• R/O
  • SSH

提交

标签
No Tags

Frequently used words (click to add to your profile)

javac++androidlinuxc#windowsobjective-ccocoa誰得qtpythonphprubygameguibathyscaphec計画中(planning stage)翻訳omegatframeworktwitterdomtestvb.netdirectxゲームエンジンbtronarduinopreviewer

Commit MetaInfo

修订版582949d0d56cfa883c5ed8cfacda9808b8898103 (tree)
时间2022-08-31 01:44:03
作者Albert Mietus < albert AT mietus DOT nl >
CommiterAlbert Mietus < albert AT mietus DOT nl >

Log Message

asis

更改概述

差异

diff -r e917f1b49679 -r 582949d0d56c CCastle/2.Analyse/8.ConcurrentComputingConcepts.rst
--- a/CCastle/2.Analyse/8.ConcurrentComputingConcepts.rst Wed Aug 24 18:15:17 2022 +0200
+++ b/CCastle/2.Analyse/8.ConcurrentComputingConcepts.rst Tue Aug 30 18:44:03 2022 +0200
@@ -1,13 +1,4 @@
1-.. .. include:: /std/localtoc.irst
2-
3-.. sidebar:: On this page (override:4)
4- :class: localtoc
5-
6- .. contents::
7- :depth: 4
8- :local:
9- :backlinks: none
10-
1+.. include:: /std/localtoc.irst
112
123 .. _ConcurrentComputingConcepts:
134
@@ -19,13 +10,15 @@
1910 :category: Castle DesignStudy
2011 :tags: Castle, Concurrency, DRAFT
2112
22- Shortly, more and more cores will become available alike I described in “:ref:`BusyCores`”. Castle should make it
23- easy to write code for all of them: not to keep them busy, but to maximize speed up [useCase: :need:`U_ManyCore`].
24- We also discussed threads_: they do not scale well for CPU-bound (embedded) systems. And, I introduced some (other)
25- concurrency abstractions; which do not always fit nicely in existing languages.
13+ Sooner as we may realize even embedded systems will have many, many cores; as I described in
14+ “:ref:`BusyCores`”. Castle should make it easy to write code for all of them: not to keep them busy, but to maximize
15+ speed up [useCase: :need:`U_ManyCore`]. There I also showed that threads_ do not scale well for CPU-bound (embedded)
16+ systems. Last, I introduced some (more) concurrency abstractions. Some are great, but they often do not fit
17+ nicely in existing languages.
18+
19+ Still, as Castle is a new language we have the opportunity to select such a concept and incorporate it into the
20+ language ...
2621 |BR|
27- As Castle is a new language we have the opportunity to select such a concept and incorporate it into the language ...
28-
2922 In this blog, we explore a bit of theory. I will focus on semantics and the possibilities to implement them
3023 efficiently. The exact syntax will come later.
3124
@@ -33,10 +26,10 @@
3326 *****************
3427
3528 There are many theories available and some more practical expertise but they hardly share a common vocabulary.
36-For that reason, let describe some basic terms, that will be used in these blogs. As always, we use Wikipedia as common
37-ground, and add links for a deep-dive.
29+For that reason, let’s describe some basic terms, that will be used in these blogs. As always, we use Wikipedia as common
30+ground and add links for a deep dive.
3831 |BR|
39-Again, we use ‘task’ as the most generic term of work-to-be-executed; that can be (in) a process, (on) a thread, (by) a
32+Again, we use ‘task’ as the most generic term for work-to-be-executed; that can be (in) a process, (on) a thread, (by) a
4033 computer, etc.
4134
4235 .. include:: CCC-sidebar-concurrency.irst
@@ -46,18 +39,17 @@
4639
4740 Concurrency_ is the **ability** to “compute” multiple *tasks* at the same time.
4841 |BR|
49-Designing concurrent software isn’t that complicated but; demands another mindset the when we write software that does
50-one tasks afer the other.
42+Designing concurrent software isn’t that complicated but; demands another mindset than when we write software that does
43+one task after the other.
5144
5245 A typical example is a loop: suppose we have a sequence of numbers and we like to compute the square of each one. Most
5346 developers will loop over those numbers, get one number, calculate the square, store it in another list, and continue.
54-It works, but we have also instructed the computer to do it in sequence — especially when the
55-task is bit more complicated, the compiler does know whether the ‘next task’ depends on the current one, and can’t
56-optimise it.
47+It works, but we have also instructed the computer to do it in sequence. Especially when the task is a bit more
48+complicated, the compiler does know whether the ‘next task’ depends on the current one, and can’t optimize it.
5749
58-A better plan is to tell the compiler about the tasks; most are independently: square a number. There is also one that
59-has to be run at the end: combine the results into a new list. And one is bit funny: distribute the sequence-elements
60-over the “square-tasks” — clearly, one has to start with this one, but it can be concurrent with many others too.
50+A better plan is to tell the compiler about different tasks. Most are independent: square a number. There is also one
51+that has to be run at the end: combine the results into a new list. And one is a bit funny: distribute the elements over
52+the “square tasks”. Clearly one has to start with this one, but it can be concurrent with many others too.
6153 |BR|
6254 This is *not* a parallel algorithm. When not specifying the order, we allow parallel execution. We do not demand it,
6355 sequential execution is allowed too.
@@ -66,54 +58,54 @@
6658 Parallelism
6759 ===========
6860
69-Parallelism_ is about executing multiple tasks (seemingly) at the same time. We will focus running multiple concurrent
70-task (of the same program) on *“as many cores as possible”*. When we assume a thousand cores, we need a thousand
71-independent tasks (at least) to gain maximal speed up. A thousand at any moment!
61+Parallelism_ is about executing multiple tasks (seemingly) at the same time. We will on focus running many multiple
62+concurrent tasks (of the same program) on *“as many cores as possible”*. When we assume a thousand cores, we need a
63+thousand independent tasks (at least) to gain maximal speed up. A thousand at any moment!
7264 |BR|
73-It’s not only about doing a thousand tasks at the same time (that is not to complicated, for a computer), but also —
65+It’s not only about doing a thousand tasks at the same time (that is not too complicated, for a computer) but also —
7466 probably: mostly — about finishing a thousand times faster…
7567
76-With many cores, multiple program-steps can be executed at the same time: from changing the same variable, acces the
77-same memory, or compete for new memory. And when solving that, we introduce new hazards: like deadlocks_ and even
78-livelocks_.
68+With many cores, multiple “program lines” can be executed at the same time, which can introduce unforeseen effects:
69+changing the same variable, accessing the same memory, or competing for new, “free” memory. And when solving that, we
70+introduce new hazards: like deadlocks_ and even livelocks_.
7971
8072
8173 Distributed
8274 -----------
8375
84-A special form of parallelisme is Distributed-Computing_: compute on many computers. Many experts consider this
85-as an independent field of expertise; still --as Multi-Core_ is basically “many computers on a chips”-- its there is an
86-analogy [#DistributedDiff]_, and we should use the know-how that is available, to design out “best ever language”.
76+A special form of parallelism is Distributed-Computing_: computing on many computers. Many experts consider this
77+an independent field of expertise. Still --as Multi-Core_ is basically “many computers on a chip”-- it’s an
78+available, adjacent [#DistributedDiff]_ theory, and we should use it, to design our “best ever language”.
8779
8880 .. include:: CCC-sidebar-CS.irst
8981
9082 Efficient Communication
9183 ***********************
9284
93-When multiple tasks run in concurrently, they have to communicate to pass data and to controll progress. Unlike in a
94-sequential program -- where the controll is trivial, as is sharing data-- this needs a bit of extra effort.
85+When multiple tasks run concurrently, they have to communicate to pass data and control progress. Unlike in a
86+sequential program -- where the control is trivial, as is sharing data-- this needs a bit of extra effort.
9587 |BR|
96-There are two main approches: shared-data of message-passing; we will introduce them below.
88+There are two main approaches: shared-data of message-passing; we will introduce them below.
9789
98-Communication takes time, especially *wall-time* [#wall-time]_ (or real time) and may slow down computing. Therefore
99-communication has to be efficient. This is a hard problem and becomes harder when we have more communication, more
100-concurrently (& parallel) tasks, and/or those task are shorter living. Or better: is depends on the ratio
101-communication-time over time-between-two-communications.
90+Communication takes time, especially *wall time* [#wall-time]_ (or clock time) and may slow down computing. Therefore
91+communication has to be efficient. This is an arduous problem and becomes harder when we have more communication, more
92+concurrency, more parallelism, and/or those tasks are short(er)living. Or better: it depends on the ratio between the
93+communication-time and the time-between-two-communications.
10294
10395
10496 Shared Memory
10597 =============
10698
107-In this model all tasks (usually threads or process) have some shared/common memory; typically “variables”. As the acces
108-is asynchronous, the risk exist the data is updated “at the same time” by two or more tasks. This can lead to invalid
99+In this model all tasks (usually threads or processes) have some shared/common memory; typically “variables”. As the access
100+is asynchronous, the risk exists the data is updated “at the same time” by two or more tasks. This can lead to invalid
109101 data and so Critical-Sections_ are needed.
110102
111-This is a very basic model which assumes that there is physically memory that can be shared. In distributed systems this
112-is uncommon; but for threads it’s straightforward. As disadvantage of this model is that is hazardous: Even when a
113-single access to such a shared variable is not protected by a Critical-Section_, the whole system can break [#OOCS]_.
103+This is a very basic model which assumes that there is physical memory that can be shared. In distributed systems this
104+is uncommon, but for threads it’s straightforward. A disadvantage of this model is that is hazardous: Even when a
105+single modifier of a shared variable is not protected by a Critical-Section_, the whole system can break [#OOCS]_.
114106
115-The advantage of shared memory is the short communication-time. The wall-time and CPU-time are roughly the same: the
116-time to write & read the variable, added to the (overhead) time for the critical section -- which is typical the
107+The advantage of shared memory is the fast *communication-time*. The wall-time and CPU-time are roughly the same: the
108+time to write & read the variable added to the (overhead) time for the critical section -- which is typically the
117109 bigger part.
118110
119111
@@ -122,29 +114,29 @@
122114
123115 A more modern approach is Message-Passing_: a task sends some information to another; this can be a message, some data,
124116 or an event. In all cases, there is a distinct sender and receiver -- and apparently no common/shared memory-- so no
125-Critical-Sections [#MPCS]_ are needed; at least not explicitly. Messages can be used by all kind of task; even in a
126-distributed system -- then the message (and it data) is serialised, transmitted over a network and deserialised. Which
127-can introduce some overhead and delay.
117+Critical-Sections [#MPCS]_ are needed; at least not explicitly. Messages are easier to use and more generic: they can be
118+used in single-, multi-, and many-core systems. Even distributed systems are possible -- then the message (and its data)
119+is serialised, transmitted over a network, and deserialised.
128120
129-As you may have noticed, there is an analogy between Message-Passing_ and Events_ (in an the event-loop). They have
130-separate history, but are quite similar in nature. Like a “message”, the “event” is also used to share data (& controll)
131-to isolated “tasks”.
121+As you may have noticed, there is an analogy between Message-Passing_ and Events_ (in an event-loop). They have separate
122+histories but are quite similar in nature. Like a “message”, the “event” is also used to share data (& control) to
123+isolated “tasks”.
132124
133125 .. warning::
134126
135- Many people use the networking mental model when they thing about Message-Passing_, and *wrongly* assume there is
127+ Many people use the networking mental model when they think about Message-Passing_, and *wrongly* assume there is
136128 always serialisation (and network) overhead. This is not needed for parallel cores as they typically have shared
137129 (physical) memory.
138130
139- Then, we can use the message abstraction at developer-level, and let the compiler will translate that it into shared
140- memory instructions at processor level.
131+ Then, we can use the message abstraction at developer-level, and let the compiler translate that into shared
132+ memory instructions for the processor level.
141133 |BR|
142134 Notice: As the compiler will insert the (low level) Semaphores_, the risk that a developer forgets one is gone!
143135
144-Aspects
145--------
136+Messaging Aspects
137+-----------------
146138
147-There are many variant on messaging, mostly combinations some fundamental aspects
139+There are many variant on messaging, mostly combinations some fundamental aspects. Let mentions some basic ones.
148140
149141 .. include:: CCC-sidebar-async.irst
150142
@@ -196,14 +188,46 @@
196188 Messages --or actually the channel that transport them-- can be *unidirectional*: from sender to receiver only;
197189 *bidirectional*: both sides can send and receive; or *broadcasted*: one message is send to many receivers [#anycast]_.
198190
199-Reliability
200-~~~~~~~~~~~
191+Reliability & Order
192+~~~~~~~~~~~~~~~~~~~
201193
202-Especially when studying “network messages”, we have to consider Reliability_ too: Will a send message always be
203-received (and will they be received in the same order, and/or only once. This has advantages, but disadvantages too: a
204-reliable connection (over an unreliable network) needs more overhead and has more delay.
194+Especially when studying “network messages”, we have to consider Reliability_ too. Many developers assume that a send
195+message is always received and that when multiple messages are sent, they are received in the same order. In most,
196+traditional --single-core-- applications this is always true. With networking applications, this is not always
197+the case. Messages can get lost, received out of order, or even read twice. Although it is always possible to add a
198+“reliability layer”.
199+|BR|
200+Such a layer makes writing the application easier but introduces overhead. And therefore not always the right solution.
205201
206-This also applies to software-message, although in lesser extent.
202+In Castle, we have “active components”: many cores are running parallel, all doing a part of the overall (concurrent)
203+program. This resembles a networking application -- even while there is no real network -- where at least three nodes
204+are active.
205+
206+This is a bit more complicated, so let us start with an example. Say, we have 3 components ``A``, ``B1``, and
207+``B2``. All are connected to all others. We assume that messages are unbuffered, non-blocking, never got lost, and that
208+two messages over the same channel are never out-of-order. Sound simple, isn’t it?
209+|BR|
210+Now state that ``A`` send a message (`m1`) to ``B1`` and then one (`m2`) to ``B1``. The “B components” will --on
211+receiving a message from ``A`` -- send a short message to the other one (`m3` and `m4`). And that message triggers
212+(again both in ``B1`` and ``B2``) to send an answer to ``A``; so `m5` and `m6`.
213+
214+Now the question is: in which order those answers (in ``A``) are received?
215+|BR|
216+The real answer is: you don’t know!
217+|BR|
218+It’s clear that ``A`` will get `m5` and `m6` -- given that all messages (aka channels) are reliable. But there are many
219+ways those messages may receive in the opposite order. Presumably, even in more ways, than you can imagine. For example,
220+``B1`` might processes `m4` before it process `m1`! This can happen when channel ``A->B1`` is *slow*, or when ``B2``
221+gets CPU-time before ``B1``, or...
222+
223+When we add buffering, more connected components, etc this *“network”* acts less reliable than we might aspect (even
224+though each message is reliable). When we add some real-time demands (see below), the ability to use/model a solution
225+using an unreliable message becomes attractive ...
226+|BR|
227+It’s not that you should always favor unreliable, out-of-order messages. Without regard, No! We are designing a new
228+language, however --one that should run efficiently on thousands of core, in a real-time embedded system-- then the
229+option to utilize them may be beneficial.
230+
207231
208232 .. hint::
209233
diff -r e917f1b49679 -r 582949d0d56c CCastle/2.Analyse/CCC-sidebar-CS.irst
--- a/CCastle/2.Analyse/CCC-sidebar-CS.irst Wed Aug 24 18:15:17 2022 +0200
+++ b/CCastle/2.Analyse/CCC-sidebar-CS.irst Tue Aug 30 18:44:03 2022 +0200
@@ -3,29 +3,29 @@
33
44 .. sidebar:: About Critical Sections
55
6- For those, who are not familiar with Critical-Sections_ and/or Semaphores_, a short intro.
7-
8- .. rubric:: Dilemma: Statements are not atomic
6+ For those, who are not familiar with Critical-Sections_ or Semaphores_, here is a short intro.
97
10- Unlike some developers presume “code-lines” are not *‘atomic’*: they can be interrupted. When using (e.g) threads_,
11- the “computer” can pause one thread halfway a statement, to run another one temporally and continue a millisecond
12- later. When that happens when writing or reading a variable and the other thread also access the same shared-memory,
13- the result is unpredictable. To prevent that, we need to controle the handling that variable: make it a
8+ .. rubric:: Dilemma: Statements are not atomic.
9+
10+ Unlike some developers presume, *“code lines”* are not *‘atomic’*: they can be interrupted. When using (e.g.) threads_,
11+ the “computer” can pause one thread halfway through a statement to run another one temporally and continue a millisecond
12+ later. When it happens during writing or reading a variable, and the other thread also accesses the same shared-memory,
13+ the result is unpredictable. To prevent that, we need to control the handling of that variable: make it a
1414 Critical-Section_.
1515
16- .. rubric:: Solve it by marking Sections as exclusive
16+ .. rubric:: Solve it by marking sections *‘exclusive’*.
1717
18- In essense, we have to tell the “computer” that a line (of a few lines) are *atomic*; to make access exclusive The
19- the compiler will add some extra fundamental instructions (specific for that CPU-type) to assure this. A check is
18+ In essence, we have to tell the “computer” that a line (or a few lines) is *atomic*. To enforce the access exclusive,
19+ the compiler will add some extra fundamental instructions (specific for that type of CPU) to assure this. A check is
2020 inserted just before the section is entered, and the thread will be suspended when another task is using it. When
21- granted acces, a bit of bookkeeping is done -- so the “check” in other thread is halted). That bookkeeping is updated
22- and when leaving. Along with more bookkeeping to un-pauze the suspended threads.
21+ access is granted, a bit of bookkeeping is done -- so that the “check” in other threads will halt). That bookkeeping
22+ is updated when leaving. Along with more bookkeeping to un-pause the suspended threads.
2323
24- .. rubric:: Complication: overhead
24+ .. rubric:: Complication: overhead!
2525
2626 As you can imagen, this “bookkeeping” is extra complicated on a Multi-Core_ system; some global data structure is
2727 needed; which is a Critical-Sections in itself.
2828 |BR|
2929 There are many algorithms to solve this. All with the same disadvantage: it takes a bit of time -- possible by
30- “Spinlocking_” all other cores (for a few nanoseconds). As Critical-Sections a usually short (e.g a assignment, or a
31- few lines) the overhead can be (relatively) huge [#timesCPU]_!
30+ “Spinlocking_” all other cores (for a few nanoseconds). As Critical-Sections a usually short (e.g. one assignment, or
31+ a few lines) the overhead can be (relatively) huge [#timesCPU]_!
diff -r e917f1b49679 -r 582949d0d56c CCastle/2.Analyse/CCC-sidebar-concurrency.irst
--- a/CCastle/2.Analyse/CCC-sidebar-concurrency.irst Wed Aug 24 18:15:17 2022 +0200
+++ b/CCastle/2.Analyse/CCC-sidebar-concurrency.irst Tue Aug 30 18:44:03 2022 +0200
@@ -1,4 +1,4 @@
1-.. -*- rst -*-
1+.. -*-rst-*-
22 included in `8.BusyCores-concepts.rst`
33
44 .. sidebar::
@@ -15,7 +15,7 @@
1515 for n in L1:
1616 L2.append(power(n))
1717
18- .. note:: As ``power()`` could have side-effects, the compiler **must** keep the defined order!
18+ .. note:: As ``power()`` could have side effects, the compiler **must** keep the defined order!
1919
2020 .. tab:: Concurrent
2121
@@ -28,8 +28,8 @@
2828 .. note::
2929
3030 Although (current) python-compilers will run it sequentially, it is *allowed* to distribute it; even when
31- ``power()`` has side-effects!
31+ ``power()`` has side effects!
3232 |BR|
3333 As long as *python* put the results in the correct order in list ``L2`` **any order** is allowed. “Out of
34- order” side-effects are allowed by this code.
34+ order” side effects are allowed by this code.
3535