修订版 | 582949d0d56cfa883c5ed8cfacda9808b8898103 (tree) |
---|---|
时间 | 2022-08-31 01:44:03 |
作者 | Albert Mietus < albert AT mietus DOT nl > |
Commiter | Albert Mietus < albert AT mietus DOT nl > |
asis
@@ -1,13 +1,4 @@ | ||
1 | -.. .. include:: /std/localtoc.irst | |
2 | - | |
3 | -.. sidebar:: On this page (override:4) | |
4 | - :class: localtoc | |
5 | - | |
6 | - .. contents:: | |
7 | - :depth: 4 | |
8 | - :local: | |
9 | - :backlinks: none | |
10 | - | |
1 | +.. include:: /std/localtoc.irst | |
11 | 2 | |
12 | 3 | .. _ConcurrentComputingConcepts: |
13 | 4 |
@@ -19,13 +10,15 @@ | ||
19 | 10 | :category: Castle DesignStudy |
20 | 11 | :tags: Castle, Concurrency, DRAFT |
21 | 12 | |
22 | - Shortly, more and more cores will become available alike I described in “:ref:`BusyCores`”. Castle should make it | |
23 | - easy to write code for all of them: not to keep them busy, but to maximize speed up [useCase: :need:`U_ManyCore`]. | |
24 | - We also discussed threads_: they do not scale well for CPU-bound (embedded) systems. And, I introduced some (other) | |
25 | - concurrency abstractions; which do not always fit nicely in existing languages. | |
13 | + Sooner as we may realize even embedded systems will have many, many cores; as I described in | |
14 | + “:ref:`BusyCores`”. Castle should make it easy to write code for all of them: not to keep them busy, but to maximize | |
15 | + speed up [useCase: :need:`U_ManyCore`]. There I also showed that threads_ do not scale well for CPU-bound (embedded) | |
16 | + systems. Last, I introduced some (more) concurrency abstractions. Some are great, but they often do not fit | |
17 | + nicely in existing languages. | |
18 | + | |
19 | + Still, as Castle is a new language we have the opportunity to select such a concept and incorporate it into the | |
20 | + language ... | |
26 | 21 | |BR| |
27 | - As Castle is a new language we have the opportunity to select such a concept and incorporate it into the language ... | |
28 | - | |
29 | 22 | In this blog, we explore a bit of theory. I will focus on semantics and the possibilities to implement them |
30 | 23 | efficiently. The exact syntax will come later. |
31 | 24 |
@@ -33,10 +26,10 @@ | ||
33 | 26 | ***************** |
34 | 27 | |
35 | 28 | There are many theories available and some more practical expertise but they hardly share a common vocabulary. |
36 | -For that reason, let describe some basic terms, that will be used in these blogs. As always, we use Wikipedia as common | |
37 | -ground, and add links for a deep-dive. | |
29 | +For that reason, let’s describe some basic terms, that will be used in these blogs. As always, we use Wikipedia as common | |
30 | +ground and add links for a deep dive. | |
38 | 31 | |BR| |
39 | -Again, we use ‘task’ as the most generic term of work-to-be-executed; that can be (in) a process, (on) a thread, (by) a | |
32 | +Again, we use ‘task’ as the most generic term for work-to-be-executed; that can be (in) a process, (on) a thread, (by) a | |
40 | 33 | computer, etc. |
41 | 34 | |
42 | 35 | .. include:: CCC-sidebar-concurrency.irst |
@@ -46,18 +39,17 @@ | ||
46 | 39 | |
47 | 40 | Concurrency_ is the **ability** to “compute” multiple *tasks* at the same time. |
48 | 41 | |BR| |
49 | -Designing concurrent software isn’t that complicated but; demands another mindset the when we write software that does | |
50 | -one tasks afer the other. | |
42 | +Designing concurrent software isn’t that complicated but; demands another mindset than when we write software that does | |
43 | +one task after the other. | |
51 | 44 | |
52 | 45 | A typical example is a loop: suppose we have a sequence of numbers and we like to compute the square of each one. Most |
53 | 46 | developers will loop over those numbers, get one number, calculate the square, store it in another list, and continue. |
54 | -It works, but we have also instructed the computer to do it in sequence — especially when the | |
55 | -task is bit more complicated, the compiler does know whether the ‘next task’ depends on the current one, and can’t | |
56 | -optimise it. | |
47 | +It works, but we have also instructed the computer to do it in sequence. Especially when the task is a bit more | |
48 | +complicated, the compiler does know whether the ‘next task’ depends on the current one, and can’t optimize it. | |
57 | 49 | |
58 | -A better plan is to tell the compiler about the tasks; most are independently: square a number. There is also one that | |
59 | -has to be run at the end: combine the results into a new list. And one is bit funny: distribute the sequence-elements | |
60 | -over the “square-tasks” — clearly, one has to start with this one, but it can be concurrent with many others too. | |
50 | +A better plan is to tell the compiler about different tasks. Most are independent: square a number. There is also one | |
51 | +that has to be run at the end: combine the results into a new list. And one is a bit funny: distribute the elements over | |
52 | +the “square tasks”. Clearly one has to start with this one, but it can be concurrent with many others too. | |
61 | 53 | |BR| |
62 | 54 | This is *not* a parallel algorithm. When not specifying the order, we allow parallel execution. We do not demand it, |
63 | 55 | sequential execution is allowed too. |
@@ -66,54 +58,54 @@ | ||
66 | 58 | Parallelism |
67 | 59 | =========== |
68 | 60 | |
69 | -Parallelism_ is about executing multiple tasks (seemingly) at the same time. We will focus running multiple concurrent | |
70 | -task (of the same program) on *“as many cores as possible”*. When we assume a thousand cores, we need a thousand | |
71 | -independent tasks (at least) to gain maximal speed up. A thousand at any moment! | |
61 | +Parallelism_ is about executing multiple tasks (seemingly) at the same time. We will on focus running many multiple | |
62 | +concurrent tasks (of the same program) on *“as many cores as possible”*. When we assume a thousand cores, we need a | |
63 | +thousand independent tasks (at least) to gain maximal speed up. A thousand at any moment! | |
72 | 64 | |BR| |
73 | -It’s not only about doing a thousand tasks at the same time (that is not to complicated, for a computer), but also — | |
65 | +It’s not only about doing a thousand tasks at the same time (that is not too complicated, for a computer) but also — | |
74 | 66 | probably: mostly — about finishing a thousand times faster… |
75 | 67 | |
76 | -With many cores, multiple program-steps can be executed at the same time: from changing the same variable, acces the | |
77 | -same memory, or compete for new memory. And when solving that, we introduce new hazards: like deadlocks_ and even | |
78 | -livelocks_. | |
68 | +With many cores, multiple “program lines” can be executed at the same time, which can introduce unforeseen effects: | |
69 | +changing the same variable, accessing the same memory, or competing for new, “free” memory. And when solving that, we | |
70 | +introduce new hazards: like deadlocks_ and even livelocks_. | |
79 | 71 | |
80 | 72 | |
81 | 73 | Distributed |
82 | 74 | ----------- |
83 | 75 | |
84 | -A special form of parallelisme is Distributed-Computing_: compute on many computers. Many experts consider this | |
85 | -as an independent field of expertise; still --as Multi-Core_ is basically “many computers on a chips”-- its there is an | |
86 | -analogy [#DistributedDiff]_, and we should use the know-how that is available, to design out “best ever language”. | |
76 | +A special form of parallelism is Distributed-Computing_: computing on many computers. Many experts consider this | |
77 | +an independent field of expertise. Still --as Multi-Core_ is basically “many computers on a chip”-- it’s an | |
78 | +available, adjacent [#DistributedDiff]_ theory, and we should use it, to design our “best ever language”. | |
87 | 79 | |
88 | 80 | .. include:: CCC-sidebar-CS.irst |
89 | 81 | |
90 | 82 | Efficient Communication |
91 | 83 | *********************** |
92 | 84 | |
93 | -When multiple tasks run in concurrently, they have to communicate to pass data and to controll progress. Unlike in a | |
94 | -sequential program -- where the controll is trivial, as is sharing data-- this needs a bit of extra effort. | |
85 | +When multiple tasks run concurrently, they have to communicate to pass data and control progress. Unlike in a | |
86 | +sequential program -- where the control is trivial, as is sharing data-- this needs a bit of extra effort. | |
95 | 87 | |BR| |
96 | -There are two main approches: shared-data of message-passing; we will introduce them below. | |
88 | +There are two main approaches: shared-data of message-passing; we will introduce them below. | |
97 | 89 | |
98 | -Communication takes time, especially *wall-time* [#wall-time]_ (or real time) and may slow down computing. Therefore | |
99 | -communication has to be efficient. This is a hard problem and becomes harder when we have more communication, more | |
100 | -concurrently (& parallel) tasks, and/or those task are shorter living. Or better: is depends on the ratio | |
101 | -communication-time over time-between-two-communications. | |
90 | +Communication takes time, especially *wall time* [#wall-time]_ (or clock time) and may slow down computing. Therefore | |
91 | +communication has to be efficient. This is an arduous problem and becomes harder when we have more communication, more | |
92 | +concurrency, more parallelism, and/or those tasks are short(er)living. Or better: it depends on the ratio between the | |
93 | +communication-time and the time-between-two-communications. | |
102 | 94 | |
103 | 95 | |
104 | 96 | Shared Memory |
105 | 97 | ============= |
106 | 98 | |
107 | -In this model all tasks (usually threads or process) have some shared/common memory; typically “variables”. As the acces | |
108 | -is asynchronous, the risk exist the data is updated “at the same time” by two or more tasks. This can lead to invalid | |
99 | +In this model all tasks (usually threads or processes) have some shared/common memory; typically “variables”. As the access | |
100 | +is asynchronous, the risk exists the data is updated “at the same time” by two or more tasks. This can lead to invalid | |
109 | 101 | data and so Critical-Sections_ are needed. |
110 | 102 | |
111 | -This is a very basic model which assumes that there is physically memory that can be shared. In distributed systems this | |
112 | -is uncommon; but for threads it’s straightforward. As disadvantage of this model is that is hazardous: Even when a | |
113 | -single access to such a shared variable is not protected by a Critical-Section_, the whole system can break [#OOCS]_. | |
103 | +This is a very basic model which assumes that there is physical memory that can be shared. In distributed systems this | |
104 | +is uncommon, but for threads it’s straightforward. A disadvantage of this model is that is hazardous: Even when a | |
105 | +single modifier of a shared variable is not protected by a Critical-Section_, the whole system can break [#OOCS]_. | |
114 | 106 | |
115 | -The advantage of shared memory is the short communication-time. The wall-time and CPU-time are roughly the same: the | |
116 | -time to write & read the variable, added to the (overhead) time for the critical section -- which is typical the | |
107 | +The advantage of shared memory is the fast *communication-time*. The wall-time and CPU-time are roughly the same: the | |
108 | +time to write & read the variable added to the (overhead) time for the critical section -- which is typically the | |
117 | 109 | bigger part. |
118 | 110 | |
119 | 111 |
@@ -122,29 +114,29 @@ | ||
122 | 114 | |
123 | 115 | A more modern approach is Message-Passing_: a task sends some information to another; this can be a message, some data, |
124 | 116 | or an event. In all cases, there is a distinct sender and receiver -- and apparently no common/shared memory-- so no |
125 | -Critical-Sections [#MPCS]_ are needed; at least not explicitly. Messages can be used by all kind of task; even in a | |
126 | -distributed system -- then the message (and it data) is serialised, transmitted over a network and deserialised. Which | |
127 | -can introduce some overhead and delay. | |
117 | +Critical-Sections [#MPCS]_ are needed; at least not explicitly. Messages are easier to use and more generic: they can be | |
118 | +used in single-, multi-, and many-core systems. Even distributed systems are possible -- then the message (and its data) | |
119 | +is serialised, transmitted over a network, and deserialised. | |
128 | 120 | |
129 | -As you may have noticed, there is an analogy between Message-Passing_ and Events_ (in an the event-loop). They have | |
130 | -separate history, but are quite similar in nature. Like a “message”, the “event” is also used to share data (& controll) | |
131 | -to isolated “tasks”. | |
121 | +As you may have noticed, there is an analogy between Message-Passing_ and Events_ (in an event-loop). They have separate | |
122 | +histories but are quite similar in nature. Like a “message”, the “event” is also used to share data (& control) to | |
123 | +isolated “tasks”. | |
132 | 124 | |
133 | 125 | .. warning:: |
134 | 126 | |
135 | - Many people use the networking mental model when they thing about Message-Passing_, and *wrongly* assume there is | |
127 | + Many people use the networking mental model when they think about Message-Passing_, and *wrongly* assume there is | |
136 | 128 | always serialisation (and network) overhead. This is not needed for parallel cores as they typically have shared |
137 | 129 | (physical) memory. |
138 | 130 | |
139 | - Then, we can use the message abstraction at developer-level, and let the compiler will translate that it into shared | |
140 | - memory instructions at processor level. | |
131 | + Then, we can use the message abstraction at developer-level, and let the compiler translate that into shared | |
132 | + memory instructions for the processor level. | |
141 | 133 | |BR| |
142 | 134 | Notice: As the compiler will insert the (low level) Semaphores_, the risk that a developer forgets one is gone! |
143 | 135 | |
144 | -Aspects | |
145 | -------- | |
136 | +Messaging Aspects | |
137 | +----------------- | |
146 | 138 | |
147 | -There are many variant on messaging, mostly combinations some fundamental aspects | |
139 | +There are many variant on messaging, mostly combinations some fundamental aspects. Let mentions some basic ones. | |
148 | 140 | |
149 | 141 | .. include:: CCC-sidebar-async.irst |
150 | 142 |
@@ -196,14 +188,46 @@ | ||
196 | 188 | Messages --or actually the channel that transport them-- can be *unidirectional*: from sender to receiver only; |
197 | 189 | *bidirectional*: both sides can send and receive; or *broadcasted*: one message is send to many receivers [#anycast]_. |
198 | 190 | |
199 | -Reliability | |
200 | -~~~~~~~~~~~ | |
191 | +Reliability & Order | |
192 | +~~~~~~~~~~~~~~~~~~~ | |
201 | 193 | |
202 | -Especially when studying “network messages”, we have to consider Reliability_ too: Will a send message always be | |
203 | -received (and will they be received in the same order, and/or only once. This has advantages, but disadvantages too: a | |
204 | -reliable connection (over an unreliable network) needs more overhead and has more delay. | |
194 | +Especially when studying “network messages”, we have to consider Reliability_ too. Many developers assume that a send | |
195 | +message is always received and that when multiple messages are sent, they are received in the same order. In most, | |
196 | +traditional --single-core-- applications this is always true. With networking applications, this is not always | |
197 | +the case. Messages can get lost, received out of order, or even read twice. Although it is always possible to add a | |
198 | +“reliability layer”. | |
199 | +|BR| | |
200 | +Such a layer makes writing the application easier but introduces overhead. And therefore not always the right solution. | |
205 | 201 | |
206 | -This also applies to software-message, although in lesser extent. | |
202 | +In Castle, we have “active components”: many cores are running parallel, all doing a part of the overall (concurrent) | |
203 | +program. This resembles a networking application -- even while there is no real network -- where at least three nodes | |
204 | +are active. | |
205 | + | |
206 | +This is a bit more complicated, so let us start with an example. Say, we have 3 components ``A``, ``B1``, and | |
207 | +``B2``. All are connected to all others. We assume that messages are unbuffered, non-blocking, never got lost, and that | |
208 | +two messages over the same channel are never out-of-order. Sound simple, isn’t it? | |
209 | +|BR| | |
210 | +Now state that ``A`` send a message (`m1`) to ``B1`` and then one (`m2`) to ``B1``. The “B components” will --on | |
211 | +receiving a message from ``A`` -- send a short message to the other one (`m3` and `m4`). And that message triggers | |
212 | +(again both in ``B1`` and ``B2``) to send an answer to ``A``; so `m5` and `m6`. | |
213 | + | |
214 | +Now the question is: in which order those answers (in ``A``) are received? | |
215 | +|BR| | |
216 | +The real answer is: you don’t know! | |
217 | +|BR| | |
218 | +It’s clear that ``A`` will get `m5` and `m6` -- given that all messages (aka channels) are reliable. But there are many | |
219 | +ways those messages may receive in the opposite order. Presumably, even in more ways, than you can imagine. For example, | |
220 | +``B1`` might processes `m4` before it process `m1`! This can happen when channel ``A->B1`` is *slow*, or when ``B2`` | |
221 | +gets CPU-time before ``B1``, or... | |
222 | + | |
223 | +When we add buffering, more connected components, etc this *“network”* acts less reliable than we might aspect (even | |
224 | +though each message is reliable). When we add some real-time demands (see below), the ability to use/model a solution | |
225 | +using an unreliable message becomes attractive ... | |
226 | +|BR| | |
227 | +It’s not that you should always favor unreliable, out-of-order messages. Without regard, No! We are designing a new | |
228 | +language, however --one that should run efficiently on thousands of core, in a real-time embedded system-- then the | |
229 | +option to utilize them may be beneficial. | |
230 | + | |
207 | 231 | |
208 | 232 | .. hint:: |
209 | 233 |
@@ -3,29 +3,29 @@ | ||
3 | 3 | |
4 | 4 | .. sidebar:: About Critical Sections |
5 | 5 | |
6 | - For those, who are not familiar with Critical-Sections_ and/or Semaphores_, a short intro. | |
7 | - | |
8 | - .. rubric:: Dilemma: Statements are not atomic | |
6 | + For those, who are not familiar with Critical-Sections_ or Semaphores_, here is a short intro. | |
9 | 7 | |
10 | - Unlike some developers presume “code-lines” are not *‘atomic’*: they can be interrupted. When using (e.g) threads_, | |
11 | - the “computer” can pause one thread halfway a statement, to run another one temporally and continue a millisecond | |
12 | - later. When that happens when writing or reading a variable and the other thread also access the same shared-memory, | |
13 | - the result is unpredictable. To prevent that, we need to controle the handling that variable: make it a | |
8 | + .. rubric:: Dilemma: Statements are not atomic. | |
9 | + | |
10 | + Unlike some developers presume, *“code lines”* are not *‘atomic’*: they can be interrupted. When using (e.g.) threads_, | |
11 | + the “computer” can pause one thread halfway through a statement to run another one temporally and continue a millisecond | |
12 | + later. When it happens during writing or reading a variable, and the other thread also accesses the same shared-memory, | |
13 | + the result is unpredictable. To prevent that, we need to control the handling of that variable: make it a | |
14 | 14 | Critical-Section_. |
15 | 15 | |
16 | - .. rubric:: Solve it by marking Sections as exclusive | |
16 | + .. rubric:: Solve it by marking sections *‘exclusive’*. | |
17 | 17 | |
18 | - In essense, we have to tell the “computer” that a line (of a few lines) are *atomic*; to make access exclusive The | |
19 | - the compiler will add some extra fundamental instructions (specific for that CPU-type) to assure this. A check is | |
18 | + In essence, we have to tell the “computer” that a line (or a few lines) is *atomic*. To enforce the access exclusive, | |
19 | + the compiler will add some extra fundamental instructions (specific for that type of CPU) to assure this. A check is | |
20 | 20 | inserted just before the section is entered, and the thread will be suspended when another task is using it. When |
21 | - granted acces, a bit of bookkeeping is done -- so the “check” in other thread is halted). That bookkeeping is updated | |
22 | - and when leaving. Along with more bookkeeping to un-pauze the suspended threads. | |
21 | + access is granted, a bit of bookkeeping is done -- so that the “check” in other threads will halt). That bookkeeping | |
22 | + is updated when leaving. Along with more bookkeeping to un-pause the suspended threads. | |
23 | 23 | |
24 | - .. rubric:: Complication: overhead | |
24 | + .. rubric:: Complication: overhead! | |
25 | 25 | |
26 | 26 | As you can imagen, this “bookkeeping” is extra complicated on a Multi-Core_ system; some global data structure is |
27 | 27 | needed; which is a Critical-Sections in itself. |
28 | 28 | |BR| |
29 | 29 | There are many algorithms to solve this. All with the same disadvantage: it takes a bit of time -- possible by |
30 | - “Spinlocking_” all other cores (for a few nanoseconds). As Critical-Sections a usually short (e.g a assignment, or a | |
31 | - few lines) the overhead can be (relatively) huge [#timesCPU]_! | |
30 | + “Spinlocking_” all other cores (for a few nanoseconds). As Critical-Sections a usually short (e.g. one assignment, or | |
31 | + a few lines) the overhead can be (relatively) huge [#timesCPU]_! |
@@ -1,4 +1,4 @@ | ||
1 | -.. -*- rst -*- | |
1 | +.. -*-rst-*- | |
2 | 2 | included in `8.BusyCores-concepts.rst` |
3 | 3 | |
4 | 4 | .. sidebar:: |
@@ -15,7 +15,7 @@ | ||
15 | 15 | for n in L1: |
16 | 16 | L2.append(power(n)) |
17 | 17 | |
18 | - .. note:: As ``power()`` could have side-effects, the compiler **must** keep the defined order! | |
18 | + .. note:: As ``power()`` could have side effects, the compiler **must** keep the defined order! | |
19 | 19 | |
20 | 20 | .. tab:: Concurrent |
21 | 21 |
@@ -28,8 +28,8 @@ | ||
28 | 28 | .. note:: |
29 | 29 | |
30 | 30 | Although (current) python-compilers will run it sequentially, it is *allowed* to distribute it; even when |
31 | - ``power()`` has side-effects! | |
31 | + ``power()`` has side effects! | |
32 | 32 | |BR| |
33 | 33 | As long as *python* put the results in the correct order in list ``L2`` **any order** is allowed. “Out of |
34 | - order” side-effects are allowed by this code. | |
34 | + order” side effects are allowed by this code. | |
35 | 35 |