修订版 | f02f0584978b96c68bf30c7836aa6675fefcce7e (tree) |
---|---|
时间 | 2022-08-11 23:10:02 |
作者 | Albert Mietus < albert AT mietus DOT nl > |
Commiter | Albert Mietus < albert AT mietus DOT nl > |
asis
@@ -226,8 +226,9 @@ | ||
226 | 226 | At the same time it should run efficiently. The goals is **not** to keep all cores *busy*, but to use (all) cores to |
227 | 227 | gain maximal speed up. |
228 | 228 | |
229 | -Distributing tasks over multiple cores (or even distributed computers [#web]_ is not new. In :ref:`ConcurrentComputingConcepts` we | |
230 | -analyse some, before we design & describe Castle’s :ref:`XXX`. | |
229 | +Distributing tasks over multiple cores (or even distributed computers) [#web]_ is not new. In | |
230 | +:ref:`ConcurrentComputingConcepts` we describe some (common, theoretical) concepts, and in :ref:`ActorAbstractions` | |
231 | +analyse a powerfull approach. That we like to use for CCastle. | |
231 | 232 | |
232 | 233 | ---------- |
233 | 234 |
@@ -0,0 +1,24 @@ | ||
1 | +FSMs vs Grammars | |
2 | +================= | |
3 | + | |
4 | +In case you are wondering: *“Are FSMs and Grammars related, or the same?”* | |
5 | + | |
6 | +The are related, but not equal. | |
7 | + | |
8 | +As discovered by `Chomsky <https://en.wikipedia.org/wiki/Noam_Chomsky>`__ there is a `hierarchy <https://en.wikipedia.org/wiki/Chomsky_hierarchy>`__ of languages (or actually grammars). As Chomsky was studying all kind of languages, include natural “human” language, his perception differs from our interpretation. |BR| When we (SW/Engineers) speak about grammars, we typically refer to “Context free grammars” (`CFG <https://en.wikipedia.org/wiki/Context-free_grammar>`__) -- Chomsky classify them as “Type-2”. | |
9 | + | |
10 | +A non-deterministic PushDown machine (`PDA <https://en.wikipedia.org/wiki/Pushdown_automaton>`__) is needed to | |
11 | +recognise a “Type-2” (CFG) Grammar. (Which is a simple version of a stack-machine, which is lot more restricted then a Turing Machine). | |
12 | +|BR| | |
13 | +And as a FSM is one of the most restricted machines that exist --it can recognise only “Type-3” `Regular grammar | |
14 | +<https://en.wikipedia.org/wiki/Regular_grammar>`__-- a FSM can’t be used to recognise a (CFG) grammar. | |
15 | + | |
16 | +.. seealso:: | |
17 | + | |
18 | + .. postlist:: | |
19 | + :category: Castle | |
20 | + :tags: FSM | |
21 | + | |
22 | + .. postlist:: | |
23 | + :category: Castle | |
24 | + :tags: Grammar |
@@ -15,7 +15,7 @@ | ||
15 | 15 | :need:`U_ManyCore`]. |
16 | 16 | |BR| |
17 | 17 | We also discussed threads_: they do not scale well for CPU-bound (embedded) systems. And, I introduced some |
18 | - contemporary abstractions; which do not always fit nicely in existing languages. | |
18 | + (other) concurrency abstractions; which do not always fit nicely in existing languages. | |
19 | 19 | |
20 | 20 | As Castle is a new language we have the opportunity to select such a concept and incorporate it into the language ... |
21 | 21 | |BR| |
@@ -56,8 +56,8 @@ | ||
56 | 56 | sequential execution is allowed too. |
57 | 57 | |
58 | 58 | |
59 | -Parallel | |
60 | --------- | |
59 | +Parallelism | |
60 | +----------- | |
61 | 61 | |
62 | 62 | Parallelism_ is about executing multiple tasks (seemingly) at the same time. We will focus running multiple concurrent |
63 | 63 | task (of the same program) on “as many cores as possible”. When we assume a thousand cores, we need a thousand |
@@ -79,25 +79,35 @@ | ||
79 | 79 | analogy [#DistributedDiff]_, and we should use the know-how that is available, to design out “best ever language”. |
80 | 80 | |
81 | 81 | |
82 | -Communication | |
83 | -============= | |
82 | +Efficient Communication | |
83 | +======================= | |
84 | 84 | |
85 | 85 | When multiple tasks run in concurrently, they have to communicate to pass data and to controll progress. Unlike in a |
86 | -sequential program -- where the controll is trivial, as sharing data-- this needs a bit of extra effort. | |
86 | +sequential program -- where the controll is trivial, as is sharing data-- this needs a bit of extra effort. | |
87 | 87 | |BR| |
88 | -There are two main approches: shared-data of message-passing. | |
88 | +There are two main approches: shared-data of message-passing; we will introduce them below. | |
89 | + | |
90 | +Communication takes time, especially *wall-time* [#wall-time]_ (or real time) and may slow down computing. Therefore | |
91 | +communication has to be efficient. This is a hard problem and becomes harder when we have more communication, more | |
92 | +concurrently (& parallel) tasks, and/or those task are shorter living. Or better: is depends on the ratio | |
93 | +communication-time over time-between-two-communications. | |
94 | + | |
89 | 95 | |
90 | 96 | Shared Memory |
91 | 97 | ------------- |
92 | 98 | |
93 | 99 | In this model all tasks (usually threads or process) have some shared/common memory; typically “variables”. As the acces |
94 | 100 | is asynchronous, the risk exist the data is updated “at the same time” by two or more tasks. This can lead to invalid |
95 | -data; and so Critical-Sections_ are needed. | |
101 | +data and so Critical-Sections_ are needed. | |
96 | 102 | |
97 | 103 | This is a very basic model which assumes that there is physically memory that can be shared. In distributed systems this |
98 | 104 | is uncommon; but for threads it’s straightforward. As disadvantage of this model is that is hazardous: Even when a |
99 | 105 | single access to such a shared variable is not protected by a Critical-Section_, the whole system can break [#OOCS]_. |
100 | 106 | |
107 | +The advantage of shared memory is the short communication-time. The wall-time and CPU-time are roughly the same: the | |
108 | +time to write & read the variable, added to the (overhead) time for the critical section -- which is typical the | |
109 | +bigger part. | |
110 | + | |
101 | 111 | About Critical Sections |
102 | 112 | ~~~~~~~~~~~~~~~~~~~~~~~ |
103 | 113 | For those, who are not familiar with Critical-Sections_ and/or Semaphores, a very short intro. |
@@ -107,7 +117,7 @@ | ||
107 | 117 | that happens when writing or reading a variable and the other thread also access the same shared-memory, the result is |
108 | 118 | unpredictable. To prevent that, we need to controle the handling that variable: make it a Critical-Section_. |
109 | 119 | |
110 | -In essense, we have to tell the “computer” to make that line (of a few lines) atomic; make it exclusive for a task. Then | |
120 | +In essense, we have to tell the “computer” that a line (of a few lines) are *atomic*; to make access exclusive The | |
111 | 121 | the compiler will add some extra fundamental instructions (specific for that CPU-type) to assure this. A check is |
112 | 122 | inserted just before the section is entered, and the thread will be suspended when another task is using it. When |
113 | 123 | granted acces, a bit of bookkeeping is done -- so the “check” in other thread is halted). That bookkeeping is updated |
@@ -118,42 +128,70 @@ | ||
118 | 128 | |BR| |
119 | 129 | There are many algorithms to solve this. All with the same disadvantage: it takes a bit of time -- possible by |
120 | 130 | “Spinlocking_” all other cores (for a few nanoseconds). As Critical-Sections a usually short (e.g a assignment, or a |
121 | -few lines) the overhead can be huge(!) [#timesCPU]_. | |
131 | +few lines) the overhead can be (relatively) huge [#timesCPU]_! | |
122 | 132 | |
123 | 133 | |
124 | 134 | Messages |
125 | 135 | -------- |
126 | 136 | |
127 | 137 | A more modern approach is Message-Passing_: a task sends some information to another; this can be a message, some data, |
128 | -or an event. In all cases, there is a distinct sender and receiver -- and apparently no common/shared memory--, so no | |
129 | -Critical-Sections [#MPCS]_ are needed; at least no explicitly. Messages can be used by all kind of task; even in a | |
138 | +or an event. In all cases, there is a distinct sender and receiver -- and apparently no common/shared memory-- so no | |
139 | +Critical-Sections [#MPCS]_ are needed; at least not explicitly. Messages can be used by all kind of task; even in a | |
130 | 140 | distributed system -- then the message (and it data) is serialised, transmitted over a network and deserialised. Which |
131 | 141 | can introduce some overhead and delay. |
132 | -|BR| | |
133 | -Many people use this networking mental model when they thing about Message-Passing_, and *wrongly* assume there is | |
134 | -always serialisation (or network) overhead. When (carefully) implemented this is not needed; and can be as efficiently | |
135 | -as shared-memory (assuming there is shared-memory that can be used). | |
136 | 142 | |
137 | -This “message | |
138 | -There is an analogy between Message-Passing_ and Events_ (and the event-loop). They have separate history, but are quite | |
139 | -similar in nature. Like a “message”, the “event” is also used to share data (& controll) to isolated “tasks”. | |
143 | +As you may have noticed, there is an analogy between Message-Passing_ and Events_ (and the event-loop). They have | |
144 | +separate history, but are quite similar in nature. Like a “message”, the “event” is also used to share data (& controll) | |
145 | +to isolated “tasks”. | |
140 | 146 | |
141 | -Variants | |
147 | +.. warning:: | |
148 | + | |
149 | + Many people use the networking mental model when they thing about Message-Passing_, and *wrongly* assume there is | |
150 | + always serialisation (or network) overhead. This is not needed for parallel cores as they typically have shared | |
151 | + memory. | |
152 | + | |
153 | + Then, at developers-level we can use the Messaging abstraction, where the compiler will translate it into shared | |
154 | + memory instructions at processor level. | |
155 | + | |
156 | +There are many variant on messaging, mostly combinations some fundamental aspects | |
157 | + | |
158 | +(Un)Buffered | |
159 | +~~~~~~~~~~~~ | |
160 | +Messages can be *buffered*, or not. It’s not a characteristic of the messages itself, but more about the channel that | |
161 | +transport the messages: How many messages can be stored ‘in’ the channel (often depicted a a queue). When there is no | |
162 | +storage at all the writer and reader needs to rendezvous: send and receive at the same (wall) time. | |
163 | +|BR| | |
164 | +When a buffer is available, this is not needed. Depending on the size of the buffer, some messages can be send before | |
165 | +the are picked-up bt the receiver. Note this is always asymmetric: messages need to be send before the can be read. | |
166 | + | |
167 | +Blocking | |
142 | 168 | ~~~~~~~~ |
169 | +Both the writer and the reader can be blocking (or not). A blocking reader it will always return when a messages is | |
170 | +available -- and will pauze until then. Also writers can be blocking: it will pauze until the message can be send -- | |
171 | +e.g. the reader is available (rendezvous) or a message-buffer is free. | |
143 | 172 | |
144 | -There are may variants of Message-Passing_ | |
173 | +(A)Synchronous | |
174 | +~~~~~~~~~~~~~~ | |
175 | +Synchronous messages resembles normal function-calls. Typically a “question” is send, the call awaits the | |
176 | +answer-messages, and that answer is returned. | |
177 | + | |
145 | 178 | |
146 | 179 | |
147 | 180 | ************************************************************ |
148 | 181 | |
149 | -.. todo:: All below is draft and needs work!!!! | |
182 | +.. todo:: | |
150 | 183 | |
184 | + * Async, buffered, blocking | |
185 | + * Pipe : kind of data messages | |
186 | + | |
187 | + | |
188 | + .. todo:: All below is draft and needs work!!!! | |
151 | 189 | |
152 | 190 | |
153 | 191 | Models |
154 | 192 | ====== |
155 | 193 | |
156 | -Probably the oldest model to described concurrency is the Petri-net_; which has the disadvantage that is synchronous | |
194 | +Probably the oldest model to described concurrency is the | |
157 | 195 | (all tokens move at the same timeslot) -- which is a hard to implement (efficiently) on Multi-Core_. |
158 | 196 | |
159 | 197 | Actors |
@@ -183,6 +221,10 @@ | ||
183 | 221 | |BR| |
184 | 222 | But that condition does apply to Multi-Core_ too. Although the (timing) numbers do differ. |
185 | 223 | |
224 | +.. [#wall-time] | |
225 | + As reminder: We speak about *CPU-time* when we count the cycles that a core us busy; so when a core is waiting, no | |
226 | + CPU-time is used. And we use *wall-time* when we time according the “the clock on the wall”. | |
227 | + | |
186 | 228 | .. [#OOCS] |
187 | 229 | The brittleness of Critical-Sections_ can be reduced by embedding (the) (shared-) variable in an OO abstraction. By |
188 | 230 | using *getters and *setters*, that controll the access, the biggest risk is (mostly) gone. That does not, however, |
@@ -214,4 +256,4 @@ | ||
214 | 256 | .. _Actor-Model: https://en.wikipedia.org/wiki/Actor_model |
215 | 257 | .. _Actor-Model-Theory: https://en.wikipedia.org/wiki/Actor_model_theory |
216 | 258 | |
217 | -.. _Petri-Net: https://en.wikipedia.org/wiki/Petri_net | |
259 | + |
@@ -0,0 +1,33 @@ | ||
1 | +.. include:: /std/localtoc.irst | |
2 | + | |
3 | +.. _ActorAbstractions: | |
4 | + | |
5 | +================================ | |
6 | +Actor Abstractions (Busy, DRAFT) | |
7 | +================================ | |
8 | + | |
9 | +.. post:: | |
10 | + :category: Castle DesignStudy | |
11 | + :tags: Castle, Concurrency | |
12 | + | |
13 | + The Actor-Model_ which is inspired by physics, unlike older models like the famous Petri-net_ that are mostly | |
14 | + mathematical. And has become popular for Distributed-Computing_, where each node/computing act as in independent, | |
15 | + active actor. | |
16 | + | |
17 | + As an abstraction, this is similar to the “Many Core” :ref:`CC` we use for CCastle. Hence, we study this model a bit more | |
18 | + | |
19 | + | |
20 | + | |
21 | + | |
22 | + | |
23 | +---------- | |
24 | + | |
25 | +.. rubric:: Footnotes | |
26 | + | |
27 | +.. [#XXX] | |
28 | + YYY | |
29 | + | |
30 | +.. _Petri-Net: https://en.wikipedia.org/wiki/Petri_net | |
31 | +.. _Actor-Model: https://en.wikipedia.org/wiki/Actor_model | |
32 | +.. _Actor-Model-Theory: https://en.wikipedia.org/wiki/Actor_model_theory | |
33 | +.. _Distributed-Computing: https://en.wikipedia.org/wiki/Distributed_computing |
@@ -1,24 +0,0 @@ | ||
1 | -FSMs vs Grammars | |
2 | -================= | |
3 | - | |
4 | -In case you are wondering: *“Are FSMs and Grammars related, or the same?”* | |
5 | - | |
6 | -The are related, but not equal. | |
7 | - | |
8 | -As discovered by `Chomsky <https://en.wikipedia.org/wiki/Noam_Chomsky>`__ there is a `hierarchy <https://en.wikipedia.org/wiki/Chomsky_hierarchy>`__ of languages (or actually grammars). As Chomsky was studying all kind of languages, include natural “human” language, his perception differs from our interpretation. |BR| When we (SW/Engineers) speak about grammars, we typically refer to “Context free grammars” (`CFG <https://en.wikipedia.org/wiki/Context-free_grammar>`__) -- Chomsky classify them as “Type-2”. | |
9 | - | |
10 | -A non-deterministic PushDown machine (`PDA <https://en.wikipedia.org/wiki/Pushdown_automaton>`__) is needed to | |
11 | -recognise a “Type-2” (CFG) Grammar. (Which is a simple version of a stack-machine, which is lot more restricted then a Turing Machine). | |
12 | -|BR| | |
13 | -And as a FSM is one of the most restricted machines that exist --it can recognise only “Type-3” `Regular grammar | |
14 | -<https://en.wikipedia.org/wiki/Regular_grammar>`__-- a FSM can’t be used to recognise a (CFG) grammar. | |
15 | - | |
16 | -.. seealso:: | |
17 | - | |
18 | - .. postlist:: | |
19 | - :category: Castle | |
20 | - :tags: FSM | |
21 | - | |
22 | - .. postlist:: | |
23 | - :category: Castle | |
24 | - :tags: Grammar |