OSDN > Developer > albertmietus > Chamber > DocIdeas > 提交

DocIdeas
Fork

(Original repository, No fork origin)

提交

Commit MetaInfo

修订版	582949d0d56cfa883c5ed8cfacda9808b8898103 (tree)
时间	2022-08-31 01:44:03
作者	Albert Mietus < albert AT mietus DOT nl >
Commiter	Albert Mietus < albert AT mietus DOT nl >

Log Message

asis

更改概述

modified: CCastle/2.Analyse/8.ConcurrentComputingConcepts.rst (diff)
modified: CCastle/2.Analyse/CCC-sidebar-CS.irst (diff)
modified: CCastle/2.Analyse/CCC-sidebar-concurrency.irst (diff)

差异

diff -r e917f1b49679 -r 582949d0d56c CCastle/2.Analyse/8.ConcurrentComputingConcepts.rst

--- a/CCastle/2.Analyse/8.ConcurrentComputingConcepts.rst Wed Aug 24 18:15:17 2022 +0200

+++ b/CCastle/2.Analyse/8.ConcurrentComputingConcepts.rst Tue Aug 30 18:44:03 2022 +0200

		@@ -1,13 +1,4 @@
1		-.. .. include:: /std/localtoc.irst
2		-
3		-.. sidebar:: On this page (override:4)
4		- :class: localtoc
5		-
6		- .. contents::
7		- :depth: 4
8		- :local:
9		- :backlinks: none
10		-
	1	+.. include:: /std/localtoc.irst
11	2
12	3	.. _ConcurrentComputingConcepts:
13	4

		@@ -19,13 +10,15 @@
19	10	:category: Castle DesignStudy
20	11	:tags: Castle, Concurrency, DRAFT
21	12
22		- Shortly, more and more cores will become available alike I described in “:ref:`BusyCores`”. Castle should make it
23		- easy to write code for all of them: not to keep them busy, but to maximize speed up [useCase: :need:`U_ManyCore`].
24		- We also discussed threads_: they do not scale well for CPU-bound (embedded) systems. And, I introduced some (other)
25		- concurrency abstractions; which do not always fit nicely in existing languages.
	13	+ Sooner as we may realize even embedded systems will have many, many cores; as I described in
	14	+ “:ref:`BusyCores`”. Castle should make it easy to write code for all of them: not to keep them busy, but to maximize
	15	+ speed up [useCase: :need:`U_ManyCore`]. There I also showed that threads_ do not scale well for CPU-bound (embedded)
	16	+ systems. Last, I introduced some (more) concurrency abstractions. Some are great, but they often do not fit
	17	+ nicely in existing languages.
	18	+
	19	+ Still, as Castle is a new language we have the opportunity to select such a concept and incorporate it into the
	20	+ language ...
26	21	\|BR\|
27		- As Castle is a new language we have the opportunity to select such a concept and incorporate it into the language ...
28		-
29	22	In this blog, we explore a bit of theory. I will focus on semantics and the possibilities to implement them
30	23	efficiently. The exact syntax will come later.
31	24

		@@ -33,10 +26,10 @@
33	26	*****************
34	27
35	28	There are many theories available and some more practical expertise but they hardly share a common vocabulary.
36		-For that reason, let describe some basic terms, that will be used in these blogs. As always, we use Wikipedia as common
37		-ground, and add links for a deep-dive.
	29	+For that reason, let’s describe some basic terms, that will be used in these blogs. As always, we use Wikipedia as common
	30	+ground and add links for a deep dive.
38	31	\|BR\|
39		-Again, we use ‘task’ as the most generic term of work-to-be-executed; that can be (in) a process, (on) a thread, (by) a
	32	+Again, we use ‘task’ as the most generic term for work-to-be-executed; that can be (in) a process, (on) a thread, (by) a
40	33	computer, etc.
41	34
42	35	.. include:: CCC-sidebar-concurrency.irst

		@@ -46,18 +39,17 @@
46	39
47	40	Concurrency_ is the ability to “compute” multiple tasks at the same time.
48	41	\|BR\|
49		-Designing concurrent software isn’t that complicated but; demands another mindset the when we write software that does
50		-one tasks afer the other.
	42	+Designing concurrent software isn’t that complicated but; demands another mindset than when we write software that does
	43	+one task after the other.
51	44
52	45	A typical example is a loop: suppose we have a sequence of numbers and we like to compute the square of each one. Most
53	46	developers will loop over those numbers, get one number, calculate the square, store it in another list, and continue.
54		-It works, but we have also instructed the computer to do it in sequence — especially when the
55		-task is bit more complicated, the compiler does know whether the ‘next task’ depends on the current one, and can’t
56		-optimise it.
	47	+It works, but we have also instructed the computer to do it in sequence. Especially when the task is a bit more
	48	+complicated, the compiler does know whether the ‘next task’ depends on the current one, and can’t optimize it.
57	49
58		-A better plan is to tell the compiler about the tasks; most are independently: square a number. There is also one that
59		-has to be run at the end: combine the results into a new list. And one is bit funny: distribute the sequence-elements
60		-over the “square-tasks” — clearly, one has to start with this one, but it can be concurrent with many others too.
	50	+A better plan is to tell the compiler about different tasks. Most are independent: square a number. There is also one
	51	+that has to be run at the end: combine the results into a new list. And one is a bit funny: distribute the elements over
	52	+the “square tasks”. Clearly one has to start with this one, but it can be concurrent with many others too.
61	53	\|BR\|
62	54	This is not a parallel algorithm. When not specifying the order, we allow parallel execution. We do not demand it,
63	55	sequential execution is allowed too.

		@@ -66,54 +58,54 @@
66	58	Parallelism
67	59	===========
68	60
69		-Parallelism_ is about executing multiple tasks (seemingly) at the same time. We will focus running multiple concurrent
70		-task (of the same program) on “as many cores as possible”. When we assume a thousand cores, we need a thousand
71		-independent tasks (at least) to gain maximal speed up. A thousand at any moment!
	61	+Parallelism_ is about executing multiple tasks (seemingly) at the same time. We will on focus running many multiple
	62	+concurrent tasks (of the same program) on “as many cores as possible”. When we assume a thousand cores, we need a
	63	+thousand independent tasks (at least) to gain maximal speed up. A thousand at any moment!
72	64	\|BR\|
73		-It’s not only about doing a thousand tasks at the same time (that is not to complicated, for a computer), but also —
	65	+It’s not only about doing a thousand tasks at the same time (that is not too complicated, for a computer) but also —
74	66	probably: mostly — about finishing a thousand times faster…
75	67
76		-With many cores, multiple program-steps can be executed at the same time: from changing the same variable, acces the
77		-same memory, or compete for new memory. And when solving that, we introduce new hazards: like deadlocks_ and even
78		-livelocks_.
	68	+With many cores, multiple “program lines” can be executed at the same time, which can introduce unforeseen effects:
	69	+changing the same variable, accessing the same memory, or competing for new, “free” memory. And when solving that, we
	70	+introduce new hazards: like deadlocks_ and even livelocks_.
79	71
80	72
81	73	Distributed
82	74	-----------
83	75
84		-A special form of parallelisme is Distributed-Computing_: compute on many computers. Many experts consider this
85		-as an independent field of expertise; still --as Multi-Core_ is basically “many computers on a chips”-- its there is an
86		-analogy [#DistributedDiff]_, and we should use the know-how that is available, to design out “best ever language”.
	76	+A special form of parallelism is Distributed-Computing_: computing on many computers. Many experts consider this
	77	+an independent field of expertise. Still --as Multi-Core_ is basically “many computers on a chip”-- it’s an
	78	+available, adjacent [#DistributedDiff]_ theory, and we should use it, to design our “best ever language”.
87	79
88	80	.. include:: CCC-sidebar-CS.irst
89	81
90	82	Efficient Communication
91	83	***********************
92	84
93		-When multiple tasks run in concurrently, they have to communicate to pass data and to controll progress. Unlike in a
94		-sequential program -- where the controll is trivial, as is sharing data-- this needs a bit of extra effort.
	85	+When multiple tasks run concurrently, they have to communicate to pass data and control progress. Unlike in a
	86	+sequential program -- where the control is trivial, as is sharing data-- this needs a bit of extra effort.
95	87	\|BR\|
96		-There are two main approches: shared-data of message-passing; we will introduce them below.
	88	+There are two main approaches: shared-data of message-passing; we will introduce them below.
97	89
98		-Communication takes time, especially wall-time [#wall-time]_ (or real time) and may slow down computing. Therefore
99		-communication has to be efficient. This is a hard problem and becomes harder when we have more communication, more
100		-concurrently (& parallel) tasks, and/or those task are shorter living. Or better: is depends on the ratio
101		-communication-time over time-between-two-communications.
	90	+Communication takes time, especially wall time [#wall-time]_ (or clock time) and may slow down computing. Therefore
	91	+communication has to be efficient. This is an arduous problem and becomes harder when we have more communication, more
	92	+concurrency, more parallelism, and/or those tasks are short(er)living. Or better: it depends on the ratio between the
	93	+communication-time and the time-between-two-communications.
102	94
103	95
104	96	Shared Memory
105	97	=============
106	98
107		-In this model all tasks (usually threads or process) have some shared/common memory; typically “variables”. As the acces
108		-is asynchronous, the risk exist the data is updated “at the same time” by two or more tasks. This can lead to invalid
	99	+In this model all tasks (usually threads or processes) have some shared/common memory; typically “variables”. As the access
	100	+is asynchronous, the risk exists the data is updated “at the same time” by two or more tasks. This can lead to invalid
109	101	data and so Critical-Sections_ are needed.
110	102
111		-This is a very basic model which assumes that there is physically memory that can be shared. In distributed systems this
112		-is uncommon; but for threads it’s straightforward. As disadvantage of this model is that is hazardous: Even when a
113		-single access to such a shared variable is not protected by a Critical-Section_, the whole system can break [#OOCS]_.
	103	+This is a very basic model which assumes that there is physical memory that can be shared. In distributed systems this
	104	+is uncommon, but for threads it’s straightforward. A disadvantage of this model is that is hazardous: Even when a
	105	+single modifier of a shared variable is not protected by a Critical-Section_, the whole system can break [#OOCS]_.
114	106
115		-The advantage of shared memory is the short communication-time. The wall-time and CPU-time are roughly the same: the
116		-time to write & read the variable, added to the (overhead) time for the critical section -- which is typical the
	107	+The advantage of shared memory is the fast communication-time. The wall-time and CPU-time are roughly the same: the
	108	+time to write & read the variable added to the (overhead) time for the critical section -- which is typically the
117	109	bigger part.
118	110
119	111

		@@ -122,29 +114,29 @@
122	114
123	115	A more modern approach is Message-Passing_: a task sends some information to another; this can be a message, some data,
124	116	or an event. In all cases, there is a distinct sender and receiver -- and apparently no common/shared memory-- so no
125		-Critical-Sections [#MPCS]_ are needed; at least not explicitly. Messages can be used by all kind of task; even in a
126		-distributed system -- then the message (and it data) is serialised, transmitted over a network and deserialised. Which
127		-can introduce some overhead and delay.
	117	+Critical-Sections [#MPCS]_ are needed; at least not explicitly. Messages are easier to use and more generic: they can be
	118	+used in single-, multi-, and many-core systems. Even distributed systems are possible -- then the message (and its data)
	119	+is serialised, transmitted over a network, and deserialised.
128	120
129		-As you may have noticed, there is an analogy between Message-Passing_ and Events_ (in an the event-loop). They have
130		-separate history, but are quite similar in nature. Like a “message”, the “event” is also used to share data (& controll)
131		-to isolated “tasks”.
	121	+As you may have noticed, there is an analogy between Message-Passing_ and Events_ (in an event-loop). They have separate
	122	+histories but are quite similar in nature. Like a “message”, the “event” is also used to share data (& control) to
	123	+isolated “tasks”.
132	124
133	125	.. warning::
134	126
135		- Many people use the networking mental model when they thing about Message-Passing_, and wrongly assume there is
	127	+ Many people use the networking mental model when they think about Message-Passing_, and wrongly assume there is
136	128	always serialisation (and network) overhead. This is not needed for parallel cores as they typically have shared
137	129	(physical) memory.
138	130
139		- Then, we can use the message abstraction at developer-level, and let the compiler will translate that it into shared
140		- memory instructions at processor level.
	131	+ Then, we can use the message abstraction at developer-level, and let the compiler translate that into shared
	132	+ memory instructions for the processor level.
141	133	\|BR\|
142	134	Notice: As the compiler will insert the (low level) Semaphores_, the risk that a developer forgets one is gone!
143	135
144		-Aspects
145		--------
	136	+Messaging Aspects
	137	+-----------------
146	138
147		-There are many variant on messaging, mostly combinations some fundamental aspects
	139	+There are many variant on messaging, mostly combinations some fundamental aspects. Let mentions some basic ones.
148	140
149	141	.. include:: CCC-sidebar-async.irst
150	142

		@@ -196,14 +188,46 @@
196	188	Messages --or actually the channel that transport them-- can be unidirectional: from sender to receiver only;
197	189	bidirectional: both sides can send and receive; or broadcasted: one message is send to many receivers [#anycast]_.
198	190
199		-Reliability
200		-~~~~~~~~~~~
	191	+Reliability & Order
	192	+~~~~~~~~~~~~~~~~~~~
201	193
202		-Especially when studying “network messages”, we have to consider Reliability_ too: Will a send message always be
203		-received (and will they be received in the same order, and/or only once. This has advantages, but disadvantages too: a
204		-reliable connection (over an unreliable network) needs more overhead and has more delay.
	194	+Especially when studying “network messages”, we have to consider Reliability_ too. Many developers assume that a send
	195	+message is always received and that when multiple messages are sent, they are received in the same order. In most,
	196	+traditional --single-core-- applications this is always true. With networking applications, this is not always
	197	+the case. Messages can get lost, received out of order, or even read twice. Although it is always possible to add a
	198	+“reliability layer”.
	199	+\|BR\|
	200	+Such a layer makes writing the application easier but introduces overhead. And therefore not always the right solution.
205	201
206		-This also applies to software-message, although in lesser extent.
	202	+In Castle, we have “active components”: many cores are running parallel, all doing a part of the overall (concurrent)
	203	+program. This resembles a networking application -- even while there is no real network -- where at least three nodes
	204	+are active.
	205	+
	206	+This is a bit more complicated, so let us start with an example. Say, we have 3 components ``A``, ``B1``, and
	207	+``B2``. All are connected to all others. We assume that messages are unbuffered, non-blocking, never got lost, and that
	208	+two messages over the same channel are never out-of-order. Sound simple, isn’t it?
	209	+\|BR\|
	210	+Now state that ``A`` send a message (`m1`) to ``B1`` and then one (`m2`) to ``B1``. The “B components” will --on
	211	+receiving a message from ``A`` -- send a short message to the other one (`m3` and `m4`). And that message triggers
	212	+(again both in ``B1`` and ``B2``) to send an answer to ``A``; so `m5` and `m6`.
	213	+
	214	+Now the question is: in which order those answers (in ``A``) are received?
	215	+\|BR\|
	216	+The real answer is: you don’t know!
	217	+\|BR\|
	218	+It’s clear that ``A`` will get `m5` and `m6` -- given that all messages (aka channels) are reliable. But there are many
	219	+ways those messages may receive in the opposite order. Presumably, even in more ways, than you can imagine. For example,
	220	+``B1`` might processes `m4` before it process `m1`! This can happen when channel ``A->B1`` is slow, or when ``B2``
	221	+gets CPU-time before ``B1``, or...
	222	+
	223	+When we add buffering, more connected components, etc this “network” acts less reliable than we might aspect (even
	224	+though each message is reliable). When we add some real-time demands (see below), the ability to use/model a solution
	225	+using an unreliable message becomes attractive ...
	226	+\|BR\|
	227	+It’s not that you should always favor unreliable, out-of-order messages. Without regard, No! We are designing a new
	228	+language, however --one that should run efficiently on thousands of core, in a real-time embedded system-- then the
	229	+option to utilize them may be beneficial.
	230	+
207	231
208	232	.. hint::
209	233

diff -r e917f1b49679 -r 582949d0d56c CCastle/2.Analyse/CCC-sidebar-CS.irst

--- a/CCastle/2.Analyse/CCC-sidebar-CS.irst Wed Aug 24 18:15:17 2022 +0200

+++ b/CCastle/2.Analyse/CCC-sidebar-CS.irst Tue Aug 30 18:44:03 2022 +0200

		@@ -3,29 +3,29 @@
3	3
4	4	.. sidebar:: About Critical Sections
5	5
6		- For those, who are not familiar with Critical-Sections_ and/or Semaphores_, a short intro.
7		-
8		- .. rubric:: Dilemma: Statements are not atomic
	6	+ For those, who are not familiar with Critical-Sections_ or Semaphores_, here is a short intro.
9	7
10		- Unlike some developers presume “code-lines” are not ‘atomic’: they can be interrupted. When using (e.g) threads_,
11		- the “computer” can pause one thread halfway a statement, to run another one temporally and continue a millisecond
12		- later. When that happens when writing or reading a variable and the other thread also access the same shared-memory,
13		- the result is unpredictable. To prevent that, we need to controle the handling that variable: make it a
	8	+ .. rubric:: Dilemma: Statements are not atomic.
	9	+
	10	+ Unlike some developers presume, “code lines” are not ‘atomic’: they can be interrupted. When using (e.g.) threads_,
	11	+ the “computer” can pause one thread halfway through a statement to run another one temporally and continue a millisecond
	12	+ later. When it happens during writing or reading a variable, and the other thread also accesses the same shared-memory,
	13	+ the result is unpredictable. To prevent that, we need to control the handling of that variable: make it a
14	14	Critical-Section_.
15	15
16		- .. rubric:: Solve it by marking Sections as exclusive
	16	+ .. rubric:: Solve it by marking sections ‘exclusive’.
17	17
18		- In essense, we have to tell the “computer” that a line (of a few lines) are atomic; to make access exclusive The
19		- the compiler will add some extra fundamental instructions (specific for that CPU-type) to assure this. A check is
	18	+ In essence, we have to tell the “computer” that a line (or a few lines) is atomic. To enforce the access exclusive,
	19	+ the compiler will add some extra fundamental instructions (specific for that type of CPU) to assure this. A check is
20	20	inserted just before the section is entered, and the thread will be suspended when another task is using it. When
21		- granted acces, a bit of bookkeeping is done -- so the “check” in other thread is halted). That bookkeeping is updated
22		- and when leaving. Along with more bookkeeping to un-pauze the suspended threads.
	21	+ access is granted, a bit of bookkeeping is done -- so that the “check” in other threads will halt). That bookkeeping
	22	+ is updated when leaving. Along with more bookkeeping to un-pause the suspended threads.
23	23
24		- .. rubric:: Complication: overhead
	24	+ .. rubric:: Complication: overhead!
25	25
26	26	As you can imagen, this “bookkeeping” is extra complicated on a Multi-Core_ system; some global data structure is
27	27	needed; which is a Critical-Sections in itself.
28	28	\|BR\|
29	29	There are many algorithms to solve this. All with the same disadvantage: it takes a bit of time -- possible by
30		- “Spinlocking_” all other cores (for a few nanoseconds). As Critical-Sections a usually short (e.g a assignment, or a
31		- few lines) the overhead can be (relatively) huge [#timesCPU]_!
	30	+ “Spinlocking_” all other cores (for a few nanoseconds). As Critical-Sections a usually short (e.g. one assignment, or
	31	+ a few lines) the overhead can be (relatively) huge [#timesCPU]_!

diff -r e917f1b49679 -r 582949d0d56c CCastle/2.Analyse/CCC-sidebar-concurrency.irst

--- a/CCastle/2.Analyse/CCC-sidebar-concurrency.irst Wed Aug 24 18:15:17 2022 +0200

+++ b/CCastle/2.Analyse/CCC-sidebar-concurrency.irst Tue Aug 30 18:44:03 2022 +0200

		@@ -1,4 +1,4 @@
1		-.. -- rst --
	1	+.. --rst--
2	2	included in `8.BusyCores-concepts.rst`
3	3
4	4	.. sidebar::

		@@ -15,7 +15,7 @@
15	15	for n in L1:
16	16	L2.append(power(n))
17	17
18		- .. note:: As ``power()`` could have side-effects, the compiler must keep the defined order!
	18	+ .. note:: As ``power()`` could have side effects, the compiler must keep the defined order!
19	19
20	20	.. tab:: Concurrent
21	21

		@@ -28,8 +28,8 @@
28	28	.. note::
29	29
30	30	Although (current) python-compilers will run it sequentially, it is allowed to distribute it; even when
31		- ``power()`` has side-effects!
	31	+ ``power()`` has side effects!
32	32	\|BR\|
33	33	As long as python put the results in the correct order in list ``L2`` any order is allowed. “Out of
34		- order” side-effects are allowed by this code.
	34	+ order” side effects are allowed by this code.
35	35

DocIdeas Fork

提交

标签

Frequently used words (click to add to your profile)

Commit MetaInfo

Log Message

更改概述

差异

DocIdeas
Fork