• R/O
  • SSH

提交

标签
No Tags

Frequently used words (click to add to your profile)

javac++androidlinuxc#windowsobjective-ccocoa誰得qtpythonphprubygameguibathyscaphec計画中(planning stage)翻訳omegatframeworktwitterdomtestvb.netdirectxゲームエンジンbtronarduinopreviewer

Commit MetaInfo

修订版f83313ef3fe05c825da58b2d265d9e91986062f1 (tree)
时间2022-11-05 02:48:21
作者Albert Mietus < albert AT mietus DOT nl >
CommiterAlbert Mietus < albert AT mietus DOT nl >

Log Message

Moved CCastle QuickNotes on tool-desing into new section

更改概述

差异

diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/3.Design/B.Workshop/index.rst
--- a/CCastle/3.Design/B.Workshop/index.rst Fri Nov 04 18:43:20 2022 +0100
+++ b/CCastle/3.Design/B.Workshop/index.rst Fri Nov 04 18:48:21 2022 +0100
@@ -5,7 +5,7 @@
55 =========================
66
77 No computer language can exist without a set of :ref:`Castle-WorkshopTools`; at least a compiler is needed -- and a lot
8-more. This chapter contains a growing number of notes, blogs & article on desinging them
8+more. This chapter contains a growing number of notes, blogs & article on desinging them.
99
1010
1111 .. toctree::
@@ -13,6 +13,7 @@
1313 :titlesonly:
1414 :glob:
1515
16+ */index
1617 *
1718
1819
diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/3.Design/B.Workshop/short/arpeggio.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/CCastle/3.Design/B.Workshop/short/arpeggio.rst Fri Nov 04 18:48:21 2022 +0100
@@ -0,0 +1,17 @@
1+.. _QN_Arpeggio:
2+
3+===================
4+QuickNote: Arpeggio
5+===================
6+
7+.. post::
8+ :category: CastleBlogs, rough
9+ :tags: Grammar, PEG, DRAFT
10+
11+ In this short QuickNote blog we give a bit of info on `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__; an python
12+ package to implement a (PEG) parser. Eventually, it will be implemented in Castle -- like all
13+ :ref:`Castle-WorkshopTools`. To kickstart, we use python and python-packages.
14+
15+ As Arpeggio is quite well `documented <https://textx.github.io/Arpeggio/2.0/>`__ is it a short note
16+
17+.. seealso:: :ref:`QN_PEGEN`
diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/3.Design/B.Workshop/short/index.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/CCastle/3.Design/B.Workshop/short/index.rst Fri Nov 04 18:48:21 2022 +0100
@@ -0,0 +1,11 @@
1+================
2+Some Quick Blogs
3+================
4+
5+.. toctree::
6+ :maxdepth: 2
7+ :titlesonly:
8+ :glob:
9+
10+ *
11+
diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/3.Design/B.Workshop/short/pegen_parser.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/CCastle/3.Design/B.Workshop/short/pegen_parser.rst Fri Nov 04 18:48:21 2022 +0100
@@ -0,0 +1,178 @@
1+.. include:: /std/localtoc.irst
2+
3+.. _QN_PEGEN:
4+
5+================
6+QuickNote: PEGEN
7+================
8+
9+.. post:: 2022/11/3
10+ :category: CastleBlogs, rough
11+ :tags: Grammar, PEG
12+
13+ To implement CCastle we need a parser, as part of the compiler. Eventually, that parser will be writen in Castle. For
14+ now we kickstart it in python; which has several packages that can assist us. As we like to use an PEG one, there
15+ are a few options. `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__ is well known, and has some nice options --
16+ but can’t handle `left recursion <https://en.wikipedia.org/wiki/Left_recursion>`__ -- like most PEG-parsers.
17+
18+ Recently python itself uses a PEG parser, that supports `left recursion
19+ <https://en.wikipedia.org/wiki/Left_recursion>`__ (which is a recent development). That parser is also available as a
20+ package: `pegen <https://we-like-parsers.github.io/pegen/index.html>`__; but hardly documented.
21+
22+This blog is writen to remember some leassons learned when playing with it. And as kind of informal docs.
23+
24+.. seealso:: :ref:`QN_Arpeggio`
25+
26+Build-In Lexer
27+==============
28+
29+Pegen is specially writen for Python and use a specialized lexer; unlike most PEG-parser that uses PEG for lexing too. Pegen
30+uses the `tokenizer <https://docs.python.org/3/library/tokenize.html>`__ that is part of Python. This comes with some
31+restrictions.
32+
33+This lexer -or tokenize(r) as python calls it-- is used **both** to read the grammar (the peg-file), *and* to read the
34+source-files that are parser by the generated parser.
35+
36+.. hint::
37+
38+ These restrictions applies when we use pegen as modele: ``pyton -m pegen ...``; that calls `simple_parser_main()`.
39+ |BR|
40+ But also when we use the parser-class in own code --so, when importing pegen ``from pegen.parser Parser ...``-- it is
41+ restricted. Then is a bit more possible, as we can configure another (self made) lexer. The interface is quite narrow
42+ to python however.
43+
44+
45+Tokens
46+------
47+
48+The lexer will recognize some tokes that are specialy for python, like `INDENT` & `DEDENT`. Also some generic tokens
49+like NAME (which is an ID) and `NUMBER` are know, and can be used to define the language.
50+
51+Unfortunally, it will also find some tokens --typical operators-- that *hardcoded* for python. Even when we like to use
52+them differently; possible combined with other characters. Then, those will not be found; not the literal-strings as set
53+in the grammar.
54+
55+.. note::
56+
57+ Pegen speaks about *(soft)* **keywords** for all kind of literal terminals; even when they are more like operators
58+ than *words*.
59+
60+.. warning::
61+
62+ When the grammar defines (literal) terminals (or keywords) --especially for operators-- make sure the lexer will not
63+ break them into predefined tokens!
64+ |BR|
65+ This will not give an error, but it does not work!
66+
67+ .. code-block:: PEG
68+
69+ Left_arrow_BAD: '<-' ## This is WRONG, as ``<`` is seen as a token. And so, `<-` is never found
70+ Left_arrow_OKE: '<' '-' ## This is acceptable
71+
72+ This *splitting* results however in 2 enties in the resulting tree --unless one uses `grammar actions
73+ <https://we-like-parsers.github.io/pegen/grammar.html#grammar-actions>`__ to create one new “token”.
74+
75+.. seealso:: See https://docs.python.org/3/library/token.html, for an overiew of the predefined tokens
76+
77+.. tip::
78+
79+ A quick trick to see how a file is split into tokens, use ``python -m tokenize [-e] filename.peg``.
80+ |BR|
81+ Make sure you do not use string-literals that (eg) are composed of two tokens. Like the above mentioned ``<--``
82+
83+
84+
85+.. sidebar:: Reserverd
86+ :class: localtoc
87+
88+ - showpeek
89+ - name
90+ - number
91+ - string
92+ - op
93+ - type_comment
94+ - soft_keyword
95+ - expect
96+ - expect_forced
97+ - positive_lookahead
98+ - negative_lookahead
99+ - make_syntax_error
100+
101+Rule names
102+----------
103+
104+The *GeneratedParser* inherites and calls the base ``pegen.parser.Parser`` class and has methods for all
105+rule-names. This implies some names should not be used as rule-names (in all cases) -- see the sidebar.
106+
107+
108+Meta Syntax (issues)
109+====================
110+
111+No: regexps
112+-----------
113+
114+PEGEN has **no** support for regular expressions probably as it uses a custom lexer.
115+
116+Unordered Group starts a comment
117+--------------------------------
118+
119+PEGEN (or it lexer) used the ``#`` to start a comment. This implies an **Unordered group** ``( sequence )#`` --as in
120+`Arpeggio <https://textx.github.io/Arpeggio/2.0/grammars/#grammars-written-in-peg-notations>`__-- are not recognized
121+
122+A workarond is to use another character like ``@`` instead of the hash (``#``).
123+
124+
125+Result/Output
126+=============
127+
128+cmd-tool
129+--------
130+
131+The commandline tool ``pyton -m pegen ...`` only prints the parsed tree: a list (shown as ``[`` ... ``]``) with
132+sub-list and/or `TokenInfo` namedtuples. Each `TokenInfo` has 5 elements: a token type (an int and its enum-name), the
133+token-string (that was was parsed), the begin & end location (line- & column-number), and the full line that is beeing
134+parsed.
135+
136+No info about the matched gramer-rule (e.g. the rule-name) is shown. Actually that info is not part of the parsed-tree.
137+
138+.. seealso:: This `structure is described <https://docs.python.org/3/library/tokenize.html?highlight=TokenInfo>`__ in
139+ the tokenize module; without specifying its name: TokenInfo.
140+
141+The parser
142+----------
143+
144+The GeneratedParser (and/or it’s baseclass: ``pegen.parser.Parser``) returns only (list of) tokens from the tokenizer (a
145+OO wrapper arround tokenize). And so, the same TokenInfo objects as described above.
146+
147+Stability
148+=========
149+
150+The current pegen package op `pypi <https://pypi.org/project/pegen/>`__ is V0.1.0 -- which already showns it not
151+mature. `That version github <https://github.com/we-like-parsers/pegen/tree/v0.1.0>`__ is dated September 2021 (with 36
152+commits). The `current <https://github.com/we-like-parsers/pegen/tree/db7552dda0af6b27cbbb1230be116e8a56c49736>`__
153+version (Nov 22) has 20 commits more (56).
154+|BR|
155+And can be installed with ``pip install git+https://github.com/we-like-parsers/pegen``
156+
157+It os however, not fully compatible. By example ``pegen/parser.y::simple_parser_main()`` now expect an ATS object (to
158+print), not a list of TokenInfo.
159+
160+.. tip::
161+
162+ The pegen package is **NOT** used inside the `(C)Python tool
163+ <https://github.com/python/cpython/tree/main/Tools/peg_generator>`__; the CPython version is heavily related to other
164+ details of CPython; it can also generate C-code. The pegen-package is based on it, and more-or-less in sync, can
165+ generate Python-code only, but is not depending on the compiler-implementation details.
166+
167+ .. seealso:: https://we-like-parsers.github.io/pegen/#differences-with-cpythons-pegen
168+
169+
170+Buggy current version
171+---------------------
172+
173+The git version contains (at least) one bug. The function ``parser::simple_parser_main()``, that is called when using the
174+generated file, uses the AST module to print (show) the result -- which simple does not work.
175+|BR|
176+Probably, that* default main* isn’t used a lot (Also, I prever to use -- have use-- a own main). Still it shows it
177+inmaturity.
178+
diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/short/arpeggio.rst
--- a/CCastle/short/arpeggio.rst Fri Nov 04 18:43:20 2022 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,17 +0,0 @@
1-.. _QN_Arpeggio:
2-
3-===================
4-QuickNote: Arpeggio
5-===================
6-
7-.. post::
8- :category: CastleBlogs, rough
9- :tags: Grammar, PEG, DRAFT
10-
11- In this short QuickNote blog we give a bit of info on `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__; an python
12- package to implement a (PEG) parser. Eventually, it will be implemented in Castle -- like all
13- :ref:`Castle-WorkshopTools`. To kickstart, we use python and python-packages.
14-
15- As Arpeggio is quite well `documented <https://textx.github.io/Arpeggio/2.0/>`__ is it a short note
16-
17-.. seealso:: :ref:`QN_PEGEN`
diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/short/index.rst
--- a/CCastle/short/index.rst Fri Nov 04 18:43:20 2022 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,11 +0,0 @@
1-================
2-Some Quick Blogs
3-================
4-
5-.. toctree::
6- :maxdepth: 2
7- :titlesonly:
8- :glob:
9-
10- *
11-
diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/short/pegen_parser.rst
--- a/CCastle/short/pegen_parser.rst Fri Nov 04 18:43:20 2022 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,178 +0,0 @@
1-.. include:: /std/localtoc.irst
2-
3-.. _QN_PEGEN:
4-
5-================
6-QuickNote: PEGEN
7-================
8-
9-.. post:: 2022/11/3
10- :category: CastleBlogs, rough
11- :tags: Grammar, PEG
12-
13- To implement CCastle we need a parser, as part of the compiler. Eventually, that parser will be writen in Castle. For
14- now we kickstart it in python; which has several packages that can assist us. As we like to use an PEG one, there
15- are a few options. `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__ is well known, and has some nice options --
16- but can’t handle `left recursion <https://en.wikipedia.org/wiki/Left_recursion>`__ -- like most PEG-parsers.
17-
18- Recently python itself uses a PEG parser, that supports `left recursion
19- <https://en.wikipedia.org/wiki/Left_recursion>`__ (which is a recent development). That parser is also available as a
20- package: `pegen <https://we-like-parsers.github.io/pegen/index.html>`__; but hardly documented.
21-
22-This blog is writen to remember some leassons learned when playing with it. And as kind of informal docs.
23-
24-.. seealso:: :ref:`QN_Arpeggio`
25-
26-Build-In Lexer
27-==============
28-
29-Pegen is specially writen for Python and use a specialized lexer; unlike most PEG-parser that uses PEG for lexing too. Pegen
30-uses the `tokenizer <https://docs.python.org/3/library/tokenize.html>`__ that is part of Python. This comes with some
31-restrictions.
32-
33-This lexer -or tokenize(r) as python calls it-- is used **both** to read the grammar (the peg-file), *and* to read the
34-source-files that are parser by the generated parser.
35-
36-.. hint::
37-
38- These restrictions applies when we use pegen as modele: ``pyton -m pegen ...``; that calls `simple_parser_main()`.
39- |BR|
40- But also when we use the parser-class in own code --so, when importing pegen ``from pegen.parser Parser ...``-- it is
41- restricted. Then is a bit more possible, as we can configure another (self made) lexer. The interface is quite narrow
42- to python however.
43-
44-
45-Tokens
46-------
47-
48-The lexer will recognize some tokes that are specialy for python, like `INDENT` & `DEDENT`. Also some generic tokens
49-like NAME (which is an ID) and `NUMBER` are know, and can be used to define the language.
50-
51-Unfortunally, it will also find some tokens --typical operators-- that *hardcoded* for python. Even when we like to use
52-them differently; possible combined with other characters. Then, those will not be found; not the literal-strings as set
53-in the grammar.
54-
55-.. note::
56-
57- Pegen speaks about *(soft)* **keywords** for all kind of literal terminals; even when they are more like operators
58- than *words*.
59-
60-.. warning::
61-
62- When the grammar defines (literal) terminals (or keywords) --especially for operators-- make sure the lexer will not
63- break them into predefined tokens!
64- |BR|
65- This will not give an error, but it does not work!
66-
67- .. code-block:: PEG
68-
69- Left_arrow_BAD: '<-' ## This is WRONG, as ``<`` is seen as a token. And so, `<-` is never found
70- Left_arrow_OKE: '<' '-' ## This is acceptable
71-
72- This *splitting* results however in 2 enties in the resulting tree --unless one uses `grammar actions
73- <https://we-like-parsers.github.io/pegen/grammar.html#grammar-actions>`__ to create one new “token”.
74-
75-.. seealso:: See https://docs.python.org/3/library/token.html, for an overiew of the predefined tokens
76-
77-.. tip::
78-
79- A quick trick to see how a file is split into tokens, use ``python -m tokenize [-e] filename.peg``.
80- |BR|
81- Make sure you do not use string-literals that (eg) are composed of two tokens. Like the above mentioned ``<--``
82-
83-
84-
85-.. sidebar:: Reserverd
86- :class: localtoc
87-
88- - showpeek
89- - name
90- - number
91- - string
92- - op
93- - type_comment
94- - soft_keyword
95- - expect
96- - expect_forced
97- - positive_lookahead
98- - negative_lookahead
99- - make_syntax_error
100-
101-Rule names
102-----------
103-
104-The *GeneratedParser* inherites and calls the base ``pegen.parser.Parser`` class and has methods for all
105-rule-names. This implies some names should not be used as rule-names (in all cases) -- see the sidebar.
106-
107-
108-Meta Syntax (issues)
109-====================
110-
111-No: regexps
112------------
113-
114-PEGEN has **no** support for regular expressions probably as it uses a custom lexer.
115-
116-Unordered Group starts a comment
117---------------------------------
118-
119-PEGEN (or it lexer) used the ``#`` to start a comment. This implies an **Unordered group** ``( sequence )#`` --as in
120-`Arpeggio <https://textx.github.io/Arpeggio/2.0/grammars/#grammars-written-in-peg-notations>`__-- are not recognized
121-
122-A workarond is to use another character like ``@`` instead of the hash (``#``).
123-
124-
125-Result/Output
126-=============
127-
128-cmd-tool
129---------
130-
131-The commandline tool ``pyton -m pegen ...`` only prints the parsed tree: a list (shown as ``[`` ... ``]``) with
132-sub-list and/or `TokenInfo` namedtuples. Each `TokenInfo` has 5 elements: a token type (an int and its enum-name), the
133-token-string (that was was parsed), the begin & end location (line- & column-number), and the full line that is beeing
134-parsed.
135-
136-No info about the matched gramer-rule (e.g. the rule-name) is shown. Actually that info is not part of the parsed-tree.
137-
138-.. seealso:: This `structure is described <https://docs.python.org/3/library/tokenize.html?highlight=TokenInfo>`__ in
139- the tokenize module; without specifying its name: TokenInfo.
140-
141-The parser
142-----------
143-
144-The GeneratedParser (and/or it’s baseclass: ``pegen.parser.Parser``) returns only (list of) tokens from the tokenizer (a
145-OO wrapper arround tokenize). And so, the same TokenInfo objects as described above.
146-
147-Stability
148-=========
149-
150-The current pegen package op `pypi <https://pypi.org/project/pegen/>`__ is V0.1.0 -- which already showns it not
151-mature. `That version github <https://github.com/we-like-parsers/pegen/tree/v0.1.0>`__ is dated September 2021 (with 36
152-commits). The `current <https://github.com/we-like-parsers/pegen/tree/db7552dda0af6b27cbbb1230be116e8a56c49736>`__
153-version (Nov 22) has 20 commits more (56).
154-|BR|
155-And can be installed with ``pip install git+https://github.com/we-like-parsers/pegen``
156-
157-It os however, not fully compatible. By example ``pegen/parser.y::simple_parser_main()`` now expect an ATS object (to
158-print), not a list of TokenInfo.
159-
160-.. tip::
161-
162- The pegen package is **NOT** used inside the `(C)Python tool
163- <https://github.com/python/cpython/tree/main/Tools/peg_generator>`__; the CPython version is heavily related to other
164- details of CPython; it can also generate C-code. The pegen-package is based on it, and more-or-less in sync, can
165- generate Python-code only, but is not depending on the compiler-implementation details.
166-
167- .. seealso:: https://we-like-parsers.github.io/pegen/#differences-with-cpythons-pegen
168-
169-
170-Buggy current version
171----------------------
172-
173-The git version contains (at least) one bug. The function ``parser::simple_parser_main()``, that is called when using the
174-generated file, uses the AST module to print (show) the result -- which simple does not work.
175-|BR|
176-Probably, that* default main* isn’t used a lot (Also, I prever to use -- have use-- a own main). Still it shows it
177-inmaturity.
178-