OSDN > Developer > albertmietus > Chamber > DocIdeas > 提交

DocIdeas
Fork

(Original repository, No fork origin)

提交

Commit MetaInfo

修订版	f83313ef3fe05c825da58b2d265d9e91986062f1 (tree)
时间	2022-11-05 02:48:21
作者	Albert Mietus < albert AT mietus DOT nl >
Commiter	Albert Mietus < albert AT mietus DOT nl >

Log Message

Moved CCastle QuickNotes on tool-desing into new section

更改概述

modified: CCastle/3.Design/B.Workshop/index.rst (diff)
add: CCastle/3.Design/B.Workshop/short/arpeggio.rst (diff)
add: CCastle/3.Design/B.Workshop/short/index.rst (diff)
add: CCastle/3.Design/B.Workshop/short/pegen_parser.rst (diff)
delete: CCastle/short/arpeggio.rst
delete: CCastle/short/index.rst
delete: CCastle/short/pegen_parser.rst

差异

diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/3.Design/B.Workshop/index.rst

--- a/CCastle/3.Design/B.Workshop/index.rst Fri Nov 04 18:43:20 2022 +0100

+++ b/CCastle/3.Design/B.Workshop/index.rst Fri Nov 04 18:48:21 2022 +0100

		@@ -5,7 +5,7 @@
5	5	=========================
6	6
7	7	No computer language can exist without a set of :ref:`Castle-WorkshopTools`; at least a compiler is needed -- and a lot
8		-more. This chapter contains a growing number of notes, blogs & article on desinging them
	8	+more. This chapter contains a growing number of notes, blogs & article on desinging them.
9	9
10	10
11	11	.. toctree::

		@@ -13,6 +13,7 @@
13	13	:titlesonly:
14	14	:glob:
15	15
	16	+ */index
16	17	*
17	18
18	19

diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/3.Design/B.Workshop/short/arpeggio.rst

--- /dev/null Thu Jan 01 00:00:00 1970 +0000

+++ b/CCastle/3.Design/B.Workshop/short/arpeggio.rst Fri Nov 04 18:48:21 2022 +0100

		@@ -0,0 +1,17 @@
	1	+.. _QN_Arpeggio:
	2	+
	3	+===================
	4	+QuickNote: Arpeggio
	5	+===================
	6	+
	7	+.. post::
	8	+ :category: CastleBlogs, rough
	9	+ :tags: Grammar, PEG, DRAFT
	10	+
	11	+ In this short QuickNote blog we give a bit of info on `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__; an python
	12	+ package to implement a (PEG) parser. Eventually, it will be implemented in Castle -- like all
	13	+ :ref:`Castle-WorkshopTools`. To kickstart, we use python and python-packages.
	14	+
	15	+ As Arpeggio is quite well `documented <https://textx.github.io/Arpeggio/2.0/>`__ is it a short note
	16	+
	17	+.. seealso:: :ref:`QN_PEGEN`

diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/3.Design/B.Workshop/short/index.rst

--- /dev/null Thu Jan 01 00:00:00 1970 +0000

+++ b/CCastle/3.Design/B.Workshop/short/index.rst Fri Nov 04 18:48:21 2022 +0100

		@@ -0,0 +1,11 @@
	1	+================
	2	+Some Quick Blogs
	3	+================
	4	+
	5	+.. toctree::
	6	+ :maxdepth: 2
	7	+ :titlesonly:
	8	+ :glob:
	9	+
	10	+ *
	11	+

diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/3.Design/B.Workshop/short/pegen_parser.rst

--- /dev/null Thu Jan 01 00:00:00 1970 +0000

+++ b/CCastle/3.Design/B.Workshop/short/pegen_parser.rst Fri Nov 04 18:48:21 2022 +0100

		@@ -0,0 +1,178 @@
	1	+.. include:: /std/localtoc.irst
	2	+
	3	+.. _QN_PEGEN:
	4	+
	5	+================
	6	+QuickNote: PEGEN
	7	+================
	8	+
	9	+.. post:: 2022/11/3
	10	+ :category: CastleBlogs, rough
	11	+ :tags: Grammar, PEG
	12	+
	13	+ To implement CCastle we need a parser, as part of the compiler. Eventually, that parser will be writen in Castle. For
	14	+ now we kickstart it in python; which has several packages that can assist us. As we like to use an PEG one, there
	15	+ are a few options. `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__ is well known, and has some nice options --
	16	+ but can’t handle `left recursion <https://en.wikipedia.org/wiki/Left_recursion>`__ -- like most PEG-parsers.
	17	+
	18	+ Recently python itself uses a PEG parser, that supports `left recursion
	19	+ <https://en.wikipedia.org/wiki/Left_recursion>`__ (which is a recent development). That parser is also available as a
	20	+ package: `pegen <https://we-like-parsers.github.io/pegen/index.html>`__; but hardly documented.
	21	+
	22	+This blog is writen to remember some leassons learned when playing with it. And as kind of informal docs.
	23	+
	24	+.. seealso:: :ref:`QN_Arpeggio`
	25	+
	26	+Build-In Lexer
	27	+==============
	28	+
	29	+Pegen is specially writen for Python and use a specialized lexer; unlike most PEG-parser that uses PEG for lexing too. Pegen
	30	+uses the `tokenizer <https://docs.python.org/3/library/tokenize.html>`__ that is part of Python. This comes with some
	31	+restrictions.
	32	+
	33	+This lexer -or tokenize(r) as python calls it-- is used both to read the grammar (the peg-file), and to read the
	34	+source-files that are parser by the generated parser.
	35	+
	36	+.. hint::
	37	+
	38	+ These restrictions applies when we use pegen as modele: ``pyton -m pegen ...``; that calls `simple_parser_main()`.
	39	+ \|BR\|
	40	+ But also when we use the parser-class in own code --so, when importing pegen ``from pegen.parser Parser ...``-- it is
	41	+ restricted. Then is a bit more possible, as we can configure another (self made) lexer. The interface is quite narrow
	42	+ to python however.
	43	+
	44	+
	45	+Tokens
	46	+------
	47	+
	48	+The lexer will recognize some tokes that are specialy for python, like `INDENT` & `DEDENT`. Also some generic tokens
	49	+like NAME (which is an ID) and `NUMBER` are know, and can be used to define the language.
	50	+
	51	+Unfortunally, it will also find some tokens --typical operators-- that hardcoded for python. Even when we like to use
	52	+them differently; possible combined with other characters. Then, those will not be found; not the literal-strings as set
	53	+in the grammar.
	54	+
	55	+.. note::
	56	+
	57	+ Pegen speaks about (soft) keywords for all kind of literal terminals; even when they are more like operators
	58	+ than words.
	59	+
	60	+.. warning::
	61	+
	62	+ When the grammar defines (literal) terminals (or keywords) --especially for operators-- make sure the lexer will not
	63	+ break them into predefined tokens!
	64	+ \|BR\|
	65	+ This will not give an error, but it does not work!
	66	+
	67	+ .. code-block:: PEG
	68	+
	69	+ Left_arrow_BAD: '<-' ## This is WRONG, as ``<`` is seen as a token. And so, `<-` is never found
	70	+ Left_arrow_OKE: '<' '-' ## This is acceptable
	71	+
	72	+ This splitting results however in 2 enties in the resulting tree --unless one uses `grammar actions
	73	+ <https://we-like-parsers.github.io/pegen/grammar.html#grammar-actions>`__ to create one new “token”.
	74	+
	75	+.. seealso:: See https://docs.python.org/3/library/token.html, for an overiew of the predefined tokens
	76	+
	77	+.. tip::
	78	+
	79	+ A quick trick to see how a file is split into tokens, use ``python -m tokenize [-e] filename.peg``.
	80	+ \|BR\|
	81	+ Make sure you do not use string-literals that (eg) are composed of two tokens. Like the above mentioned ``<--``
	82	+
	83	+
	84	+
	85	+.. sidebar:: Reserverd
	86	+ :class: localtoc
	87	+
	88	+ - showpeek
	89	+ - name
	90	+ - number
	91	+ - string
	92	+ - op
	93	+ - type_comment
	94	+ - soft_keyword
	95	+ - expect
	96	+ - expect_forced
	97	+ - positive_lookahead
	98	+ - negative_lookahead
	99	+ - make_syntax_error
	100	+
	101	+Rule names
	102	+----------
	103	+
	104	+The GeneratedParser inherites and calls the base ``pegen.parser.Parser`` class and has methods for all
	105	+rule-names. This implies some names should not be used as rule-names (in all cases) -- see the sidebar.
	106	+
	107	+
	108	+Meta Syntax (issues)
	109	+====================
	110	+
	111	+No: regexps
	112	+-----------
	113	+
	114	+PEGEN has no support for regular expressions probably as it uses a custom lexer.
	115	+
	116	+Unordered Group starts a comment
	117	+--------------------------------
	118	+
	119	+PEGEN (or it lexer) used the ``#`` to start a comment. This implies an Unordered group ``( sequence )#`` --as in
	120	+`Arpeggio <https://textx.github.io/Arpeggio/2.0/grammars/#grammars-written-in-peg-notations>`__-- are not recognized
	121	+
	122	+A workarond is to use another character like ``@`` instead of the hash (``#``).
	123	+
	124	+
	125	+Result/Output
	126	+=============
	127	+
	128	+cmd-tool
	129	+--------
	130	+
	131	+The commandline tool ``pyton -m pegen ...`` only prints the parsed tree: a list (shown as ``[`` ... ``]``) with
	132	+sub-list and/or `TokenInfo` namedtuples. Each `TokenInfo` has 5 elements: a token type (an int and its enum-name), the
	133	+token-string (that was was parsed), the begin & end location (line- & column-number), and the full line that is beeing
	134	+parsed.
	135	+
	136	+No info about the matched gramer-rule (e.g. the rule-name) is shown. Actually that info is not part of the parsed-tree.
	137	+
	138	+.. seealso:: This `structure is described <https://docs.python.org/3/library/tokenize.html?highlight=TokenInfo>`__ in
	139	+ the tokenize module; without specifying its name: TokenInfo.
	140	+
	141	+The parser
	142	+----------
	143	+
	144	+The GeneratedParser (and/or it’s baseclass: ``pegen.parser.Parser``) returns only (list of) tokens from the tokenizer (a
	145	+OO wrapper arround tokenize). And so, the same TokenInfo objects as described above.
	146	+
	147	+Stability
	148	+=========
	149	+
	150	+The current pegen package op `pypi <https://pypi.org/project/pegen/>`__ is V0.1.0 -- which already showns it not
	151	+mature. `That version github <https://github.com/we-like-parsers/pegen/tree/v0.1.0>`__ is dated September 2021 (with 36
	152	+commits). The `current <https://github.com/we-like-parsers/pegen/tree/db7552dda0af6b27cbbb1230be116e8a56c49736>`__
	153	+version (Nov 22) has 20 commits more (56).
	154	+\|BR\|
	155	+And can be installed with ``pip install git+https://github.com/we-like-parsers/pegen``
	156	+
	157	+It os however, not fully compatible. By example ``pegen/parser.y::simple_parser_main()`` now expect an ATS object (to
	158	+print), not a list of TokenInfo.
	159	+
	160	+.. tip::
	161	+
	162	+ The pegen package is NOT used inside the `(C)Python tool
	163	+ <https://github.com/python/cpython/tree/main/Tools/peg_generator>`__; the CPython version is heavily related to other
	164	+ details of CPython; it can also generate C-code. The pegen-package is based on it, and more-or-less in sync, can
	165	+ generate Python-code only, but is not depending on the compiler-implementation details.
	166	+
	167	+ .. seealso:: https://we-like-parsers.github.io/pegen/#differences-with-cpythons-pegen
	168	+
	169	+
	170	+Buggy current version
	171	+---------------------
	172	+
	173	+The git version contains (at least) one bug. The function ``parser::simple_parser_main()``, that is called when using the
	174	+generated file, uses the AST module to print (show) the result -- which simple does not work.
	175	+\|BR\|
	176	+Probably, that* default main* isn’t used a lot (Also, I prever to use -- have use-- a own main). Still it shows it
	177	+inmaturity.
	178	+

diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/short/arpeggio.rst

--- a/CCastle/short/arpeggio.rst Fri Nov 04 18:43:20 2022 +0100

+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

		@@ -1,17 +0,0 @@
1		-.. _QN_Arpeggio:
2		-
3		-===================
4		-QuickNote: Arpeggio
5		-===================
6		-
7		-.. post::
8		- :category: CastleBlogs, rough
9		- :tags: Grammar, PEG, DRAFT
10		-
11		- In this short QuickNote blog we give a bit of info on `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__; an python
12		- package to implement a (PEG) parser. Eventually, it will be implemented in Castle -- like all
13		- :ref:`Castle-WorkshopTools`. To kickstart, we use python and python-packages.
14		-
15		- As Arpeggio is quite well `documented <https://textx.github.io/Arpeggio/2.0/>`__ is it a short note
16		-
17		-.. seealso:: :ref:`QN_PEGEN`

diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/short/index.rst

--- a/CCastle/short/index.rst Fri Nov 04 18:43:20 2022 +0100

+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

		@@ -1,11 +0,0 @@
1		-================
2		-Some Quick Blogs
3		-================
4		-
5		-.. toctree::
6		- :maxdepth: 2
7		- :titlesonly:
8		- :glob:
9		-
10		- *
11		-

diff -r 0f71dc240084 -r f83313ef3fe0 CCastle/short/pegen_parser.rst

--- a/CCastle/short/pegen_parser.rst Fri Nov 04 18:43:20 2022 +0100

+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

		@@ -1,178 +0,0 @@
1		-.. include:: /std/localtoc.irst
2		-
3		-.. _QN_PEGEN:
4		-
5		-================
6		-QuickNote: PEGEN
7		-================
8		-
9		-.. post:: 2022/11/3
10		- :category: CastleBlogs, rough
11		- :tags: Grammar, PEG
12		-
13		- To implement CCastle we need a parser, as part of the compiler. Eventually, that parser will be writen in Castle. For
14		- now we kickstart it in python; which has several packages that can assist us. As we like to use an PEG one, there
15		- are a few options. `Arpeggio <https://textx.github.io/Arpeggio/2.0/>`__ is well known, and has some nice options --
16		- but can’t handle `left recursion <https://en.wikipedia.org/wiki/Left_recursion>`__ -- like most PEG-parsers.
17		-
18		- Recently python itself uses a PEG parser, that supports `left recursion
19		- <https://en.wikipedia.org/wiki/Left_recursion>`__ (which is a recent development). That parser is also available as a
20		- package: `pegen <https://we-like-parsers.github.io/pegen/index.html>`__; but hardly documented.
21		-
22		-This blog is writen to remember some leassons learned when playing with it. And as kind of informal docs.
23		-
24		-.. seealso:: :ref:`QN_Arpeggio`
25		-
26		-Build-In Lexer
27		-==============
28		-
29		-Pegen is specially writen for Python and use a specialized lexer; unlike most PEG-parser that uses PEG for lexing too. Pegen
30		-uses the `tokenizer <https://docs.python.org/3/library/tokenize.html>`__ that is part of Python. This comes with some
31		-restrictions.
32		-
33		-This lexer -or tokenize(r) as python calls it-- is used both to read the grammar (the peg-file), and to read the
34		-source-files that are parser by the generated parser.
35		-
36		-.. hint::
37		-
38		- These restrictions applies when we use pegen as modele: ``pyton -m pegen ...``; that calls `simple_parser_main()`.
39		- \|BR\|
40		- But also when we use the parser-class in own code --so, when importing pegen ``from pegen.parser Parser ...``-- it is
41		- restricted. Then is a bit more possible, as we can configure another (self made) lexer. The interface is quite narrow
42		- to python however.
43		-
44		-
45		-Tokens
46		-------
47		-
48		-The lexer will recognize some tokes that are specialy for python, like `INDENT` & `DEDENT`. Also some generic tokens
49		-like NAME (which is an ID) and `NUMBER` are know, and can be used to define the language.
50		-
51		-Unfortunally, it will also find some tokens --typical operators-- that hardcoded for python. Even when we like to use
52		-them differently; possible combined with other characters. Then, those will not be found; not the literal-strings as set
53		-in the grammar.
54		-
55		-.. note::
56		-
57		- Pegen speaks about (soft) keywords for all kind of literal terminals; even when they are more like operators
58		- than words.
59		-
60		-.. warning::
61		-
62		- When the grammar defines (literal) terminals (or keywords) --especially for operators-- make sure the lexer will not
63		- break them into predefined tokens!
64		- \|BR\|
65		- This will not give an error, but it does not work!
66		-
67		- .. code-block:: PEG
68		-
69		- Left_arrow_BAD: '<-' ## This is WRONG, as ``<`` is seen as a token. And so, `<-` is never found
70		- Left_arrow_OKE: '<' '-' ## This is acceptable
71		-
72		- This splitting results however in 2 enties in the resulting tree --unless one uses `grammar actions
73		- <https://we-like-parsers.github.io/pegen/grammar.html#grammar-actions>`__ to create one new “token”.
74		-
75		-.. seealso:: See https://docs.python.org/3/library/token.html, for an overiew of the predefined tokens
76		-
77		-.. tip::
78		-
79		- A quick trick to see how a file is split into tokens, use ``python -m tokenize [-e] filename.peg``.
80		- \|BR\|
81		- Make sure you do not use string-literals that (eg) are composed of two tokens. Like the above mentioned ``<--``
82		-
83		-
84		-
85		-.. sidebar:: Reserverd
86		- :class: localtoc
87		-
88		- - showpeek
89		- - name
90		- - number
91		- - string
92		- - op
93		- - type_comment
94		- - soft_keyword
95		- - expect
96		- - expect_forced
97		- - positive_lookahead
98		- - negative_lookahead
99		- - make_syntax_error
100		-
101		-Rule names
102		-----------
103		-
104		-The GeneratedParser inherites and calls the base ``pegen.parser.Parser`` class and has methods for all
105		-rule-names. This implies some names should not be used as rule-names (in all cases) -- see the sidebar.
106		-
107		-
108		-Meta Syntax (issues)
109		-====================
110		-
111		-No: regexps
112		------------
113		-
114		-PEGEN has no support for regular expressions probably as it uses a custom lexer.
115		-
116		-Unordered Group starts a comment
117		---------------------------------
118		-
119		-PEGEN (or it lexer) used the ``#`` to start a comment. This implies an Unordered group ``( sequence )#`` --as in
120		-`Arpeggio <https://textx.github.io/Arpeggio/2.0/grammars/#grammars-written-in-peg-notations>`__-- are not recognized
121		-
122		-A workarond is to use another character like ``@`` instead of the hash (``#``).
123		-
124		-
125		-Result/Output
126		-=============
127		-
128		-cmd-tool
129		---------
130		-
131		-The commandline tool ``pyton -m pegen ...`` only prints the parsed tree: a list (shown as ``[`` ... ``]``) with
132		-sub-list and/or `TokenInfo` namedtuples. Each `TokenInfo` has 5 elements: a token type (an int and its enum-name), the
133		-token-string (that was was parsed), the begin & end location (line- & column-number), and the full line that is beeing
134		-parsed.
135		-
136		-No info about the matched gramer-rule (e.g. the rule-name) is shown. Actually that info is not part of the parsed-tree.
137		-
138		-.. seealso:: This `structure is described <https://docs.python.org/3/library/tokenize.html?highlight=TokenInfo>`__ in
139		- the tokenize module; without specifying its name: TokenInfo.
140		-
141		-The parser
142		-----------
143		-
144		-The GeneratedParser (and/or it’s baseclass: ``pegen.parser.Parser``) returns only (list of) tokens from the tokenizer (a
145		-OO wrapper arround tokenize). And so, the same TokenInfo objects as described above.
146		-
147		-Stability
148		-=========
149		-
150		-The current pegen package op `pypi <https://pypi.org/project/pegen/>`__ is V0.1.0 -- which already showns it not
151		-mature. `That version github <https://github.com/we-like-parsers/pegen/tree/v0.1.0>`__ is dated September 2021 (with 36
152		-commits). The `current <https://github.com/we-like-parsers/pegen/tree/db7552dda0af6b27cbbb1230be116e8a56c49736>`__
153		-version (Nov 22) has 20 commits more (56).
154		-\|BR\|
155		-And can be installed with ``pip install git+https://github.com/we-like-parsers/pegen``
156		-
157		-It os however, not fully compatible. By example ``pegen/parser.y::simple_parser_main()`` now expect an ATS object (to
158		-print), not a list of TokenInfo.
159		-
160		-.. tip::
161		-
162		- The pegen package is NOT used inside the `(C)Python tool
163		- <https://github.com/python/cpython/tree/main/Tools/peg_generator>`__; the CPython version is heavily related to other
164		- details of CPython; it can also generate C-code. The pegen-package is based on it, and more-or-less in sync, can
165		- generate Python-code only, but is not depending on the compiler-implementation details.
166		-
167		- .. seealso:: https://we-like-parsers.github.io/pegen/#differences-with-cpythons-pegen
168		-
169		-
170		-Buggy current version
171		----------------------
172		-
173		-The git version contains (at least) one bug. The function ``parser::simple_parser_main()``, that is called when using the
174		-generated file, uses the AST module to print (show) the result -- which simple does not work.
175		-\|BR\|
176		-Probably, that* default main* isn’t used a lot (Also, I prever to use -- have use-- a own main). Still it shows it
177		-inmaturity.
178		-

DocIdeas Fork

提交

标签

Frequently used words (click to add to your profile)

Commit MetaInfo

Log Message

更改概述

差异

DocIdeas
Fork