OSDN > Developer > albertmietus > Chamber > DocIdeas > 提交

DocIdeas
Fork

(Original repository, No fork origin)

提交

Commit MetaInfo

修订版	a47ed1afefeef2d5cfd19dd685f75db9a356e3be (tree)
时间	2022-05-17 00:10:49
作者	Albert Mietus < albert AT mietus DOT nl >
Commiter	Albert Mietus < albert AT mietus DOT nl >

Log Message

asis

更改概述

add: CCastle/Design/syntax/2.grammar_code.rst (diff)
add: CCastle/Usage/1.CompilerCompiler.rst (diff)
delete: CCastle/Usage/CompilerCompiler.rst

差异

diff -r 073412a6dd1e -r a47ed1afefee CCastle/Design/syntax/2.grammar_code.rst

--- /dev/null Thu Jan 01 00:00:00 1970 +0000

+++ b/CCastle/Design/syntax/2.grammar_code.rst Mon May 16 17:10:49 2022 +0200

		@@ -0,0 +1,218 @@
	1	+.. _grammmar-code:
	2	+
	3	+===============
	4	+Grammar is code
	5	+===============
	6	+
	7	+.. post:: 2022/05/14
	8	+ :category: Castle, DesignStudy
	9	+ :tags: Castle, grammar, PEG
	10	+
	11	+ In :ref:`Castle-CompilerCompiler` we have seen that we can define a grammar within a Castle-program. And we have
	12	+ argued that each grammars-rule can be considered as a function.
	13	+
	14	+ In this post, we look into de details of how this works. And will confirm grammars is code ...
	15	+
	16	+Let’s start with an example. The grammer below is a simplified description of how a (PEG) parsing-rule looks like. The
	17	+syntax in Castle is a bit more complicated, by having more details and options; but very simular. Most of the
	18	+Caste-grammers can be parsed by the grammer below!
	19	+
	20	+.. code-block:: PEG
	21	+
	22	+ PEG_rule <- rule_name '<-' expression ';' ;
	23	+ expression <- ordered_choice+ ;
	24	+ ordered_choice <- sequence ( '\|' sequence)* ;
	25	+ sequence <- group \| atom ;
	26	+ group <- '(' expression ')' ;
	27	+ atom <- rule_xref \| str_lit \| regexp ;
	28	+ str_lit <- '"' /[^"]* '"' ;
	29	+ regexp <- '/' /[^/]* '/' ;
	30	+ rule_name <- ID ;
	31	+ rule_xref <- ID ;
	32	+
	33	+With this grammer we can read and check whether an a string is a valid rule by simply calling:
	34	+:math:`ast:=PEG\_rule(single\_line)`. When not, ``ast`` is :math:`False`, else ``ast`` has 4 children (or fields).
	35	+Where :data:`ast[0]` represent the ``rule_name``, :data:`ast[2]` is an ``expression``; which is a sequence of
	36	+``ordered_choice``\(s).
	37	+\|BR\|
	38	+As the rule has two constants :data:`ast[1]` and :data:`ast[3]` will always match the strings ‘<-’ and ‘;’.
	39	+
	40	+
	41	+Grammar to code
	42	+===============
	43	+
	44	+Before we study how Castle expands a grammar to code, let examine how to do that by manually crafted code, or using a
	45	+external compiler-compiler. In both cases, we will focus on the “validation” part only. And we skip a lot of details;
	46	+assuming you know a bit of theory of (PEG) parser. When not, read Guido van Rossum’s excellent blog-series `PEG Parsing
	47	+Series Overview <https://medium.com/@gvanrossum_83706/peg-parsing-series-de5d41b2ed60>`__
	48	+
	49	+
	50	+Hand written code
	51	+-----------------
	52	+
	53	+Again, let’s suppost we like to verify a string contains a peg-rule. Then we need some functions [#func]_, which signature
	54	+are something like:
	55	+
	56	+.. tabs::
	57	+
	58	+ .. code-tab:: python
	59	+
	60	+ def PEG_rule(text: str) -> ast : ...
	61	+ def expression(text: str) -> ast : ...
	62	+ def ordered_choice(text: str) -> ast : ...
	63	+ def sequence(text: str) -> ast : ...
	64	+ ...
	65	+
	66	+ .. code-tab:: c
	67	+
	68	+ ast PEG_rule(char* txt);
	69	+ ast expression(char* txt);
	70	+ ast ordered_choice(char* txt);
	71	+ ast sequence(char* txt);
	72	+ ...
	73	+
	74	+Where we assume the type ``ast`` is defined in the chosen language and will be Null/None/Empty to signal :math:`False`:
	75	+not valid.
	76	+
	77	+The implementation of ``PEG_rule()`` checks thats the input-string starts with a ``rule_name``, followed by the literal
	78	+string “<-”, then has an ``expression`` and finally the literal string “;”. When one of those (four) steps fail,
	79	+we return :math:`False`.
	80	+\|BR\|
	81	+We follow this pattern for all rules.
	82	+
	83	+This concept is easy to implement: each rule-function calls other rule-functions as defined by the grammar. When we
	84	+need to check for a literal string we use :func:`expect(txt: str, literal: str) -> bool`. Also, we need a function to find
	85	+an ID; again easy. Sometimes we need to implement a loop --to handle ``*`` and ``+``. Or we need an :math:`or`, to
	86	+implement alternatives (``\|``). None of that is rocket science.
	87	+
	88	+A real implementation is a bit harder, as we have to strip spaces (and comments), handle newlines, and need to keep
	89	+track of where we are. Typically that is (nowadays) done by embedding those functions in a class; then the “input text” can be
	90	+stored in the instance (instead of passing them constantly). That instance also has a ‘cursor’ to the current
	91	+location.
	92	+
	93	+More details
	94	+~~~~~~~~~~~~
	95	+
	96	+There are a lot of details that make writing a grammer complex. We mention a few, and what it effect is on the (manually
	97	+written) code.
	98	+
	99	+When using alternatives (the ``\|`` operator in the grammar), a PEG-parser will always try the first alternative first,
	100	+Only when that fails, it back-ups an try the next alternative. Sometimes means (almost) start again, and parse the same file almost
	101	+completely again. Therefore the packrat algorithm is usually used; using memoization.
	102	+\|BR\|
	103	+This is not hard: just add a few lines of boilerplate before and after each call. To store intermediate partial-ast(s) in a
	104	+cache.
	105	+
	106	+Sometimes, we like to use another parser-strategy, like LALR_ (used by Yacc_), GLR_ (e.g Bison, the successor of Yacc_)
	107	+or `LL(k)`_ (introduced by ANTLR, which was popular for a while); each one has it pros and cons. Still, all (or almost)
	108	+start with the same grammar (although smarter strategies may result is shorter, easier to maintain [#maintain]_
	109	+grammars) [#notation]_.
	110	+
	111	+For a long time PEG-parsers where not able to handle left recursive rules [#leftStack]_. Until somebody discovered that is not
	112	+correct. Grammars in Castle can be left recursive! Both direct and indirect recursion is allowed. It is possible to
	113	+rewrite a grammar to remove the recursion. It makes the grammar however, more complex to maintain *(and for that reason
	114	+Castle will support recursion!)*
	115	+\|BR\|
	116	+By example, an simple calculation as :math:`7-5-3` should result in :math:`((7-5)-3)` but that needs left
	117	+recursion. When rewriting it, you must be carefull not to get :math:`(7-(5-3))`!
	118	+\|BR\|
	119	+This can be fixes, by adding an extra step. But it is better to use the update PEG-strategy: Just add more boilerplate code!
	120	+
	121	+.. tabs::
	122	+
	123	+ .. code-tab:: PEG Direct recursion
	124	+
	125	+ expr <- expr '-' term \| term
	126	+
	127	+ .. code-tab:: PEG Indirect recursion
	128	+
	129	+ A <- B "a" \| "a"
	130	+ B <- A "b" \| "b"
	131	+
	132	+ .. code-tab:: PEG A rewritten grammar
	133	+
	134	+ expr <- term ( '-' term )*
	135	+
	136	+
	137	+
	138	+Generating the code
	139	+===================
	140	+
	141	+You might recognise the pattern: To make the grammer more useful, the algorithms become more complex and adds more
	142	+code. This “extra” code, however is not hard; you just need the same (or almost the same) lines at many places.
	143	+\|BR\|
	144	+This begs for automation. And that is exactly what most compiler-compilers do.
	145	+
	146	+A compiler-compilers read the grammar and generates the code. As shown above it will generate (C, C++, C#, Java,
	147	+Python, or ...) functions [#OrTables]_ that call each-other. It will also detect left-recursion, and might compensate for
	148	+that. The result: more boilerplate-code; but as it is automatically generated this is easy.
	149	+
	150	+Classic tools
	151	+-------------
	152	+There are many tools, that we can use for inspiration. A short overview, and how it influences Castle.
	153	+
	154	+Possible the most famous compiler-compilers is Yacc_. It was developed in 197X and generates C-code that can be compiled
	155	+and linked to your code. To parse a string, you had to call ``yyparse())``. It would however be relatively simple to
	156	+generate functions with the name of each rule, using the same machinery. In that decade however, the goal was
	157	+differently. Memory was limited, what we can also see in the used grammar: one had to craft it carefully as the was no
	158	+back-tracking an only a single token look-ahead.
	159	+
	160	+Bison_ is Gnu reimplementation of Yacc_, but can use several parsing-algorithms.cLike Yacc_, it used a separate Lexer_:
	161	+flex (whereas Yacc uses lex). A lexer_ splits the input-string into a stream of Tokens using another (simpler,
	162	+but faster) algorithm. In that time that was relevant.
	163	+\|BR\|
	164	+As a lexer_ can be implemented with a parsing-algorithm (but not the other-way around), and as the need for speed doesn't
	165	+demand a separate lexer_ anymore; modern parsings are often “scannerless”. This removes the need to use two meta-syntaxes
	166	+(for the lexer/scanner and the parser) and so is simpler to use.
	167	+\|BR\|
	168	+Also Castle use a scannerless approach.
	169	+
	170	+Castle
	171	+------
	172	+Also in Castle you can use grammars; but now directly in your program, using the Castle-syntax. And Castle will generate
	173	+“code”, Castle-functions that is. But now without an extra tool.
	174	+\|BR\|
	175	+Actually, it probably will not generate code; nor ‘code as text’. Why should we generate code, to read & parse it back
	176	+and compile it directly? It easier to generate the AST, that would be the result of parsing the generated-code directly.
	177	+
	178	+But the effect is the same. You create a set of function with this generic “text to tree” signature, by writing some
	179	+simle rule. Castle does the rest for you. Easy!
	180	+
	181	+
	182	+----------
	183	+
	184	+.. rubric:: Footnotes
	185	+
	186	+.. [#func]
	187	+ Instead of a function, it can also be a method, or any callable*. We use ‘function’ a generic term, in the
	188	+ mathematical meaning: some input (parameters) and an output (return value).
	189	+
	190	+.. [#maintain]
	191	+ This is not specially for grammers; all it valid for all programming-languages. New languages may introduce new
	192	+ concepts (like --once-- OO). When the compiler becomes smarter, the programmer can focus in the important bits!
	193	+
	194	+.. [#notation]
	195	+ Aside of multiple parser-algorithms, there are also several notation to write the grammar itself; like `EBNF
	196	+ <https://en.wikipedia.org/wiki/Extended_Backus–Naur_form>`__ `ABNF
	197	+ <https://en.wikipedia.org/wiki/Augmented_Backus–Naur_form>`__, and `YACC`_
	198	+ Most implementations of a given algorithm, use a dialect of a standard one, to enable :ref:`G2C-actions`, or ..
	199	+
	200	+ Also Caste does this: We use the Caste-grammer, which is based on both EBNF and PEG; but using the classic ‘\|’
	201	+ instead of the ‘\’ for ordered-choice.
	202	+
	203	+.. [#leftStack]
	204	+ Without going into details left-recursion is hard for many parsing-algorithms. In the shown approach, a
	205	+ rule-function (for a rule that is direct left-recurse) will call itself as first step. In this way no progress is
	206	+ made, and the stack will quickly overrun.
	207	+
	208	+.. [#OrTables]
	209	+ Some tools, like Yacc by example, use another approach. Instead of many functions it has a generic (run-time) library
	210	+ that used code-tables; which are generated by the tool. Still, that is just a implementation detail.
	211	+
	212	+.. _LALR: https://en.wikipedia.org/wiki/LALR_parser
	213	+.. _LALR(1): LALR_
	214	+.. _GLR: https://en.wikipedia.org/wiki/GLR_parser
	215	+.. _LL(k): https://en.wikipedia.org/wiki/LL_parser
	216	+.. _YACC: https://en.wikipedia.org/wiki/Yacc
	217	+.. _Bison: https://en.wikipedia.org/wiki/GNU_Bison
	218	+.. _Lexer: https://en.wikipedia.org/wiki/Lexical_analysis

diff -r 073412a6dd1e -r a47ed1afefee CCastle/Usage/1.CompilerCompiler.rst

--- /dev/null Thu Jan 01 00:00:00 1970 +0000

+++ b/CCastle/Usage/1.CompilerCompiler.rst Mon May 16 17:10:49 2022 +0200

		@@ -0,0 +1,124 @@
	1	+.. _Castle-CompilerCompiler:
	2	+
	3	+=================
	4	+Compiler Compiler
	5	+=================
	6	+
	7	+.. post:: 2022/05/7
	8	+ :category: Castle, Usage
	9	+ :tags: Castle, grammar
	10	+
	11	+ In Castle you can define a grammar directly in your code. The compiler will translate them into functions, using
	12	+ the build-in (PEG) compiler-compiler -- at least that was it called back in the days of YACC.
	13	+
	14	+ How do one use that? And why should you?
	15	+
	16	+Grammars, a short intro
	17	+=======================
	18	+
	19	+A grammar is a collection of (parsing)-rules and optionally some settings. Rules are written in a mixture of EBNF
	20	+and PEG meta-syntax. Let’s start with an simple example:
	21	+
	22	+.. code-block:: PEG
	23	+
	24	+ castle_file <- ( import_line \| interface \| implementation )* ;
	25	+ import_line <- IMPORT_stmt ( STRING_literal \| qualID ) ';' ;
	26	+ qualID <- '.'? nameID ('.' nameID )* ;
	27	+ IMPORT_stmt = "import" ;
	28	+ ...
	29	+
	30	+
	31	+This basically defines that a ``castle_file`` is either an ``import_line``, an ``interface``, an ``implementation``, or
	32	+sequence of them. Where an ``import_line`` starts with the ``IMPORT_stmt`` *(which is set to the string ‘import’, on
	33	+line 4)*, then comes either a ``STRING_literal`` (indeed a literal-string) or a ``qualID``, and ends with a semicolon
	34	+(`;`). Likewise, a ``qualID`` is a ``nameID`` (a name that is used as ID, like in any programming language),
	35	+optionally followed by sub-names *(again like most languages: a dotted name, specifying a field (in a field, in
	36	+...)*. In Castle, that name may start with a dot --which is a shorthand notation for “in the current namespace”. You can
	37	+ignore that for know.
	38	+
	39	+The grammer defines how one should read the input --a text--, or more formally: how to parse it. The result of this
	40	+parsing is twofold. It will check whether input conforms to the grammer; resulting a in boolean, for the mathematics
	41	+under us. And it will translate a sequential (flat) text into a tree-structure; which typically much more useful for a
	42	+software-engineer.
	43	+\|BR\|
	44	+A well known example is this HTML-file. On disk it’s nothing but text, which is easy to store and to transfer. But
	45	+when send to your brouwer, it’s parsed and to create the `DOM
	46	+<https://nl.wikipedia.org/wiki/Document_Object_Model>`__; a tree of the document, with sections, paragraphs,
	47	+hyper-links, etc. By regarding it as a tree, it easy to describe (e.g. with CSS) how all parts, should be shown: all
	48	+headers have a background, the first row in a table is highlighed, etc.
	49	+
	50	+
	51	+Parsing
	52	+=======
	53	+Another well-known example is (the source of a) programm. As code, it is just text. But the compiler will parse it into
	54	+a parse-tree and/or an abstract-syntax-tree; which is build out of classes, methods, statements etc.
	55	+\|BR\|
	56	+But also your favorite IDE will parse it; to highlight the code, give tooltips, enable you to quickly navigate and
	57	+refactor it, and all those conviant features that make it your favorite editor.
	58	+
	59	+And even you are probably parsing text as part of your daily job. When you un-serialise data, you are (often) parsing
	60	+text; when you read the configuration, you are (or should be ) parsing that text. Even a simple input of the user might
	61	+need a bit of parsing. The text “42” is not the number :math:`42.0` -- you need to convert it; parse it.
	62	+
	63	+There a many ways to parse. You do not need a full-fledged grammer to translate “42” into :math:`42` or
	64	+:math:`42.0` --a stdlib functions as ``atoi()`` or ``atof()`` will do. But how about handling complex numbers
	65	+(:math:`4+j2`) or fractions (:math:`\frac{17}{42}`)?
	66	+
	67	+Non-parsing
	68	+-----------
	69	+
	70	+As proper passing used to hard, other similar (but simpler) techniques do exist, like `globing
	71	+<https://en.wikipedia.org/wiki/Glob_(programming)>`__ (``\*.Castle`` on the bash-prompt will result in all
	72	+Castle-files). Using `regular-expressions <https://en.wikipedia.org/wiki/Regular_expression>`__ is more powerfull, and
	73	+often used to highlight code; a pattern as ``//.*$`` can be used to highlight (single-line) comment. It often works, but
	74	+this simple pattern might match a piece of text inside a multi-line-(doc)string -- which wrong.
	75	+\|BR\|
	76	+To parse a input-text its not a sound solution; although I have seen cunning regular-expressions, that almost always
	77	+work. But reg-exps have not the same power as a grammar-- That is already proven halve a century ago and will not be
	78	+repeated here.
	79	+
	80	+Grammars are more powerfull
	81	+===========================
	82	+
	83	+A grammar (even a simple one) is more powerfull. You can define the overal structure of the input and the sub-structure
	84	+of each lump. When a multi-line-string has no sub-structure, the parser will never find comments inside it. Nor other
	85	+way around; it simple is not hunting for it.
	86	+
	87	+As most programming-languages do not have build-in support for grammars, one has to resort to external tools. Like the
	88	+famous `YACC <https://en.wikipedia.org/wiki/Yacc>`__; developed in 197X. YACC will read a grammar-file, and generates
	89	+C-code that can be compiled and linked to your code.
	90	+
	91	+Back then, writing compiler-compilers was a popular academic research exercise (YACC stand for: Yet Another Compiler
	92	+Compiler). It was great for compiler-designers, but clumsy to use for average developers: The syntax to write a grammar
	93	+was hard to grasp, with many pitfalls, the interface between your code and the parser was awkward (you had to call
	94	+``yyparse()``; needed some globals; OO wasn't invented, no inheritance or data-hiding, which resulted in puzzling tricks
	95	+to use multiple parsers, etc).
	96	+\|BR\|
	97	+Aside of that, more and better parsing strategies are developed; that is handles in another :ref:`blog <grammmar-code>`.
	98	+
	99	+Unleash that power!
	100	+-------------------
	101	+
	102	+With those better parsing-algorithms, faster computers with a lot more memory and other inventions, writing grammars
	103	+has become more peaceful. Except that you still need an extra step, another sytax, as you still need to use an external
	104	+tool. That sometimes isn’t maintained after a couple of years ...
	105	+\|BR\|
	106	+The effect is, most developers don’t use grammars; the write parser-like code manually, the settle for less optimal
	107	+result. Or are utterly not aware that grammer can provide a other (better, easier) solution.
	108	+
	109	+Castle has build-in support for grammers, and is hiding the nasty details of parsing-strategies. There is no need to
	110	+generating, compiling, and use that code, with external tools. All that clutter is gone.
	111	+\|BR\|
	112	+With a few lines, you define the structure of the input. Each rule is like a function: it has a name (the left-hand-side
	113	+of the rule, so the part before the arrow), and a implementation; the part after the arrow. That implementation “calls”
	114	+other rules, like normal code.
	115	+
	116	+To use the grammar you simply call one of those rules as a function: pass the input (string) and it will return a
	117	+(generic) tree-structure.
	118	+\|BR\|
	119	+When you simple like to verify the syntax is correct: use the tree as a boolean: when it not-empty the input is valid.
	120	+
	121	+But typically you proces/use that tree: like you do in many situations. Read the configuration values, walk over the
	122	+tree, of traverse it as-if it is a DOM. You can even use Castle’s :ref:`matching-statements` to simply that.
	123	+
	124	+Grammars makes reading text easy. Define the structure, call the “main rule” and use the values. Castle makes that simple!

diff -r 073412a6dd1e -r a47ed1afefee CCastle/Usage/CompilerCompiler.rst

--- a/CCastle/Usage/CompilerCompiler.rst Mon May 16 17:04:58 2022 +0200

+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

		@@ -1,124 +0,0 @@
1		-.. _Castle-CompilerCompiler:
2		-
3		-=================
4		-Compiler Compiler
5		-=================
6		-
7		-.. post:: 2022/05/7
8		- :category: Castle, Usage
9		- :tags: Castle, grammar
10		-
11		- In Castle you can define a grammar directly in your code. The compiler will translate them into functions, using
12		- the build-in (PEG) compiler-compiler -- at least that was it called back in the days of YACC.
13		-
14		- How do one use that? And why should you?
15		-
16		-Grammars, a short intro
17		-=======================
18		-
19		-A grammar is a collection of (parsing)-rules and optionally some settings. Rules are written in a mixture of EBNF
20		-and PEG meta-syntax. Let’s start with an simple example:
21		-
22		-.. code-block:: PEG
23		-
24		- castle_file <- ( import_line \| interface \| implementation )* ;
25		- import_line <- IMPORT_stmt ( STRING_literal \| qualID ) ';' ;
26		- qualID <- '.'? nameID ('.' nameID )* ;
27		- IMPORT_stmt = "import" ;
28		- ...
29		-
30		-
31		-This basically defines that a ``castle_file`` is either an ``import_line``, an ``interface``, an ``implementation``, or
32		-sequence of them. Where an ``import_line`` starts with the ``IMPORT_stmt`` *(which is set to the string ‘import’, on
33		-line 4)*, then comes either a ``STRING_literal`` (indeed a literal-string) or a ``qualID``, and ends with a semicolon
34		-(`;`). Likewise, a ``qualID`` is a ``nameID`` (a name that is used as ID, like in any programming language),
35		-optionally followed by sub-names *(again like most languages: a dotted name, specifying a field (in a field, in
36		-...)*. In Castle, that name may start with a dot --which is a shorthand notation for “in the current namespace”. You can
37		-ignore that for know.
38		-
39		-The grammer defines how one should read the input --a text--, or more formally: how to parse it. The result of this
40		-parsing is twofold. It will check whether input conforms to the grammer; resulting a in boolean, for the mathematics
41		-under us. And it will translate a sequential (flat) text into a tree-structure; which typically much more useful for a
42		-software-engineer.
43		-\|BR\|
44		-A well known example is this HTML-file. On disk it’s nothing but text, which is easy to store and to transfer. But
45		-when send to your brouwer, it’s parsed and to create the `DOM
46		-<https://nl.wikipedia.org/wiki/Document_Object_Model>`__; a tree of the document, with sections, paragraphs,
47		-hyper-links, etc. By regarding it as a tree, it easy to describe (e.g. with CSS) how all parts, should be shown: all
48		-headers have a background, the first row in a table is highlighed, etc.
49		-
50		-
51		-Parsing
52		-=======
53		-Another well-known example is (the source of a) programm. As code, it is just text. But the compiler will parse it into
54		-a parse-tree and/or an abstract-syntax-tree; which is build out of classes, methods, statements etc.
55		-\|BR\|
56		-But also your favorite IDE will parse it; to highlight the code, give tooltips, enable you to quickly navigate and
57		-refactor it, and all those conviant features that make it your favorite editor.
58		-
59		-And even you are probably parsing text as part of your daily job. When you un-serialise data, you are (often) parsing
60		-text; when you read the configuration, you are (or should be ) parsing that text. Even a simple input of the user might
61		-need a bit of parsing. The text “42” is not the number :math:`42.0` -- you need to convert it; parse it.
62		-
63		-There a many ways to parse. You do not need a full-fledged grammer to translate “42” into :math:`42` or
64		-:math:`42.0` --a stdlib functions as ``atoi()`` or ``atof()`` will do. But how about handling complex numbers
65		-(:math:`4+j2`) or fractions (:math:`\frac{17}{42}`)?
66		-
67		-Non-parsing
68		------------
69		-
70		-As proper passing used to hard, other similar (but simpler) techniques do exist, like `globing
71		-<https://en.wikipedia.org/wiki/Glob_(programming)>`__ (``\*.Castle`` on the bash-prompt will result in all
72		-Castle-files). Using `regular-expressions <https://en.wikipedia.org/wiki/Regular_expression>`__ is more powerfull, and
73		-often used to highlight code; a pattern as ``//.*$`` can be used to highlight (single-line) comment. It often works, but
74		-this simple pattern might match a piece of text inside a multi-line-(doc)string -- which wrong.
75		-\|BR\|
76		-To parse a input-text its not a sound solution; although I have seen cunning regular-expressions, that almost always
77		-work. But reg-exps have not the same power as a grammar-- That is already proven halve a century ago and will not be
78		-repeated here.
79		-
80		-Grammars are more powerfull
81		-===========================
82		-
83		-A grammar (even a simple one) is more powerfull. You can define the overal structure of the input and the sub-structure
84		-of each lump. When a multi-line-string has no sub-structure, the parser will never find comments inside it. Nor other
85		-way around; it simple is not hunting for it.
86		-
87		-As most programming-languages do not have build-in support for grammars, one has to resort to external tools. Like the
88		-famous `YACC <https://en.wikipedia.org/wiki/Yacc>`__; developed in 197X. YACC will read a grammar-file, and generates
89		-C-code that can be compiled and linked to your code.
90		-
91		-Back then, writing compiler-compilers was a popular academic research exercise (YACC stand for: Yet Another Compiler
92		-Compiler). It was great for compiler-designers, but clumsy to use for average developers: The syntax to write a grammar
93		-was hard to grasp, with many pitfalls, the interface between your code and the parser was awkward (you had to call
94		-``yyparse()``; needed some globals; OO wasn't invented, no inheritance or data-hiding, which resulted in puzzling tricks
95		-to use multiple parsers, etc).
96		-\|BR\|
97		-Aside of that, more and better parsing strategies are developed; that is handles in another :ref:`blog <grammmar-code>`.
98		-
99		-Unleash that power!
100		--------------------
101		-
102		-With those better parsing-algorithms, faster computers with a lot more memory and other inventions, writing grammars
103		-has become more peaceful. Except that you still need an extra step, another sytax, as you still need to use an external
104		-tool. That sometimes isn’t maintained after a couple of years ...
105		-\|BR\|
106		-The effect is, most developers don’t use grammars; the write parser-like code manually, the settle for less optimal
107		-result. Or are utterly not aware that grammer can provide a other (better, easier) solution.
108		-
109		-Castle has build-in support for grammers, and is hiding the nasty details of parsing-strategies. There is no need to
110		-generating, compiling, and use that code, with external tools. All that clutter is gone.
111		-\|BR\|
112		-With a few lines, you define the structure of the input. Each rule is like a function: it has a name (the left-hand-side
113		-of the rule, so the part before the arrow), and a implementation; the part after the arrow. That implementation “calls”
114		-other rules, like normal code.
115		-
116		-To use the grammar you simply call one of those rules as a function: pass the input (string) and it will return a
117		-(generic) tree-structure.
118		-\|BR\|
119		-When you simple like to verify the syntax is correct: use the tree as a boolean: when it not-empty the input is valid.
120		-
121		-But typically you proces/use that tree: like you do in many situations. Read the configuration values, walk over the
122		-tree, of traverse it as-if it is a DOM. You can even use Castle’s :ref:`matching-statements` to simply that.
123		-
124		-Grammars makes reading text easy. Define the structure, call the “main rule” and use the values. Castle makes that simple!

DocIdeas Fork

提交

标签

Frequently used words (click to add to your profile)

Commit MetaInfo

Log Message

更改概述

差异

DocIdeas
Fork