修订版 | a47ed1afefeef2d5cfd19dd685f75db9a356e3be (tree) |
---|---|
时间 | 2022-05-17 00:10:49 |
作者 | Albert Mietus < albert AT mietus DOT nl > |
Commiter | Albert Mietus < albert AT mietus DOT nl > |
asis
@@ -0,0 +1,218 @@ | ||
1 | +.. _grammmar-code: | |
2 | + | |
3 | +=============== | |
4 | +Grammar is code | |
5 | +=============== | |
6 | + | |
7 | +.. post:: 2022/05/14 | |
8 | + :category: Castle, DesignStudy | |
9 | + :tags: Castle, grammar, PEG | |
10 | + | |
11 | + In :ref:`Castle-CompilerCompiler` we have seen that we can define a grammar within a Castle-program. And we have | |
12 | + argued that each grammars-rule can be considered as a function. | |
13 | + | |
14 | + In this post, we look into de details of how this works. And will confirm grammars is code ... | |
15 | + | |
16 | +Let’s start with an example. The grammer below is a simplified description of how a (PEG) parsing-rule looks like. The | |
17 | +syntax in Castle is a bit more complicated, by having more details and options; but very simular. Most of the | |
18 | +Caste-grammers can be parsed by the grammer below! | |
19 | + | |
20 | +.. code-block:: PEG | |
21 | + | |
22 | + PEG_rule <- rule_name '<-' expression ';' ; | |
23 | + expression <- ordered_choice+ ; | |
24 | + ordered_choice <- sequence ( '|' sequence)* ; | |
25 | + sequence <- group | atom ; | |
26 | + group <- '(' expression ')' ; | |
27 | + atom <- rule_xref | str_lit | regexp ; | |
28 | + str_lit <- '"' /[^"]* '"' ; | |
29 | + regexp <- '/' /[^/]* '/' ; | |
30 | + rule_name <- ID ; | |
31 | + rule_xref <- ID ; | |
32 | + | |
33 | +With this grammer we can read and check whether an a string is a valid rule by simply calling: | |
34 | +:math:`ast:=PEG\_rule(single\_line)`. When not, ``ast`` is :math:`False`, else ``ast`` has 4 children (or fields). | |
35 | +Where :data:`ast[0]` represent the ``rule_name``, :data:`ast[2]` is an ``expression``; which is a sequence of | |
36 | +``ordered_choice``\(s). | |
37 | +|BR| | |
38 | +As the rule has two constants :data:`ast[1]` and :data:`ast[3]` will always match the strings *‘<-’* and *‘;’*. | |
39 | + | |
40 | + | |
41 | +Grammar to code | |
42 | +=============== | |
43 | + | |
44 | +Before we study how Castle *expands* a grammar to code, let examine how to do that by manually crafted code, or using a | |
45 | +external compiler-compiler. In both cases, we will focus on the “validation” part only. And we skip a lot of details; | |
46 | +assuming you know a bit of theory of (PEG) parser. When not, read Guido van Rossum’s excellent blog-series `PEG Parsing | |
47 | +Series Overview <https://medium.com/@gvanrossum_83706/peg-parsing-series-de5d41b2ed60>`__ | |
48 | + | |
49 | + | |
50 | +Hand written code | |
51 | +----------------- | |
52 | + | |
53 | +Again, let’s suppost we like to verify a string contains a *peg-rule*. Then we need some functions [#func]_, which signature | |
54 | +are something like: | |
55 | + | |
56 | +.. tabs:: | |
57 | + | |
58 | + .. code-tab:: python | |
59 | + | |
60 | + def PEG_rule(text: str) -> ast : ... | |
61 | + def expression(text: str) -> ast : ... | |
62 | + def ordered_choice(text: str) -> ast : ... | |
63 | + def sequence(text: str) -> ast : ... | |
64 | + ... | |
65 | + | |
66 | + .. code-tab:: c | |
67 | + | |
68 | + ast PEG_rule(char* txt); | |
69 | + ast expression(char* txt); | |
70 | + ast ordered_choice(char* txt); | |
71 | + ast sequence(char* txt); | |
72 | + ... | |
73 | + | |
74 | +Where we assume the type ``ast`` is defined in the chosen language and will be Null/None/Empty to signal :math:`False`: | |
75 | +not valid. | |
76 | + | |
77 | +The implementation of ``PEG_rule()`` checks thats the input-string starts with a ``rule_name``, followed by the literal | |
78 | +string **“<-”**, then has an ``expression`` and finally the literal string **“;”**. When one of those (four) steps fail, | |
79 | +we return :math:`False`. | |
80 | +|BR| | |
81 | +We follow this pattern for all rules. | |
82 | + | |
83 | +This concept is easy to implement: each rule-function calls other rule-functions as defined by the grammar. When we | |
84 | +need to check for a literal string we use :func:`expect(txt: str, literal: str) -> bool`. Also, we need a function to find | |
85 | +an ID; again easy. Sometimes we need to implement a loop --to handle ``*`` and ``+``. Or we need an :math:`or`, to | |
86 | +implement alternatives (``|``). None of that is rocket science. | |
87 | + | |
88 | +A real implementation is a bit harder, as we have to strip spaces (and comments), handle newlines, and need to keep | |
89 | +track of where we are. Typically that is (nowadays) done by embedding those functions in a class; then the “input text” can be | |
90 | +stored in the instance (instead of passing them constantly). That instance also has a ‘cursor’ to the current | |
91 | +location. | |
92 | + | |
93 | +More details | |
94 | +~~~~~~~~~~~~ | |
95 | + | |
96 | +There are a lot of details that make writing a grammer complex. We mention a few, and what it effect is on the (manually | |
97 | +written) code. | |
98 | + | |
99 | +When using alternatives (the ``|`` operator in the grammar), a PEG-parser will always try the first alternative first, | |
100 | +Only when that fails, it back-ups an try the next alternative. Sometimes means (almost) start again, and parse the same file almost | |
101 | +completely again. Therefore the *packrat* algorithm is usually used; using memoization. | |
102 | +|BR| | |
103 | +This is not hard: just add a few lines of boilerplate before and after each call. To store intermediate partial-ast(s) in a | |
104 | +cache. | |
105 | + | |
106 | +Sometimes, we like to use another parser-strategy, like LALR_ (used by Yacc_), GLR_ (e.g Bison, the successor of Yacc_) | |
107 | +or `LL(k)`_ (introduced by ANTLR, which was popular for a while); each one has it pros and cons. Still, all (or almost) | |
108 | +start with the same grammar (although smarter strategies may result is shorter, easier to maintain [#maintain]_ | |
109 | +grammars) [#notation]_. | |
110 | + | |
111 | +For a long time PEG-parsers where not able to handle left recursive rules [#leftStack]_. Until somebody discovered that is not | |
112 | +correct. Grammars in Castle can be left recursive! Both direct and indirect recursion is allowed. It is possible to | |
113 | +rewrite a grammar to remove the recursion. It makes the grammar however, more complex to maintain *(and for that reason | |
114 | +Castle will support recursion!)* | |
115 | +|BR| | |
116 | +By example, an simple calculation as :math:`7-5-3` should result in :math:`((7-5)-3)` but that needs left | |
117 | +recursion. When rewriting it, you must be carefull not to get :math:`(7-(5-3))`! | |
118 | +|BR| | |
119 | +This can be fixes, by adding an extra step. But it is better to use the update PEG-strategy: Just add more boilerplate code! | |
120 | + | |
121 | +.. tabs:: | |
122 | + | |
123 | + .. code-tab:: PEG Direct recursion | |
124 | + | |
125 | + expr <- expr '-' term | term | |
126 | + | |
127 | + .. code-tab:: PEG Indirect recursion | |
128 | + | |
129 | + A <- B "a" | "a" | |
130 | + B <- A "b" | "b" | |
131 | + | |
132 | + .. code-tab:: PEG A rewritten grammar | |
133 | + | |
134 | + expr <- term ( '-' term )* | |
135 | + | |
136 | + | |
137 | + | |
138 | +Generating the code | |
139 | +=================== | |
140 | + | |
141 | +You might recognise the pattern: To make the grammer more useful, the algorithms become more complex and adds more | |
142 | +code. This “extra” code, however is not hard; you just need the same (or almost the same) lines at many places. | |
143 | +|BR| | |
144 | +This begs for automation. And that is exactly what most compiler-compilers do. | |
145 | + | |
146 | +A compiler-compilers read the grammar and generates the code. As shown above it will generate (C, C++, C#, Java, | |
147 | +Python, or ...) functions [#OrTables]_ that call each-other. It will also detect left-recursion, and might compensate for | |
148 | +that. The result: more boilerplate-code; but as it is automatically generated this is easy. | |
149 | + | |
150 | +Classic tools | |
151 | +------------- | |
152 | +There are many tools, that we can use for inspiration. A short overview, and how it influences Castle. | |
153 | + | |
154 | +Possible the most famous compiler-compilers is Yacc_. It was developed in 197X and generates C-code that can be compiled | |
155 | +and linked to your code. To parse a string, you had to call ``yyparse())``. It would however be relatively simple to | |
156 | +generate functions with the name of each rule, using the same machinery. In that decade however, the goal was | |
157 | +differently. Memory was limited, what we can also see in the used grammar: one had to craft it carefully as the was no | |
158 | +back-tracking an only a single token look-ahead. | |
159 | + | |
160 | +Bison_ is Gnu reimplementation of Yacc_, but can use several parsing-algorithms.cLike Yacc_, it used a separate Lexer_: | |
161 | +*flex* (whereas Yacc uses *lex*). A lexer_ splits the input-string into a stream of *Tokens* using another (simpler, | |
162 | +but faster) algorithm. In that time that was relevant. | |
163 | +|BR| | |
164 | +As a lexer_ can be implemented with a parsing-algorithm (but not the other-way around), and as the need for speed doesn't | |
165 | +demand a separate lexer_ anymore; modern parsings are often “scannerless”. This removes the need to use two meta-syntaxes | |
166 | +(for the lexer/scanner and the parser) and so is simpler to use. | |
167 | +|BR| | |
168 | +Also Castle use a scannerless approach. | |
169 | + | |
170 | +Castle | |
171 | +------ | |
172 | +Also in Castle you can use grammars; but now directly in your program, using the Castle-syntax. And Castle will generate | |
173 | +“code”, Castle-functions that is. But now without an extra tool. | |
174 | +|BR| | |
175 | +Actually, it probably will not generate code; nor ‘code as text’. Why should we generate code, to read & parse it back | |
176 | +and compile it directly? It easier to generate the AST, that would be the result of parsing the generated-code directly. | |
177 | + | |
178 | +But the effect is the same. You create a set of function with this generic “text to tree” signature, by writing some | |
179 | +simle rule. Castle does the rest for you. Easy! | |
180 | + | |
181 | + | |
182 | +---------- | |
183 | + | |
184 | +.. rubric:: Footnotes | |
185 | + | |
186 | +.. [#func] | |
187 | + Instead of a **function**, it can also be a *method, or any *callable*. We use ‘function’ a generic term, in the | |
188 | + mathematical meaning: some input (parameters) and an output (return value). | |
189 | + | |
190 | +.. [#maintain] | |
191 | + This is not specially for grammers; all it valid for all programming-languages. New languages may introduce new | |
192 | + concepts (like --once-- OO). When the compiler becomes smarter, the programmer can focus in the important bits! | |
193 | + | |
194 | +.. [#notation] | |
195 | + Aside of multiple parser-algorithms, there are also several notation to write the grammar itself; like `EBNF | |
196 | + <https://en.wikipedia.org/wiki/Extended_Backus–Naur_form>`__ `ABNF | |
197 | + <https://en.wikipedia.org/wiki/Augmented_Backus–Naur_form>`__, and `YACC`_ | |
198 | + Most implementations of a given algorithm, use a dialect of a standard one, to enable :ref:`G2C-actions`, or .. | |
199 | + | |
200 | + Also Caste does this: We use the Caste-grammer, which is based on both EBNF and PEG; but using the classic ‘|’ | |
201 | + instead of the ‘\’ for ordered-choice. | |
202 | + | |
203 | +.. [#leftStack] | |
204 | + Without going into details left-recursion is hard for many parsing-algorithms. In the shown approach, a | |
205 | + rule-function (for a rule that is direct left-recurse) will call itself as first step. In this way no progress is | |
206 | + made, and the stack will quickly overrun. | |
207 | + | |
208 | +.. [#OrTables] | |
209 | + Some tools, like Yacc by example, use another approach. Instead of many functions it has a generic (run-time) library | |
210 | + that used code-tables; which are generated by the tool. Still, that is just a implementation detail. | |
211 | + | |
212 | +.. _LALR: https://en.wikipedia.org/wiki/LALR_parser | |
213 | +.. _LALR(1): LALR_ | |
214 | +.. _GLR: https://en.wikipedia.org/wiki/GLR_parser | |
215 | +.. _LL(k): https://en.wikipedia.org/wiki/LL_parser | |
216 | +.. _YACC: https://en.wikipedia.org/wiki/Yacc | |
217 | +.. _Bison: https://en.wikipedia.org/wiki/GNU_Bison | |
218 | +.. _Lexer: https://en.wikipedia.org/wiki/Lexical_analysis |
@@ -0,0 +1,124 @@ | ||
1 | +.. _Castle-CompilerCompiler: | |
2 | + | |
3 | +================= | |
4 | +Compiler Compiler | |
5 | +================= | |
6 | + | |
7 | +.. post:: 2022/05/7 | |
8 | + :category: Castle, Usage | |
9 | + :tags: Castle, grammar | |
10 | + | |
11 | + In Castle you can define a *grammar* directly in your code. The compiler will *translate* them into functions, using | |
12 | + the build-in (PEG) **compiler-compiler** -- at least that was it called back in the days of *YACC*. | |
13 | + | |
14 | + How do one use that? And why should you? | |
15 | + | |
16 | +Grammars, a short intro | |
17 | +======================= | |
18 | + | |
19 | +A grammar is a collection of (parsing)-**rules** and optionally some *settings*. Rules are written in a mixture of EBNF | |
20 | +and PEG meta-syntax. Let’s start with an simple example: | |
21 | + | |
22 | +.. code-block:: PEG | |
23 | + | |
24 | + castle_file <- ( import_line | interface | implementation )* ; | |
25 | + import_line <- IMPORT_stmt ( STRING_literal | qualID ) ';' ; | |
26 | + qualID <- '.'? nameID ('.' nameID )* ; | |
27 | + IMPORT_stmt = "import" ; | |
28 | + ... | |
29 | + | |
30 | + | |
31 | +This basically defines that a ``castle_file`` is either an ``import_line``, an ``interface``, an ``implementation``, or | |
32 | +sequence of them. Where an ``import_line`` starts with the ``IMPORT_stmt`` *(which is set to the string ‘import’, on | |
33 | +line 4)*, then comes either a ``STRING_literal`` (indeed a literal-string) or a ``qualID``, and ends with a semicolon | |
34 | +(`;`). Likewise, a ``qualID`` is a ``nameID`` *(a name that is used as ID, like in any programming language)*, | |
35 | +optionally followed by sub-names *(again like most languages: a dotted name, specifying a field (in a field, in | |
36 | +...)*. In Castle, that name may start with a dot --which is a shorthand notation for “in the current namespace”. You can | |
37 | +ignore that for know. | |
38 | + | |
39 | +The grammer defines how one should read the input --a text--, or more formally: how to parse it. The result of this | |
40 | +parsing is twofold. It will check whether input conforms to the grammer; resulting a in boolean, for the mathematics | |
41 | +under us. And it will translate a sequential (flat) text into a tree-structure; which typically much more useful for a | |
42 | +software-engineer. | |
43 | +|BR| | |
44 | +A well known example is this HTML-file. On disk it’s nothing but text, which is easy to store and to transfer. But | |
45 | +when send to your brouwer, it’s *parsed* and to create the `DOM | |
46 | +<https://nl.wikipedia.org/wiki/Document_Object_Model>`__; a tree of the document, with sections, paragraphs, | |
47 | +hyper-links, etc. By regarding it as a tree, it easy to describe (e.g. with CSS) how all parts, should be shown: all | |
48 | +headers have a background, the first row in a table is highlighed, etc. | |
49 | + | |
50 | + | |
51 | +Parsing | |
52 | +======= | |
53 | +Another well-known example is (the source of a) programm. As code, it is just text. But the compiler will parse it into | |
54 | +a parse-tree and/or an abstract-syntax-tree; which is build out of classes, methods, statements etc. | |
55 | +|BR| | |
56 | +But also your favorite IDE will *parse* it; to highlight the code, give tooltips, enable you to quickly navigate and | |
57 | +refactor it, and all those conviant features that make it your favorite editor. | |
58 | + | |
59 | +And even you are probably parsing text as part of your daily job. When you un-serialise data, you are (often) parsing | |
60 | +text; when you read the configuration, you are (or should be ) parsing that text. Even a simple input of the user might | |
61 | +need a bit of parsing. The text “42” is not the number :math:`42.0` -- you need to convert it; parse it. | |
62 | + | |
63 | +There a many ways to *parse*. You do not need a full-fledged grammer to translate “42” into :math:`42` or | |
64 | +:math:`42.0` --a stdlib functions as ``atoi()`` or ``atof()`` will do. But how about handling complex numbers | |
65 | +(:math:`4+j2`) or fractions (:math:`\frac{17}{42}`)? | |
66 | + | |
67 | +Non-parsing | |
68 | +----------- | |
69 | + | |
70 | +As proper passing used to hard, other similar (but simpler) techniques do exist, like `globing | |
71 | +<https://en.wikipedia.org/wiki/Glob_(programming)>`__ (``\*.Castle`` on the bash-prompt will result in all | |
72 | +Castle-files). Using `regular-expressions <https://en.wikipedia.org/wiki/Regular_expression>`__ is more powerfull, and | |
73 | +often used to highlight code; a pattern as ``//.*$`` can be used to highlight (single-line) comment. It often works, but | |
74 | +this simple pattern might match a piece of text *inside* a multi-line-(doc)string -- which wrong. | |
75 | +|BR| | |
76 | +To parse a input-text its not a sound solution; although I have seen cunning regular-expressions, that almost always | |
77 | +work. But *reg-exps* have not the same power as a grammar-- That is already proven halve a century ago and will not be | |
78 | +repeated here. | |
79 | + | |
80 | +Grammars are more powerfull | |
81 | +=========================== | |
82 | + | |
83 | +A grammar (even a simple one) is more powerfull. You can define the overal structure of the input and the sub-structure | |
84 | +of each *lump*. When a multi-line-string has no sub-structure, the parser will never find comments inside it. Nor other | |
85 | +way around; it simple is not hunting for it. | |
86 | + | |
87 | +As most programming-languages do not have build-in support for grammars, one has to resort to external tools. Like the | |
88 | +famous `YACC <https://en.wikipedia.org/wiki/Yacc>`__; developed in 197X. YACC will read a grammar-file, and generates | |
89 | +C-code that can be compiled and linked to your code. | |
90 | + | |
91 | +Back then, writing compiler-compilers was a popular academic research exercise (YACC stand for: Yet Another Compiler | |
92 | +Compiler). It was great for compiler-designers, but clumsy to use for average developers: The syntax to write a grammar | |
93 | +was hard to grasp, with many pitfalls, the interface between your code and the parser was awkward (you had to call | |
94 | +``yyparse()``; needed some globals; OO wasn't invented, no inheritance or data-hiding, which resulted in puzzling tricks | |
95 | +to use multiple parsers, etc). | |
96 | +|BR| | |
97 | +Aside of that, more and better parsing strategies are developed; that is handles in another :ref:`blog <grammmar-code>`. | |
98 | + | |
99 | +Unleash that power! | |
100 | +------------------- | |
101 | + | |
102 | +With those better parsing-algorithms, faster computers with a lot more memory and other inventions, writing grammars | |
103 | +has become more peaceful. Except that you still need an extra step, another sytax, as you still need to use an external | |
104 | +tool. That sometimes isn’t maintained after a couple of years ... | |
105 | +|BR| | |
106 | +The effect is, most developers don’t use grammars; the write parser-like code manually, the settle for less optimal | |
107 | +result. Or are utterly not aware that grammer can provide a other (better, easier) solution. | |
108 | + | |
109 | +Castle has build-in support for grammers, and is hiding the nasty details of parsing-strategies. There is no need to | |
110 | +generating, compiling, and use that code, with external tools. All that clutter is gone. | |
111 | +|BR| | |
112 | +With a few lines, you define the structure of the input. Each rule is like a function: it has a name (the left-hand-side | |
113 | +of the rule, so the part before the arrow), and a implementation; the part after the arrow. That implementation “calls” | |
114 | +other rules, like normal code. | |
115 | + | |
116 | +To use the grammar you simply call one of those rules as a function: pass the input (string) and it will return a | |
117 | +(generic) tree-structure. | |
118 | +|BR| | |
119 | +When you simple like to verify the syntax is correct: use the tree as a boolean: when it not-empty the input is valid. | |
120 | + | |
121 | +But typically you proces/use that tree: like you do in many situations. Read the configuration values, walk over the | |
122 | +tree, of traverse it as-if it is a DOM. You can even use Castle’s :ref:`matching-statements` to simply that. | |
123 | + | |
124 | +Grammars makes reading text easy. Define the structure, call the “main rule” and use the values. Castle makes that simple! |
@@ -1,124 +0,0 @@ | ||
1 | -.. _Castle-CompilerCompiler: | |
2 | - | |
3 | -================= | |
4 | -Compiler Compiler | |
5 | -================= | |
6 | - | |
7 | -.. post:: 2022/05/7 | |
8 | - :category: Castle, Usage | |
9 | - :tags: Castle, grammar | |
10 | - | |
11 | - In Castle you can define a *grammar* directly in your code. The compiler will *translate* them into functions, using | |
12 | - the build-in (PEG) **compiler-compiler** -- at least that was it called back in the days of *YACC*. | |
13 | - | |
14 | - How do one use that? And why should you? | |
15 | - | |
16 | -Grammars, a short intro | |
17 | -======================= | |
18 | - | |
19 | -A grammar is a collection of (parsing)-**rules** and optionally some *settings*. Rules are written in a mixture of EBNF | |
20 | -and PEG meta-syntax. Let’s start with an simple example: | |
21 | - | |
22 | -.. code-block:: PEG | |
23 | - | |
24 | - castle_file <- ( import_line | interface | implementation )* ; | |
25 | - import_line <- IMPORT_stmt ( STRING_literal | qualID ) ';' ; | |
26 | - qualID <- '.'? nameID ('.' nameID )* ; | |
27 | - IMPORT_stmt = "import" ; | |
28 | - ... | |
29 | - | |
30 | - | |
31 | -This basically defines that a ``castle_file`` is either an ``import_line``, an ``interface``, an ``implementation``, or | |
32 | -sequence of them. Where an ``import_line`` starts with the ``IMPORT_stmt`` *(which is set to the string ‘import’, on | |
33 | -line 4)*, then comes either a ``STRING_literal`` (indeed a literal-string) or a ``qualID``, and ends with a semicolon | |
34 | -(`;`). Likewise, a ``qualID`` is a ``nameID`` *(a name that is used as ID, like in any programming language)*, | |
35 | -optionally followed by sub-names *(again like most languages: a dotted name, specifying a field (in a field, in | |
36 | -...)*. In Castle, that name may start with a dot --which is a shorthand notation for “in the current namespace”. You can | |
37 | -ignore that for know. | |
38 | - | |
39 | -The grammer defines how one should read the input --a text--, or more formally: how to parse it. The result of this | |
40 | -parsing is twofold. It will check whether input conforms to the grammer; resulting a in boolean, for the mathematics | |
41 | -under us. And it will translate a sequential (flat) text into a tree-structure; which typically much more useful for a | |
42 | -software-engineer. | |
43 | -|BR| | |
44 | -A well known example is this HTML-file. On disk it’s nothing but text, which is easy to store and to transfer. But | |
45 | -when send to your brouwer, it’s *parsed* and to create the `DOM | |
46 | -<https://nl.wikipedia.org/wiki/Document_Object_Model>`__; a tree of the document, with sections, paragraphs, | |
47 | -hyper-links, etc. By regarding it as a tree, it easy to describe (e.g. with CSS) how all parts, should be shown: all | |
48 | -headers have a background, the first row in a table is highlighed, etc. | |
49 | - | |
50 | - | |
51 | -Parsing | |
52 | -======= | |
53 | -Another well-known example is (the source of a) programm. As code, it is just text. But the compiler will parse it into | |
54 | -a parse-tree and/or an abstract-syntax-tree; which is build out of classes, methods, statements etc. | |
55 | -|BR| | |
56 | -But also your favorite IDE will *parse* it; to highlight the code, give tooltips, enable you to quickly navigate and | |
57 | -refactor it, and all those conviant features that make it your favorite editor. | |
58 | - | |
59 | -And even you are probably parsing text as part of your daily job. When you un-serialise data, you are (often) parsing | |
60 | -text; when you read the configuration, you are (or should be ) parsing that text. Even a simple input of the user might | |
61 | -need a bit of parsing. The text “42” is not the number :math:`42.0` -- you need to convert it; parse it. | |
62 | - | |
63 | -There a many ways to *parse*. You do not need a full-fledged grammer to translate “42” into :math:`42` or | |
64 | -:math:`42.0` --a stdlib functions as ``atoi()`` or ``atof()`` will do. But how about handling complex numbers | |
65 | -(:math:`4+j2`) or fractions (:math:`\frac{17}{42}`)? | |
66 | - | |
67 | -Non-parsing | |
68 | ------------ | |
69 | - | |
70 | -As proper passing used to hard, other similar (but simpler) techniques do exist, like `globing | |
71 | -<https://en.wikipedia.org/wiki/Glob_(programming)>`__ (``\*.Castle`` on the bash-prompt will result in all | |
72 | -Castle-files). Using `regular-expressions <https://en.wikipedia.org/wiki/Regular_expression>`__ is more powerfull, and | |
73 | -often used to highlight code; a pattern as ``//.*$`` can be used to highlight (single-line) comment. It often works, but | |
74 | -this simple pattern might match a piece of text *inside* a multi-line-(doc)string -- which wrong. | |
75 | -|BR| | |
76 | -To parse a input-text its not a sound solution; although I have seen cunning regular-expressions, that almost always | |
77 | -work. But *reg-exps* have not the same power as a grammar-- That is already proven halve a century ago and will not be | |
78 | -repeated here. | |
79 | - | |
80 | -Grammars are more powerfull | |
81 | -=========================== | |
82 | - | |
83 | -A grammar (even a simple one) is more powerfull. You can define the overal structure of the input and the sub-structure | |
84 | -of each *lump*. When a multi-line-string has no sub-structure, the parser will never find comments inside it. Nor other | |
85 | -way around; it simple is not hunting for it. | |
86 | - | |
87 | -As most programming-languages do not have build-in support for grammars, one has to resort to external tools. Like the | |
88 | -famous `YACC <https://en.wikipedia.org/wiki/Yacc>`__; developed in 197X. YACC will read a grammar-file, and generates | |
89 | -C-code that can be compiled and linked to your code. | |
90 | - | |
91 | -Back then, writing compiler-compilers was a popular academic research exercise (YACC stand for: Yet Another Compiler | |
92 | -Compiler). It was great for compiler-designers, but clumsy to use for average developers: The syntax to write a grammar | |
93 | -was hard to grasp, with many pitfalls, the interface between your code and the parser was awkward (you had to call | |
94 | -``yyparse()``; needed some globals; OO wasn't invented, no inheritance or data-hiding, which resulted in puzzling tricks | |
95 | -to use multiple parsers, etc). | |
96 | -|BR| | |
97 | -Aside of that, more and better parsing strategies are developed; that is handles in another :ref:`blog <grammmar-code>`. | |
98 | - | |
99 | -Unleash that power! | |
100 | -------------------- | |
101 | - | |
102 | -With those better parsing-algorithms, faster computers with a lot more memory and other inventions, writing grammars | |
103 | -has become more peaceful. Except that you still need an extra step, another sytax, as you still need to use an external | |
104 | -tool. That sometimes isn’t maintained after a couple of years ... | |
105 | -|BR| | |
106 | -The effect is, most developers don’t use grammars; the write parser-like code manually, the settle for less optimal | |
107 | -result. Or are utterly not aware that grammer can provide a other (better, easier) solution. | |
108 | - | |
109 | -Castle has build-in support for grammers, and is hiding the nasty details of parsing-strategies. There is no need to | |
110 | -generating, compiling, and use that code, with external tools. All that clutter is gone. | |
111 | -|BR| | |
112 | -With a few lines, you define the structure of the input. Each rule is like a function: it has a name (the left-hand-side | |
113 | -of the rule, so the part before the arrow), and a implementation; the part after the arrow. That implementation “calls” | |
114 | -other rules, like normal code. | |
115 | - | |
116 | -To use the grammar you simply call one of those rules as a function: pass the input (string) and it will return a | |
117 | -(generic) tree-structure. | |
118 | -|BR| | |
119 | -When you simple like to verify the syntax is correct: use the tree as a boolean: when it not-empty the input is valid. | |
120 | - | |
121 | -But typically you proces/use that tree: like you do in many situations. Read the configuration values, walk over the | |
122 | -tree, of traverse it as-if it is a DOM. You can even use Castle’s :ref:`matching-statements` to simply that. | |
123 | - | |
124 | -Grammars makes reading text easy. Define the structure, call the “main rule” and use the values. Castle makes that simple! |