reference/parle/pattern.matching.xml
0be325d396b139feeaee38168f779be962b74f09
...
...
@@ -1,16 +1,34 @@
1
1
<?xml version="1.0" encoding="utf-8"?>
2
-
<!-- $Revision$ -->
3
-

4
-
<chapter xml:id="parle.pattern.matching" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
2
+
<chapter xml:id="parle.pattern.matching" xmlns="http://docbook.org/ns/docbook">
5
3
<title>Parle pattern matching</title>
6
4
<titleabbrev>Pattern matching</titleabbrev>
7
5
<para>
8
-
Parle supports regex matching similar to flex. Also supported are the following POSIX character sets: <literal>[:alnum:]</literal>, <literal>[:alpha:]</literal>, <literal>[:blank:]</literal>, <literal>[:cntrl:]</literal>, <literal>[:digit:]</literal>, <literal>[:graph:]</literal>, <literal>[:lower:]</literal>, <literal>[:print:]</literal>, <literal>[:punct:]</literal>, <literal>[:space:]</literal>, <literal>[:upper:]</literal> and <literal>[:xdigit:]</literal>.
6
+
Parle supports regex matching similar to flex.
7
+
Also supported are the following POSIX character sets:
8
+
<simplelist type="inline">
9
+
<member><literal>[:alnum:]</literal></member>
10
+
<member><literal>[:alpha:]</literal></member>
11
+
<member><literal>[:blank:]</literal></member>
12
+
<member><literal>[:cntrl:]</literal></member>
13
+
<member><literal>[:digit:]</literal></member>
14
+
<member><literal>[:graph:]</literal></member>
15
+
<member><literal>[:lower:]</literal></member>
16
+
<member><literal>[:print:]</literal></member>
17
+
<member><literal>[:punct:]</literal></member>
18
+
<member><literal>[:space:]</literal></member>
19
+
<member><literal>[:upper:]</literal></member>
20
+
<member><literal>[:xdigit:]</literal></member>
21
+
</simplelist>
22
+
.
9
23
</para>
10
24
<para>
11
-
The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available. A particular encoding can be mapped with a correctly constructed regex. For example, to match the EURO symbol encoded in UTF-8, the regular expression <literal>[\xe2][\x82][\xac]</literal> can be used. The pattern for an UTF-8 encoded string could be <literal>[ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+</literal>.
25
+
The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available.
26
+
A particular encoding can be mapped with a correctly constructed regex.
27
+
For example, to match the EURO symbol encoded in UTF-8, the regular expression <literal>[\xe2][\x82][\xac]</literal> can be used.
28
+
The pattern for an UTF-8 encoded string could be <literal>[ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+</literal>.
12
29
</para>
13
-
<section xml:id="parle.regex.chars">
30
+

31
+
<section xml:id="parle.regex.chars" annotations="chunk:false">
14
32
<title>Character representations</title>
15
33
<para>
16
34
<table>
...
...
@@ -60,7 +78,7 @@
60
78
</table>
61
79
</para>
62
80
</section>
63
-
<section xml:id="parle.regex.charclass">
81
+
<section xml:id="parle.regex.charclass" annotations="chunk:false">
64
82
<title>Character classes</title>
65
83
<para>
66
84
<table>
...
...
@@ -104,7 +122,7 @@
104
122
</table>
105
123
</para>
106
124
</section>
107
-
<section xml:id="parle.regex.unicodecharclass">
125
+
<section xml:id="parle.regex.unicodecharclass" annotations="chunk:false">
108
126
<title>Unicode character classes</title>
109
127
<para>
110
128
<table>
...
...
@@ -232,10 +250,10 @@
232
250
</table>
233
251
</para>
234
252
<para>
235
-
These character clasess are only available, if the option --enable-parle-utf32 was passed at the compilation time.
253
+
These character classes are only available, if the option --enable-parle-utf32 was passed at the compilation time.
236
254
</para>
237
255
</section>
238
-
<section xml:id="parle.regex.alternation">
256
+
<section xml:id="parle.regex.alternation" annotations="chunk:false">
239
257
<title>Alternation and repetition</title>
240
258
<para>
241
259
<table>
...
...
@@ -248,7 +266,7 @@
248
266
</thead>
249
267
<tbody>
250
268
<row>
251
-
<entry>...|...</entry><entry>-</entry><entry>Try subpatterns in alternation.</entry>
269
+
<entry>...|...</entry><entry>-</entry><entry>Try sub-patterns in alternation.</entry>
252
270
</row>
253
271
<row>
254
272
<entry>*</entry><entry>yes</entry><entry>Match 0 or more times.</entry>
...
...
@@ -291,7 +309,7 @@
291
309
</table>
292
310
</para>
293
311
</section>
294
-
<section xml:id="parle.regex.anchors">
312
+
<section xml:id="parle.regex.anchors" annotations="chunk:false">
295
313
<title>Anchors</title>
296
314
<para>
297
315
<table>
...
...
@@ -314,7 +332,7 @@
314
332
</table>
315
333
</para>
316
334
</section>
317
-
<section xml:id="parle.regex.grouping">
335
+
<section xml:id="parle.regex.grouping" annotations="chunk:false">
318
336
<title>Grouping</title>
319
337
<para>
320
338
<table>
...
...
@@ -322,49 +340,47 @@
322
340
<tgroup cols="2">
323
341
<thead>
324
342
<row>
325
-
<entry>Sequence</entry><entry>Description</entry>
343
+
<entry>Sequence</entry>
344
+
<entry>Description</entry>
326
345
</row>
327
346
</thead>
328
347
<tbody>
329
348
<row>
330
-
<entry>(...)</entry><entry>Group a regular expression to override default operator precedence.</entry>
349
+
<entry>(...)</entry>
350
+
<entry>Group a regular expression to override default operator precedence.</entry>
331
351
</row>
332
352
<row>
333
353
<entry valign="top">(?r-s:pattern)</entry>
334
354
<entry>
335
-
Apply option r and omit option s while interpreting pattern. Options may be zero or more of the characters i, s, or x.
336
-
<table>
337
-
<title>Options</title>
338
-
<tgroup cols="2">
339
-
<thead>
340
-
<row>
341
-
<entry>Option</entry><entry>Description</entry>
342
-
</row>
343
-
</thead>
344
-
<tbody>
345
-
<row>
346
-
<entry>i</entry><entry>Case insensitive.</entry>
347
-
</row>
348
-
<row>
349
-
<entry>-i</entry><entry>Case sensitive.</entry>
350
-
</row>
351
-
<row>
352
-
<entry>s</entry><entry>Alters the meaning of '.' to match any character whatsoever.</entry>
353
-
</row>
354
-
<row>
355
-
<entry>-s</entry><entry>Alters the meaning of '.' to match any character except '\n'.</entry>
356
-
</row>
357
-
<row>
358
-
<entry>x</entry><entry>Ignores comments and whitespace in patterns. Whitespace is ignored unless it is backslash-escaped, contained within ""s, or appears inside a character range.</entry>
359
-
</row>
360
-
</tbody>
361
-
</tgroup>
362
-
</table>
363
-
These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer.
355
+
<simpara>
356
+
Apply option r and omit option s while interpreting pattern.
357
+
Options may be zero or more of the characters i, s, or x.
358
+
</simpara>
359
+
<simpara>
360
+
<literal>i</literal> means case-insensitive.
361
+
</simpara>
362
+
<simpara>
363
+
<literal>-i</literal> means case-sensitive.
364
+
</simpara>
365
+
<simpara>
366
+
<literal>s</literal> alters the meaning of <literal>.</literal> to match any character whatsoever.
367
+
</simpara>
368
+
<simpara>
369
+
<literal>-s</literal> alters the meaning of <literal>.</literal> to match any character except <literal>\n</literal>.
370
+
</simpara>
371
+
<simpara>
372
+
<literal>x</literal> ignores comments and whitespace in patterns.
373
+
Whitespace is ignored unless it is backslash-escaped, contained within <literal>""s</literal>,
374
+
or appears inside a character range.
375
+
</simpara>
376
+
<simpara>
377
+
These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer.
378
+
</simpara>
364
379
</entry>
365
380
</row>
366
381
<row>
367
-
<entry>(?# comment )</entry><entry>Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines.</entry>
382
+
<entry>(?# comment )</entry>
383
+
<entry>Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines.</entry>
368
384
</row>
369
385
</tbody>
370
386
</tgroup>
371
387