PHP: Documentation Tools

reference/parle/pattern.matching.xml
0be325d396b139feeaee38168f779be962b74f09

...

@@ -1,16 +1,34 @@

<?xml version="1.0" encoding="utf-8"?>

<!-- $Revision$ -->

<chapter xml:id="parle.pattern.matching" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">

<chapter xml:id="parle.pattern.matching" xmlns="http://docbook.org/ns/docbook">

 <title>Parle pattern matching</title>

 <titleabbrev>Pattern matching</titleabbrev>

 <para>

  Parle supports regex matching similar to flex. Also supported are the following POSIX character sets: <literal>[:alnum:]</literal>, <literal>[:alpha:]</literal>, <literal>[:blank:]</literal>, <literal>[:cntrl:]</literal>, <literal>[:digit:]</literal>, <literal>[:graph:]</literal>, <literal>[:lower:]</literal>, <literal>[:print:]</literal>, <literal>[:punct:]</literal>, <literal>[:space:]</literal>, <literal>[:upper:]</literal> and <literal>[:xdigit:]</literal>.

  Parle supports regex matching similar to flex.

  Also supported are the following POSIX character sets:

  <simplelist type="inline">

   <member><literal>[:alnum:]</literal></member>

   <member><literal>[:alpha:]</literal></member>

   <member><literal>[:blank:]</literal></member>

   <member><literal>[:cntrl:]</literal></member>

   <member><literal>[:digit:]</literal></member>

   <member><literal>[:graph:]</literal></member>

   <member><literal>[:lower:]</literal></member>

   <member><literal>[:print:]</literal></member>

   <member><literal>[:punct:]</literal></member>

   <member><literal>[:space:]</literal></member>

   <member><literal>[:upper:]</literal></member>

   <member><literal>[:xdigit:]</literal></member>

  </simplelist>

.

 </para>

 <para>

  The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available. A particular encoding can be mapped with a correctly constructed regex. For example, to match the EURO symbol encoded in UTF-8, the regular expression <literal>[\xe2][\x82][\xac]</literal> can be used. The pattern for an UTF-8 encoded string could be <literal>[ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+</literal>.

  The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available.

  A particular encoding can be mapped with a correctly constructed regex.

  For example, to match the EURO symbol encoded in UTF-8, the regular expression <literal>[\xe2][\x82][\xac]</literal> can be used.

  The pattern for an UTF-8 encoded string could be <literal>[ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+</literal>.

 </para>

 <section xml:id="parle.regex.chars">

 <section xml:id="parle.regex.chars" annotations="chunk:false">

  <title>Character representations</title>

  <para>

   <table>

...

@@ -60,7 +78,7 @@

   </table>

  </para>

 </section>

 <section xml:id="parle.regex.charclass">

 <section xml:id="parle.regex.charclass" annotations="chunk:false">

  <title>Character classes</title>

  <para>

   <table>

...

@@ -104,7 +122,7 @@

104

122

   </table>

105

123

  </para>

106

124

 </section>

107

 <section xml:id="parle.regex.unicodecharclass">

125

 <section xml:id="parle.regex.unicodecharclass" annotations="chunk:false">

108

126

  <title>Unicode character classes</title>

109

127

  <para>

110

128

   <table>

...

@@ -232,10 +250,10 @@

232

250

   </table>

233

251

  </para>

234

252

  <para>

235

   These character clasess are only available, if the option --enable-parle-utf32 was passed at the compilation time.

253

   These character classes are only available, if the option --enable-parle-utf32 was passed at the compilation time.

236

254

  </para>

237

255

 </section>

238

 <section xml:id="parle.regex.alternation">

256

 <section xml:id="parle.regex.alternation" annotations="chunk:false">

239

257

  <title>Alternation and repetition</title>

240

258

  <para>

241

259

   <table>

...

@@ -248,7 +266,7 @@

248

266

     </thead>

249

267

     <tbody>

250

268

      <row>

251

       <entry>...|...</entry><entry>-</entry><entry>Try subpatterns in alternation.</entry>

269

       <entry>...|...</entry><entry>-</entry><entry>Try sub-patterns in alternation.</entry>

252

270

      </row>

253

271

      <row>

254

272

       <entry>*</entry><entry>yes</entry><entry>Match 0 or more times.</entry>

...

@@ -291,7 +309,7 @@

291

309

   </table>

292

310

  </para>

293

311

 </section>

294

 <section xml:id="parle.regex.anchors">

312

 <section xml:id="parle.regex.anchors" annotations="chunk:false">

295

313

  <title>Anchors</title>

296

314

  <para>

297

315

   <table>

...

@@ -314,7 +332,7 @@

314

332

   </table>

315

333

  </para>

316

334

 </section>

317

 <section xml:id="parle.regex.grouping">

335

 <section xml:id="parle.regex.grouping" annotations="chunk:false">

318

336

  <title>Grouping</title>

319

337

  <para>

320

338

   <table>

...

@@ -322,49 +340,47 @@

322

340

    <tgroup cols="2">

323

341

     <thead>

324

342

      <row>

325

       <entry>Sequence</entry><entry>Description</entry>

343

       <entry>Sequence</entry>

344

       <entry>Description</entry>

326

345

      </row>

327

346

     </thead>

328

347

     <tbody>

329

348

      <row>

330

       <entry>(...)</entry><entry>Group a regular expression to override default operator precedence.</entry>

349

       <entry>(...)</entry>

350

       <entry>Group a regular expression to override default operator precedence.</entry>

331

351

      </row>

332

352

      <row>

333

353

       <entry valign="top">(?r-s:pattern)</entry>

334

354

       <entry>

335

        Apply option r and omit option s while interpreting pattern. Options may be zero or more of the characters i, s, or x.

336

        <table>

337

         <title>Options</title>

338

          <tgroup cols="2">

339

           <thead>

340

            <row>

341

             <entry>Option</entry><entry>Description</entry>

342

            </row>

343

           </thead>

344

          <tbody>

345

            <row>

346

             <entry>i</entry><entry>Case insensitive.</entry>

347

            </row>

348

            <row>

349

             <entry>-i</entry><entry>Case sensitive.</entry>

350

            </row>

351

            <row>

352

             <entry>s</entry><entry>Alters the meaning of '.' to match any character whatsoever.</entry>

353

            </row>

354

            <row>

355

             <entry>-s</entry><entry>Alters the meaning of '.' to match any character except '\n'.</entry>

356

            </row>

357

            <row>

358

             <entry>x</entry><entry>Ignores comments and whitespace in patterns. Whitespace is ignored unless it is backslash-escaped, contained within ""s, or appears inside a character range.</entry>

359

            </row>

360

          </tbody>

361

         </tgroup>

362

        </table>

363

        These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer.

355

        <simpara>

356

         Apply option r and omit option s while interpreting pattern.

357

         Options may be zero or more of the characters i, s, or x.

358

        </simpara>

359

        <simpara>

360

         <literal>i</literal> means case-insensitive.

361

        </simpara>

362

        <simpara>

363

         <literal>-i</literal> means case-sensitive.

364

        </simpara>

365

        <simpara>

366

         <literal>s</literal> alters the meaning of <literal>.</literal> to match any character whatsoever.

367

        </simpara>

368

        <simpara>

369

         <literal>-s</literal> alters the meaning of <literal>.</literal> to match any character except <literal>\n</literal>.

370

        </simpara>

371

        <simpara>

372

         <literal>x</literal> ignores comments and whitespace in patterns.

373

         Whitespace is ignored unless it is backslash-escaped, contained within <literal>""s</literal>,

374

         or appears inside a character range.

375

        </simpara>

376

        <simpara>

377

         These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer.

378

        </simpara>

364

379

       </entry>

365

380

      </row>

366

381

      <row>

367

       <entry>(?# comment )</entry><entry>Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines.</entry>

382

       <entry>(?# comment )</entry>

383

       <entry>Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines.</entry>

368

384

      </row>

369

385

     </tbody>

370

386

    </tgroup>

371

387

Generated: 05 May 2024 19:17:58

Tools (Russian Manual)