Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

regex -- Oniguruma (Onigmo) regular expressions

Overview

This module provides regular expressions based on Onigmo fork of Oniguruma library. it uses Ruby grammar with several customizable syntax modifications:

  • by default, Dao string patterns are mimicked: % is used instead of \, whitespace characters are ignored outside of [...] groups
  • one-line comments starting with # can be used
  • implicit spacing mode: outside of [...], a standalone whitespace character or \r\n are interpreted as \\s*, and a pair of equal whitespace characters is interpreted as \\s+

Grammar description can be found in Onigmo/doc/RE.

Installation

Currently, Onigmo should be built manually from the source provided with the module. In order to link it statically to regex module, Onigmo should be configured as ./configure CFLAGS=-fPIC LFLAGS=-fPIC to enable position-independent code (on Windows, the relevant makefiles need to be edited). Consult Onigmo/README for details as to how to build the library on various platforms.

Index

namespace re

invar class Regex

  • .pattern(self: Regex) => string
  • .groupCount(self: Regex) => int
  • .ignoresCase(self: Regex) => bool
  • fetch(self: Regex, target: string, group: int|string = 0, start = 0, end = -1) => string
  • search(self: Regex, target: string, start = 0, end = -1) => Match|none
  • matches(self: Regex, target: string) => bool
  • extract(self: Regex, target: string, matchType: enum<both,matched,unmatched> = $matched) => list<string>
  • replace(self: Regex, target: string, format: string, start = 0, end = -1) => string
  • scan(self: Regex, target: string, start = 0, end = -1)[found: Match => none|@V] => list<@V>
  • replace(self: Regex, target: string, start = 0, end = -1)[found: Match => string] => string
  • iter(self: Regex, target: string, start = 0, end = -1) => Iter

invar class Match

  • string(self: Match, group: int|string = 0) => string
  • size(self: Match, group: int|string = 0) => int
  • start(self: Match, group: int|string = 0) => int
  • end(self: Match, group: int|string = 0) => int
  • .groupCount(self: Match) => int

class Iter

  • for(self: Iter, iterator: ForIterator)
  • [](self: Iter, index: ForIterator) => Match

Functions:

  • compile(pattern: string) => Regex
  • compile(pattern: string, options: enum<strictSpacing;impliedSpacing;ignoreCase;allowComments;useBackslash>) => Regex

Classes

Regular expression using Onigmo fork of Oniguruma library with Ruby grammar as backend. See [compile()(#compile) for usage details.

Methods

.pattern(self: Regex) => string

String pattern

Note: The pattern is stored in canonical form, i.e. with strict spacing, '' as escape character and without comments

.groupCount(self: Regex) => int

Number of capture groups in the pattern

.ignoresCase(self: Regex) => bool

Case-insensitivity

fetch(self: Regex, target: string, group: int|string = 0, start = 0, end = -1) => string

Finds the first match in target in the range [start; end] and returns sub-match specified by group.

Note: For the interpretation of group numbers, see Match Errors: Param in case of invalid group or matching range

search(self: Regex, target: string, start = 0, end = -1) => Match|none

Returns the first match in target in the range [start; end], or none if no match was found

Errors: Param in case of invalid matching range

matches(self: Regex, target: string) => bool

Checks if the entire target is matched by the regex

extract(self: Regex, target: string, matchType: enum<both,matched,unmatched> = $matched) => list<string>

Returns all matches in target (or unmatched, or both, depending on matchType)

replace(self: Regex, target: string, format: string, start = 0, end = -1) => string

Replaces all matches in target in the range [start; end] with format string. Returns the entire resulting string. format may contain backreferences in the form '$<group number from 0 to 9>' or '$(<group name>)'; '$$' can be to escape '$'

Errors: Param in case of invalid matching range, Regex in case of invalid backreference

scan(self: Regex, target: string, start = 0, end = -1)[found: Match => none|@V] => list<@V>

Iterates over all matches in target in the range [start; end], yielding each match as found. Returns the list of values obtained from the code section

Errors: Param in case of invalid matching range

replace(self: Regex, target: string, start = 0, end = -1)[found: Match => string] => string

Iterates over all matches in target, yielding each of them as found. Returns the string formed by replacing each match in target by the corresponding string returned from the code section

Errors: Param in case of invalid matching range

iter(self: Regex, target: string, start = 0, end = -1) => Iter

Returns for iterator to iterate over all matches in target in the range [start; end].

Note: Changing target has no effect on the iteration process (the iterator will still be bound to the original string) Errors: Param in case of invalid matching range


Single regular expression match providing information on matched sub-string and individual captured groups.

group parameter in Match methods may either be a group number or its name.

Group number is interpreted the following way:

  • group == 0 -- entire matched sub-string
  • group > 0 and group <= groupCount() -- corresponding sub-match
  • group < 0 or group > groupCount() -- not permitted

If group is a name, the last group in the pattern with this name is assumed (at least one such group must exist).

Methods

string(self: Match, group: int|string = 0) => string

Sub-string captured by group

Errors: Param in case of invalid group

size(self: Match, group: int|string = 0) => int

Size of the sub-string captured by group

Errors: Param in case of invalid group

start(self: Match, group: int|string = 0) => int

Start position of the sub-string captured by group

Errors: Param in case of invalid group

end(self: Match, group: int|string = 0) => int

End position of the sub-string captured by group

Errors: Param in case of invalid group

.groupCount(self: Match) => int

Number of captured groups


for iterator to iterate over regular expression matches in a string

for(self: Iter, iterator: ForIterator)

[](self: Iter, index: ForIterator) => Match

Functions

compile(pattern: string) => Regex
compile(pattern: string, options: enum<strictSpacing;impliedSpacing;ignoreCase;allowComments;useBackslash>) => Regex

Constructs regular expression from pattern using specified options (if provided).

Default options mimic Dao string patterns syntax:

  • free spacing -- whitespace is ignored outside of '[...]'
  • '%' is used as control character
  • the pattern is treated as case-sensitive

This behavior can be overridden with the following values of options:

  • $strictSpacing -- whitespace characters in the pattern are treated 'as is' (canonical behavior)
  • $impliedSpacing -- outside of '[ ... ]', a standalone whitespace character or '\r\n' are interpreted as '\s*', and a pair of equal whitespace characters is interpreted as '\s+'
  • $ignoreCase -- the pattern is treated as case-insensitive
  • $allowComments -- all characters starting from '#' up to '\n' (or end of string) are ignored as comments ('#' can be escaped)
  • $useBackslash -- use canonical '' as control character

Note: Regular expression engine presumes UTF-8-encoded patterns Errors: Param in case of conflicting spacing options, Regex in case of regular expression grammar error