This module provides regular expressions based on Onigmo fork of Oniguruma library. it uses Ruby grammar with several customizable syntax modifications:
- by default, Dao string patterns are mimicked:
%is used instead of\, whitespace characters are ignored outside of[...]groups - one-line comments starting with
#can be used - implicit spacing mode: outside of
[...], a standalone whitespace character or\r\nare interpreted as\\s*, and a pair of equal whitespace characters is interpreted as\\s+
Grammar description can be found in Onigmo/doc/RE.
Currently, Onigmo should be built manually from the source provided with the module. In order to link it statically to regex module, Onigmo should be
configured as ./configure CFLAGS=-fPIC LFLAGS=-fPIC to enable position-independent code (on Windows, the relevant makefiles need to be edited). Consult
Onigmo/README for details as to how to build the library on various platforms.
namespace re
invar class Regex
- .pattern(self: Regex) => string
- .groupCount(self: Regex) => int
- .ignoresCase(self: Regex) => bool
- fetch(self: Regex, target: string, group: int|string = 0, start = 0, end = -1) => string
- search(self: Regex, target: string, start = 0, end = -1) => Match|none
- matches(self: Regex, target: string) => bool
- extract(self: Regex, target: string, matchType: enum<both,matched,unmatched> = $matched) => list<string>
- replace(self: Regex, target: string, format: string, start = 0, end = -1) => string
- scan(self: Regex, target: string, start = 0, end = -1)[found: Match => none|@V] => list<@V>
- replace(self: Regex, target: string, start = 0, end = -1)[found: Match => string] => string
- iter(self: Regex, target: string, start = 0, end = -1) => Iter
invar class Match
- string(self: Match, group: int|string = 0) => string
- size(self: Match, group: int|string = 0) => int
- start(self: Match, group: int|string = 0) => int
- end(self: Match, group: int|string = 0) => int
- .groupCount(self: Match) => int
class Iter
Functions:
- compile(pattern: string) => Regex
- compile(pattern: string, options: enum<strictSpacing;impliedSpacing;ignoreCase;allowComments;useBackslash>) => Regex
Regular expression using Onigmo fork of Oniguruma library with Ruby grammar as backend. See [compile()(#compile) for usage details.
.pattern(self: Regex) => stringString pattern
Note: The pattern is stored in canonical form, i.e. with strict spacing, '' as escape character and without comments
.groupCount(self: Regex) => intNumber of capture groups in the pattern
.ignoresCase(self: Regex) => boolfetch(self: Regex, target: string, group: int|string = 0, start = 0, end = -1) => stringFinds the first match in target in the range [start; end] and returns sub-match specified by group.
Note: For the interpretation of group numbers, see Match
Errors: Param in case of invalid group or matching range
search(self: Regex, target: string, start = 0, end = -1) => Match|noneReturns the first match in target in the range [start; end], or none if no match was found
Errors: Param in case of invalid matching range
matches(self: Regex, target: string) => boolChecks if the entire target is matched by the regex
extract(self: Regex, target: string, matchType: enum<both,matched,unmatched> = $matched) => list<string>Returns all matches in target (or unmatched, or both, depending on matchType)
replace(self: Regex, target: string, format: string, start = 0, end = -1) => stringReplaces all matches in target in the range [start; end] with format string. Returns the entire resulting string. format may contain backreferences in the form '$<group number from 0 to 9>' or '$(<group name>)'; '$$' can be to escape '$'
Errors: Param in case of invalid matching range, Regex in case of invalid backreference
scan(self: Regex, target: string, start = 0, end = -1)[found: Match => none|@V] => list<@V>Iterates over all matches in target in the range [start; end], yielding each match as found. Returns the list of values obtained from the code section
Errors: Param in case of invalid matching range
replace(self: Regex, target: string, start = 0, end = -1)[found: Match => string] => stringIterates over all matches in target, yielding each of them as found. Returns the string formed by replacing each match in target by the corresponding string returned from the code section
Errors: Param in case of invalid matching range
iter(self: Regex, target: string, start = 0, end = -1) => IterReturns for iterator to iterate over all matches in target in the range [start; end].
Note: Changing target has no effect on the iteration process (the iterator will still be bound to the original string)
Errors: Param in case of invalid matching range
Single regular expression match providing information on matched sub-string and individual captured groups.
group parameter in Match methods may either be a group number or its name.
Group number is interpreted the following way:
- group == 0 -- entire matched sub-string
- group > 0 and group <=
groupCount()-- corresponding sub-match - group < 0 or group >
groupCount()-- not permitted
If group is a name, the last group in the pattern with this name is assumed (at least one such group must exist).
string(self: Match, group: int|string = 0) => stringSub-string captured by group
Errors: Param in case of invalid group
size(self: Match, group: int|string = 0) => intSize of the sub-string captured by group
Errors: Param in case of invalid group
start(self: Match, group: int|string = 0) => intStart position of the sub-string captured by group
Errors: Param in case of invalid group
end(self: Match, group: int|string = 0) => intEnd position of the sub-string captured by group
Errors: Param in case of invalid group
.groupCount(self: Match) => intNumber of captured groups
for iterator to iterate over regular expression matches in a string
for(self: Iter, iterator: ForIterator)[](self: Iter, index: ForIterator) => Matchcompile(pattern: string) => Regex
compile(pattern: string, options: enum<strictSpacing;impliedSpacing;ignoreCase;allowComments;useBackslash>) => RegexConstructs regular expression from pattern using specified options (if provided).
Default options mimic Dao string patterns syntax:
- free spacing -- whitespace is ignored outside of '[...]'
- '%' is used as control character
- the pattern is treated as case-sensitive
This behavior can be overridden with the following values of options:
$strictSpacing-- whitespace characters in the pattern are treated 'as is' (canonical behavior)$impliedSpacing-- outside of '[ ... ]', a standalone whitespace character or '\r\n' are interpreted as '\s*', and a pair of equal whitespace characters is interpreted as '\s+'$ignoreCase-- the pattern is treated as case-insensitive$allowComments-- all characters starting from '#' up to '\n' (or end of string) are ignored as comments ('#' can be escaped)$useBackslash-- use canonical '' as control character
Note: Regular expression engine presumes UTF-8-encoded patterns
Errors: Param in case of conflicting spacing options, Regex in case of regular expression grammar error