myre is a composable regular expression library that transforms regex patterns into first-class objects supporting algebraic operations. It provides a human-friendly way to combine, filter, and transform pattern matching in Python, maintaining full compatibility with the standard re module interface.
- Algebraic Operations: Combine patterns using
|(OR),&(AND),+(SEQUENCE),/(SPLIT),^(XOR/EXCLUDE),@(MASK) - Type-Safe: Protocol-based design with full type hints
- re-Compatible: Drop-in replacement for standard
remodule patterns - Composable: Patterns are immutable objects that can be combined like mathematical sets
- Zero Dependencies: Only requires
typing-extensionsfor modern type hints
pip install myreimport myre
# OR operation: match any pattern (default)
pattern = myre.compile(r"hello", r"world", r"foo")
for match in pattern.finditer("hello world foo"):
print(match.group()) # Prints: hello, world, foo
# AND operation: all patterns must match
from myre import Mode
pattern = myre.compile(r"hello", r"world", mode=Mode.ALL)
for match in pattern.finditer("hello world"):
print(match.group()) # Prints: hello world
# SEQUENCE operation: match patterns in order
pattern = myre.compile(r"hello") + myre.compile(r"world")
for match in pattern.finditer("hello world"):
print(match.group()) # Prints: hello world
# SPLIT operation: split by delimiter, match pattern in each segment
pattern = myre.compile(r"\d{2}") / r"[,-]"
for match in pattern.finditer("12,34-56"):
print(match.group()) # Prints: 12, 34, 56 (each with correct position)
# XOR operation: match pattern but exclude another
pattern = myre.compile(r"h\wllo") ^ r"ell"
for match in pattern.finditer("Hello Hallo Hillo"):
print(match.group()) # Prints: Hallo, Hillo (excludes "Hello")
# MASK operation: ignore specific patterns by masking with placeholders
pattern = myre.compile(r"test_{3}value") @ (r"\d+", "_")
for match in pattern.finditer("test123value"):
print(match.group()) # Prints: test123valueMatches any of the given patterns, avoiding overlapping matches:
from myre import Mode
pattern = myre.compile(r"abc", r"bcd", r"cde", mode=Mode.ANY)
# or simply: pattern = myre.compile(r"abc", r"bcd", r"cde")
for match in pattern.finditer("abc bcd cde"):
print(match.group()) # Non-overlapping matchesRequires all patterns to match somewhere in the search range. Returns a combined match:
pattern = myre.compile(r"hello", r"world", mode=Mode.ALL)
for match in pattern.finditer("hello world"):
print(match.group()) # "hello world"Note: MatchALL collects the first match from each pattern and combines them into a single match result.
| Operator | Name | Description | Example |
|---|---|---|---|
| |
OR | Match any pattern | p1 | p2 → MatchAny(p1, p2) |
& |
AND | Match all patterns | p1 & p2 → MatchALL(p1, p2) |
+ |
SEQUENCE | Match patterns in order | p1 + p2 → MatchSeq(p1, p2) |
/ |
SPLIT | Split by delimiter, match in segments | p1 / p2 → SplitMatch(p1, p2) |
^ |
XOR | Exclude matches containing deny pattern | p1 ^ p2 → Match p1, exclude if text contains p2 |
@ |
MASK | Mask patterns with placeholders | p1 @ (mask, placeholder) → Replace mask with placeholder, match p1 |
^ (XOR): Currently operates at the string level - if the entire search string contains the deny pattern, all matches are rejected. This is different from traditional set XOR.
@ (MASK): Replaces matched text with placeholder characters (preserving length), then matches the pattern. Returns the original text (not masked text). Choose a placeholder that works with your pattern:
- Use
'x'if pattern usesx+orx{N}to match placeholders - Use
'_'if pattern uses_{3}etc. - Default is
'.'(dot character)
All pattern objects implement the PatternLike protocol:
class PatternLike(Protocol):
def search(self, string: str, pos: int = 0, endpos: int = sys.maxsize) -> Optional[MatchLike]: ...
def findall(self, string: str, pos: int = 0, endpos: int = sys.maxsize) -> List[str]: ...
def finditer(self, string: str, pos: int = 0, endpos: int = sys.maxsize) -> Iterator[MatchLike]: ...TODO: Standard library methods not yet implemented:
match()- Match at the beginning of the stringfullmatch()- Match the entire stringsplit()- Split string by pattern occurrencessub()/subn()- Replace pattern matchesflags- Access pattern flagspattern- Access pattern stringgroupindex- Access group name mapping
All match objects implement the MatchLike protocol:
class MatchLike(Protocol):
re: PatternLike
string: str
def start(self, group: int | str = 0) -> int: ...
def end(self, group: int | str = 0) -> int: ...
def span(self, group: int | str = 0) -> Tuple[int, int]: ...
def group(self, group: int | str = 0) -> str: ...
def groups(self) -> tuple[str, ...]: ...TODO: Standard library methods not yet implemented:
groupdict()- Return dict of named groupsexpand()- Expand template using groupslastindex- Last matched group indexlastgroup- Last matched group namepos/endpos- Search boundaries
@dataclass
class Base:
p_trim: PatternLike # Pre-process filter
p_deny: PatternLike # Post-process filter
def finditer(self, string: str, pos: int = 0, endpos: int = sys.maxsize) -> Iterator[MatchLike]
def findall(self, string: str, pos: int = 0, endpos: int = sys.maxsize) -> List[str]
def search(self, string: str, pos: int = 0, endpos: int = sys.maxsize) -> Optional[MatchLike]@dataclass
class MatchAny(Base):
patterns: tuple[PatternLike, ...]
@classmethod
def compile(cls, *patterns: Any, flag: int = 0) -> Self@dataclass
class MatchALL(MatchAny):
patterns: tuple[PatternLike, ...]
@classmethod
def compile(cls, *patterns: Any, flag: int = 0) -> Selfimport re
import myre
# Mix regex strings and compiled patterns
regex1 = re.compile(r"hello")
pattern = myre.compile(regex1, r"world", flag=re.IGNORECASE)# Complex pattern: (hello OR world) BUT NOT (foo OR bar)
base = myre.compile(r"hello", r"world")
exclude = myre.compile(r"foo", r"bar")
pattern = base ^ exclude# Mask punctuation before matching
pattern = myre.compile(r"\bhello\b") @ (r"[.,!?;:]", " ")
for match in pattern.finditer("Hello, world! Hello."):
print(match.group()) # Prints both "Hello," and "Hello."The library is built on three architectural layers:
- Protocol Layer (
protocol.py): Defines structural types for pattern/match compatibility - Pattern Layer (
pattern.py): Implements algebraic operations using Template Method pattern - Match Layer (
match.py): Provides position-aware match result wrappers
- Template Method:
Base.finditer()orchestrates the matching pipeline - Protocol-based Polymorphism: Structural subtyping over nominal inheritance
- Immutable Operators: All operations create new objects via
deepcopy
# Install dependencies
uv sync --extra dev
# Run tests
pytest tests/
# Type checking
mypy myre/
# Linting and formatting
ruff check .
ruff format .Contributions are welcome! Please ensure:
- All tests pass:
pytest tests/ - Type checking passes:
mypy myre/ - Code is formatted:
ruff format .
MIT License - see LICENSE for details.
Inspired by the need for composable, human-readable regular expressions in Python.