Skip to content

Possible to return empty string rather than None for the zero_or_more case? #22

@lmmx

Description

@lmmx

Hi there, I was very pleased to find a solution to the inability to generate a regex for (.*?) in capture groups via the parse library, only (.+?). I feel it's a shame that the libraries could not be merged, but such is open source.

I've studied your docs and comments on the other repo and written out test cases for the behaviour I'm after.

I've only managed to make "optional strings" (nullable strings, Union[str,None]) whereas what I really want is "any width strings" (length 0+, str).

Here's the code I wrote to achieve it:

from parse import with_pattern

from parse_type.cfparse import Parser


def check(parser: Parser, schema: str, expected: list[str], /) -> None:
    """Validate the parsed field values against their expected values."""
    result = parser.parse(schema)
    try:
        assert result is not None, f"Parse failed for {schema!r} ({expected=})"
        values = [result[f] or "" for f in parser.named_fields]
        assert values == expected, f"Parsed {schema!r} as {values} ({expected=})"
    except AssertionError as exc:
        print(f"  F {exc}")
    else:
        print(f"  P {schema!r} ---> {result}")


@with_pattern(r".+")
def parse_str(text: str) -> str:
    return text


extra_types = {"Stringlike": parse_str}

parser = Parser("-{content:Stringlike?}", extra_types=extra_types)
print(f"EXPR {parser._expression}")
check(parser, "-hello world", ["hello world"])
check(parser, "-", [""])

print()

parser = Parser("-{a:Stringlike?} {b:Stringlike?}", extra_types=extra_types)
print(f"EXPR {parser._expression}")
check(parser, "-A B", ["A", "B"])
check(parser, "-A ", ["A", ""])  # ["A", ""]
check(parser, "- B", ["", "B"])  # ["", "B"]
check(parser, "- ", ["", ""])  # ["", ""]

Which results in

EXPR -(?P<content>(.+)?)
  P '-hello world' ---> <Result () {'content': 'hello world'}>
  P '-' ---> <Result () {'content': None}>

EXPR -(?P<a>(.+)?) (?P<b>(.+)?)
  P '-A B' ---> <Result () {'a': 'A', 'b': 'B'}>
  P '-A ' ---> <Result () {'a': 'A', 'b': None}>
  P '- B' ---> <Result () {'a': None, 'b': 'B'}>
  P '- ' ---> <Result () {'a': None, 'b': None}>

Note that in my code I extract the field value or "" so I can test against lists of strings including the empty string rather than None.

What I would really like here is to eliminate that or statement, I really just want strings.

I suspect that the place to do so would be to hook into the TypeBuilder but I'm falling very far down the rabbit hole at this point! If you could guide me I would appreciate it greatly :-)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions