Comparing changes

Current (April '24) attempt to refactor the text reader. Two goals: * Reduce lines of code to 50% (roughly) and complexity to 25% * Increase perf of impl in scope by 4 - 5x In the end I want this to be performant and maintainable as we look to add 1.1 functionality in the future. My first attempt at this was the protonic POC: `protonic-poc` What I has changed since then: * Refactored binary parser to context frame holding parser pattern * Implemented basic SliceableBuffer used in the above * Understand lazy IonThunkEvent values better Given what I know now, I think this should: * Use some parser combinator _pattern_ to reduce boilerplate * Lex _eagerly_ but lightly, and use IonThunkEvents for lazy "parsing" * Use lookup tables judiciously Next steps: * Complete SliceableBuffer methods and tests * Make minimum of Symbols, Lists and Structs work and test

This change updates the parser combinators to use the SliceableBuffer instead of the BufferContext as before. We're starting with just the minimal set to get off the ground with the text parser.

Some fairly savage hacking at both the SliceableBuffer and protons abstractions. It works and I _think_ it's the right contracts for the components but not totally sure. I think at a minimum I need to fix the EOF marking so that it is clearly correct for all cases. That means an actual flag. But I don't want to have the parsers or the buffer have to track that, it should just be in the parser. I need to track depth as context anyway, so we can have the parse methods in the reader track that as well.

Pretty savage hacking at SliceableBuffer and protons abstractions. Many todos, but have basics working and some momentum. Known todos: * Annotations * Comments * Underscores in Ints * Decimals and Floats * Timestamps * Long quoted strings * Blobs/Clobs * Operator parsing in Sexps * Typed nulls

* Added back eof() as it just makes completeness handling clearer * Fixed protons I'd broken * Added and uncommented tests * Made sure reader_text2 works with changes

I'm putting this on the shelf for a bit. Basically: I did some stuff to move this along, I ran it against my own "hkc" dataset and the results were not great. Specifically: the existing pure-python text parser does about .3 ops/second whereas this code does .4. And it's not complete. Struct parsing is broken becuase as of now each of the container parsers is stateless, but the parsers themselves are not re-entrant. So you effectively can't have containers inside of containers. It took me about 3 - 4 heads down days to get here and I suspect it would take about as much to productionize it. Given what I saw from messing around, simply changing out the main value alt for a table (or reading only once and passing the data?) would likely have a significant impact on performance. Changes: * Added up to day precision Timestamp parsing * Broken Semantically, but "working" Decimal Parsing * Untested formally but attempted "table" proton * Hacked up ability to run Refactor against tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Uh oh!

Commits on Apr 5, 2024

Commits on Apr 19, 2024

Commits on Apr 22, 2024

Commits on Apr 23, 2024

Commits on Apr 24, 2024

Commits on Apr 26, 2024

Commits on Apr 29, 2024

This comparison is taking too long to generate.

Uh oh!