Skip to content

Conversation

@tompng
Copy link
Member

@tompng tompng commented Nov 29, 2025

To replace RDoc::Parser::RipperStateLex

@tompng tompng requested a deployment to fork-preview-protection November 29, 2025 07:48 — with GitHub Actions Waiting
@tompng tompng force-pushed the prism_syntax_highlighter branch from cf98f2b to 870b68f Compare November 29, 2025 08:15
@tompng tompng requested a deployment to fork-preview-protection November 29, 2025 08:15 — with GitHub Actions Waiting
@tompng tompng force-pushed the prism_syntax_highlighter branch from 870b68f to 5db5b5d Compare November 29, 2025 08:16
@tompng tompng requested a deployment to fork-preview-protection November 29, 2025 08:16 — with GitHub Actions Waiting
@tompng tompng force-pushed the prism_syntax_highlighter branch from 5db5b5d to 0c393a5 Compare November 29, 2025 08:42
@tompng tompng requested a deployment to fork-preview-protection November 29, 2025 08:42 — with GitHub Actions Waiting
@tompng tompng force-pushed the prism_syntax_highlighter branch from 0c393a5 to 46cc1f8 Compare November 29, 2025 09:23
@tompng tompng temporarily deployed to fork-preview-protection November 29, 2025 09:23 — with GitHub Actions Inactive
@matzbot
Copy link
Collaborator

matzbot commented Nov 30, 2025

🚀 Preview deployment available at: https://1dae1d57.rdoc-6cd.pages.dev (commit: 46cc1f8)

Copy link
Member

@st0012 st0012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we make Prism a dependency and just remove RipperStateLex?

@tompng
Copy link
Member Author

tompng commented Dec 10, 2025

Why don't we make Prism a dependency and just remove RipperStateLex?

RipperStateLex is still used in parser/ruby.rb for parsing, so we can't remove it now.
This new tokenizer doesn't generate state bits required in parser/ruby.rb, so we can't replace RipperStateLex.

This pull request makes parser/prism_ruby.rb not to depend on RipperStateLex. Tokenize to a compatible token stream so that the same syntax highlighter (TokenStream.to_html) can be used, while trying to make colorization unchanged as possible.
For this constraint, tokenizer logic is a bit complicated than it needs to be:

  • Have Prism token name to Ripper token name conversion
  • Have token squashing which is generally impossible if there is a heredoc. (≒ buggy)

We can change this, but it will also change syntax highlight result. It may also be a relatively large change.

@st0012
Copy link
Member

st0012 commented Dec 10, 2025

So at the moment we have

  • 1 tokenizer using Ripper
  • 1 parser using Ripper
  • 1 parser using Prism

And this PR will add another tokenizer using Prism. Is this correct?

Have Prism token name to Ripper token name conversion

Will we avoid this if we fully migrate to Prism parser?

I think my main concern is that after this we'll have 2 tokenizers and 2 parsers but it's not clear when we'll be able to drop the old ones.
Do we know:

  • Will this change make migrating to Prism easier
  • Will migrating to Prism make this or a similar change simpler

@tompng
Copy link
Member Author

tompng commented Dec 21, 2025

And this PR will add another tokenizer using Prism. Is this correct?

Yes. And two of the Ripper based parser/tokenizer will be unmaintaind/keep unchanged until we drop it.

Have Prism token name to Ripper token name conversion

Will we avoid this if we fully migrate to Prism parser?

Yes. I reconsidered this pull request, it's better to avoid it.

Will this change make migrating to Prism easier

It should be, but the current pull req is not the straight way for this.
I'll try to make a better/ideal colorizer, drop useless compatibility with the old tokenizer, so on.

@tompng tompng marked this pull request as draft December 21, 2025 15:49
@st0012
Copy link
Member

st0012 commented Dec 22, 2025

WDYT about moving coloring to the frontend, after we deprecated darkfish? It'll simplify RDoc's Ruby implementation quite a bit and provide significant speedup to generation (with YJIT enabled in core, it went from 50s to 23s).
If we're happy with the current accuracy on Ruby code highlighting, we can likely achieve that with a small JS highlighter too.

@tompng
Copy link
Member Author

tompng commented Dec 23, 2025

I think syntax highlight is not a bottleneck.
Measurement in my environment was:

scenario time
master(prism_ruby) 31.6 sec
this branch 30.7 sec
removing syntax highlight 29.3 sec

RDoc's C-tokenzier/parser is not a complete parser, so using JS syntax highlighter makes sense to me.
On the other hand, Syntax highlight built with ruby can be a complete, perfect highlighter, and Ruby code is more important for us.
And I also think making Ruby highlighter with JS (and without ruby.wasm) is difficult compared to C highlighter.
At least it needs heredocs parsing and a heuristic to distinguishing operator % / from %[] /regexp/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants