Skip to content

Conversation

@platypii
Copy link

@platypii platypii commented Dec 9, 2024

Uses hyparquet for javascript parquet parsing. It is a small, pure js implementation of parquet parsing with no dependencies. Parquet.js that this replaces is unmaintained and has not been updated in 5+ years.

Fixes #102 and #104 by using a well-maintained parquet library that supports modern parquet files.

I tested this with the parquet file generated by together-python and confirmed that upload works and fixes issue #104.

Let me know if I can help with anything!


Note

Switches parquet parsing to hyparquet and updates parquet file checks to use hyparquet metadata/schema APIs.

  • Parquet handling:
    • Migrate _check_parquet in src/lib/check-file.ts to use hyparquet (asyncBufferFromFile, parquetMetadataAsync, parquetSchema) to read schema and row count.
    • Validate columns via children.map(...name) and sample count via metadata.num_rows; remove parquetjs reader logic.
  • Dependencies:
    • Add hyparquet@1.14.0 and remove parquetjs and its type definitions from package.json.

Written by Cursor Bugbot for commit 6f1787b. This will update automatically on new commits. Configure here.

@platypii
Copy link
Author

@Nutlope? Anyone?

@nicolasembleton
Copy link

This should be straightforward to review.

@platypii
Copy link
Author

platypii commented Jan 2, 2025

@samselikoff?

@platypii
Copy link
Author

Here's a video showing that instructions on together.ai website fail due to parquetjs parsing error. This PR fixes this issue:

together-upload.mp4

@Nutlope @samselikoff anything I can do to help move this along?

@Nutlope
Copy link
Collaborator

Nutlope commented Jan 24, 2025

@platypii thanks so much for reporting and for the PR! We're in the process of fixing some things with the upload. @yogishbaliga, mind taking at this PR when you do your other PR on the upload functionality too?

@platypii
Copy link
Author

@yogishbaliga thoughts? happy to contribute if there's more work that needs done

blainekasten and others added 10 commits November 17, 2025 16:57
…eng-48225-update-file-upload-setup-to-not-be-controlled-by-stainless

feat(api): files.upload supported with custom file checks
BREAKING CHANGE: For the TS SDK the `images.create` is now `images.generate`
BREAKING CHANGE: Change Fine Tuning method name from `download()` to `content()` to align with other namespaces
BREAKING CHANGE: Update method signature for reranking to `rerank.create()`
@blainekasten blainekasten changed the base branch from main to next November 20, 2025 20:54
@blainekasten
Copy link
Contributor

@platypii apologies for the delay on this. I'm actively maintaining these codebases now. I switched the base branch to next - could you address the conflicts and let me know when this is stable again. I would love to land this

@platypii platypii force-pushed the main branch 2 times, most recently from 994b18a to 2a5b13e Compare November 28, 2025 19:18
@platypii
Copy link
Author

platypii commented Nov 28, 2025

@blainekasten thanks, I just rebased onto next branch, this should fix the Tokenized Data walkthrough on the together docs.

FYI there seems to be an issue with installing packages in this repo. When I run yarn install it calls the npm prepare script which includes a call to git-swap.sh which... deletes your entire local git repo and git history 😬 I THINK what you want is to move it from prepare to prepublish? I'm happy to put up another PR for that but it's not my repo so I didn't want to mess with packaging stuff. LMK.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace parquetjs for better deno compatibility?

4 participants