Name	Name	Last commit message	Last commit date
Latest commit History 172 Commits
lib	lib
spec	spec
.gitignore	.gitignore
.rspec	.rspec
.rubocop.yml	.rubocop.yml
.travis.yml	.travis.yml
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
Gemfile	Gemfile
LICENSE.txt	LICENSE.txt
README.md	README.md
Rakefile	Rakefile
format_parser.gemspec	format_parser.gemspec

format_parser

is a Ruby library for prying open video, image, document, and audio files. It includes a number of parser modules that try to recover metadata useful for post-processing and layout while reading the absolute minimum amount of data possible.

format_parser is inspired by imagesize, fastimage and dimensions, borrowing from them where appropriate.

Currently supported filetypes:

TIFF, PSD, PNG, MP3, JPEG, GIF, DPX, AIFF, WAV, FDX, MOV, MP4

...with more on the way!

Basic usage

Pass an IO object that responds to read and seek to FormatParser and the first confirmed match will be returned.

match = FormatParser.parse(File.open("myimage.jpg", "rb"))
match.nature        #=> :image
match.format        #=> :jpg
match.width_px      #=> 320
match.height_px     #=> 240
match.orientation   #=> :top_left

If you would rather receive all potential results from the gem, call the gem as follows:

FormatParser.parse(File.open("myimage.jpg", "rb"), results: :all)

You can also optimize the metadata extraction by providing hints to the gem:

FormatParser.parse(File.open("myimage", "rb"), natures: [:video, :image], formats: [:jpg, :png, :mp4], results: :all)

Creating your own parsers

In order to create new parsers, these have to meet two requirements:

Instances of the new parser class needs to respond to a call method which takes one IO object as an argument and returns some metadata information about its corresponding file or nil otherwise.
Instances of the new parser class needs to respond natures and formats accessor methods, both returning an array of symbols. A simple DSL is provided to avoid writing those accessors.
The class needs to register itself as a parser.

Down below you can find a basic parser implementation:

class BasicParser
  include FormatParser::DSL # Adds formats and natures methods to the class, which define
                            # accessor for all the instances.
  
  formats :foo, :baz # Indicates which formats it can read.
  natures :bar       # Indicates which type of file from a human perspective it can read:
                     #      - :audio
                     #      - :document
                     #      - :image
                     #      - :video
  def call(file)
    # Returns a DTO object with including some metadata.
  end

  FormatParser.register_parser_constructor self # Register this parser.

Design rationale

We need to recover metadata from various file types, and we need to do so satisfying the following constraints:

The data in those files can be malicious and/or incomplete, so we need to be failsafe
The data will be fetched from a remote location, so we want to acquire it with as few HTTP requests as possible and with fetches being sufficiently small - the number of HTTP requests being of greater concern due to the fact that we rely on AWS, and data transfer is much cheaper than per-request fees.
The data can be recognized ambiguously and match more than one format definition (like TIFF sections of camera RAW)
The number of supported formats is only ever going to increase, not decrease
The library is likely to be used in multiple consumer applications
The information necessary is a small subset of the overall metadata available in the file

Therefore we adapt the following approaches:

Modular parsers per file format, with some degree of code sharing between them (but not too much). Adding new formats should be low-friction, and testing these format parsers should be possible in isolation
Modular and configurable IO stack that supports limiting reads/loops from the source entity. The IO stack is isolated from the parsers, meaning parsers do not need to care about things like fetches using Range: headers, GZIP compression and the like
A caching system that allows us to ideally fetch once, and only once, and as little as possible - but still accomodate formats that have the important information at the end of the file or might need information from the middle of the file
Minimal dependencies, and if dependencies are to be used they should be very stable and low-level
Where possible, use small subsets of full-feature format parsers since we only care about a small subset of the data
Avoid using C libraries which are likely to contain buffer overflows/underflows - we stay memory safe

Fixture Sources

Unless specified otherwise in this section the fixture files are MIT licensed and from the FastImage and Dimensions projects.

AIFF

fixture.aiff was created by one of the project maintainers and is MIT licensed

WAV

c_11k16bitpcm.wav and c_8kmp316.wav are from Wikipedia WAV, retrieved January 7, 2018
c_39064__alienbomb__atmo-truck.wav is from freesound and is CC0 licensed
c_M1F1-Alaw-AFsp.wav and d_6_Channel_ID.wav are from a McGill Engineering site

MP3

Cassy.mp3 has been produced by WeTransfer and may be used with the library for the purposes of testing

FDX

fixture.fdx was created by one of the project maintainers and is MIT licensed

MOOV

bmff.mp4 is borrowed from the bmff project
Test_Circular MOV files were created by one of the project maintainers and are MIT licensed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

format_parser

Currently supported filetypes:

Basic usage

Creating your own parsers

Design rationale

Fixture Sources

AIFF

WAV

MP3

FDX

MOOV

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 27

Uh oh!

Languages

License

WeTransfer/format_parser

Folders and files

Latest commit

History

Repository files navigation

format_parser

Currently supported filetypes:

Basic usage

Creating your own parsers

Design rationale

Fixture Sources

AIFF

WAV

MP3

FDX

MOOV

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 27

Uh oh!

Languages

Packages