Higher level languages, such as Ruby, make interacting with CSV (Comma Separated Values) files trivial. Even so, this library provides a very simple object/CSV mapper that allows you to fully interact with CSV's in a declarative way. Locking in common patterns, even in higher level languages, is important in large codebases. Using a library, such as this, will help ensure standardization around CSV interaction.
To install through Rubygems:
gem install install bumblebee
You can also add this to your Gemfile:
bundle add bumblebee
Imagine the following CSV:
| id | name | dob | phone |
|---|---|---|---|
| 1 | Matt | 1901-02-03 | 555-555-5555 |
| 2 | Nick | 1921-09-03 | 444-444-4444 |
| 3 | Sam | 1932-12-12 | 333-333-3333 |
Using the following column configuration:
columns = %i[id name dob phone]We could parse this data and turn it into hashes:
objects = Bumblebee::Template.new(columns: columns).parse(data)Then objects is this array of hashes:
[
{ id: '1', name: 'Matt', dob: '1901-02-03', phone: '555-555-5555' },
{ id: '2', name: 'Nick', dob: '1921-09-03', phone: '444-444-4444' },
{ id: '3', name: 'Sam', dob: '1932-12-12', phone: '333-333-3333' }
]Note: Data, in this case, would be the CSV file contents in string format.
If our headers are not a perfect 1:1 match to our object, such as:
| ID # | First Name | Date of Birth | Phone # |
|---|---|---|---|
| 1 | Matt | 1901-02-03 | 555-555-5555 |
| 2 | Nick | 1921-09-03 | 444-444-4444 |
| 3 | Sam | 1932-12-12 | 333-333-3333 |
Then we can explicitly map those as:
columns = {
'ID #' => :id,
'First Name' => :name,
'Date of Birth' => :dob,
'Phone #' => :phone
}Let's say we have the following data which we want to create a CSV from:
objects = [
{
id: 1,
name: { first: 'Matt' },
demo: { dob: '1901-02-03' },
contact: { phone: '555-555-5555' }
},
{
id: 2,
name: { first: 'Nick' },
demo: { dob: '1921-09-03' },
contact: { phone: '444-444-4444' }
},
{
id: 3,
name: { first: 'Sam' },
demo: { dob: '1932-12-12' },
contact: { phone: '333-333-3333' }
}
}We could create a flat-file CSV:
| ID # | First Name | Date of Birth | Phone # |
|---|---|---|---|
| 1 | Matt | 1901-02-03 | 555-555-5555 |
| 2 | Nick | 1921-09-03 | 444-444-4444 |
| 3 | Sam | 1932-12-12 | 333-333-3333 |
Using the following column config:
columns = {
'ID #' => :id,
'First Name': {
property: :first,
through: :name
},
'Date of Birth': {
property: :dob,
through: :demo
},
'Phone #': {
property: :phone,
through: :contact
}
}And executing the following:
csv = Bumblebee::Template.new(columns: columns).generate(objects)The above columns config would work both ways, so if we received the CSV, we could parse it to an array of nested hashes.
You can also pass in built-in or custom functions that can do the value formatting. For example:
columns = {
'ID #': {
property: :id,
to_object: :integer
},
'First Name': {
property: :first,
through: :name,
to_csv: ->(v) { v.to_s.upcase }
},
'Date of Birth': {
property: :dob,
through: :demo,
to_object: { type: :date, nullable: true }
},
'Phone #': {
property: :phone,
through: :contact
}
}would ensure:
- id is an integer data type when parsed
- the CSV has only upper-case
First Namevalues - dob is a date data type when parsed
Other formatting functions that can be used for to_object and/or to_csv:
- bigdecimal: converts to BigDecimal (nullable, non-nullable default is 0)
- boolean: converts to flexible boolean (nullable; non-nullable default is false). 1,t,true,y,yes all parse to true, 0,f,false,n,no all parse to false
- date: converts to Date (nullable; non-nullable default is 1900-01-01)
- integer: converts to Fixnum (nullable, non-nullable default is 0)
- join: array is joined by separator option (defaults to comma)
- float: converts to Float (nullable, non-nullable default is 0.0f)
- function: custom lambda function (input is the resolved value, output of lambda will be used resolved value)
- pluck_join: map the sub-property (sub_property option) then join them with separator (defaults to comma)
- pluck_split: array is split by separator option (defaults to comma), then new object (object_class option) is created and sub-property (sub_property option) set.
- split: array is split by separator option (defaults to comma)
- string: calls to_s method on the value
Pluck join and pluck split comes in handy when you have an array of objects and would like to:
- map one value from each object and join it (in order to output in a CSV)
- take a string value, split it, the map each value to a new object (in order to parse as objects)
Take this input and configuration for example:
objects = [
{
id: 1,
name: { first: 'Matt' },
demo: { dob: '1901-02-03' },
contact: { phone: '555-555-5555' },
children: [ { id: 9, name: 'Spunky' }, { id: 10, name: 'Dunker' } ]
},
{
id: 2,
name: { first: 'Nick' },
demo: { dob: '1921-09-03' },
contact: { phone: '444-444-4444' },
children: [ { id: 11, name: 'Bonzi' }, { id: 12, name: 'Buddy' } ]
},
{
id: 3,
name: { first: 'Sam' },
demo: { dob: '1932-12-12' },
contact: { phone: '333-333-3333' }
}
]
columns = {
'ID #': {
property: :id,
to_object: :integer
},
'Children ID #s': {
property: :children,
to_csv: { type: :pluck_join, separator: ';', sub_property: :id },
to_object: { type: :pluck_split, separator: ';', sub_property: :id },
}
}Generating a CSV:
csv = Bumblebee::Template.new(columns: columns).generate(objects)would output:
| ID # | Children ID #s |
|---|---|
| 1 | 9;10 |
| 2 | 11;12 |
Parsing a CSV:
objects = Bumblebee::Template.new(columns: columns).parse(csv)would output:
objects = [
{
id: 1,
children: [ { id: 9 }, { id: 10 } ]
},
{
id: 2,
children: [ { id: 11 }, { id: 12 } ]
},
{
id: 3
}
]Hash is the default return type when parsing a CSV. You can change this by providing a Hash-like class:
objects = Bumblebee::Template.new(columns: columns, object_class: OpenStruct).parse(csv)Objects will now be an array of OpenStruct objects instead of Hash objects.
Note: you must also specify this in pluck_split:
columns = {
'ID #': {
property: :id,
to_object: :integer
},
'Children ID #s': {
property: :children,
to_csv: { type: :pluck_join, separator: ';', sub_property: :id },
to_object: { type: :pluck_split, separator: ';', sub_property: :id, object_class: OpenStruct },
}
}The two main methods:
- Template#generate
- Template#parse
also accept custom options that Ruby's CSV::new accepts. The only caveat is that Bumblebee needs headers for its mapping, so it overrides the header options.
You can choose to pass in a block for template/column specification if you would rather prefer a code-first approach over a configuration-first approach.
csv = Bumblebee::Template.new do |t|
t.column 'ID #', property: :id,
to_object: :integer
t.column 'First Name', property: :first,
through: :name
end.generate(objects)
objects = Bumblebee::Template.new do |t|
t.column 'ID #', property: :id,
to_object: :integer
t.column 'First Name', property: :first,
through: :name
end.parse(data)Another option is to subclass Template and declare your columns at the class-level:
class PersonTemplate < Bumblebee::Template
column 'ID #', property: :id,
to_object: :integer
column 'First Name', property: :first,
through: :name,
to_object: :pluck_split
end
template = PersonTemplate.new
csv = template.generate(objects)
objects = template.parse(data)The preceding examples showed three ways to declare columns, and each is additive to the next (in the following order):
- Class level (parent-first)
- Argument level (passed into constructor)
- Block level
To illustrate all three:
class PersonTemplate < Bumblebee::Template # first
column 'ID #', property: :id,
to_object: :integer
column 'First Name', property: :first,
through: :name,
to_object: :pluck_split
end
columns = {
'Middle Name': {
property: :middle
}
}
template = PersonTemplate.new(columns: columns) do |t| # second
t.column 'Last Name', property: :last # third
endWhen executed to generate a CSV, the columns would be (in order): ID #, First Name, Middle Name, Last Name.
This library, currently, only supports UTF-8. You can choose to force the inclusion the UTF-8 byte order mark, for example:
csv = Bumblebee::Template.new(columns: columns).generate(objects, bom: true)
# csv will now start with "\xEF\xBB\xBF"UTF-8 byte order marks will also be ignored while parsing.
Basic steps to take to get this repository compiling:
- Install Ruby (check bumblebee.gemspec for versions supported)
- Install bundler (gem install bundler)
- Clone the repository (git clone git@github.com:bluemarblepayroll/bumblebee.git)
- Navigate to the root folder (cd bumblebee)
- Install dependencies (bundle)
To execute the test suite run:
bundle exec rspec spec --format documentation
Alternatively, you can have Guard watch for changes:
bundle exec guard
Also, do not forget to run Rubocop:
bundle exec rubocop
Note: ensure you have proper authorization before trying to publish new versions.
After code changes have successfully gone through the Pull Request review process then the following steps should be followed for publishing new versions:
- Merge Pull Request into master
- Update
lib/bumblebee/version.rbusing semantic versioning - Install dependencies:
bundle - Update
CHANGELOG.mdwith release notes - Commit & push master to remote and ensure CI builds master successfully
- Run
bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the.gemfile to rubygems.org.
Everyone interacting in this codebase, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.
This project is MIT Licensed.