This gem copies data from postgres to redshift. It's especially useful to copy data from postgres to redshift in heroku.
Add this line to your application's Gemfile:
gem 'postgres_to_redshift'And then execute:
$ bundle
Or install it yourself as:
$ gem install postgres_to_redshift
Set your source and target databases, as well as your s3 intermediary.
export POSTGRES_TO_REDSHIFT_SOURCE_URI='postgres://username:password@host:port/database-name'
export POSTGRES_TO_REDSHIFT_TARGET_URI='postgres://username:password@host:port/database-name'
export POSTGRES_TO_REDSHIFT_TARGET_SCHEMA='testing-data'
export S3_DATABASE_EXPORT_ID='yourid'
export S3_DATABASE_EXPORT_KEY='yourkey'
export S3_DATABASE_EXPORT_BUCKET='some-bucket-to-use'
export REDSHIFT_INCLUDE_TABLES='table-pattern-to-include1,table-pattern-to-include2'
postgres_to_redshiftIt is possible to run an import that will pick up only records that have updated sine the last run of the import. It has the following caveats:
- Does not apply deletions to the target table
- Requires that the source table has either an
updated_atorcreated_atfield
Should you wish to enable incremental mode, set the following ENV:
export POSTGRES_TO_REDSHIFT_INCREMENTAL=trueIt will record the start time of the last import in a local file and will import changes on or after that start time for subsequent imports.
For tables that cannot run incrementally, please specify the tables in this variable, and they will always be run in "full" mode:
export TABLES_ALWAYS_FULL_IMPORT='table-that-has-no-primary-id,table-that-has-no-updated_at'It is possible to run the import in dry run mode whereby the import will run, but will roll back any changes to the target tables.
export POSTGRES_TO_REDSHIFT_DRY_RUN=trueIf an error is encountered during an import it will be handled as follows:
- Incremental imports: The import will be rolled back and retried from the beginning
- Full imports: The import will be rolled back to the previous table and the current table's import will be retried
An import will be attempted three times before giving up and raising the exception to the caller.
For an incremental import, the entire import process is performed in one database transaction to ensure that the data remains in a consistent state while the import is running as it is assumed that the incremental import will be running during business hours moving a relatively small amount of data. For a full import, each table is imported in its own transaction as it is assumed that the full import is running outside of business hours and would be moving too large a volume of data to be performed in a single transaction.
- Fork it ( https://github.com/kitchensurfing/postgres_to_redshift/fork )
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request