You may use this library in your applications through:
> $SPARK_HOME/bin/spark-shell --packages databricks:spark-redshift:0.3
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "databricks/spark-redshift:0.3"
Otherwise,
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "databricks" % "spark-redshift" % "0.3"
In your pom.xml, add:
<dependencies>
<!-- list of dependencies -->
<dependency>
<groupId>databricks</groupId>
<artifactId>spark-redshift</artifactId>
<version>0.3</version>
</dependency>
</dependencies>
<repositories>
<!-- list of other repositories -->
<repository>
<id>SparkPackagesRepo</id>
<url>http://dl.bintray.com/spark-packages/maven</url>
</repository>
</repositories>
Hadoop input format for Redshift tables unloaded with the ESCAPE option.
Usage in Spark Core:
import com.databricks.spark.redshift.RedshiftInputFormat
val records = sc.newAPIHadoopFile(
path,
classOf[RedshiftInputFormat],
classOf[java.lang.Long],
classOf[Array[String]])Usage in Spark SQL:
import com.databricks.spark.redshift._
// Call redshiftFile() that returns a SchemaRDD with all string columns.
val records: DataFrame = sqlContext.redshiftFile(path, Seq("name", "age"))
// Call redshiftFile() with the table schema.
val records: DataFrame = sqlContext.redshiftFile(path, "name varchar(10) age integer")Some breaking changes were made in version 0.3. Users should make the following changes in their code if they would like to use the 0.3 version:
com.databricks.examples.redshift.input->com.databricks.spark.redshiftSchemaRDD->DataFrameimport com.databricks.examples.redshift.input.RedshiftInputFormat._->import com.databricks.spark.redshift._