- This project implements the CP-ALS algorithm on Spark, it also implements related vector, matrix and tensor operation on Spark.
- The aim of this project is to utilize the power of RDD to achieve fast iteration while performing decomposition.
- This project is also a exploration of Type-Safe tensor operation, which is enlightened by TensorSafe.
- sbt 0.13.15
- Scala 2.11.11
- Spark 2.1.1
# package without dependencies
sbt packageIt will produce package target/scala-2.11/paraten_2.11-1.0.jar.
# package with all dependencies
sbt assemblyYou will get a big package target/scala-2.11/ParaTen-assembly-1.0.jar.
CP decomposition on Spark
Usage: ParaCP [options]
-s, --shape <value> shape
-r, --rank <value> rank
--maxIter <value> number of iterations of ALS. default: 500
--tol <value> tolerance for the ALS. default: 0.001
--tries <value> tries
-o, --output-dir <dir> output write path.
-i, --input <value> path of input file.-s, --shape, -r, --rank and -i, --input are required.
Use the spark-submit to run ParaCP on cluster,
spark-submit \
--class org.chaomai.paraten.apps.ParaCP \
--master spark://Chaos-MacBook-Pro.local:7077 \
--total-executor-cores=4 \
--executor-memory=2g \
target/scala-2.11/ParaTen-assembly-1.0.jar \
-s 2,2,3,2 -r 5 \
-i hdfs://localhost:9000/user/chaomai/paraten/data/test_dim4_dense.tensor \
-o hdfs://localhost:9000/user/chaomai/paraten/result \
--maxIter 30 --tol 0.1 --tries 2Also you can use ParaCP_cluster_deploy_mode.sh or ParaCP_client_deploy_mode.sh to run it. They accept same parameters.
[dim_1,..,dim_N (tensor)] [rank] [tensor file path] [output path] \
[max iteration] [tolerance] [tries] [master of Spark] \
[total-executor-cores] [executor-memory]
The first one submit with --deploy-mode cluster.
./src/main/scala/org/chaomai/paraten/apps/ParaCP_cluster_deploy_mode.sh \
2,2,3,2 5 \
hdfs://localhost:9000/user/chaomai/paraten/data/test_dim4_dense.tensor \
hdfs://localhost:9000/user/chaomai/paraten/result \
30 0.1 2 \
spark://Chaos-MacBook-Pro.local:6066 \
4 2g- original image
- reconstruct image by factor matrix
Set rank to 1000, iterate until converged. Pickup top k rank according to lambda vector and reconstruct the tensor i.e. the image.
| top k rank | norm | image |
|---|---|---|
| 200 | ||
| 400 | ||
| 600 | ||
| 800 | ||
| 1000 |
- An Investigation of Sparse Tensor Formats for Tensor Libraries. Parker Allen Tew. 2015.
- HaTen2: Billion-scale Tensor Decompositions. Inah Jeon, Evangelos E. Papalexakis, U Kang, Christos Faloutsos. 31st IEEE International Conference on Data Engineering (ICDE) 2015, Seoul, Korea.
- Scalable Tensor Decompositions for Multi-aspect Data Mining. Tamara G. Kolda, Jimeng Sun. Data Mining, 2008. ICDM '08.
- Tensor Decompositions and Applications. Tamara G. Kolda, Brett W. Bader. SIAM Review Volume 51 Issue 3, August 2009.
