Skip to content

aks1981/mist

 
 

Repository files navigation

Build Status Build Status GitHub version Maven Central Docker Hub Pulls

Hydrosphere Mist

Hydrosphere Mist is a Multi-tenancy and Multi-user Spark server.

Main features:

  • Serverless. Get abstracted from resource isolation, sharing and auto-scaling.
  • REST HTTP & Messaging (MQTT, Kafka) API for Scala & Python Spark jobs.
  • Compatibility with EMR, Hortonworks, Cloudera, DC/OS and vanilla Spark distributions.
  • Spark MLLib serving that has been moved to spark-ml-serving library and hydro-serving project

It implements Spark Compute as a Service and creates a unified API layer for building enterprise solutions and services on top of a big data stack.

Mist use cases

Discover more Hydrosphere Mist use cases.


Getting Started Guide and user documentation


More Features

  • Spark Contexts orchestration - Cluster of Spark Clusters: manages multiple Spark contexts in separate JVMs or Dockers Cluster of Spark Clusters
  • Realtime low latency serving/scoring for ML Lib models. Moved to spark-ml-serving library and hydro-serving project Mist Local Serving
  • Clear end-user REST API
    POST v2/api/endpoints/weather-forecast?force=true
    {
        lat: “37.777114,
        long: -122.419631
        radius: 100
    }
  • Spark 2.1.1 support!
  • Scala and Python Spark jobs support
  • Support for Spark SQL and Hive
  • High Availability and Fault Tolerance
  • Self Healing after driver program failure
  • Powerful logging

Version Information

Mist Version Scala Version Python Version Spark Version
0.1.4 2.10.6 2.7.6 >=1.5.2
0.2.0 2.10.6 2.7.6 >=1.5.2
0.3.0 2.10.6 2.7.6 >=1.5.2
0.4.0 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.5.0 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.6.5 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.7.0 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.8.0 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.9.1 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.10.0 2.10.6, 2.11.8 2.7.6 >=1.5.2
master 2.10.6, 2.11.8 2.7.6 >=1.5.2

Roadmap


  • Persist job state for self healing
  • Super parallel mode: run Spark contexts in separate JVMs
  • Powerful logging
  • RESTification
  • Support streaming contexts/jobs
  • Reactive API
  • Realtime ML models serving/scoring
  • CLI
  • Web Interface
  • Apache Kafka support
  • AWS ECS cloudformation package
  • AWS EMR cloudformation package
  • Hortonworks Ambari package
  • Kerberos integration
  • DC/OS package
  • Dynamic auto-configurable Spark settings based on jobs history
  • Bi-directional streaming API
  • Spark Structural Streaming API
  • AMQP support

Contact

Please report bugs/problems to: https://github.com/Hydrospheredata/mist/issues.

http://hydrosphere.io/

LinkedIn

Facebook

Twitter

About

Multi-tenancy and Multi-user Spark server

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 96.1%
  • Python 2.2%
  • Shell 1.7%