Spark is the emerging super star in big data world. Its in-memory feature works much faster than traditional map-reduce on disk approach, and gradually becomes a to-go platform for big data analytics. Spark provides multiple APIs: Scala, Java, Python, and R. As a personal preference, I works mainly with PySpark, its Python API.
The key modules in Spark (beyond its traditional RDD data type) are:
1. Data Frame / SQL
2. ML/MLib (machine learning library)
3. GraphX/GraphFrame (graph analytics)
4. Streaming
The key modules in Spark (beyond its traditional RDD data type) are:
1. Data Frame / SQL
2. ML/MLib (machine learning library)
3. GraphX/GraphFrame (graph analytics)
4. Streaming