Magic Analytics
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About

Big Data Analytics Platform

Spark is the emerging super star in big data world. Its in-memory feature works much faster than traditional map-reduce on disk approach, and gradually becomes a to-go platform for big data analytics. Spark provides multiple APIs: Scala, Java, Python, and R. As a personal preference, I works mainly with PySpark, its Python API. 

The key modules in Spark (beyond its traditional RDD data type) are:
1. Data Frame / SQL 
2. ML/MLib (machine learning library)
3. GraphX/GraphFrame (graph analytics) 
4. Streaming

Powered by Create your own unique website with customizable templates.
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About