Magic Analytics
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About

Aries Research Note

PySpark vs. Pandas (Part 4: set related operation)

10/24/2016

0 Comments

 
The "set" related operation is more like considering the data frame as if it is a "set". Common set operations are: union, intersect, difference. Pandas and PySpark have different ways handling this. 

In Pandas, since it has the concept of Index, so sometimes the thinking for Pandas is a little bit different from the traditional Set operation. 

    
While for Spark, it is quite easy since Spark is so close to SQL, it directly has those keywords implemented

    
So in this round of comparison, Spark is more intuitive than Pandas to handle SQL set related operation. 
0 Comments



Leave a Reply.

    Author

    Data Magician

    Archives

    October 2017
    April 2017
    November 2016
    October 2016
    September 2016

    Categories

    All
    Git
    Hive
    Machine Learning
    Matplotlib
    Pandas
    Plotly
    Python
    R
    Spark

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About