The "set" related operation is more like considering the data frame as if it is a "set". Common set operations are: union, intersect, difference. Pandas and PySpark have different ways handling this. In Pandas, since it has the concept of Index, so sometimes the thinking for Pandas is a little bit different from the traditional Set operation. While for Spark, it is quite easy since Spark is so close to SQL, it directly has those keywords implemented So in this round of comparison, Spark is more intuitive than Pandas to handle SQL set related operation.
0 Comments
Leave a Reply. |
AuthorData Magician Archives
October 2017
Categories
All
|