As long as the data can be loaded fully into memory, Pandas is a great data analytic tool. However, with data amount much bigger Spark comes into the play. Pandas and PySpark DF have different APIs, and it is very easy to get confused or not knowing the best practice. I want to summarize my best practice so that others will take less detour.
0 Comments
Leave a Reply. |
AuthorData Magician Archives
October 2017
Categories
All
|