A data frame has its index, columns, and values inside. Any selection operation usually not affect the table's structure, but only "take selected pieces" out. Other operations may change its structure, like "group-by" operation. So how to change the structure back? In excel, there is a concept of pivot table, which convert one or more columns into index/column, and nicely present the data. This is quite a nice feature and very fast provide analytic insights. Does Pandas support this? The answer is "for sure!". Here are a few functions very often used in Pandas to manipulate the "shape" of data frame. 1. reset_index / set_index Very self-explanatory ... while reset_index change the an index back to a column, set_index move a column into the index. 2. pivot and pivot_table It always get confusing (to me) how to do pivot table in Pandas, while Nikolay Grozev's blog provides a very intuitive visualization. I will use one of them here for easy illustration, and it is encouraged to go to his blog for more details. As you can see, pivot_table could considered as a "advanced" pivot, where the table is created with more control on which aggregation function to use, while pivot provides a faster way to just "reshape" the data frame into the one needed. 3. stack and unstack In Nikolay Grozev's blog, this section is also very well illustrated. I borrow on figure here, and the reader is highly recommend to check the details in original blog. Let's see how it works in the Titanic data set, this is how it looks like: Now, changing a data frame shape should not be a problem any more.
1 Comment
|
AuthorData Magician Archives
October 2017
Categories
All
|