Magic Analytics
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About

Aries Research Note

Pandas: reshape data frame

10/2/2016

1 Comment

 
A data frame has its index, columns, and values inside. Any selection operation usually not affect the table's structure, but only "take selected pieces" out. Other operations may change its structure, like "group-by" operation. So how to change the structure back?

In excel, there is a concept of pivot table, which convert one or more columns into index/column, and nicely present the data. This is quite a nice feature and very fast provide analytic insights. Does Pandas support this?

The answer is "for sure!". Here are a few functions very often used in Pandas to manipulate the "shape" of data frame.

1. reset_index / set_index
    Very self-explanatory ... while reset_index change the an index back to a column, set_index move a column into the index.

2. pivot and pivot_table
    It always get confusing (to me) how to do pivot table in Pandas, while Nikolay Grozev's blog provides a very intuitive visualization. I will use one of them here for easy illustration, and it is encouraged to go to his blog for more details.
Picture
Picture
As you can see, pivot_table could considered as a "advanced" pivot, where the table is created with more control on which aggregation function to use, while pivot provides a faster way to just "reshape" the data frame into the one needed.
Picture
3. stack and unstack
   In Nikolay Grozev's blog, this section is also very well illustrated. I borrow on figure here, and the reader is highly recommend to check the details in original blog.
Picture
Let's see how it works in the Titanic data set, this is how it looks like:
Picture
Picture
Picture
Now, changing a data frame shape should not be a problem any more.
1 Comment
Angella's Blog link
7/10/2023 02:07:29 pm

Good reading thiis post

Reply



Leave a Reply.

    Author

    Data Magician

    Archives

    October 2017
    April 2017
    November 2016
    October 2016
    September 2016

    Categories

    All
    Git
    Hive
    Machine Learning
    Matplotlib
    Pandas
    Plotly
    Python
    R
    Spark

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About