Magic Analytics
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About

Aries Research Note

Pandas: data frame sub-selection

10/1/2016

0 Comments

 
Finding the right block data out from a data frame was not elegant: one has to use different functions: like query, selection, loc, iloc, or directly use something like "df['x'] > df['y']", which is totally not chain-able.

Here is a few new features under Pandas 0.18.1 to simplify the flow of doing data frame selection.

    
Regardless of how difficult the selection logic is, because of the newly improved .loc, things start look neat!
0 Comments

Pandas: data cleaning functions

10/1/2016

1 Comment

 
Data cleaning is a key part in Pandas. Usually it requires functions such as:
- drop columns
- fill/drop missing data
- drop duplicate rows
- replace data value

Doing this in Pandas is quite straightforward (based on Pandas 0.18.1)

    
Picture
Here are the functions used:
drop, dropna, fillna, drop_duplicates, replace

Simple, but powerful  :)
1 Comment
Forward>>

    Author

    Data Magician

    Archives

    October 2017
    April 2017
    November 2016
    October 2016
    September 2016

    Categories

    All
    Git
    Hive
    Machine Learning
    Matplotlib
    Pandas
    Plotly
    Python
    R
    Spark

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Python
    • Pandas
    • Matplotlib
    • Interactive Visualization
    • Folium
  • Spark
    • DataFrame
  • Machine Learning
    • Classification >
      • Logistic Regression
    • Dimension Reduction
    • Model Explaination
  • Blog
  • About