Keep the DataFrame Structure after Applying Groupby in Python

Indexing is an important concept in all data analysis and programming. One tricky thing for layman users is that in most cases indexing starts from 0 except in R programming. Even seems counterintuitive in our daily life, it makes total sense in the computer world where the binary 0 and 1 dictates the foundational rule.

When we create a simple dataframe, we can add the index lable, for instance, df = pd.DataFrame(data = d, index = i); or, we can add or alter index later on by executing this one line: i = [‘a’,’b’,’c’,’d’,’e’,’f’,’g’,’h’,’i’,’j’], df.index = i

Index is useful in slicing, segmenting or positioning data. Some commonly used syntaxes are:
df.loc[‘a’ : ‘d’]
df.iloc[0 : 3]
df.ix[rows, columns]
df.ix[:3,[‘col’, ‘test’]]

It’s extensively used in groupby function, where the structure is altered. Depending on what we wish, setting as_index=False or as_index=True, or appending reset_index() in the end. note the following gives all parameters – Series.reset_index(level=None, drop=False, name=None, inplace=False, drop=True indicate to insert index into dataframe columns.

In daily work, we frequently use ‘groupby’ to aggregate data values, but the dataframe will be collapsed after this consolidation, columns other than the aggregated ones are removed. Oftentimes, we need to keep these columns or keep the original structure, and sometimes, to add on extra columns based on ‘groupby’ function. The following two sample codes are set to solve these:

1. when applying groupby, add a column on original dataframe

dfsales1[‘L6_TOTAL’] = dfsales1.groupby([‘DATE’,’L6_ID’])[‘CL6_SALES’].transform(‘sum’)

2. when applying groupby, keep the full structure of a dataframe, by applying a function

def lheap(df):
dftemp = df.nlargest(5, [‘p_market_val_sec’])
return dftemp
test = df.groupby(‘date’, as_index = False).apply(lheap)
test.reset_index()

Note indexing will be altered after the application of groupby, hence, it’s important to be aware and set accordingly if set_index = True or False in real problems.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s