Indexing is an important concept in all data analysis and programming. One tricky thing for layman users is that in most cases indexing starts from 0 except in R programming. Even seems counterintuitive in our daily life, it makes total sense in the computer world where the binary 0 and 1 dictates the foundational rule.
When we create a simple dataframe, we can add the index lable, for instance, df = pd.DataFrame(data = d, index = i); or, we can add or alter index later on by executing this one line: i = [‘a’,’b’,’c’,’d’,’e’,’f’,’g’,’h’,’i’,’j’], df.index = i
Index is useful in slicing, segmenting or positioning data. Some commonly used syntaxes are:
df.loc[‘a’ : ‘d’]
df.iloc[0 : 3]
It’s extensively used in groupby function, where the structure is altered. Depending on what we wish, setting as_index=False or as_index=True, or appending reset_index() in the end. note the following gives all parameters – Series.reset_index(level=None, drop=False, name=None, inplace=False, drop=True indicate to insert index into dataframe columns.
In daily work, we frequently use ‘groupby’ to aggregate data values, but the dataframe will be collapsed after this consolidation, columns other than the aggregated ones are removed. Oftentimes, we need to keep these columns or keep the original structure, and sometimes, to add on extra columns based on ‘groupby’ function. The following two sample codes are set to solve these:
1. when applying groupby, add a column on original dataframe
dfsales1[‘L6_TOTAL’] = dfsales1.groupby([‘DATE’,’L6_ID’])[‘CL6_SALES’].transform(‘sum’)
2. when applying groupby, keep the full structure of a dataframe, by applying a function
dftemp = df.nlargest(5, [‘p_market_val_sec’])
test = df.groupby(‘date’, as_index = False).apply(lheap)
Note indexing will be altered after the application of groupby, hence, it’s important to be aware and set accordingly if set_index = True or False in real problems.