2024-heraklion-data/notebooks/030_tabular_data/041_window_functions_tutor.ipynb
2024-08-27 15:27:53 +03:00

7.8 KiB

Window functions for tabular data

In [1]:
import pandas as pd

Load experimental data

In [2]:
df = pd.read_csv('timed_responses.csv', index_col=0)
In [3]:
df
Out[3]:
subject_id time (ms) response accuracy
574 3 540 RIGHT 0.04
1190 2 552 LEFT 0.43
1895 2 1036 LEFT 0.36
53 3 257 RIGHT 0.11
158 2 743 RIGHT 0.32
551 3 619 LEFT 0.25
1602 1 43 RIGHT 0.65
413 1 471 LEFT 0.80
785 1 121 LEFT 0.10
1393 2 903 RIGHT 0.33
629 2 353 LEFT 0.17
1829 3 768 RIGHT 0.26
902 1 1093 LEFT 0.34
1486 2 3 RIGHT 0.29

Split-apply-combine operations return one aggregated value per group

In [ ]:
df.groupby('subject_id')['accuracy'].max()
In [ ]:

However, for some calculations we need to have a value per row

For example: for each subject, rank the responses by decreasing accuracy

In [ ]:

In many cases, a window functions is combined with a sorting operation

For example: for each subject, count the number of "LEFT" responses up until any moment in the experiment

In [ ]:

Window functions are also useful to compute changes in the data for each group

In this case, the window function often uses the shift(n) method that lags the data by n rows

In [ ]:

In [ ]:

In [ ]: