2024-heraklion-data/exercises/tabular_window_functions/.ipynb_checkpoints/window_functions_solution-checkpoint.ipynb
2024-08-27 15:27:53 +03:00

17 KiB

Exercise: For each patcher, compute the average number of days they waited between experiments

Here is how to proceed

  1. Use a window function to compute the number of days that elapse between experiment (i.e., the distance between date), for each patcher. Add that as a new column, 'days from prev'
  2. Compute the average 'days from prev' per patcher

With your new awesome vectorization skills, it should take two lines!

In [1]:
import pandas as pd

# Set some Pandas options: maximum number of rows/columns it's going to display
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 100)

Load the neural data

In [61]:
df = pd.read_csv('shuffled_QC_passed_2024-07-04_collected_v1.csv', parse_dates=['date'])
In [62]:
df.head()
Out[62]:
OP patcher date slice cell_ch cell_ID day treatment hrs_incubation repatch hrs_after_OP Rs Rin resting_potential max_spikes Rheobase AP_heigth TH max_depol max_repol membra_time_constant_tau capacitance comments rheo_ramp AP_halfwidth Rheobse_ramp Unnamed: 27 rheos_ramp comment high K concentration RMP_from_char tissue_source area patient_age
0 OP211209 Verji 2024-03-13 S2 8 21d10S2c8 D1 Ctrl 0.0 no 13.298889 14.470281 166.878916 -67.962646 34 50.0 83.190918 -36.132812 302.124023 -72.631836 20.75 152.623120 NaN NaN 0.966102 NaN NaN NaN NaN NaN 15 mM -59.101382 Bielefeld temporal 27.0
1 OP221024 Verji 2024-06-16 S3 4 22o24S3c4 D1 Ctrl 0.0 no 23.964167 11.521243 137.820797 -71.789551 41 50.0 93.322754 -42.968750 465.820312 -83.740234 14.85 124.324170 17 NaN 0.959995 NaN NaN NaN NaN NaN 8 mM -62.265689 Bielefeld temporal 42.0
2 OP230810 Verji 2024-05-14 S2 5 23810S2c5 D1 TTX 0.0 no 7.043056 10.120637 67.739416 -70.629883 47 100.0 91.973877 -37.817383 415.771484 -107.666016 13.00 228.654858 10 402.013400 0.760052 NaN NaN NaN NaN NaN 8 mM -61.329228 Mitte temporal 63.0
3 OP230209 Verji 2024-04-27 S2_D2 3 23209S2_D2c3 D2 high K 25.0 no 21.848333 7.745503 43.009610 -68.371582 31 500.0 67.163086 -29.284668 212.036133 -61.645508 11.05 215.784505 30 672.202407 0.958735 NaN NaN NaN NaN NaN 8 mM -62.577472 Bielefeld temporal 63.0
4 OP240321 Verji 2024-04-11 S2 4 24321S2c4 D1 Ctrl 0.0 no 11.530278 7.632941 32.884808 -52.453613 21 200.0 84.008789 -36.785889 403.442383 -71.899414 14.80 695.791105 8 NaN 1.063838 324.520817 NaN NaN NaN NaN 8 mM -63.149769 Bielefeld temporal 31.0
In [64]:
df['days from prev'] = df['date'] - df.sort_values('date').groupby('patcher')['date'].shift()
In [68]:
df.sort_values(['patcher', 'date'])[['patcher', 'date', 'days from prev']].head()
Out[68]:
patcher date days from prev
251 Rosie 2024-01-01 NaT
102 Rosie 2024-01-04 3 days
355 Rosie 2024-01-04 0 days
47 Rosie 2024-01-05 1 days
477 Rosie 2024-01-05 0 days
In [74]:
df.groupby('patcher')['days from prev'].mean()
Out[74]:
patcher
Rosie   1 days 05:45:15.789473684
Verji   0 days 10:53:42.269807280
Name: days from prev, dtype: timedelta64[ns]
In [ ]:

In [ ]:

In [ ]: