2024-heraklion-data/exercises/pandas_intro/.ipynb_checkpoints/pandas_intro_solution-checkpoint.ipynb
2024-08-27 15:27:53 +03:00

49 KiB

Exercise: Have a look at the neural data using Pandas

In [1]:
import pandas as pd

# Set some Pandas options: maximum number of rows/columns it's going to display
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 100)

Load electrophysiology data

In [2]:
df = pd.read_csv('../../data/QC_passed_2024-07-04_collected.csv')

1. How many rows/columns does the data set have?

In [3]:
df.shape
Out[3]:
(827, 31)

2. Display the first 5 rows of the DataFrame

In [4]:
df.head()
Out[4]:
OP filename slice cell_ch cell_ID day treatment hrs_incubation repatch hrs_after_OP Rs Rin resting_potential max_spikes Rheobase AP_heigth TH max_depol max_repol membra_time_constant_tau capacitance comments rheo_ramp AP_halfwidth Rheobse_ramp Unnamed: 27 rheos_ramp comment high K concentration RMP_from_char
0 OP230420 23420003.abf S1 1 23420S1c1 D1 TTX 0.0 no 10.416389 6.675643 39.025301 -74.285889 24 200.0 80.749512 -35.278320 336.181641 -60.791016 19.40 510.601767 0 753.380113 1.151009 NaN NaN NaN NaN NaN 8 mM -61.828554
1 OP230420 23420003.abf S1 3 23420S1c3 D1 TTX 0.0 no 10.416389 7.867174 48.728367 -69.573975 26 300.0 78.448486 -32.043457 350.097656 -67.138672 17.30 393.397918 1 585.102837 1.006321 NaN NaN NaN NaN NaN 8 mM -60.460298
2 OP230420 23420003.abf S1 6 23420S1c6 D1 TTX 0.0 no 10.416389 8.820134 35.971082 -54.956055 22 300.0 76.660156 -29.827881 270.629883 -52.246094 14.85 426.098774 3 173.915797 1.266335 NaN NaN NaN NaN NaN 8 mM -59.615979
3 OP230420 23420003.abf S1 7 23420S1c7 D1 TTX 0.0 yes 10.416389 7.269195 39.186101 -69.268799 24 300.0 75.030518 -29.699707 242.553711 -71.411133 17.15 478.273362 4 598.079936 0.994396 NaN NaN NaN NaN NaN 8 mM -61.173839
4 OP230420 23420003.abf S1 8 23420S1c8 D1 TTX 0.0 yes 10.416389 6.000400 31.599917 -70.550537 22 350.0 81.011963 -33.068848 309.448242 -61.401367 16.65 575.513924 5 786.927898 1.182830 NaN NaN NaN NaN NaN 8 mM -60.956350

3. Display the names and dtypes of all the columns

In [5]:
df.dtypes
Out[5]:
OP                           object
filename                     object
slice                        object
cell_ch                       int64
cell_ID                      object
day                          object
treatment                    object
hrs_incubation              float64
repatch                      object
hrs_after_OP                float64
Rs                          float64
Rin                         float64
resting_potential           float64
max_spikes                    int64
Rheobase                    float64
AP_heigth                   float64
TH                          float64
max_depol                   float64
max_repol                   float64
membra_time_constant_tau    float64
capacitance                 float64
comments                     object
rheo_ramp                   float64
AP_halfwidth                float64
Rheobse_ramp                float64
Unnamed: 27                 float64
rheos_ramp                  float64
comment                      object
                            float64
high K concentration         object
RMP_from_char               float64
dtype: object

4. Display the unique values of the high K concentration and of the treatment columns

In [6]:
df['high K concentration'].unique()
Out[6]:
array(['8 mM', '15 mM'], dtype=object)
In [7]:
df['treatment'].unique()
Out[7]:
array(['TTX', 'high K', 'Ctrl', 'wash in high K'], dtype=object)
In [8]:
df['OP'].nunique()
Out[8]:
44

5. Display the main statistics of the max_spikes column

In [9]:
df['max_spikes'].describe()
Out[9]:
count     827.000000
mean       27.920193
std        57.997378
min         0.000000
25%        19.000000
50%        26.000000
75%        33.000000
max      1664.000000
Name: max_spikes, dtype: float64

6. Show all the rows where the max number of spikes is larger than 50

In [10]:
df.loc[df['max_spikes'] > 50]
Out[10]:
OP filename slice cell_ch cell_ID day treatment hrs_incubation repatch hrs_after_OP Rs Rin resting_potential max_spikes Rheobase AP_heigth TH max_depol max_repol membra_time_constant_tau capacitance comments rheo_ramp AP_halfwidth Rheobse_ramp Unnamed: 27 rheos_ramp comment high K concentration RMP_from_char
70 OP231130 23n30003.abf S1 1 23n30S1c1 D1 Ctrl 0.0 yes 11.992778 15.627081 58.666581 -78.060913 55 300.0 88.775635 -41.577148 352.294922 -103.515625 13.80 329.350619 0 NaN 0.811086 351.581719 NaN NaN NaN NaN 8 mM -71.584465
74 OP231130 23n30037.abf S1_D2 2 23n30S1_D2c2 D2 wash in high K 21.0 no 32.699444 7.426442 65.804793 -67.544556 61 250.0 77.716064 -36.505127 246.459961 -74.340820 12.50 200.293398 8; exclude; Rs_end > 30 NaN 0.950158 322.090736 NaN NaN NaN NaN 8 mM -59.579331
262 OP230808 23808003.abf S1 6 23808S1c6 D1 TTX 0.0 no 8.163333 10.394754 106.082649 -83.300781 1664 -300.0 71.081543 -15.057373 237.426758 -69.702148 13.05 98.940861 5 NaN 1.025354 NaN NaN NaN NaN NaN 8 mM -32.684415
321 OP230209 23209012.abf S2 5 23209S2c5 D1 high K 0.0 no -2.874722 8.525730 81.231493 -69.049072 53 100.0 82.275391 -34.912109 365.478516 -98.266602 13.05 141.831642 8 192.024601 0.713136 NaN NaN NaN NaN NaN 8 mM -58.246034
396 OP230810 23810004.abf S1 5 23810S1c5 D1 high K 0.0 yes 5.660833 25.468412 79.043216 -65.155029 55 100.0 85.491943 -43.072510 389.770508 -80.078125 11.45 241.592788 3 224.587500 0.966593 NaN NaN NaN NaN NaN 8 mM -60.860893
397 OP230810 23810004.abf S1 7 23810S1c7 D1 high K 0.0 yes 5.660833 26.756530 74.709503 -64.855957 53 150.0 90.942383 -42.932129 423.950195 -83.007812 9.10 239.316854 5 307.540300 0.965371 NaN NaN NaN NaN NaN 8 mM -61.513494
398 OP230810 23810004.abf S1 8 23810S1c8 D1 high K 0.0 no 5.660833 18.023665 63.532613 -61.413574 55 200.0 84.509277 -39.605713 339.843750 -77.392578 7.10 146.691551 6 199.506700 1.043352 NaN NaN NaN NaN NaN 8 mM -62.291177
558 OP230314 23314003.abf S1 8 23314S1c8 D1 high K 0.0 no 5.940833 22.054204 97.596130 -67.358398 52 100.0 79.431152 -41.333008 325.073242 -111.572266 13.30 186.404790 5 NaN 0.776089 NaN NaN 201.505075 NaN NaN 8 mM -61.035575
646 OP240117 24118004.abf S2_D2 4 24117S2_D2c4 D1 Ctrl 20.0 no 26.424200 17.188385 139.095453 -76.916504 61 100.0 78.436279 -40.686035 316.040039 -95.092773 19.80 201.367598 3 NaN 0.740537 295.539851 NaN NaN NaN NaN 8 mM -59.561161
647 OP240117 24118004.abf S2_D2 5 24117S2_D2c5 D1 Ctrl 20.0 no 26.424200 27.929918 140.091217 -70.422363 56 100.0 82.684326 -44.421387 325.561523 -96.923828 18.85 226.172391 4 NaN 0.769121 207.006900 NaN NaN NaN NaN 8 mM -60.495223

7. Display the main statistics of 'max_spikes', for the rows where high K concentration is 8 mM and 15 mM (separately)

Are the distributions any different?

In [11]:
df.loc[df['high K concentration'] == '8 mM', 'max_spikes'].describe()
Out[11]:
count     474.000000
mean       30.955696
std        75.960740
min         1.000000
25%        21.000000
50%        27.000000
75%        34.000000
max      1664.000000
Name: max_spikes, dtype: float64
In [12]:
df.loc[df['high K concentration'] == '15 mM', 'max_spikes'].describe()
Out[12]:
count    353.000000
mean      23.844193
std       10.519791
min        0.000000
25%       18.000000
50%       24.000000
75%       31.000000
max       48.000000
Name: max_spikes, dtype: float64

8. Display the statistics of max_spikes when high K concentration is 8 mM, and the maximum number of spikes is <= 100

Does that change your conclusion?

In [13]:
df.loc[(df['high K concentration'] == '8 mM') & (df['max_spikes'] <= 100), 'max_spikes'].describe()
Out[13]:
count    473.000000
mean      27.503171
std       10.965493
min        1.000000
25%       21.000000
50%       27.000000
75%       34.000000
max       61.000000
Name: max_spikes, dtype: float64

9. Transform the high K concentration column into a numerical column

a) Discard the last three characters of the columns (' mM')

b) Use .astype(float) to convert to floating point numbers

c) Save the result in a column K (mM)

In [14]:
df['K (mM)'] = df['high K concentration'].str[:-3].astype(float)
In [15]:
df.head()
Out[15]:
OP filename slice cell_ch cell_ID day treatment hrs_incubation repatch hrs_after_OP Rs Rin resting_potential max_spikes Rheobase AP_heigth TH max_depol max_repol membra_time_constant_tau capacitance comments rheo_ramp AP_halfwidth Rheobse_ramp Unnamed: 27 rheos_ramp comment high K concentration RMP_from_char K (mM)
0 OP230420 23420003.abf S1 1 23420S1c1 D1 TTX 0.0 no 10.416389 6.675643 39.025301 -74.285889 24 200.0 80.749512 -35.278320 336.181641 -60.791016 19.40 510.601767 0 753.380113 1.151009 NaN NaN NaN NaN NaN 8 mM -61.828554 8.0
1 OP230420 23420003.abf S1 3 23420S1c3 D1 TTX 0.0 no 10.416389 7.867174 48.728367 -69.573975 26 300.0 78.448486 -32.043457 350.097656 -67.138672 17.30 393.397918 1 585.102837 1.006321 NaN NaN NaN NaN NaN 8 mM -60.460298 8.0
2 OP230420 23420003.abf S1 6 23420S1c6 D1 TTX 0.0 no 10.416389 8.820134 35.971082 -54.956055 22 300.0 76.660156 -29.827881 270.629883 -52.246094 14.85 426.098774 3 173.915797 1.266335 NaN NaN NaN NaN NaN 8 mM -59.615979 8.0
3 OP230420 23420003.abf S1 7 23420S1c7 D1 TTX 0.0 yes 10.416389 7.269195 39.186101 -69.268799 24 300.0 75.030518 -29.699707 242.553711 -71.411133 17.15 478.273362 4 598.079936 0.994396 NaN NaN NaN NaN NaN 8 mM -61.173839 8.0
4 OP230420 23420003.abf S1 8 23420S1c8 D1 TTX 0.0 yes 10.416389 6.000400 31.599917 -70.550537 22 350.0 81.011963 -33.068848 309.448242 -61.401367 16.65 575.513924 5 786.927898 1.182830 NaN NaN NaN NaN NaN 8 mM -60.956350 8.0
In [ ]: