Exercise: Add experiment information to electrophysiology data¶

In [1]:

import pandas as pd

# Set some Pandas options: maximum number of rows/columns it's going to display
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 100)

Load electrophysiology data¶

In [2]:

df = pd.read_csv('../../data/QC_passed_2024-07-04_collected.csv')
info = pd.read_csv('../../data/op_info.csv')

In [3]:

df.head()

Out[3]:

	OP	filename	slice	cell_ch	cell_ID	day	treatment	repatch	hrs_after_OP	Rs	Rin	resting_potential	max_spikes	Rheobase	AP_heigth	TH	max_depol	max_repol	membra_time_constant_tau	capacitance	comments	rheo_ramp	AP_halfwidth	Rheobse_ramp	Unnamed: 27	rheos_ramp	comment		high K concentration	RMP_from_char
0	OP230420	23420003.abf	S1	1	23420S1c1	D1	TTX	no	10.416389	6.675643	39.025301	-74.285889	24	200.0	80.749512	-35.278320	336.181641	-60.791016	19.40	510.601767	0	753.380113	1.151009	NaN	NaN	NaN	NaN	NaN	8 mM	-61.828554
1	OP230420	23420003.abf	S1	3	23420S1c3	D1	TTX	no	10.416389	7.867174	48.728367	-69.573975	26	300.0	78.448486	-32.043457	350.097656	-67.138672	17.30	393.397918	1	585.102837	1.006321	NaN	NaN	NaN	NaN	NaN	8 mM	-60.460298
2	OP230420	23420003.abf	S1	6	23420S1c6	D1	TTX	no	10.416389	8.820134	35.971082	-54.956055	22	300.0	76.660156	-29.827881	270.629883	-52.246094	14.85	426.098774	3	173.915797	1.266335	NaN	NaN	NaN	NaN	NaN	8 mM	-59.615979
3	OP230420	23420003.abf	S1	7	23420S1c7	D1	TTX	yes	10.416389	7.269195	39.186101	-69.268799	24	300.0	75.030518	-29.699707	242.553711	-71.411133	17.15	478.273362	4	598.079936	0.994396	NaN	NaN	NaN	NaN	NaN	8 mM	-61.173839
4	OP230420	23420003.abf	S1	8	23420S1c8	D1	TTX	yes	10.416389	6.000400	31.599917	-70.550537	22	350.0	81.011963	-33.068848	309.448242	-61.401367	16.65	575.513924	5	786.927898	1.182830	NaN	NaN	NaN	NaN	NaN	8 mM	-60.956350

In [4]:

info.head()

Out[4]:

	OP	tissue_source	area	patient_age	patcher
0	OP201027	Mitte	temporal	33	Rosie
1	OP201029	Mitte	temporal	47	Rosie
2	OP210323	Virchow	temporal	10	Rosie
3	OP210615	Virchow	temporal	19	Rosie
4	OP211123	Bielefeld	temporal	68	Rosie

1. Add experiment information to the electrophysiology results¶

Is there information for every experiment?
How many experiments did each patcher perform? (i.e., individual OPs, or rows in info)
How many samples did each patcher analyze? (i.e., individual rows in df)

In [5]:

df_with_info = df.merge(info, on='OP', how='left')

In [6]:

df_with_info.count()

Out[6]:

OP                          827
filename                    827
slice                       827
cell_ch                     827
cell_ID                     827
day                         827
treatment                   827
hrs_incubation              827
repatch                     827
hrs_after_OP                827
Rs                          827
Rin                         827
resting_potential           827
max_spikes                  827
Rheobase                    824
AP_heigth                   824
TH                          824
max_depol                   824
max_repol                   824
membra_time_constant_tau    827
capacitance                 827
comments                    742
rheo_ramp                   120
AP_halfwidth                820
Rheobse_ramp                160
Unnamed: 27                   0
rheos_ramp                   32
comment                       5
                             37
high K concentration        827
RMP_from_char               827
tissue_source               800
area                        800
patient_age                 800
patcher                     800
dtype: int64

In [7]:

info['patcher'].value_counts()

Out[7]:

Verji    35
Rosie     8
Anna      2
Name: patcher, dtype: int64

In [8]:

df_with_info['patcher'].value_counts()

Out[8]:

Verji    594
Rosie    206
Name: patcher, dtype: int64

2. Remove outliers from the table¶

Load the list of outliers in outliers.csv
Use an anti-join to remove the outliers from the table
How many samples (rows) are left in the data?

In [9]:

outliers = pd.read_csv('outliers.csv')

In [10]:

outliers.shape

Out[10]:

(134, 2)

In [11]:

outliers.head()

Out[11]:

	OP	cell_ID
0	OP240201	24201S2c2
1	OP210323	2021_03_25_0S4_D2c6
2	OP230808	23808S2c6
3	OP240503	24503S1c6
4	OP230109	2311S3c2

In [12]:

temp = df_with_info.merge(outliers, on=['OP', 'cell_ID'], how='outer', indicator=True)

In [13]:

df_without_outliers = temp[temp['_merge'] == 'left_only'].drop('_merge', axis=1)

In [14]:

df_without_outliers.shape

Out[14]:

(659, 35)

In [15]:

df_without_outliers.head()

Out[15]:

	OP	filename	slice	cell_ch	cell_ID	day	treatment	hrs_incubation	repatch	hrs_after_OP	Rs	Rin	resting_potential	max_spikes	Rheobase	AP_heigth	TH	max_depol	max_repol	membra_time_constant_tau	capacitance	comments	rheo_ramp	AP_halfwidth	Rheobse_ramp	Unnamed: 27	rheos_ramp	comment		high K concentration	RMP_from_char	tissue_source	area	patient_age	patcher
0	OP230420	23420003.abf	S1	1	23420S1c1	D1	TTX	0.0	no	10.416389	6.675643	39.025301	-74.285889	24	200.0	80.749512	-35.278320	336.181641	-60.791016	19.40	510.601767	0	753.380113	1.151009	NaN	NaN	NaN	NaN	NaN	8 mM	-61.828554	Bielefeld	temporal	13.0	Verji
1	OP230420	23420003.abf	S1	3	23420S1c3	D1	TTX	0.0	no	10.416389	7.867174	48.728367	-69.573975	26	300.0	78.448486	-32.043457	350.097656	-67.138672	17.30	393.397918	1	585.102837	1.006321	NaN	NaN	NaN	NaN	NaN	8 mM	-60.460298	Bielefeld	temporal	13.0	Verji
2	OP230420	23420003.abf	S1	6	23420S1c6	D1	TTX	0.0	no	10.416389	8.820134	35.971082	-54.956055	22	300.0	76.660156	-29.827881	270.629883	-52.246094	14.85	426.098774	3	173.915797	1.266335	NaN	NaN	NaN	NaN	NaN	8 mM	-59.615979	Bielefeld	temporal	13.0	Verji
5	OP230420	23420003.abf	S1	8	23420S1c8	D1	TTX	0.0	yes	10.416389	6.000400	31.599917	-70.550537	22	350.0	81.011963	-33.068848	309.448242	-61.401367	16.65	575.513924	5	786.927898	1.182830	NaN	NaN	NaN	NaN	NaN	8 mM	-60.956350	Bielefeld	temporal	13.0	Verji
6	OP230420	23420061.abf	S1_D2	8	23420S1c8	D2	TTX	19.0	yes	29.633333	8.271614	30.607259	-70.745850	1	1300.0	48.883057	-20.855713	100.952148	-27.465820	13.25	864.892430	29	565.938865	1.504127	NaN	NaN	NaN	NaN	NaN	8 mM	-61.283967	Bielefeld	temporal	13.0	Verji

3. Save final result in `processed_QC_passed_2024-07-04_collected_v1.csv`¶

Using the .to_csv method of Pandas DataFrames

In [16]:

df_without_outliers.to_csv('processed_QC_passed_2024-07-04_collected_v1.csv', index=None)

In [ ]:

34 KiB Raw Blame History

Exercise: Add experiment information to electrophysiology data¶

Load electrophysiology data¶

1. Add experiment information to the electrophysiology results¶

2. Remove outliers from the table¶

3. Save final result in processed_QC_passed_2024-07-04_collected_v1.csv¶

34 KiB

Raw Blame History

3. Save final result in `processed_QC_passed_2024-07-04_collected_v1.csv`¶