34 KiB
34 KiB
Exercise: Add experiment information to electrophysiology data¶
In [1]:
import pandas as pd
# Set some Pandas options: maximum number of rows/columns it's going to display
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 100)
Load electrophysiology data¶
In [2]:
df = pd.read_csv('../../data/QC_passed_2024-07-04_collected.csv')
info = pd.read_csv('../../data/op_info.csv')
In [3]:
df.head()
Out[3]:
In [4]:
info.head()
Out[4]:
1. Add experiment information to the electrophysiology results¶
- Is there information for every experiment?
- How many experiments did each patcher perform? (i.e., individual OPs, or rows in
info
) - How many samples did each patcher analyze? (i.e., individual rows in
df
)
In [5]:
df_with_info = df.merge(info, on='OP', how='left')
In [6]:
df_with_info.count()
Out[6]:
In [7]:
info['patcher'].value_counts()
Out[7]:
In [8]:
df_with_info['patcher'].value_counts()
Out[8]:
2. Remove outliers from the table¶
- Load the list of outliers in
outliers.csv
- Use an anti-join to remove the outliers from the table
- How many samples (rows) are left in the data?
In [9]:
outliers = pd.read_csv('outliers.csv')
In [10]:
outliers.shape
Out[10]:
In [11]:
outliers.head()
Out[11]:
In [12]:
temp = df_with_info.merge(outliers, on=['OP', 'cell_ID'], how='outer', indicator=True)
In [13]:
df_without_outliers = temp[temp['_merge'] == 'left_only'].drop('_merge', axis=1)
In [14]:
df_without_outliers.shape
Out[14]:
In [15]:
df_without_outliers.head()
Out[15]:
3. Save final result in processed_QC_passed_2024-07-04_collected_v1.csv
¶
- Using the
.to_csv
method of Pandas DataFrames
In [16]:
df_without_outliers.to_csv('processed_QC_passed_2024-07-04_collected_v1.csv', index=None)
In [ ]: