12 KiB
12 KiB
Combine information across tables: joins and anti-joins¶
In [1]:
import pandas as pd
"Load" some experimental data¶
In [2]:
data = pd.DataFrame(
data=[
['312', 'A1', 0.12, 'LEFT'],
['312', 'A2', 0.37, 'LEFT'],
['312', 'C2', 0.68, 'LEFT'],
['711', 'A1', 4.01, 'RIGHT'],
['711', 'A2', 0.44, 'LEFT'],
['313', 'A1', 0.07, 'RIGHT'],
['313', 'B1', 0.08, 'RIGHT'],
['712', 'A2', 3.29, 'LEFT'],
['314', 'A2', 0.29, 'LEFT'],
['714', 'B2', 3.32, 'RIGHT'],
['314', 'B1', 0.14, 'RIGHT'],
['314', 'C2', 0.73, 'RIGHT'],
['713', 'B1', 5.74, 'LEFT'],
],
columns=['subject_id', 'condition_id', 'response_time', 'response'],
)
data
Out[2]:
Each experiment belongs to one experimental condition, but the parameters of each condition are not in the table
In [3]:
condition_to_orientation = {
'A1': 0,
'A2': 0,
'B1': 45,
'B2': 45,
'C1': 90,
}
condition_to_duration = {
'A1': 0.1,
'A2': 0.01,
'B1': 0.1,
'B2': 0.01,
'C1': 0.2,
}
condition_to_surround = {
'A1': 'FULL',
'A2': 'NONE',
'B1': 'NONE',
'B2': 'FULL',
'C1': 'FULL',
}
condition_to_stimulus_type = {
'A1': 'LINES',
'A2': 'DOTS',
'B1': 'PLAID',
'B2': 'PLAID',
'C1': 'WIGGLES',
}
Manually adding the condition parameters to the table¶
In [73]:
data_with_properties = data.copy()
In [ ]:
In [ ]:
Using a join operation¶
In [4]:
# Often, this is done using a spreadsheet
condition_properties = pd.DataFrame(
[condition_to_orientation, condition_to_duration, condition_to_surround, condition_to_stimulus_type],
index=['orientation', 'duration', 'surround', 'stimulus_type'],
).T
condition_properties
Out[4]:
In [ ]:
In [ ]:
Anti-join: filter out unwanted data¶
In [5]:
# We are given a list of subjects that are outliers and should be disregarded in the analysis
outliers = pd.DataFrame([['711'], ['712'], ['713'], ['714'], ['888']], columns=['subject_id'])
In [ ]:
In [ ]: