1908 lines
58 KiB
Text
1908 lines
58 KiB
Text
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f11a76bf",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Exercise on Joins and anti-joins: add information from other tables"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "b6f2742b",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"\n",
|
||
"# Set some Pandas options: maximum number of rows/columns it's going to display\n",
|
||
"#pd.set_option('display.max_rows', 1000)\n",
|
||
"#pd.set_option('display.max_columns', 100)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2967c84e",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Load data from clinical trial\n",
|
||
"\n",
|
||
"Data comes in two different files. The file `predimed_records.csv` file contains the clinical data for each patient, except which diet group they were assigned. The file `predimed_mapping.csv` contain the information of which patient was assigned to which diet group. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "ed626ee3",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>patient-id</th>\n",
|
||
" <th>location-id</th>\n",
|
||
" <th>sex</th>\n",
|
||
" <th>age</th>\n",
|
||
" <th>smoke</th>\n",
|
||
" <th>bmi</th>\n",
|
||
" <th>waist</th>\n",
|
||
" <th>wth</th>\n",
|
||
" <th>htn</th>\n",
|
||
" <th>diab</th>\n",
|
||
" <th>hyperchol</th>\n",
|
||
" <th>famhist</th>\n",
|
||
" <th>hormo</th>\n",
|
||
" <th>p14</th>\n",
|
||
" <th>toevent</th>\n",
|
||
" <th>event</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>436</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>58</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>33.53</td>\n",
|
||
" <td>122</td>\n",
|
||
" <td>0.753086</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>5.374401</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>1130</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>77</td>\n",
|
||
" <td>Current</td>\n",
|
||
" <td>31.05</td>\n",
|
||
" <td>119</td>\n",
|
||
" <td>0.730061</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>6.097194</td>\n",
|
||
" <td>No</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1131</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>72</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>30.86</td>\n",
|
||
" <td>106</td>\n",
|
||
" <td>0.654321</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>5.946612</td>\n",
|
||
" <td>No</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1132</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>71</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>27.68</td>\n",
|
||
" <td>118</td>\n",
|
||
" <td>0.694118</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>2.907598</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1111</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>79</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>35.94</td>\n",
|
||
" <td>129</td>\n",
|
||
" <td>0.806250</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>4.761123</td>\n",
|
||
" <td>No</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" patient-id location-id sex age smoke bmi waist wth htn \\\n",
|
||
"0 436 4 Male 58 Former 33.53 122 0.753086 No \n",
|
||
"1 1130 4 Male 77 Current 31.05 119 0.730061 Yes \n",
|
||
"2 1131 4 Female 72 Former 30.86 106 0.654321 No \n",
|
||
"3 1132 4 Male 71 Former 27.68 118 0.694118 Yes \n",
|
||
"4 1111 2 Female 79 Never 35.94 129 0.806250 Yes \n",
|
||
"\n",
|
||
" diab hyperchol famhist hormo p14 toevent event \n",
|
||
"0 No Yes No No 10 5.374401 Yes \n",
|
||
"1 Yes No No No 10 6.097194 No \n",
|
||
"2 Yes No Yes No 8 5.946612 No \n",
|
||
"3 No Yes No No 8 2.907598 Yes \n",
|
||
"4 No Yes No No 9 4.761123 No "
|
||
]
|
||
},
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('../../data/predimed_records.csv')\n",
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "48d5375f",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>location-id</th>\n",
|
||
" <th>patient-id</th>\n",
|
||
" <th>group</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>2</td>\n",
|
||
" <td>885</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>182</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>971</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>2</td>\n",
|
||
" <td>691</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>2</td>\n",
|
||
" <td>632</td>\n",
|
||
" <td>Control</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" location-id patient-id group\n",
|
||
"0 2 885 MedDiet + VOO\n",
|
||
"1 1 182 MedDiet + Nuts\n",
|
||
"2 1 971 MedDiet + Nuts\n",
|
||
"3 2 691 MedDiet + Nuts\n",
|
||
"4 2 632 Control"
|
||
]
|
||
},
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"info = pd.read_csv('../../data/predimed_mapping.csv')\n",
|
||
"info.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2b4b98ed-d7ec-4b7c-b983-adc616d2f16f",
|
||
"metadata": {},
|
||
"source": [
|
||
"There were 5 different locations where the study was conducted, each one gave an identification number `patient-id` to each participant."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "b9dbc492-1489-4530-96ac-5f33f7389caa",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([2, 1, 3, 4, 5])"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"info['location-id'].unique()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2fef4d37",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 1. Add diet information to the patients' records\n",
|
||
"\n",
|
||
"* For how many patients do we have clinical information? (i.e., rows in `df`)\n",
|
||
"* For how many patients do we have diet information? (i.e., rows in `info`)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "861ac334-14ce-490a-b3c4-877b32789f3e",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"6324"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(df)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "14f57842-5722-4953-88d6-d7cf3070400c",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"6287"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(info)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "3f23fa17-af3e-41c3-883f-3e1279d4820e",
|
||
"metadata": {},
|
||
"source": [
|
||
"Perform the merge, keeping in mind that it only make sense to analyze patients with the diet information. \n",
|
||
"* Which type of merge would you do? \n",
|
||
"* For how many patients do we have full information (records and which diet they followed? "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "35e19a53",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_with_info = df.merge(info, on=['patient-id', 'location-id'], how='right')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"id": "eac1244f",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"patient-id 6287\n",
|
||
"location-id 6287\n",
|
||
"sex 6287\n",
|
||
"age 6287\n",
|
||
"smoke 6287\n",
|
||
"bmi 6287\n",
|
||
"waist 6287\n",
|
||
"wth 6287\n",
|
||
"htn 6287\n",
|
||
"diab 6287\n",
|
||
"hyperchol 6287\n",
|
||
"famhist 6287\n",
|
||
"hormo 5629\n",
|
||
"p14 6287\n",
|
||
"toevent 6287\n",
|
||
"event 6287\n",
|
||
"group 6287\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_with_info.count()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "0d770d69-7a0a-47c9-bb5d-93a11329e7ad",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>patient-id</th>\n",
|
||
" <th>location-id</th>\n",
|
||
" <th>sex</th>\n",
|
||
" <th>age</th>\n",
|
||
" <th>smoke</th>\n",
|
||
" <th>bmi</th>\n",
|
||
" <th>waist</th>\n",
|
||
" <th>wth</th>\n",
|
||
" <th>htn</th>\n",
|
||
" <th>diab</th>\n",
|
||
" <th>hyperchol</th>\n",
|
||
" <th>famhist</th>\n",
|
||
" <th>hormo</th>\n",
|
||
" <th>p14</th>\n",
|
||
" <th>toevent</th>\n",
|
||
" <th>event</th>\n",
|
||
" <th>group</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>885</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>74</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>29.94</td>\n",
|
||
" <td>107</td>\n",
|
||
" <td>0.681529</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>5.711157</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>182</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>30.76</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>0.555556</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>3.274470</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>971</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>23.81</td>\n",
|
||
" <td>86</td>\n",
|
||
" <td>0.540881</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>3.088296</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>691</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>64</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>32.70</td>\n",
|
||
" <td>102</td>\n",
|
||
" <td>0.637500</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>3.028063</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>632</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>73</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>28.32</td>\n",
|
||
" <td>91</td>\n",
|
||
" <td>0.594771</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>5.919233</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Control</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6282</th>\n",
|
||
" <td>855</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>55</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>29.77</td>\n",
|
||
" <td>106</td>\n",
|
||
" <td>0.612717</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>3.449692</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6283</th>\n",
|
||
" <td>711</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>78</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>34.72</td>\n",
|
||
" <td>104</td>\n",
|
||
" <td>0.712329</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>1.921971</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Control</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6284</th>\n",
|
||
" <td>113</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>31.48</td>\n",
|
||
" <td>98</td>\n",
|
||
" <td>0.640523</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>6.403833</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6285</th>\n",
|
||
" <td>50</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>77</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>28.92</td>\n",
|
||
" <td>92</td>\n",
|
||
" <td>0.625850</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>5.360712</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6286</th>\n",
|
||
" <td>85</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>61</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>37.50</td>\n",
|
||
" <td>106</td>\n",
|
||
" <td>0.662500</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>1.823409</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6287 rows × 17 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" patient-id location-id sex age smoke bmi waist wth \\\n",
|
||
"0 885 2 Male 74 Former 29.94 107 0.681529 \n",
|
||
"1 182 1 Female 60 Former 30.76 85 0.555556 \n",
|
||
"2 971 1 Female 65 Never 23.81 86 0.540881 \n",
|
||
"3 691 2 Female 64 Never 32.70 102 0.637500 \n",
|
||
"4 632 2 Female 73 Never 28.32 91 0.594771 \n",
|
||
"... ... ... ... ... ... ... ... ... \n",
|
||
"6282 855 1 Male 55 Former 29.77 106 0.612717 \n",
|
||
"6283 711 4 Female 78 Never 34.72 104 0.712329 \n",
|
||
"6284 113 4 Female 60 Never 31.48 98 0.640523 \n",
|
||
"6285 50 5 Female 77 Never 28.92 92 0.625850 \n",
|
||
"6286 85 4 Female 61 Never 37.50 106 0.662500 \n",
|
||
"\n",
|
||
" htn diab hyperchol famhist hormo p14 toevent event group \n",
|
||
"0 Yes Yes Yes No NaN 8 5.711157 No MedDiet + VOO \n",
|
||
"1 No No Yes Yes Yes 10 3.274470 No MedDiet + Nuts \n",
|
||
"2 Yes Yes Yes No No 6 3.088296 No MedDiet + Nuts \n",
|
||
"3 Yes Yes No No No 8 3.028063 No MedDiet + Nuts \n",
|
||
"4 Yes Yes Yes Yes No 9 5.919233 No Control \n",
|
||
"... ... ... ... ... ... ... ... ... ... \n",
|
||
"6282 Yes No Yes Yes No 9 3.449692 No MedDiet + VOO \n",
|
||
"6283 Yes No Yes No No 9 1.921971 No Control \n",
|
||
"6284 No Yes Yes No No 10 6.403833 No MedDiet + VOO \n",
|
||
"6285 Yes No Yes No No 11 5.360712 No MedDiet + Nuts \n",
|
||
"6286 Yes No Yes Yes No 9 1.823409 No MedDiet + Nuts \n",
|
||
"\n",
|
||
"[6287 rows x 17 columns]"
|
||
]
|
||
},
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_with_info"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "946beb08-30a5-4020-8612-360385cdfc1e",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 2. Add location information to the patients' records\n",
|
||
"\n",
|
||
"There were five locations where the study was conducted. Here is a DataFrame containing the information of each location. \n",
|
||
"\n",
|
||
"- Add a new column to the dataset that contains the city where each patient was recorded.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"id": "36ce0688-d421-4a07-b00e-0e9b3201f0e0",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>location-id</th>\n",
|
||
" <th>City</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Valencia</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Barcelona</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Bilbao</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" location-id City\n",
|
||
"0 1 Madrid\n",
|
||
"1 2 Valencia\n",
|
||
"2 3 Barcelona\n",
|
||
"3 4 Bilbao\n",
|
||
"4 5 Malaga"
|
||
]
|
||
},
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"locations = pd.DataFrame.from_dict({'location-id': [1, 2, 3, 4, 5], \n",
|
||
" 'City': ['Madrid', 'Valencia', 'Barcelona', 'Bilbao','Malaga']})\n",
|
||
"locations"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"id": "b636dde4-129a-4dd1-8cbf-c539c9c8a5f2",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>patient-id</th>\n",
|
||
" <th>location-id</th>\n",
|
||
" <th>sex</th>\n",
|
||
" <th>age</th>\n",
|
||
" <th>smoke</th>\n",
|
||
" <th>bmi</th>\n",
|
||
" <th>waist</th>\n",
|
||
" <th>wth</th>\n",
|
||
" <th>htn</th>\n",
|
||
" <th>diab</th>\n",
|
||
" <th>hyperchol</th>\n",
|
||
" <th>famhist</th>\n",
|
||
" <th>hormo</th>\n",
|
||
" <th>p14</th>\n",
|
||
" <th>toevent</th>\n",
|
||
" <th>event</th>\n",
|
||
" <th>group</th>\n",
|
||
" <th>City</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>182</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>30.76</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>0.555556</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>3.274470</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>971</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>23.81</td>\n",
|
||
" <td>86</td>\n",
|
||
" <td>0.540881</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>3.088296</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>485</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>71</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>22.41</td>\n",
|
||
" <td>92</td>\n",
|
||
" <td>0.516854</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>0.438056</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Control</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>621</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>71</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>32.70</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>0.662651</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>2.661191</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>954</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>29.48</td>\n",
|
||
" <td>107</td>\n",
|
||
" <td>0.633136</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4.186174</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6282</th>\n",
|
||
" <td>1120</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>63</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>28.83</td>\n",
|
||
" <td>92</td>\n",
|
||
" <td>0.534884</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>3.392197</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6283</th>\n",
|
||
" <td>1082</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>63</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>25.33</td>\n",
|
||
" <td>92</td>\n",
|
||
" <td>0.613333</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>5.886379</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6284</th>\n",
|
||
" <td>302</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>59</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>31.71</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>0.519481</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>5.631759</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6285</th>\n",
|
||
" <td>566</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>58</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>27.81</td>\n",
|
||
" <td>98</td>\n",
|
||
" <td>0.583333</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>1.913758</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6286</th>\n",
|
||
" <td>50</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>77</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>28.92</td>\n",
|
||
" <td>92</td>\n",
|
||
" <td>0.625850</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>5.360712</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6287 rows × 18 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" patient-id location-id sex age smoke bmi waist wth \\\n",
|
||
"0 182 1 Female 60 Former 30.76 85 0.555556 \n",
|
||
"1 971 1 Female 65 Never 23.81 86 0.540881 \n",
|
||
"2 485 1 Male 71 Former 22.41 92 0.516854 \n",
|
||
"3 621 1 Male 71 Former 32.70 110 0.662651 \n",
|
||
"4 954 1 Male 70 Former 29.48 107 0.633136 \n",
|
||
"... ... ... ... ... ... ... ... ... \n",
|
||
"6282 1120 5 Male 63 Never 28.83 92 0.534884 \n",
|
||
"6283 1082 5 Female 63 Never 25.33 92 0.613333 \n",
|
||
"6284 302 5 Female 59 Never 31.71 80 0.519481 \n",
|
||
"6285 566 5 Male 58 Former 27.81 98 0.583333 \n",
|
||
"6286 50 5 Female 77 Never 28.92 92 0.625850 \n",
|
||
"\n",
|
||
" htn diab hyperchol famhist hormo p14 toevent event group \\\n",
|
||
"0 No No Yes Yes Yes 10 3.274470 No MedDiet + Nuts \n",
|
||
"1 Yes Yes Yes No No 6 3.088296 No MedDiet + Nuts \n",
|
||
"2 Yes Yes Yes No No 9 0.438056 No Control \n",
|
||
"3 No Yes Yes No No 11 2.661191 No MedDiet + Nuts \n",
|
||
"4 No Yes Yes No No 8 4.186174 No MedDiet + Nuts \n",
|
||
"... ... ... ... ... ... ... ... ... ... \n",
|
||
"6282 Yes No Yes No NaN 9 3.392197 No MedDiet + VOO \n",
|
||
"6283 Yes No Yes No No 5 5.886379 No MedDiet + Nuts \n",
|
||
"6284 No Yes Yes No No 8 5.631759 No MedDiet + VOO \n",
|
||
"6285 Yes No Yes No No 10 1.913758 No MedDiet + VOO \n",
|
||
"6286 Yes No Yes No No 11 5.360712 No MedDiet + Nuts \n",
|
||
"\n",
|
||
" City \n",
|
||
"0 Madrid \n",
|
||
"1 Madrid \n",
|
||
"2 Madrid \n",
|
||
"3 Madrid \n",
|
||
"4 Madrid \n",
|
||
"... ... \n",
|
||
"6282 Malaga \n",
|
||
"6283 Malaga \n",
|
||
"6284 Malaga \n",
|
||
"6285 Malaga \n",
|
||
"6286 Malaga \n",
|
||
"\n",
|
||
"[6287 rows x 18 columns]"
|
||
]
|
||
},
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_with_info = df_with_info.merge(locations, on='location-id', how='right')\n",
|
||
"df_with_info"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "44031178",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 3. Remove drops from table\n",
|
||
"\n",
|
||
"Some patients drop from the study early on and they should be removed from our analysis. Their IDS are stored in file `dropped.csv`.\n",
|
||
"1. Load the list of patients who droped, from `dropped.csv`\n",
|
||
"2. Use an anti-join to remove them from the table\n",
|
||
"3. How many patients (rows) are left in the data?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "d1d4cc27",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dropped = pd.read_csv('dropped.csv')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "fbebbd97",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(42, 2)"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"dropped.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"id": "8a3c7943",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>location-id</th>\n",
|
||
" <th>patient-id</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>217</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1147</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1170</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>627</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>4</td>\n",
|
||
" <td>541</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" location-id patient-id\n",
|
||
"0 1 217\n",
|
||
"1 1 1147\n",
|
||
"2 1 1170\n",
|
||
"3 1 627\n",
|
||
"4 4 541"
|
||
]
|
||
},
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"dropped.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"id": "573687e7",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>patient-id</th>\n",
|
||
" <th>location-id</th>\n",
|
||
" <th>sex</th>\n",
|
||
" <th>age</th>\n",
|
||
" <th>smoke</th>\n",
|
||
" <th>bmi</th>\n",
|
||
" <th>waist</th>\n",
|
||
" <th>wth</th>\n",
|
||
" <th>htn</th>\n",
|
||
" <th>diab</th>\n",
|
||
" <th>hyperchol</th>\n",
|
||
" <th>famhist</th>\n",
|
||
" <th>hormo</th>\n",
|
||
" <th>p14</th>\n",
|
||
" <th>toevent</th>\n",
|
||
" <th>event</th>\n",
|
||
" <th>group</th>\n",
|
||
" <th>City</th>\n",
|
||
" <th>_merge</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>77</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>25.92</td>\n",
|
||
" <td>94</td>\n",
|
||
" <td>0.657343</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>5.538672</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>2</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>68</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>34.85</td>\n",
|
||
" <td>150</td>\n",
|
||
" <td>0.949367</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>3.063655</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>37.50</td>\n",
|
||
" <td>120</td>\n",
|
||
" <td>0.750000</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>5.590691</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>4</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>77</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>29.26</td>\n",
|
||
" <td>93</td>\n",
|
||
" <td>0.628378</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>5.456537</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>5</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>30.02</td>\n",
|
||
" <td>104</td>\n",
|
||
" <td>0.662420</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>2.746064</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Control</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6282</th>\n",
|
||
" <td>1253</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>79</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>25.28</td>\n",
|
||
" <td>105</td>\n",
|
||
" <td>0.640244</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>5.828884</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6283</th>\n",
|
||
" <td>1254</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>62</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>27.10</td>\n",
|
||
" <td>104</td>\n",
|
||
" <td>0.594286</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>5.067762</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6284</th>\n",
|
||
" <td>1255</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>35.02</td>\n",
|
||
" <td>103</td>\n",
|
||
" <td>0.686667</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>1.993155</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6285</th>\n",
|
||
" <td>1256</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>61</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>28.42</td>\n",
|
||
" <td>94</td>\n",
|
||
" <td>0.576687</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>2.039699</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6286</th>\n",
|
||
" <td>1257</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Male</td>\n",
|
||
" <td>58</td>\n",
|
||
" <td>Former</td>\n",
|
||
" <td>24.43</td>\n",
|
||
" <td>93</td>\n",
|
||
" <td>0.547059</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>2.590007</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Malaga</td>\n",
|
||
" <td>left_only</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6287 rows × 19 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" patient-id location-id sex age smoke bmi waist wth \\\n",
|
||
"0 1 1 Female 77 Never 25.92 94 0.657343 \n",
|
||
"1 2 1 Female 68 Never 34.85 150 0.949367 \n",
|
||
"2 3 1 Female 66 Never 37.50 120 0.750000 \n",
|
||
"3 4 1 Female 77 Never 29.26 93 0.628378 \n",
|
||
"4 5 1 Female 60 Never 30.02 104 0.662420 \n",
|
||
"... ... ... ... ... ... ... ... ... \n",
|
||
"6282 1253 5 Male 79 Never 25.28 105 0.640244 \n",
|
||
"6283 1254 5 Male 62 Former 27.10 104 0.594286 \n",
|
||
"6284 1255 5 Female 65 Never 35.02 103 0.686667 \n",
|
||
"6285 1256 5 Male 61 Never 28.42 94 0.576687 \n",
|
||
"6286 1257 5 Male 58 Former 24.43 93 0.547059 \n",
|
||
"\n",
|
||
" htn diab hyperchol famhist hormo p14 toevent event group \\\n",
|
||
"0 Yes No Yes Yes No 9 5.538672 No MedDiet + VOO \n",
|
||
"1 Yes No Yes Yes NaN 10 3.063655 No MedDiet + Nuts \n",
|
||
"2 Yes Yes No No No 6 5.590691 No MedDiet + Nuts \n",
|
||
"3 Yes Yes No No No 6 5.456537 No MedDiet + VOO \n",
|
||
"4 Yes No Yes No No 9 2.746064 No Control \n",
|
||
"... ... ... ... ... ... ... ... ... ... \n",
|
||
"6282 Yes No Yes No No 8 5.828884 No MedDiet + VOO \n",
|
||
"6283 Yes No Yes Yes No 9 5.067762 No MedDiet + Nuts \n",
|
||
"6284 Yes No Yes No No 10 1.993155 No MedDiet + VOO \n",
|
||
"6285 Yes Yes No No No 9 2.039699 No MedDiet + Nuts \n",
|
||
"6286 Yes Yes Yes No No 9 2.590007 No MedDiet + Nuts \n",
|
||
"\n",
|
||
" City _merge \n",
|
||
"0 Madrid left_only \n",
|
||
"1 Madrid left_only \n",
|
||
"2 Madrid left_only \n",
|
||
"3 Madrid left_only \n",
|
||
"4 Madrid left_only \n",
|
||
"... ... ... \n",
|
||
"6282 Malaga left_only \n",
|
||
"6283 Malaga left_only \n",
|
||
"6284 Malaga left_only \n",
|
||
"6285 Malaga left_only \n",
|
||
"6286 Malaga left_only \n",
|
||
"\n",
|
||
"[6287 rows x 19 columns]"
|
||
]
|
||
},
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp = df_with_info.merge(dropped, on=['location-id', 'patient-id'], how='outer', indicator=True)\n",
|
||
"temp"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"id": "a4a6574b",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_without_dropped = temp[temp['_merge'] == 'left_only'].drop('_merge', axis=1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"id": "8fd89a40",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(6245, 18)"
|
||
]
|
||
},
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_without_dropped.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"id": "07f4776a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>patient-id</th>\n",
|
||
" <th>location-id</th>\n",
|
||
" <th>sex</th>\n",
|
||
" <th>age</th>\n",
|
||
" <th>smoke</th>\n",
|
||
" <th>bmi</th>\n",
|
||
" <th>waist</th>\n",
|
||
" <th>wth</th>\n",
|
||
" <th>htn</th>\n",
|
||
" <th>diab</th>\n",
|
||
" <th>hyperchol</th>\n",
|
||
" <th>famhist</th>\n",
|
||
" <th>hormo</th>\n",
|
||
" <th>p14</th>\n",
|
||
" <th>toevent</th>\n",
|
||
" <th>event</th>\n",
|
||
" <th>group</th>\n",
|
||
" <th>City</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>77</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>25.92</td>\n",
|
||
" <td>94</td>\n",
|
||
" <td>0.657343</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>5.538672</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>2</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>68</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>34.85</td>\n",
|
||
" <td>150</td>\n",
|
||
" <td>0.949367</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>3.063655</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>37.50</td>\n",
|
||
" <td>120</td>\n",
|
||
" <td>0.750000</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>5.590691</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + Nuts</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>4</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>77</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>29.26</td>\n",
|
||
" <td>93</td>\n",
|
||
" <td>0.628378</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>5.456537</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>MedDiet + VOO</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>5</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Female</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>Never</td>\n",
|
||
" <td>30.02</td>\n",
|
||
" <td>104</td>\n",
|
||
" <td>0.662420</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Yes</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>2.746064</td>\n",
|
||
" <td>No</td>\n",
|
||
" <td>Control</td>\n",
|
||
" <td>Madrid</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" patient-id location-id sex age smoke bmi waist wth htn \\\n",
|
||
"0 1 1 Female 77 Never 25.92 94 0.657343 Yes \n",
|
||
"1 2 1 Female 68 Never 34.85 150 0.949367 Yes \n",
|
||
"2 3 1 Female 66 Never 37.50 120 0.750000 Yes \n",
|
||
"3 4 1 Female 77 Never 29.26 93 0.628378 Yes \n",
|
||
"4 5 1 Female 60 Never 30.02 104 0.662420 Yes \n",
|
||
"\n",
|
||
" diab hyperchol famhist hormo p14 toevent event group City \n",
|
||
"0 No Yes Yes No 9 5.538672 No MedDiet + VOO Madrid \n",
|
||
"1 No Yes Yes NaN 10 3.063655 No MedDiet + Nuts Madrid \n",
|
||
"2 Yes No No No 6 5.590691 No MedDiet + Nuts Madrid \n",
|
||
"3 Yes No No No 6 5.456537 No MedDiet + VOO Madrid \n",
|
||
"4 No Yes No No 9 2.746064 No Control Madrid "
|
||
]
|
||
},
|
||
"execution_count": 18,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_without_dropped.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "84270332",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 4. Save final result in `processed_data_predimed.csv`\n",
|
||
"\n",
|
||
"1. Using the `.to_csv` method of Pandas DataFrames"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"id": "85902eea",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_without_dropped.to_csv('processed_data_predimed.csv', index=None)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "c7bcff45",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.5"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|