{ "cells": [ { "cell_type": "markdown", "id": "6f6aa857", "metadata": {}, "source": [ "# Exercise window functions: compute the cumulative number of cases across time, per diet group\n", "\n", "The variable `toevent` contains the time that patients where followed up. We want to calculate the number of events as a function of the follow-up time, separatedely for each diet group. We expect that, if the mediterranean diet has an effect, then over time there will be more cases appearing on the control group in comparison to the other diet groups. \n", "\n", "Here is how to proceed:\n", "- Use a window function to compute the cumulative number of events for each diet group separatedly. As we are interested in the follow-up time, you need to sort the events by the follow-up time first (`toevent`), and then calculate the cumulative sum of events, separatedely per group.\n", "- Add the result as a new column called `'cumulative_event_count'`\n", "\n", "With your new awesome vectorization skills, these two steps should take only one line!\n", "\n", "When ready, execute the code at the end, which has already code that creates a visualiation with the cumulative number of events per group, as a function of the time of follow-up." ] }, { "cell_type": "code", "execution_count": 1, "id": "8f9bc8b1", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "1be11d54", "metadata": {}, "source": [ "### Load patient data" ] }, { "cell_type": "code", "execution_count": 2, "id": "8dfc3020", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | patient-id | \n", "location-id | \n", "sex | \n", "age | \n", "smoke | \n", "bmi | \n", "waist | \n", "wth | \n", "htn | \n", "diab | \n", "hyperchol | \n", "famhist | \n", "hormo | \n", "p14 | \n", "toevent | \n", "event | \n", "group | \n", "City | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "1 | \n", "Female | \n", "77 | \n", "Never | \n", "25.92 | \n", "94 | \n", "0.657343 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "No | \n", "9 | \n", "5.538672 | \n", "0 | \n", "MedDiet + VOO | \n", "Madrid | \n", "
1 | \n", "2 | \n", "1 | \n", "Female | \n", "68 | \n", "Never | \n", "34.85 | \n", "150 | \n", "0.949367 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "NaN | \n", "10 | \n", "3.063655 | \n", "0 | \n", "MedDiet + Nuts | \n", "Madrid | \n", "
2 | \n", "3 | \n", "1 | \n", "Female | \n", "66 | \n", "Never | \n", "37.50 | \n", "120 | \n", "0.750000 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "6 | \n", "5.590691 | \n", "0 | \n", "MedDiet + Nuts | \n", "Madrid | \n", "
3 | \n", "4 | \n", "1 | \n", "Female | \n", "77 | \n", "Never | \n", "29.26 | \n", "93 | \n", "0.628378 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "6 | \n", "5.456537 | \n", "0 | \n", "MedDiet + VOO | \n", "Madrid | \n", "
4 | \n", "5 | \n", "1 | \n", "Female | \n", "60 | \n", "Never | \n", "30.02 | \n", "104 | \n", "0.662420 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "9 | \n", "2.746064 | \n", "0 | \n", "Control | \n", "Madrid | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
6240 | \n", "1253 | \n", "5 | \n", "Male | \n", "79 | \n", "Never | \n", "25.28 | \n", "105 | \n", "0.640244 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "8 | \n", "5.828884 | \n", "0 | \n", "MedDiet + VOO | \n", "Malaga | \n", "
6241 | \n", "1254 | \n", "5 | \n", "Male | \n", "62 | \n", "Former | \n", "27.10 | \n", "104 | \n", "0.594286 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "No | \n", "9 | \n", "5.067762 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "
6242 | \n", "1255 | \n", "5 | \n", "Female | \n", "65 | \n", "Never | \n", "35.02 | \n", "103 | \n", "0.686667 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "10 | \n", "1.993155 | \n", "0 | \n", "MedDiet + VOO | \n", "Malaga | \n", "
6243 | \n", "1256 | \n", "5 | \n", "Male | \n", "61 | \n", "Never | \n", "28.42 | \n", "94 | \n", "0.576687 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "9 | \n", "2.039699 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "
6244 | \n", "1257 | \n", "5 | \n", "Male | \n", "58 | \n", "Former | \n", "24.43 | \n", "93 | \n", "0.547059 | \n", "Yes | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "9 | \n", "2.590007 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "
6245 rows × 18 columns
\n", "\n", " | patient-id | \n", "location-id | \n", "sex | \n", "age | \n", "smoke | \n", "bmi | \n", "waist | \n", "wth | \n", "htn | \n", "diab | \n", "hyperchol | \n", "famhist | \n", "hormo | \n", "p14 | \n", "toevent | \n", "event | \n", "group | \n", "City | \n", "cumulative_event_count | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "1 | \n", "Female | \n", "77 | \n", "Never | \n", "25.92 | \n", "94 | \n", "0.657343 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "No | \n", "9 | \n", "5.538672 | \n", "0 | \n", "MedDiet + VOO | \n", "Madrid | \n", "73 | \n", "
1 | \n", "2 | \n", "1 | \n", "Female | \n", "68 | \n", "Never | \n", "34.85 | \n", "150 | \n", "0.949367 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "NaN | \n", "10 | \n", "3.063655 | \n", "0 | \n", "MedDiet + Nuts | \n", "Madrid | \n", "35 | \n", "
2 | \n", "3 | \n", "1 | \n", "Female | \n", "66 | \n", "Never | \n", "37.50 | \n", "120 | \n", "0.750000 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "6 | \n", "5.590691 | \n", "0 | \n", "MedDiet + Nuts | \n", "Madrid | \n", "61 | \n", "
3 | \n", "4 | \n", "1 | \n", "Female | \n", "77 | \n", "Never | \n", "29.26 | \n", "93 | \n", "0.628378 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "6 | \n", "5.456537 | \n", "0 | \n", "MedDiet + VOO | \n", "Madrid | \n", "73 | \n", "
4 | \n", "5 | \n", "1 | \n", "Female | \n", "60 | \n", "Never | \n", "30.02 | \n", "104 | \n", "0.662420 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "9 | \n", "2.746064 | \n", "0 | \n", "Control | \n", "Madrid | \n", "50 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
6240 | \n", "1253 | \n", "5 | \n", "Male | \n", "79 | \n", "Never | \n", "25.28 | \n", "105 | \n", "0.640244 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "8 | \n", "5.828884 | \n", "0 | \n", "MedDiet + VOO | \n", "Malaga | \n", "74 | \n", "
6241 | \n", "1254 | \n", "5 | \n", "Male | \n", "62 | \n", "Former | \n", "27.10 | \n", "104 | \n", "0.594286 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "No | \n", "9 | \n", "5.067762 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "57 | \n", "
6242 | \n", "1255 | \n", "5 | \n", "Female | \n", "65 | \n", "Never | \n", "35.02 | \n", "103 | \n", "0.686667 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "10 | \n", "1.993155 | \n", "0 | \n", "MedDiet + VOO | \n", "Malaga | \n", "27 | \n", "
6243 | \n", "1256 | \n", "5 | \n", "Male | \n", "61 | \n", "Never | \n", "28.42 | \n", "94 | \n", "0.576687 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "9 | \n", "2.039699 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "16 | \n", "
6244 | \n", "1257 | \n", "5 | \n", "Male | \n", "58 | \n", "Former | \n", "24.43 | \n", "93 | \n", "0.547059 | \n", "Yes | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "9 | \n", "2.590007 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "27 | \n", "
6245 rows × 19 columns
\n", "\n", " | patient-id | \n", "location-id | \n", "sex | \n", "age | \n", "smoke | \n", "bmi | \n", "waist | \n", "wth | \n", "htn | \n", "diab | \n", "hyperchol | \n", "famhist | \n", "hormo | \n", "p14 | \n", "toevent | \n", "event | \n", "group | \n", "City | \n", "cumulative_event_count | \n", "N | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "1 | \n", "Female | \n", "77 | \n", "Never | \n", "25.92 | \n", "94 | \n", "0.657343 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "No | \n", "9 | \n", "5.538672 | \n", "0 | \n", "MedDiet + VOO | \n", "Madrid | \n", "73 | \n", "2152 | \n", "
1 | \n", "2 | \n", "1 | \n", "Female | \n", "68 | \n", "Never | \n", "34.85 | \n", "150 | \n", "0.949367 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "NaN | \n", "10 | \n", "3.063655 | \n", "0 | \n", "MedDiet + Nuts | \n", "Madrid | \n", "35 | \n", "2077 | \n", "
2 | \n", "3 | \n", "1 | \n", "Female | \n", "66 | \n", "Never | \n", "37.50 | \n", "120 | \n", "0.750000 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "6 | \n", "5.590691 | \n", "0 | \n", "MedDiet + Nuts | \n", "Madrid | \n", "61 | \n", "2077 | \n", "
3 | \n", "4 | \n", "1 | \n", "Female | \n", "77 | \n", "Never | \n", "29.26 | \n", "93 | \n", "0.628378 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "6 | \n", "5.456537 | \n", "0 | \n", "MedDiet + VOO | \n", "Madrid | \n", "73 | \n", "2152 | \n", "
4 | \n", "5 | \n", "1 | \n", "Female | \n", "60 | \n", "Never | \n", "30.02 | \n", "104 | \n", "0.662420 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "9 | \n", "2.746064 | \n", "0 | \n", "Control | \n", "Madrid | \n", "50 | \n", "2016 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
6240 | \n", "1253 | \n", "5 | \n", "Male | \n", "79 | \n", "Never | \n", "25.28 | \n", "105 | \n", "0.640244 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "8 | \n", "5.828884 | \n", "0 | \n", "MedDiet + VOO | \n", "Malaga | \n", "74 | \n", "2152 | \n", "
6241 | \n", "1254 | \n", "5 | \n", "Male | \n", "62 | \n", "Former | \n", "27.10 | \n", "104 | \n", "0.594286 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "No | \n", "9 | \n", "5.067762 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "57 | \n", "2077 | \n", "
6242 | \n", "1255 | \n", "5 | \n", "Female | \n", "65 | \n", "Never | \n", "35.02 | \n", "103 | \n", "0.686667 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "10 | \n", "1.993155 | \n", "0 | \n", "MedDiet + VOO | \n", "Malaga | \n", "27 | \n", "2152 | \n", "
6243 | \n", "1256 | \n", "5 | \n", "Male | \n", "61 | \n", "Never | \n", "28.42 | \n", "94 | \n", "0.576687 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "9 | \n", "2.039699 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "16 | \n", "2077 | \n", "
6244 | \n", "1257 | \n", "5 | \n", "Male | \n", "58 | \n", "Former | \n", "24.43 | \n", "93 | \n", "0.547059 | \n", "Yes | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "9 | \n", "2.590007 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "27 | \n", "2077 | \n", "
6245 rows × 20 columns
\n", "