{ "cells": [ { "cell_type": "markdown", "id": "6f6aa857", "metadata": {}, "source": [ "# Exercise window functions: compute the cumulative number of cases across time, per diet group\n", "\n", "The variable `toevent` contains the time that patients where followed up. We want to calculate the number of events as a function of the follow-up time, separatedely for each diet group. We expect that, if the mediterranean diet has an effect, then over time there will be more cases appearing on the control group in comparison to the other diet groups. \n", "\n", "Here is how to proceed:\n", "- Use a window function to compute the cumulative number of events for each diet group separatedly. As we are interested in the follow-up time, you need to sort the events by the follow-up time first (`toevent`), and then calculate the cumulative sum of events, separatedely per group.\n", "- Add the result as a new column called `'cumulative_event_count'`\n", "\n", "With your new awesome vectorization skills, these two steps should take only one line!\n", "\n", "When ready, execute the code at the end, which has already code that creates a visualiation with the cumulative number of events per group, as a function of the time of follow-up." ] }, { "cell_type": "code", "execution_count": 1, "id": "8f9bc8b1", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "1be11d54", "metadata": {}, "source": [ "### Load patient data" ] }, { "cell_type": "code", "execution_count": 2, "id": "8dfc3020", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | patient-id | \n", "location-id | \n", "sex | \n", "age | \n", "smoke | \n", "bmi | \n", "waist | \n", "wth | \n", "htn | \n", "diab | \n", "hyperchol | \n", "famhist | \n", "hormo | \n", "p14 | \n", "toevent | \n", "event | \n", "group | \n", "City | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "1 | \n", "Female | \n", "77 | \n", "Never | \n", "25.92 | \n", "94 | \n", "0.657343 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "No | \n", "9 | \n", "5.538672 | \n", "0 | \n", "MedDiet + VOO | \n", "Madrid | \n", "
1 | \n", "2 | \n", "1 | \n", "Female | \n", "68 | \n", "Never | \n", "34.85 | \n", "150 | \n", "0.949367 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "NaN | \n", "10 | \n", "3.063655 | \n", "0 | \n", "MedDiet + Nuts | \n", "Madrid | \n", "
2 | \n", "3 | \n", "1 | \n", "Female | \n", "66 | \n", "Never | \n", "37.50 | \n", "120 | \n", "0.750000 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "6 | \n", "5.590691 | \n", "0 | \n", "MedDiet + Nuts | \n", "Madrid | \n", "
3 | \n", "4 | \n", "1 | \n", "Female | \n", "77 | \n", "Never | \n", "29.26 | \n", "93 | \n", "0.628378 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "6 | \n", "5.456537 | \n", "0 | \n", "MedDiet + VOO | \n", "Madrid | \n", "
4 | \n", "5 | \n", "1 | \n", "Female | \n", "60 | \n", "Never | \n", "30.02 | \n", "104 | \n", "0.662420 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "9 | \n", "2.746064 | \n", "0 | \n", "Control | \n", "Madrid | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
6240 | \n", "1253 | \n", "5 | \n", "Male | \n", "79 | \n", "Never | \n", "25.28 | \n", "105 | \n", "0.640244 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "8 | \n", "5.828884 | \n", "0 | \n", "MedDiet + VOO | \n", "Malaga | \n", "
6241 | \n", "1254 | \n", "5 | \n", "Male | \n", "62 | \n", "Former | \n", "27.10 | \n", "104 | \n", "0.594286 | \n", "Yes | \n", "No | \n", "Yes | \n", "Yes | \n", "No | \n", "9 | \n", "5.067762 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "
6242 | \n", "1255 | \n", "5 | \n", "Female | \n", "65 | \n", "Never | \n", "35.02 | \n", "103 | \n", "0.686667 | \n", "Yes | \n", "No | \n", "Yes | \n", "No | \n", "No | \n", "10 | \n", "1.993155 | \n", "0 | \n", "MedDiet + VOO | \n", "Malaga | \n", "
6243 | \n", "1256 | \n", "5 | \n", "Male | \n", "61 | \n", "Never | \n", "28.42 | \n", "94 | \n", "0.576687 | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "No | \n", "9 | \n", "2.039699 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "
6244 | \n", "1257 | \n", "5 | \n", "Male | \n", "58 | \n", "Former | \n", "24.43 | \n", "93 | \n", "0.547059 | \n", "Yes | \n", "Yes | \n", "Yes | \n", "No | \n", "No | \n", "9 | \n", "2.590007 | \n", "0 | \n", "MedDiet + Nuts | \n", "Malaga | \n", "
6245 rows × 18 columns
\n", "