25 KiB
25 KiB
Exercise: Analysis of tubercolosis cases by country and year period¶
In [1]:
import pandas as pd
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 100)
pd.set_option("display.max_colwidth", None)
Load the TB data from the World Health Organization¶
In [2]:
tb_raw = pd.read_csv('who2.csv', index_col='rownames')
Only keep data between 2000 and 2012
In [3]:
cols = ['country', 'year'] + [c for c in tb_raw.columns if c.startswith('sp')]
tb_raw = tb_raw.loc[tb_raw['year'].between(2000, 2012), cols]
In [4]:
tb_raw.shape
Out[4]:
In [5]:
tb_raw.sample(7, random_state=727)
Out[5]:
In [6]:
tb_raw[tb_raw['country'] == 'Angola']
Out[6]:
In [7]:
tb_raw.columns
Out[7]:
1. Make data tidy¶
The final table should have these columns: country
, year
, gender
, age_range
, cases
In [ ]:
2. Compute summary tables¶
- Compute the number of cases per country and gender, for data between 2000 and 2006 (included)
- Compute the number of cases per country and year range (2000-2006, 2007-2012) on rows, and gender on columns
In [ ]: