WDI Helpers Demo#

Python Logo

This short notebook demonstrates my helper functions for working with the World Development Indicators dataset.

With these functions, you can prepare a dataset for any year you choose, with each indicator in its own column.

# Load helper functions.
%run WDI.ipynb
data = get_wdi()

Remove Country “Groups”#

One issue with this dataset is that includes “groups” of countries, like “High Income” or even “World”. Let’s remove them, so we have only the actual countries.

First, we’ll look at all the “country” names.

data['Country Name'].unique()
array(['Africa Eastern and Southern', 'Africa Western and Central',
       'Arab World', 'Caribbean small states',
       'Central Europe and the Baltics', 'Early-demographic dividend',
       'East Asia & Pacific',
       'East Asia & Pacific (excluding high income)',
       'East Asia & Pacific (IDA & IBRD countries)', 'Euro area',
       'Europe & Central Asia',
       'Europe & Central Asia (excluding high income)',
       'Europe & Central Asia (IDA & IBRD countries)', 'European Union',
       'Fragile and conflict affected situations',
       'Heavily indebted poor countries (HIPC)', 'High income',
       'IBRD only', 'IDA & IBRD total', 'IDA blend', 'IDA only',
       'IDA total', 'Late-demographic dividend',
       'Latin America & Caribbean',
       'Latin America & Caribbean (excluding high income)',
       'Latin America & the Caribbean (IDA & IBRD countries)',
       'Least developed countries: UN classification',
       'Low & middle income', 'Low income', 'Lower middle income',
       'Middle East & North Africa',
       'Middle East & North Africa (excluding high income)',
       'Middle East & North Africa (IDA & IBRD countries)',
       'Middle income', 'North America', 'Not classified', 'OECD members',
       'Other small states', 'Pacific island small states',
       'Post-demographic dividend', 'Pre-demographic dividend',
       'Small states', 'South Asia', 'South Asia (IDA & IBRD)',
       'Sub-Saharan Africa', 'Sub-Saharan Africa (excluding high income)',
       'Sub-Saharan Africa (IDA & IBRD countries)', 'Upper middle income',
       'World', 'Afghanistan', 'Albania', 'Algeria', 'American Samoa',
       'Andorra', 'Angola', 'Antigua and Barbuda', 'Argentina', 'Armenia',
       'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas, The',
       'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
       'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'British Virgin Islands', 'Brunei Darussalam', 'Bulgaria',
       'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cambodia', 'Cameroon',
       'Canada', 'Cayman Islands', 'Central African Republic', 'Chad',
       'Channel Islands', 'Chile', 'China', 'Colombia', 'Comoros',
       'Congo, Dem. Rep.', 'Congo, Rep.', 'Costa Rica', "Cote d'Ivoire",
       'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic',
       'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador',
       'Egypt, Arab Rep.', 'El Salvador', 'Equatorial Guinea', 'Eritrea',
       'Estonia', 'Eswatini', 'Ethiopia', 'Faroe Islands', 'Fiji',
       'Finland', 'France', 'French Polynesia', 'Gabon', 'Gambia, The',
       'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland',
       'Grenada', 'Guam', 'Guatemala', 'Guinea', 'Guinea-Bissau',
       'Guyana', 'Haiti', 'Honduras', 'Hong Kong SAR, China', 'Hungary',
       'Iceland', 'India', 'Indonesia', 'Iran, Islamic Rep.', 'Iraq',
       'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan',
       'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati',
       "Korea, Dem. People's Rep.", 'Korea, Rep.', 'Kosovo', 'Kuwait',
       'Kyrgyz Republic', 'Lao PDR', 'Latvia', 'Lebanon', 'Lesotho',
       'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg',
       'Macao SAR, China', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives',
       'Mali', 'Malta', 'Marshall Islands', 'Mauritania', 'Mauritius',
       'Mexico', 'Micronesia, Fed. Sts.', 'Moldova', 'Monaco', 'Mongolia',
       'Montenegro', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia',
       'Nauru', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand',
       'Nicaragua', 'Niger', 'Nigeria', 'North Macedonia',
       'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palau',
       'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines',
       'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Romania',
       'Russian Federation', 'Rwanda', 'Samoa', 'San Marino',
       'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia',
       'Seychelles', 'Sierra Leone', 'Singapore',
       'Sint Maarten (Dutch part)', 'Slovak Republic', 'Slovenia',
       'Solomon Islands', 'Somalia', 'South Africa', 'South Sudan',
       'Spain', 'Sri Lanka', 'St. Kitts and Nevis', 'St. Lucia',
       'St. Martin (French part)', 'St. Vincent and the Grenadines',
       'Sudan', 'Suriname', 'Sweden', 'Switzerland',
       'Syrian Arab Republic', 'Tajikistan', 'Tanzania', 'Thailand',
       'Timor-Leste', 'Togo', 'Tonga', 'Trinidad and Tobago', 'Tunisia',
       'Turkiye', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu',
       'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom',
       'United States', 'Uruguay', 'Uzbekistan', 'Vanuatu',
       'Venezuela, RB', 'Vietnam', 'Virgin Islands (U.S.)',
       'West Bank and Gaza', 'Yemen, Rep.', 'Zambia', 'Zimbabwe'],
      dtype=object)

Good news! The names seem to be in order, with all the various groups preceding the countries. The final group is World, followed by the first country alphabetically, Afghanistan.

Let’s make an array of these groups, so we can filter them out. For maximum precision, we’ll use the three-letter country codes.

data['Country Code'].unique()
array(['AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'EAR', 'EAS', 'EAP', 'TEA',
       'EMU', 'ECS', 'ECA', 'TEC', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD',
       'IBT', 'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC',
       'LMY', 'LIC', 'LMC', 'MEA', 'MNA', 'TMN', 'MIC', 'NAC', 'INX',
       'OED', 'OSS', 'PSS', 'PST', 'PRE', 'SST', 'SAS', 'TSA', 'SSF',
       'SSA', 'TSS', 'UMC', 'WLD', 'AFG', 'ALB', 'DZA', 'ASM', 'AND',
       'AGO', 'ATG', 'ARG', 'ARM', 'ABW', 'AUS', 'AUT', 'AZE', 'BHS',
       'BHR', 'BGD', 'BRB', 'BLR', 'BEL', 'BLZ', 'BEN', 'BMU', 'BTN',
       'BOL', 'BIH', 'BWA', 'BRA', 'VGB', 'BRN', 'BGR', 'BFA', 'BDI',
       'CPV', 'KHM', 'CMR', 'CAN', 'CYM', 'CAF', 'TCD', 'CHI', 'CHL',
       'CHN', 'COL', 'COM', 'COD', 'COG', 'CRI', 'CIV', 'HRV', 'CUB',
       'CUW', 'CYP', 'CZE', 'DNK', 'DJI', 'DMA', 'DOM', 'ECU', 'EGY',
       'SLV', 'GNQ', 'ERI', 'EST', 'SWZ', 'ETH', 'FRO', 'FJI', 'FIN',
       'FRA', 'PYF', 'GAB', 'GMB', 'GEO', 'DEU', 'GHA', 'GIB', 'GRC',
       'GRL', 'GRD', 'GUM', 'GTM', 'GIN', 'GNB', 'GUY', 'HTI', 'HND',
       'HKG', 'HUN', 'ISL', 'IND', 'IDN', 'IRN', 'IRQ', 'IRL', 'IMN',
       'ISR', 'ITA', 'JAM', 'JPN', 'JOR', 'KAZ', 'KEN', 'KIR', 'PRK',
       'KOR', 'XKX', 'KWT', 'KGZ', 'LAO', 'LVA', 'LBN', 'LSO', 'LBR',
       'LBY', 'LIE', 'LTU', 'LUX', 'MAC', 'MDG', 'MWI', 'MYS', 'MDV',
       'MLI', 'MLT', 'MHL', 'MRT', 'MUS', 'MEX', 'FSM', 'MDA', 'MCO',
       'MNG', 'MNE', 'MAR', 'MOZ', 'MMR', 'NAM', 'NRU', 'NPL', 'NLD',
       'NCL', 'NZL', 'NIC', 'NER', 'NGA', 'MKD', 'MNP', 'NOR', 'OMN',
       'PAK', 'PLW', 'PAN', 'PNG', 'PRY', 'PER', 'PHL', 'POL', 'PRT',
       'PRI', 'QAT', 'ROU', 'RUS', 'RWA', 'WSM', 'SMR', 'STP', 'SAU',
       'SEN', 'SRB', 'SYC', 'SLE', 'SGP', 'SXM', 'SVK', 'SVN', 'SLB',
       'SOM', 'ZAF', 'SSD', 'ESP', 'LKA', 'KNA', 'LCA', 'MAF', 'VCT',
       'SDN', 'SUR', 'SWE', 'CHE', 'SYR', 'TJK', 'TZA', 'THA', 'TLS',
       'TGO', 'TON', 'TTO', 'TUN', 'TUR', 'TKM', 'TCA', 'TUV', 'UGA',
       'UKR', 'ARE', 'GBR', 'USA', 'URY', 'UZB', 'VUT', 'VEN', 'VNM',
       'VIR', 'PSE', 'YEM', 'ZMB', 'ZWE'], dtype=object)

We’ll keep only the rows of data that are for actual countries, not these groups.

data = wdi_remove_groups(data)
data
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2013 2014 2015 2016 2017 2018 2019 2020 2021 Unnamed: 66
70658 Afghanistan AFG Access to clean fuels and technologies for coo... EG.CFT.ACCS.ZS NaN NaN NaN NaN NaN NaN ... 24.800000 26.100000 27.400000 28.600000 29.700000 30.900000 31.900000 33.200000 NaN NaN
70659 Afghanistan AFG Access to clean fuels and technologies for coo... EG.CFT.ACCS.RU.ZS NaN NaN NaN NaN NaN NaN ... 9.100000 10.200000 11.100000 12.200000 13.000000 13.850000 15.100000 15.900000 NaN NaN
70660 Afghanistan AFG Access to clean fuels and technologies for coo... EG.CFT.ACCS.UR.ZS NaN NaN NaN NaN NaN NaN ... 75.700000 77.600000 78.800000 79.700000 80.900000 81.600000 82.300000 82.600000 NaN NaN
70661 Afghanistan AFG Access to electricity (% of population) EG.ELC.ACCS.ZS NaN NaN NaN NaN NaN NaN ... 68.290649 89.500000 71.500000 97.699997 97.699997 96.616135 97.699997 97.699997 NaN NaN
70662 Afghanistan AFG Access to electricity, rural (% of rural popul... EG.ELC.ACCS.RU.ZS NaN NaN NaN NaN NaN NaN ... 60.566135 86.500511 64.573357 97.099358 97.091972 95.586174 97.075630 97.066711 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
383567 Zimbabwe ZWE Women who believe a husband is justified in be... SG.VAW.REFU.ZS NaN NaN NaN NaN NaN NaN ... NaN NaN 14.500000 NaN NaN NaN NaN NaN NaN NaN
383568 Zimbabwe ZWE Women who were first married by age 15 (% of w... SP.M15.2024.FE.ZS NaN NaN NaN NaN NaN NaN ... NaN NaN 3.700000 NaN NaN NaN 5.418352 NaN NaN NaN
383569 Zimbabwe ZWE Women who were first married by age 18 (% of w... SP.M18.2024.FE.ZS NaN NaN NaN NaN NaN NaN ... NaN 33.500000 32.400000 NaN NaN NaN 33.658057 NaN NaN NaN
383570 Zimbabwe ZWE Women's share of population ages 15+ living wi... SH.DYN.AIDS.FE.ZS NaN NaN NaN NaN NaN NaN ... 59.200000 59.400000 59.500000 59.700000 59.900000 60.000000 60.200000 60.400000 NaN NaN
383571 Zimbabwe ZWE Young people (ages 15-24) newly infected with HIV SH.HIV.INCD.YG NaN NaN NaN NaN NaN NaN ... 18000.000000 17000.000000 15000.000000 14000.000000 12000.000000 9700.000000 9600.000000 7500.000000 NaN NaN

312914 rows × 67 columns

Choose a Year#

wdi_show_by_year(data)
Total rows: 312914
Year	 Entries
1960 	 29778
1961 	 33817
1962 	 35333
1963 	 35610
1964 	 36130
1965 	 38030
1966 	 38102
1967 	 39245
1968 	 39257
1969 	 39951
1970 	 52976
1971 	 59137
1972 	 61404
1973 	 61258
1974 	 62488
1975 	 65546
1976 	 67666
1977 	 70591
1978 	 70571
1979 	 71217
1980 	 75353
1981 	 77216
1982 	 78029
1983 	 77949
1984 	 78507
1985 	 79619
1986 	 80305
1987 	 81350
1988 	 80466
1989 	 81708
1990 	 102450
1991 	 108469
1992 	 112714
1993 	 113218
1994 	 115012
1995 	 121963
1996 	 121754
1997 	 123608
1998 	 124563
1999 	 129907
2000 	 152109
2001 	 147025
2002 	 151785
2003 	 151335
2004 	 155793
2005 	 165272
2006 	 165659
2007 	 169506
2008 	 168367
2009 	 169741
2010 	 180751
2011 	 175760
2012 	 178712
2013 	 174605
2014 	 180448
2015 	 177736
2016 	 176842
2017 	 175140
2018 	 171100
2019 	 158408
2020 	 127080
2021 	 52770

Let’s use 2018. It’s recent, but has more data than the more recent years.

year = 2018

Limit to this Year and Pivot the Table#

With our chosen year, we can limit the data to a single indicator per country per year.

Once we do that, we can pivot the table, so that each indicator will be its own column.

df = wdi_pivot(data, year=year)
df
Indicator Code AG.CON.FERT.PT.ZS AG.CON.FERT.ZS AG.LND.AGRI.K2 AG.LND.AGRI.ZS AG.LND.ARBL.HA AG.LND.ARBL.HA.PC AG.LND.ARBL.ZS AG.LND.CREL.HA AG.LND.CROP.ZS AG.LND.FRST.K2 ... per_sa_allsa.cov_q4_tot per_sa_allsa.cov_q5_tot per_si_allsi.adq_pop_tot per_si_allsi.ben_q1_tot per_si_allsi.cov_pop_tot per_si_allsi.cov_q1_tot per_si_allsi.cov_q2_tot per_si_allsi.cov_q3_tot per_si_allsi.cov_q4_tot per_si_allsi.cov_q5_tot
Country Code
ABW 0.000000 0.000000 20.00 11.111111 2000.0 0.018895 11.111111 0 0.000000 4.2 ... 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
AFG 369.324810 7.650676 379190.00 58.081365 7703000.0 0.207226 11.798854 1912634 0.330852 12084.4 ... 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
AGO 0.000000 7.930094 569524.90 45.682594 4900000.0 0.159040 3.930376 3245206 0.252667 677175.1 ... 17.259922 14.61893 54.712662 0.821459 3.632175 0.718997 2.145329 3.193753 4.284947 7.815573
ALB 0.000000 66.585076 11740.81 42.849672 611346.0 0.213282 22.311898 140110 3.089562 7889.0 ... 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
AND 0.000000 0.000000 188.30 40.063830 830.0 0.010778 1.765957 0 0.000000 160.0 ... 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
XKX 0.000000 0.000000 0.00 0.000000 0.0 0.000000 0.000000 0 0.000000 0.0 ... 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
YEM 0.000000 3.832632 233877.00 44.297403 1097700.0 0.038518 2.079095 630061 0.549274 5490.0 ... 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ZAF 238.147139 72.833333 963410.00 79.417850 12000000.0 0.207639 9.892094 3034761 0.340453 171228.9 ... 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ZMB 0.000000 52.510934 238360.00 32.063923 3800000.0 0.218999 5.111718 1208016 0.048427 451904.6 ... 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ZWE 345.495495 38.350000 162000.00 41.876696 4000000.0 0.277031 10.339925 1641701 0.258498 175367.2 ... 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

217 rows × 1286 columns

If you prefer, you can use more human-readable options for the countries and indicators.

df = wdi_pivot(data, year=year,index_column='Country Name', pivot_column='Indicator Name')
df
Indicator Name ARI treatment (% of children under 5 taken to a health provider) Access to clean fuels and technologies for cooking (% of population) Access to clean fuels and technologies for cooking, rural (% of rural population) Access to clean fuels and technologies for cooking, urban (% of urban population) Access to electricity (% of population) Access to electricity, rural (% of rural population) Access to electricity, urban (% of urban population) Adequacy of social insurance programs (% of total welfare of beneficiary households) Adequacy of social protection and labor programs (% of total welfare of beneficiary households) Adequacy of social safety net programs (% of total welfare of beneficiary households) ... Women who believe a husband is justified in beating his wife (any of five reasons) (%) Women who believe a husband is justified in beating his wife when she argues with him (%) Women who believe a husband is justified in beating his wife when she burns the food (%) Women who believe a husband is justified in beating his wife when she goes out without telling him (%) Women who believe a husband is justified in beating his wife when she neglects the children (%) Women who believe a husband is justified in beating his wife when she refuses sex with him (%) Women who were first married by age 15 (% of women ages 20-24) Women who were first married by age 18 (% of women ages 20-24) Women's share of population ages 15+ living with HIV (%) Young people (ages 15-24) newly infected with HIV
Country Name
Afghanistan 67.7 30.9 13.85 81.6 96.616135 95.586174 99.626022 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 28.9 500
Albania 82.4 79.8 62.50 92.2 100.000000 100.000000 100.000000 0.0 0.0 0.0 ... 6.8 1.8 0.8 3.7 5.2 0.9 1.4 11.8 27.2 100
Algeria 0.0 99.6 98.70 100.0 99.697838 99.071304 99.933952 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 45.5 200
American Samoa 0.0 0.0 0.00 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
Andorra 0.0 100.0 100.00 100.0 100.000000 100.000000 100.000000 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Virgin Islands (U.S.) 0.0 0.0 0.00 0.0 100.000000 100.000000 100.000000 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
West Bank and Gaza 0.0 0.0 0.00 0.0 100.000000 100.000000 100.000000 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
Yemen, Rep. 0.0 61.5 42.90 93.7 62.000000 0.000000 0.000000 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 37.5 200
Zambia 76.0 11.9 2.10 24.6 40.317890 12.466436 76.461876 0.0 0.0 0.0 ... 45.1 32.2 21.3 26.0 31.3 29.5 5.2 29.0 61.2 29000
Zimbabwe 0.0 29.9 6.40 79.1 45.572647 26.617121 85.468765 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 60.0 9700

217 rows × 1286 columns

Done#

We now have a dataset we can work with.

  • Only actual countries are included (not groupings like “High income”.

  • Only one year of data is included.

  • Each country gets one row, and each indicator gets one column.