R Exercises - Religion and babies
In these exercises we will see the power of the libraries ggplot2
and plotly
to make sense of statistical data. The goal is to reproduce the moving chart that you can see in this video from Hans Rosling – I invite you to watch his other videos, they are quite enlightening and inspiring:
For this, we will need to gather the data:
- From Gapminder, data per country and per year from 1800 to 2018:
- From the PEW research center, data per country:
Data handling
The first thing to do is to load and regroup all these datasets into a single one.
Load the
tidyverse
library and, usingread_csv()
, load the 4 datasets in 4 separate tibbles calledchildren
,income
,pop
andreligion
.To reproduce the chart on the video, we need to determine the dominant religion in each country. In the
religion
dataset, add a columnReligion
that will give the name of the dominant religion for each country. For this, you need to make the table contain just theCountry
and all religions, make the table tidy, and then select the religion with the highest proportion for each country. We will filter the data to get only the year 2020.Using
pivot_longer()
, make all datasets tidy.
-
children
should now contain 3 columns:Country
,Year
andFertility
. -
income
should now contain 3 columns:Country
,Year
andIncome
. -
pop
should now contain 3 columns:Country
,Year
andPopulation
.
We will only consider data from 1900 to 2018. Example of syntax using the pipe operator |>
:
DF <- read_table("name 2010 2011 2012 2014
Kevin 10 11 12 123
Jane 122 56 23 4
"
)
DF
# A tibble: 2 × 5
name `2010` `2011` `2012` `2014`
<chr> <dbl> <dbl> <dbl> <dbl>
1 Kevin 10 11 12 123
2 Jane 122 56 23 4
DF |>
select(name, '2010':'2012') |>
pivot_longer(col = -name,
names_to = "Year",
values_to = "Score",
names_transform = list(Year = as.numeric))
# A tibble: 6 × 3
name Year Score
<chr> <dbl> <dbl>
1 Kevin 2010 10
2 Kevin 2011 11
3 Kevin 2012 12
4 Jane 2010 122
5 Jane 2011 56
6 Jane 2012 23
The line names_transform = list(Year = as.numeric)
is here to convert the character year values to numerical values.
- Now we want to combine all these datasets into a single one called
dat
, containing the columnsCountry
,Year
,Population
,Religion
,Fertility
andIncome
. Look into theinner_join()
function of thedplyr
library (which is part of thetidyverse
library).
You should end up with a dataset like this one:
# A tibble: 20,587 × 6
Country Year Fertility Income Population Religion
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 Afghanistan 1900 7 793 5020000 Muslims
2 Afghanistan 1901 7 796 5050000 Muslims
3 Afghanistan 1902 7 798 5090000 Muslims
4 Afghanistan 1903 7 801 5120000 Muslims
5 Afghanistan 1904 7 804 5150000 Muslims
6 Afghanistan 1905 7 807 5180000 Muslims
7 Afghanistan 1906 7 809 5220000 Muslims
8 Afghanistan 1907 7 812 5250000 Muslims
9 Afghanistan 1908 7 815 5280000 Muslims
10 Afghanistan 1909 7 818 5320000 Muslims
# ℹ 20,577 more rows
In case you struggled to get there, download the archive with the button at the top and get the dat
tibble with dat <- read_csv("Data/dat.csv")
.
Now our dataset is ready, let’s plot it.
Plotting
Load the library
ggplot2
and set the global theme totheme_bw()
usingtheme_set()
Create a subset of
dat
concerning your origin country. For me it will bedat_france
Plot the evolution of the income per capita and the number of children per woman as a function of the years, and make it look like that (notice the kinks during the two world wars):
Create a subset of
dat
containing the data for your country plus all the neighbor countries (if you come from an island, the nearest countries…). For me,dat_france_region
will contain data from France, Spain, Italy, Switzerland, Germany, Luxembourg and Belgium.Plot again income and fertility as a function of the years, but add a color corresponding to the country and a point size to its population:
- Load the library
plotly
and make the previous graphs interactive. You can make an interactive graph by callingggplotly()
, like that:
library(plotly)
P <- ggplot(data = dat_france, aes(x=Population, y=Income))+
geom_point()
ggplotly(P) # add dynamicTicks=TRUE allows redrawing ticks when zooming in
- Finally, you can add a slider to the interactive graph allowing selecting a value for another variable (just like in the video) by adding the keyword
frame =
in the chart’s aesthetics. So now, make the graph of the video ! (you can also add the aestheticsid=Country
to show the country name in the popup when hovering on a point).
Optionally, you can try working with the gganimate
library to make an animated graph. Here is a tutorial to get you started.