R Exercises - CO2 emissions
I recently saw this graph I thought it would be a good exercise to try and reproduce a similar graph from online-available data (graph source: J.-M. Jancovici):
Of course, it would be too easy if we could find the actual data in the correct format…
Here, we will try reproducing the following graphs from CO2_emission.csv, energy-use-per-capita.csv, country-and-continent.csv and tot_population.csv (sources: CO2, Energy, Population, countries and continents).
You can get help on dataset merging here.
Data wrangling
- Load the different datasets in tibbles named
CO2
,continent
,pop
andenergy
- In the case of
continent
, we care only about the 1st and 5th columns, respectively “Continent”, and “Code”
- In the case of
- Make sure all tibbles are tidy, and that the column types (double, character…) are set correctly.
- Merge all tibbles into a single one using
inner_join
- We are only interested in the columns ‘Code’, ‘Year’, ‘CO2’, ‘Continent’, ‘Energy’, ‘Country’ and ‘Population’. This tibble should look like the tibble printed below
- Create a vector of the countries you are interested in (e.g. EU…)
- Create a tibble
ave
containing the averaged CO2 emission, Energy consumption and Population for each country of this group of countries
library(tidyverse)
# CO2 data
CO2 <- read_csv("Data/CO2_emission.csv", na="..")[,-c(1,2)]
names(CO2) <- c("Country","Code",1960:2018)
CO2 <- CO2 |> pivot_longer(cols="1960":"2018",
names_to="Year",
values_to="CO2",
names_transform = list(Year = as.numeric),
values_drop_na=TRUE)
# Attribute the correct continent
continent <- read_csv("Data/country-and-continent.csv")[,c(1,5)]
names(continent) <- c("Continent","Code")
CO2 <- inner_join(CO2, continent)
# Population
pop <- read_csv("Data/tot_population.csv", na="..")
names(pop) <- c("Country","Code","Year","Population")
# Energy data
energy <- read_csv("Data/energy-use-per-capita.csv")
names(energy) <- c("Country", "Code", "Year", "Energy")
# Merge data
DF <- CO2
DF <- inner_join(DF, energy, by=c("Code","Year"))
DF <- inner_join(DF, pop, by=c("Code","Year"))
DF <- DF |> select(-Country.x, -Country.y)
DF
# A tibble: 5,990 × 7
Code Year CO2 Continent Energy Country Population
<chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
1 ALB 1971 1.99 Europe 9131. Albania 2187853
2 ALB 1972 2.52 Europe 10067. Albania 2243126
3 ALB 1973 2.30 Europe 8870. Albania 2296752
4 ALB 1974 1.85 Europe 9036. Albania 2350124
5 ALB 1975 1.91 Europe 9617. Albania 2404831
6 ALB 1976 2.01 Europe 10362. Albania 2458526
7 ALB 1977 2.28 Europe 10743. Albania 2513546
8 ALB 1978 2.53 Europe 11756. Albania 2566266
9 ALB 1979 2.90 Europe 10051. Albania 2617832
10 ALB 1980 1.94 Europe 13369. Albania 2671997
# ℹ 5,980 more rows
# EU countries
EU <- c("Austria","Italy","Belgium","Latvia","Bulgaria","Lithuania","Croatia",
"Luxembourg","Cyprus","Malta","Czechia","Netherlands","Denmark","Poland",
"Estonia","Portugal","Finland","Romania","France","Slovakia","Germany",
"Slovenia","Greece","Spain","Hungary","Sweden","Ireland","United Kingdom")
# Averaged values for the past 20 years
this_year <- as.numeric(format(Sys.time(), '%Y'))
ave <- DF |> filter(Country %in% EU & Year>=this_year-20) |>
group_by(Country) |>
summarise(CO2 = mean(CO2, na.rm =TRUE),
Energy = mean(Energy, na.rm =TRUE),
Population = mean(Population, na.rm =TRUE)
)
Plotting
- Try reproducing the following plots. The following graphs are for countries within the EU (as of 2019). Make it for the continent of your origin.
- Don’t bother with the text labels first
- Try adding them using the library
ggrepel
# Plotting
library(ggplot2)
library(ggrepel)
p1 <- ggplot(data=subset(DF, Country%in%EU),
aes(x=Energy, y=CO2, col=Country)
)+
lims(y=c(0,15), x=c(0,80e3))+
scale_colour_discrete(guide = FALSE) +
geom_point(alpha=0.1, aes(size=Population/1e6))+
scale_size(name="Population (millions)")+
geom_label_repel(data=ave, show.legend=FALSE, segment.size = 0.5,
force=30,
aes(x=Energy, y=CO2, col=Country, label = Country))+
geom_point(data=ave, alpha=0.9,
aes(x=Energy, y=CO2, col=Country, size=Population/1e6))+
labs(x="Energy consumption [kWh/capita]",
y="CO2 emission [ton/capita]")+
theme_bw()+
theme(legend.position = "top")
p2 <- ggplot(data=subset(DF, Country%in%EU),
aes(x=Energy*Population/1e9, y=CO2*Population/1e9, col=Country)
)+
lims(y=c(0,1))+
scale_colour_discrete(guide = FALSE) +
geom_point(alpha=0.1, aes(size=Population/1e6))+
scale_size(name="Population (millions)")+
geom_label_repel(data=ave, show.legend=FALSE, segment.size = 0.5,
aes(x=Energy*Population/1e9, y=CO2*Population/1e9,
col=Country, label=Country))+
geom_point(data=ave,alpha=0.9,
aes(x=Energy*Population/1e9, y=CO2*Population/1e9,
col=Country, size=Population/1e6))+
labs(x="Total energy consumption [TWh]",
y="Total CO2 emission [Gton]")+
theme_bw()+
theme(legend.position = "top")
p1
p2