R Exercises - CO2 emissions


I recently saw this graph I thought it would be a good exercise to try and reproduce a similar graph from online-available data (graph source: J.-M. Jancovici):

Of course, it would be too easy if we could find the actual data in the correct format…

Here, we will try reproducing the following graphs from CO2_emission.csv, energy-use-per-capita.csv, country-and-continent.csv and tot_population.csv (sources: CO2, Energy, Population, countries and continents).

You can get help on dataset merging here.

Data wrangling

  • Load the different datasets in tibbles named CO2, continent, pop and energy
    • In the case of continent, we care only about the 1st and 5th columns, respectively “Continent”, and “Code”
  • Make sure all tibbles are tidy, and that the column types (double, character…) are set correctly.
  • Merge all tibbles into a single one using inner_join
  • We are only interested in the columns ‘Code’, ‘Year’, ‘CO2’, ‘Continent’, ‘Energy’, ‘Country’ and ‘Population’. This tibble should look like the tibble printed below
  • Create a vector of the countries you are interested in (e.g. EU…)
  • Create a tibble ave containing the averaged CO2 emission, Energy consumption and Population for each country of this group of countries
library(tidyverse)
# CO2 data
CO2 <- read_csv("Data/CO2_emission.csv", na="..")[,-c(1,2)]
names(CO2) <- c("Country","Code",1960:2018)
CO2 <- CO2 |> pivot_longer(cols="1960":"2018", 
                            names_to="Year", 
                            values_to="CO2",
                            names_transform = list(Year = as.numeric),
                            values_drop_na=TRUE)
# Attribute the correct continent
continent <- read_csv("Data/country-and-continent.csv")[,c(1,5)]
names(continent) <- c("Continent","Code")
CO2 <- inner_join(CO2, continent)
# Population
pop <- read_csv("Data/tot_population.csv", na="..")
names(pop) <- c("Country","Code","Year","Population")
# Energy data
energy <- read_csv("Data/energy-use-per-capita.csv")
names(energy) <- c("Country", "Code", "Year", "Energy")
# Merge data
DF <- CO2
DF <- inner_join(DF, energy, by=c("Code","Year"))
DF <- inner_join(DF, pop, by=c("Code","Year"))
DF <- DF |> select(-Country.x, -Country.y)
DF
# A tibble: 5,990 × 7
   Code   Year   CO2 Continent Energy Country Population
   <chr> <dbl> <dbl> <chr>      <dbl> <chr>        <dbl>
 1 ALB    1971  1.99 Europe     9131. Albania    2187853
 2 ALB    1972  2.52 Europe    10067. Albania    2243126
 3 ALB    1973  2.30 Europe     8870. Albania    2296752
 4 ALB    1974  1.85 Europe     9036. Albania    2350124
 5 ALB    1975  1.91 Europe     9617. Albania    2404831
 6 ALB    1976  2.01 Europe    10362. Albania    2458526
 7 ALB    1977  2.28 Europe    10743. Albania    2513546
 8 ALB    1978  2.53 Europe    11756. Albania    2566266
 9 ALB    1979  2.90 Europe    10051. Albania    2617832
10 ALB    1980  1.94 Europe    13369. Albania    2671997
# ℹ 5,980 more rows
# EU countries
EU <- c("Austria","Italy","Belgium","Latvia","Bulgaria","Lithuania","Croatia",
        "Luxembourg","Cyprus","Malta","Czechia","Netherlands","Denmark","Poland",
        "Estonia","Portugal","Finland","Romania","France","Slovakia","Germany",
        "Slovenia","Greece","Spain","Hungary","Sweden","Ireland","United Kingdom")
# Averaged values for the past 20 years
this_year <- as.numeric(format(Sys.time(), '%Y'))
ave <- DF |> filter(Country %in% EU & Year>=this_year-20) |>
         group_by(Country) |>
         summarise(CO2        = mean(CO2, na.rm =TRUE),
                   Energy     = mean(Energy, na.rm =TRUE),
                   Population = mean(Population, na.rm =TRUE)
                   )

Plotting

  • Try reproducing the following plots. The following graphs are for countries within the EU (as of 2019). Make it for the continent of your origin.
    • Don’t bother with the text labels first
    • Try adding them using the library ggrepel
# Plotting
library(ggplot2)
library(ggrepel)
p1 <- ggplot(data=subset(DF, Country%in%EU), 
             aes(x=Energy, y=CO2, col=Country)
            )+
    lims(y=c(0,15), x=c(0,80e3))+
    scale_colour_discrete(guide = FALSE) +
    geom_point(alpha=0.1, aes(size=Population/1e6))+
    scale_size(name="Population (millions)")+
    geom_label_repel(data=ave, show.legend=FALSE, segment.size  = 0.5,
              force=30,
              aes(x=Energy, y=CO2, col=Country, label = Country))+
    geom_point(data=ave, alpha=0.9, 
               aes(x=Energy, y=CO2, col=Country, size=Population/1e6))+
    labs(x="Energy consumption [kWh/capita]", 
         y="CO2 emission [ton/capita]")+
    theme_bw()+
    theme(legend.position = "top")
p2 <- ggplot(data=subset(DF, Country%in%EU), 
             aes(x=Energy*Population/1e9, y=CO2*Population/1e9, col=Country)
            )+
    lims(y=c(0,1))+
    scale_colour_discrete(guide = FALSE) +
    geom_point(alpha=0.1, aes(size=Population/1e6))+
    scale_size(name="Population (millions)")+
    geom_label_repel(data=ave, show.legend=FALSE, segment.size  = 0.5,
              aes(x=Energy*Population/1e9, y=CO2*Population/1e9, 
                  col=Country, label=Country))+
    geom_point(data=ave,alpha=0.9, 
               aes(x=Energy*Population/1e9, y=CO2*Population/1e9, 
                   col=Country, size=Population/1e6))+
    labs(x="Total energy consumption [TWh]",
         y="Total CO2 emission [Gton]")+
    theme_bw()+
    theme(legend.position = "top")
p1

p2