I recently saw this graph I thought it would be a good exercise to try and reproduce a similar graph from online-available data (graph source: J.-M. Jancovici):

Of course, it would be too easy if we could find the actual data in the correct format…

Here, we will try reproducing the following graphs from CO2_emission.csv, energy-use-per-capita.csv, country-and-continent.csv and tot_population.csv (sources: CO2, Energy, Population, countries and continents).

You can get help on dataset merging here.

Data wrangling

  • Load the different datasets in tibbles named CO2, continent, pop and energy
    • In the case of continent, we care only about the 1st and 5th columns, respectively “Continent”, and “Code”
  • Make sure all tibbles are tidy, and that the column types (double, character…) are set correctly.
  • Merge all tibbles into a single one using inner_join
  • We are only interested in the columns ‘Code’, ‘Year’, ‘CO2’, ‘Continent’, ‘Energy’, ‘Country’ and ‘Population’. This tibble should look like the tibble printed below
  • Create a vector of the countries you are interested in (e.g. EU…)
  • Create a tibble ave containing the averaged CO2 emission, Energy consumption and Population for each country of this group of countries
## # A tibble: 5,990 × 7
##    Code   Year   CO2 Continent Energy Country Population
##    <chr> <dbl> <dbl> <chr>      <dbl> <chr>        <dbl>
##  1 ALB    1971  1.99 Europe     9131. Albania    2187853
##  2 ALB    1972  2.52 Europe    10067. Albania    2243126
##  3 ALB    1973  2.30 Europe     8870. Albania    2296752
##  4 ALB    1974  1.85 Europe     9036. Albania    2350124
##  5 ALB    1975  1.91 Europe     9617. Albania    2404831
##  6 ALB    1976  2.01 Europe    10362. Albania    2458526
##  7 ALB    1977  2.28 Europe    10743. Albania    2513546
##  8 ALB    1978  2.53 Europe    11756. Albania    2566266
##  9 ALB    1979  2.90 Europe    10051. Albania    2617832
## 10 ALB    1980  1.94 Europe    13369. Albania    2671997
## # … with 5,980 more rows

Plotting

  • Try reproducing the following plots. The following graphs are for countries within the EU (as of 2019). Make it for the continent of your origin.
    • Don’t bother with the text labels first
    • Try adding them using the library ggrepel