This exercise is based on data from a PhD student in cognitive psychology. In his work, this student collects response times (RT) to two simultaneous tasks, and then he has to analyse the response times.

  • At \(t_0\), the stimulus S1 is triggered
  • At \(t_1=t_0 + SOA\), the stimulus S2 is triggered (SOA is the time between the 2 stimuli)
  • At \(t_2=t_0 + RT1\), the subject responds to stimulus S1
  • At \(t_3=t_1 + RT2\), the subject responds to stimulus S2

The inter-response interval (IRI) is defined as the time difference between \(t_3\) and \(t_2\), thus \(IRI=SOA+RT2-RT1\). Negatives IRI therefore mean that the response to S2 is emitted before the response to S1.

For theoretical reasons, one can consider that if \(SOA=1500\) ms, both stimulis are answered independently. One can thus use the RT1 and RT2 values for SOA=1500 ms to estimate the IRI in the case of independent responses to the stimuli.


1 Data wrangling

  • Download the datafiles archive and unzip it, then create a R project in this folder in Rstudio.
  • Load the tidyverse and patchwork packages:
library(tidyverse)
library(patchwork)
  • Load the .csv file in the Data folder and save it into raw_data. Look into the help of read_csv() to help you get rid of the error.
raw_data <- read_csv2("Data/data exemple.csv", show_col_types = FALSE)
  • Using successive pipe operations, we will now create a dataclean table from raw_data, where we:
    • Filter raw_data so that Procedure[Trial] is only equal to "EssaisDT"
    • Mutate the table by adding a column IRI containing SOAdur + S2Visuel.RT - S1Audio.RT
    • Filter the table so that some extreme values are excluded:
      • S1Audio.RT is smaller or equal to 2500 and larger or equal to 100
      • S2Visuel.RT is smaller or equal to 2500 and larger or equal to 100
      • S1Audio.ACC, S1response.ACC and S2Visuel.ACC are equal to 1
dataclean <- raw_data %>% 
    filter(`Procedure[Trial]` == "EssaisDT") %>%
    mutate(IRI = SOAdur + S2Visuel.RT - S1Audio.RT) %>% 
    filter(S1Audio.RT <= 2500 & S1Audio.RT >= 100 &
        S2Visuel.RT <= 2500 & S2Visuel.RT >= 100 &
        S1Audio.ACC == 1 & S1response.ACC == 1 & S2Visuel.ACC == 1)
  • Let’s compute the simulated values in the case the responses to the stimuli are independent, i.e. using SOA=1500ms, and save it into a tibble called IRI_sim. Using successive pipe operations and starting from dataclean:
    • Filter rows so that SOAdur is equal to 1500
    • Delete the SOAdur and IRI columns
    • Mutate the table to add 3 columns IRI_sim_xx, where xx=15, 65 or 250 and IRI_sim_xx = xx + S2Visuel.RT - S1Audio.RT.
    • Using pivot_longer() and the options names_prefix = "IRI_sim_", names_to = "SOAdur", values_to = "IRI_sim", pivot the columns containing "IRI_sim_" into a long table (you need to add the option to select the corresponding columns).
IRI_sim <- dataclean %>%
    filter(SOAdur == 1500) %>%
    select(-c(SOAdur, IRI)) %>%
    mutate(
        IRI_sim_15 = 15 + S2Visuel.RT - S1Audio.RT,
        IRI_sim_65 = 65 + S2Visuel.RT - S1Audio.RT,
        IRI_sim_250 = 250 + S2Visuel.RT - S1Audio.RT
    ) %>%
    pivot_longer(
        cols = contains("IRI_sim_"),
        names_prefix = "IRI_sim_",
        names_to = "SOAdur",
        values_to = "IRI_sim"
    )
  • We want now to get the averaged IRI per subject and per SOA, and its standard deviation. Using group_by() and summarise(), store the mean and standard deviation of IRI in a table called stats_obs, starting from dataclean. It should look like this:
(stats_obs <- dataclean %>%
    group_by(Subject, SOAdur) %>%
    summarise(
        mean = mean(IRI),
        sd = sd(IRI)
    ))
## # A tibble: 24 × 4
## # Groups:   Subject [6]
##    Subject SOAdur    mean    sd
##      <dbl>  <dbl>   <dbl> <dbl>
##  1       1     15  -64.8   291.
##  2       1     65   20.8   297.
##  3       1    250  218.    217.
##  4       1   1500 1019.    228.
##  5       2     15 -159.    193.
##  6       2     65 -156.    236.
##  7       2    250    4.42  191.
##  8       2   1500 1184.    198.
##  9       3     15 -289.    207.
## 10       3     65 -251.    262.
## # … with 14 more rows
  • We want to do the same for the 3 simulated IRI.
(stats_sim <- IRI_sim %>%
    group_by(Subject, SOAdur) %>%
    summarise(
        mean = mean(IRI_sim),
        sd = sd(IRI_sim)
    ))
## # A tibble: 18 × 4
## # Groups:   Subject [6]
##    Subject SOAdur   mean    sd
##      <dbl> <chr>   <dbl> <dbl>
##  1       1 15     -466.   228.
##  2       1 250    -231.   228.
##  3       1 65     -416.   228.
##  4       2 15     -301.   198.
##  5       2 250     -65.9  198.
##  6       2 65     -251.   198.
##  7       3 15      -99.0  157.
##  8       3 250     136.   157.
##  9       3 65      -49.0  157.
## 10       4 15     -168.   214.
## 11       4 250      66.9  214.
## 12       4 65     -118.   214.
## 13       5 15     -367.   249.
## 14       5 250    -132.   249.
## 15       5 65     -317.   249.
## 16       6 15     -313.   161.
## 17       6 250     -77.9  161.
## 18       6 65     -263.   161.

2 Plotting

  • We want now to produce a graph showing the histograms of the observed IRI column using ggplot2.
    • Create a ggplot using the dataclean dataset
    • Set the aesthetics to x = IRI
    • Create the histograms using geom_histogram(), with a fill color depending on SOAdur
    • Arrange the plots on a grid depending on the Subject column using facet_wrap()
    • Add a vertical lign marking the average value for each subject using geom_vline(). The data for these lines are stored in the stats_obs dataset.
    • Play with the theme and other ggplot commands to make the plot look like the one below
dataclean %>% 
    ggplot(aes(x = IRI, fill = factor(SOAdur))) +
    geom_histogram(aes(y = stat(count / sum(count))), bins=20) +
    scale_y_continuous(labels = scales::percent_format())+
    facet_wrap(~ paste("Subject", Subject)) +
    geom_vline(data = stats_obs, aes(xintercept = mean, color=factor(SOAdur)), lty = 2, show.legend = FALSE) +
    scale_x_continuous(limits = c(-2000,2000))+
    labs(title = "Observations",
         x = "IRI [ms]",
         y = "Occurence", 
         fill="SOA [ms]") +
    theme_bw() +
    theme(strip.background = element_rect(fill = "transparent", colour = NA),
          strip.text = element_text(face = "bold"),
          axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

  • Let’s do the same for the simulated dataset. It should look like this:
IRI_sim %>% 
    ggplot(aes(x = IRI_sim, fill = factor(SOAdur, levels=c(15, 65, 250)))) +
    geom_histogram(aes(y = stat(count / sum(count))), bins = 20) +
    scale_y_continuous(labels = scales::percent_format()) +
    facet_wrap(~ paste("Subject", Subject)) +
    geom_vline(data = stats_sim, aes(xintercept = mean, color=factor(SOAdur)), lty = 2, show.legend = FALSE) +
    scale_x_continuous(limits = c(-2000,2000))+
    labs(
        title = "Simulations",
        x = "IRI [ms]",
        y = "Occurence",
        fill = "SOA [ms]"
    ) +
    theme_bw() +
    theme(
        strip.background = element_rect(fill = "transparent", colour = NA),
        strip.text = element_text(face = "bold"),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
    )

---
title : "R Exercises - Cognitive Psychology"
date  : "`r Sys.Date()`"
output: 
    html_document:
        toc            : true
        toc_float      : true
        toc_depth      : 4
        highlight      : tango
        number_sections: true
        code_download  : TRUE
params: 
    solution:
        value: TRUE
---

<style type="text/css">
blockquote {
  background: #E9F9FF;
  border-left: 5px solid #026086;
  margin: 1.5em 10px;
  padding: 0.5em 10px;
  font-size: 1em;
}
</style>

```{r echo=FALSE, warning=FALSE, message=FALSE, fig.align="center"}
library(downloadthis)
download_link(
  link = "./Archive.zip",
  output_name = "Data Files",
  button_label = "Download Data Files",
  button_type = "default",
  has_icon = TRUE,
  icon = "fa fa-save",
  self_contained = FALSE
)
```
<br>

```{r include=FALSE}
library(knitr)
knitr::opts_chunk$set(cache = FALSE, out.width='100%', warnings=FALSE, message=FALSE)
options(width = 80)
```

----

This exercise is based on data from a PhD student in cognitive psychology. In his work, this student collects response times (RT) to two simultaneous tasks, and then he has to analyse the response times.

- At $t_0$, the stimulus S1 is triggered
- At $t_1=t_0 + SOA$, the stimulus S2 is triggered (SOA is the time between the 2 stimuli)
- At $t_2=t_0 + RT1$, the subject responds to stimulus S1
- At $t_3=t_1 + RT2$, the subject responds to stimulus S2

The inter-response interval (IRI) is defined as the time difference between $t_3$ and $t_2$, thus $IRI=SOA+RT2-RT1$. Negatives IRI therefore mean that the response to S2 is emitted before the response to S1.

For theoretical reasons, one can consider that if $SOA=1500$ ms, both stimulis are answered independently. One can thus use the RT1 and RT2 values for SOA=1500 ms to estimate the IRI in the case of independent responses to the stimuli.

---

# Data wrangling

- Download the datafiles <a href="Archive.zip" download target="_blank">archive</a> and unzip it, then create a R project in this folder in Rstudio.
- Load the `tidyverse` and `patchwork` packages:

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
library(tidyverse)
library(patchwork)
```

- Load the .csv file in the `Data` folder and save it into `raw_data`. Look into the help of `read_csv()`{.R} to help you get rid of the error.

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
raw_data <- read_csv2("Data/data exemple.csv", show_col_types = FALSE)
```

- Using successive **pipe operations**, we will now create a `dataclean` table from `raw_data`, where we:
    - **Filter** `raw_data` so that `Procedure[Trial]` is only equal to `"EssaisDT"`
    - **Mutate** the table by adding a column `IRI` containing `SOAdur + S2Visuel.RT - S1Audio.RT`
    - **Filter** the table so that some extreme values are excluded:
        - `S1Audio.RT` is smaller or equal to 2500 and larger or equal to 100
        - `S2Visuel.RT` is smaller or equal to 2500 and larger or equal to 100
        - `S1Audio.ACC`, `S1response.ACC` and `S2Visuel.ACC` are equal to 1

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
dataclean <- raw_data %>% 
    filter(`Procedure[Trial]` == "EssaisDT") %>%
    mutate(IRI = SOAdur + S2Visuel.RT - S1Audio.RT) %>% 
    filter(S1Audio.RT <= 2500 & S1Audio.RT >= 100 &
        S2Visuel.RT <= 2500 & S2Visuel.RT >= 100 &
        S1Audio.ACC == 1 & S1response.ACC == 1 & S2Visuel.ACC == 1)
```

- Let's compute the simulated values in the case the responses to the stimuli are independent, _i.e._ using SOA=1500ms, and save it into a tibble called `IRI_sim`. Using successive pipe operations and starting from `dataclean`:
    - **Filter** rows so that `SOAdur` is equal to 1500
    - Delete the `SOAdur` and `IRI` columns
    - **Mutate** the table to add 3 columns `IRI_sim_xx`, where `xx`=15, 65 or 250 and `IRI_sim_xx = xx + S2Visuel.RT - S1Audio.RT`.
    - Using `pivot_longer()`{.R} and the options `names_prefix = "IRI_sim_", names_to = "SOAdur", values_to = "IRI_sim"`, pivot the columns containing `"IRI_sim_"` into a long table (you need to add the option to select the corresponding columns).

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
IRI_sim <- dataclean %>%
    filter(SOAdur == 1500) %>%
    select(-c(SOAdur, IRI)) %>%
    mutate(
        IRI_sim_15 = 15 + S2Visuel.RT - S1Audio.RT,
        IRI_sim_65 = 65 + S2Visuel.RT - S1Audio.RT,
        IRI_sim_250 = 250 + S2Visuel.RT - S1Audio.RT
    ) %>%
    pivot_longer(
        cols = contains("IRI_sim_"),
        names_prefix = "IRI_sim_",
        names_to = "SOAdur",
        values_to = "IRI_sim"
    )
```

- We want now to get the averaged `IRI` per subject and per SOA, and its standard deviation. Using `group_by()`{.R} and `summarise()`{.R}, store the mean and standard deviation of `IRI` in a table called `stats_obs`, starting from `dataclean`. It should look like this:

```{r echo=TRUE, warning = FALSE, message=FALSE, cache=FALSE}
(stats_obs <- dataclean %>%
    group_by(Subject, SOAdur) %>%
    summarise(
        mean = mean(IRI),
        sd = sd(IRI)
    ))
```

- We want to do the same for the 3 simulated IRI. 
```{r echo=TRUE, warning = FALSE, message=FALSE, cache=FALSE}
(stats_sim <- IRI_sim %>%
    group_by(Subject, SOAdur) %>%
    summarise(
        mean = mean(IRI_sim),
        sd = sd(IRI_sim)
    ))
```

# Plotting

- We want now to produce a graph showing the histograms of the observed `IRI` column using `ggplot2`. 
    - Create a `ggplot` using the `dataclean` dataset
    - Set the aesthetics to `x = IRI`
    - Create the histograms using `geom_histogram()`{.R}, with a fill color depending on `SOAdur`
    - Arrange the plots on a grid depending on the `Subject` column using `facet_wrap()`{.R}
    - Add a vertical lign marking the average value for each subject using `geom_vline()`{.R}. The data for these lines are stored in the `stats_obs` dataset.
    - Play with the theme and other ggplot commands to make the plot look like the one below
        - Try plotting a normalized histogram by [looking up on the Internet how to do this](https://www.google.fr/search?source=hp&ei=g0MLXeGwKNLPgweV04-IBA&q=ggplot+normalized+histogram&oq=ggplot+normalized+histogram)

```{r echo=TRUE, warning = FALSE, message=FALSE, cache=FALSE}
dataclean %>% 
    ggplot(aes(x = IRI, fill = factor(SOAdur))) +
    geom_histogram(aes(y = stat(count / sum(count))), bins=20) +
    scale_y_continuous(labels = scales::percent_format())+
    facet_wrap(~ paste("Subject", Subject)) +
    geom_vline(data = stats_obs, aes(xintercept = mean, color=factor(SOAdur)), lty = 2, show.legend = FALSE) +
    scale_x_continuous(limits = c(-2000,2000))+
    labs(title = "Observations",
         x = "IRI [ms]",
         y = "Occurence", 
         fill="SOA [ms]") +
    theme_bw() +
    theme(strip.background = element_rect(fill = "transparent", colour = NA),
          strip.text = element_text(face = "bold"),
          axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
```

- Let's do the same for the simulated dataset. It should look like this:

```{r echo=TRUE, warning = FALSE, message=FALSE, cache=FALSE}
IRI_sim %>% 
    ggplot(aes(x = IRI_sim, fill = factor(SOAdur, levels=c(15, 65, 250)))) +
    geom_histogram(aes(y = stat(count / sum(count))), bins = 20) +
    scale_y_continuous(labels = scales::percent_format()) +
    facet_wrap(~ paste("Subject", Subject)) +
    geom_vline(data = stats_sim, aes(xintercept = mean, color=factor(SOAdur)), lty = 2, show.legend = FALSE) +
    scale_x_continuous(limits = c(-2000,2000))+
    labs(
        title = "Simulations",
        x = "IRI [ms]",
        y = "Occurence",
        fill = "SOA [ms]"
    ) +
    theme_bw() +
    theme(
        strip.background = element_rect(fill = "transparent", colour = NA),
        strip.text = element_text(face = "bold"),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
    )
```