Download Data Files
This exercise is based on data from a PhD student in cognitive psychology. In his work, this student collects response times (RT) to two simultaneous tasks, and then he has to analyse the response times.
At \(t_0\) , the stimulus S1 is triggered
At \(t_1=t_0 + SOA\) , the stimulus S2 is triggered (SOA is the time between the 2 stimuli)
At \(t_2=t_0 + RT1\) , the subject responds to stimulus S1
At \(t_3=t_1 + RT2\) , the subject responds to stimulus S2
The inter-response interval (IRI) is defined as the time difference between \(t_3\) and \(t_2\) , thus \(IRI=SOA+RT2-RT1\) . Negatives IRI therefore mean that the response to S2 is emitted before the response to S1.
For theoretical reasons, one can consider that if \(SOA=1500\) ms, both stimulis are answered independently. One can thus use the RT1 and RT2 values for SOA=1500 ms to estimate the IRI in the case of independent responses to the stimuli.
Data wrangling
Download the datafiles archive and unzip it, then create a R project in this folder in Rstudio.
Load the tidyverse
and patchwork
packages:
library (tidyverse)
library (patchwork)
Load the .csv file in the Data
folder and save it into raw_data
. Look into the help of read_csv ()
to help you get rid of the error.
raw_data <- read_csv2 ("Data/data exemple.csv" , show_col_types = FALSE )
Using successive pipe operations , we will now create a dataclean
table from raw_data
, where we:
Filter raw_data
so that Procedure[Trial]
is only equal to "EssaisDT"
Mutate the table by adding a column IRI
containing SOAdur + S2Visuel.RT - S1Audio.RT
Filter the table so that some extreme values are excluded:
S1Audio.RT
is smaller or equal to 2500 and larger or equal to 100
S2Visuel.RT
is smaller or equal to 2500 and larger or equal to 100
S1Audio.ACC
, S1response.ACC
and S2Visuel.ACC
are equal to 1
dataclean <- raw_data %>%
filter (` Procedure[Trial] ` == "EssaisDT" ) %>%
mutate (IRI = SOAdur + S2Visuel.RT - S1Audio.RT) %>%
filter (S1Audio.RT <= 2500 & S1Audio.RT >= 100 &
S2Visuel.RT <= 2500 & S2Visuel.RT >= 100 &
S1Audio.ACC == 1 & S1response.ACC == 1 & S2Visuel.ACC == 1 )
Let’s compute the simulated values in the case the responses to the stimuli are independent, i.e. using SOA=1500ms, and save it into a tibble called IRI_sim
. Using successive pipe operations and starting from dataclean
:
Filter rows so that SOAdur
is equal to 1500
Delete the SOAdur
and IRI
columns
Mutate the table to add 3 columns IRI_sim_xx
, where xx
=15, 65 or 250 and IRI_sim_xx = xx + S2Visuel.RT - S1Audio.RT
.
Using pivot_longer ()
and the options names_prefix = "IRI_sim_", names_to = "SOAdur", values_to = "IRI_sim"
, pivot the columns containing "IRI_sim_"
into a long table (you need to add the option to select the corresponding columns).
IRI_sim <- dataclean %>%
filter (SOAdur == 1500 ) %>%
select (- c (SOAdur, IRI)) %>%
mutate (
IRI_sim_15 = 15 + S2Visuel.RT - S1Audio.RT,
IRI_sim_65 = 65 + S2Visuel.RT - S1Audio.RT,
IRI_sim_250 = 250 + S2Visuel.RT - S1Audio.RT
) %>%
pivot_longer (
cols = contains ("IRI_sim_" ),
names_prefix = "IRI_sim_" ,
names_to = "SOAdur" ,
values_to = "IRI_sim"
)
We want now to get the averaged IRI
per subject and per SOA, and its standard deviation. Using group_by ()
and summarise ()
, store the mean and standard deviation of IRI
in a table called stats_obs
, starting from dataclean
. It should look like this:
(stats_obs <- dataclean %>%
group_by (Subject, SOAdur) %>%
summarise (
mean = mean (IRI),
sd = sd (IRI)
))
## # A tibble: 24 × 4
## # Groups: Subject [6]
## Subject SOAdur mean sd
## <dbl> <dbl> <dbl> <dbl>
## 1 1 15 -64.8 291.
## 2 1 65 20.8 297.
## 3 1 250 218. 217.
## 4 1 1500 1019. 228.
## 5 2 15 -159. 193.
## 6 2 65 -156. 236.
## 7 2 250 4.42 191.
## 8 2 1500 1184. 198.
## 9 3 15 -289. 207.
## 10 3 65 -251. 262.
## # … with 14 more rows
We want to do the same for the 3 simulated IRI.
(stats_sim <- IRI_sim %>%
group_by (Subject, SOAdur) %>%
summarise (
mean = mean (IRI_sim),
sd = sd (IRI_sim)
))
## # A tibble: 18 × 4
## # Groups: Subject [6]
## Subject SOAdur mean sd
## <dbl> <chr> <dbl> <dbl>
## 1 1 15 -466. 228.
## 2 1 250 -231. 228.
## 3 1 65 -416. 228.
## 4 2 15 -301. 198.
## 5 2 250 -65.9 198.
## 6 2 65 -251. 198.
## 7 3 15 -99.0 157.
## 8 3 250 136. 157.
## 9 3 65 -49.0 157.
## 10 4 15 -168. 214.
## 11 4 250 66.9 214.
## 12 4 65 -118. 214.
## 13 5 15 -367. 249.
## 14 5 250 -132. 249.
## 15 5 65 -317. 249.
## 16 6 15 -313. 161.
## 17 6 250 -77.9 161.
## 18 6 65 -263. 161.
Plotting
We want now to produce a graph showing the histograms of the observed IRI
column using ggplot2
.
Create a ggplot
using the dataclean
dataset
Set the aesthetics to x = IRI
Create the histograms using geom_histogram ()
, with a fill color depending on SOAdur
Arrange the plots on a grid depending on the Subject
column using facet_wrap ()
Add a vertical lign marking the average value for each subject using geom_vline ()
. The data for these lines are stored in the stats_obs
dataset.
Play with the theme and other ggplot commands to make the plot look like the one below
dataclean %>%
ggplot (aes (x = IRI, fill = factor (SOAdur))) +
geom_histogram (aes (y = stat (count / sum (count))), bins= 20 ) +
scale_y_continuous (labels = scales:: percent_format ())+
facet_wrap (~ paste ("Subject" , Subject)) +
geom_vline (data = stats_obs, aes (xintercept = mean, color= factor (SOAdur)), lty = 2 , show.legend = FALSE ) +
scale_x_continuous (limits = c (- 2000 ,2000 ))+
labs (title = "Observations" ,
x = "IRI [ms]" ,
y = "Occurence" ,
fill= "SOA [ms]" ) +
theme_bw () +
theme (strip.background = element_rect (fill = "transparent" , colour = NA ),
strip.text = element_text (face = "bold" ),
axis.text.x = element_text (angle = 45 , vjust = 1 , hjust = 1 ))
Let’s do the same for the simulated dataset. It should look like this:
IRI_sim %>%
ggplot (aes (x = IRI_sim, fill = factor (SOAdur, levels= c (15 , 65 , 250 )))) +
geom_histogram (aes (y = stat (count / sum (count))), bins = 20 ) +
scale_y_continuous (labels = scales:: percent_format ()) +
facet_wrap (~ paste ("Subject" , Subject)) +
geom_vline (data = stats_sim, aes (xintercept = mean, color= factor (SOAdur)), lty = 2 , show.legend = FALSE ) +
scale_x_continuous (limits = c (- 2000 ,2000 ))+
labs (
title = "Simulations" ,
x = "IRI [ms]" ,
y = "Occurence" ,
fill = "SOA [ms]"
) +
theme_bw () +
theme (
strip.background = element_rect (fill = "transparent" , colour = NA ),
strip.text = element_text (face = "bold" ),
axis.text.x = element_text (angle = 45 , vjust = 1 , hjust = 1 )
)
---
title : "R Exercises - Cognitive Psychology"
date  : "`r Sys.Date()`"
output: 
    html_document:
        toc            : true
        toc_float      : true
        toc_depth      : 4
        highlight      : tango
        number_sections: true
        code_download  : TRUE
params: 
    solution:
        value: TRUE
---

<style type="text/css">
blockquote {
  background: #E9F9FF;
  border-left: 5px solid #026086;
  margin: 1.5em 10px;
  padding: 0.5em 10px;
  font-size: 1em;
}
</style>

```{r echo=FALSE, warning=FALSE, message=FALSE, fig.align="center"}
library(downloadthis)
download_link(
  link = "./Archive.zip",
  output_name = "Data Files",
  button_label = "Download Data Files",
  button_type = "default",
  has_icon = TRUE,
  icon = "fa fa-save",
  self_contained = FALSE
)
```
<br>

```{r include=FALSE}
library(knitr)
knitr::opts_chunk$set(cache = FALSE, out.width='100%', warnings=FALSE, message=FALSE)
options(width = 80)
```

----

This exercise is based on data from a PhD student in cognitive psychology. In his work, this student collects response times (RT) to two simultaneous tasks, and then he has to analyse the response times.

- At $t_0$, the stimulus S1 is triggered
- At $t_1=t_0 + SOA$, the stimulus S2 is triggered (SOA is the time between the 2 stimuli)
- At $t_2=t_0 + RT1$, the subject responds to stimulus S1
- At $t_3=t_1 + RT2$, the subject responds to stimulus S2

The inter-response interval (IRI) is defined as the time difference between $t_3$ and $t_2$, thus $IRI=SOA+RT2-RT1$. Negatives IRI therefore mean that the response to S2 is emitted before the response to S1.

For theoretical reasons, one can consider that if $SOA=1500$ ms, both stimulis are answered independently. One can thus use the RT1 and RT2 values for SOA=1500 ms to estimate the IRI in the case of independent responses to the stimuli.

---

# Data wrangling

- Download the datafiles <a href="Archive.zip" download target="_blank">archive</a> and unzip it, then create a R project in this folder in Rstudio.
- Load the `tidyverse` and `patchwork` packages:

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
library(tidyverse)
library(patchwork)
```

- Load the .csv file in the `Data` folder and save it into `raw_data`. Look into the help of `read_csv()`{.R} to help you get rid of the error.

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
raw_data <- read_csv2("Data/data exemple.csv", show_col_types = FALSE)
```

- Using successive **pipe operations**, we will now create a `dataclean` table from `raw_data`, where we:
    - **Filter** `raw_data` so that `Procedure[Trial]` is only equal to `"EssaisDT"`
    - **Mutate** the table by adding a column `IRI` containing `SOAdur + S2Visuel.RT - S1Audio.RT`
    - **Filter** the table so that some extreme values are excluded:
        - `S1Audio.RT` is smaller or equal to 2500 and larger or equal to 100
        - `S2Visuel.RT` is smaller or equal to 2500 and larger or equal to 100
        - `S1Audio.ACC`, `S1response.ACC` and `S2Visuel.ACC` are equal to 1

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
dataclean <- raw_data %>% 
    filter(`Procedure[Trial]` == "EssaisDT") %>%
    mutate(IRI = SOAdur + S2Visuel.RT - S1Audio.RT) %>% 
    filter(S1Audio.RT <= 2500 & S1Audio.RT >= 100 &
        S2Visuel.RT <= 2500 & S2Visuel.RT >= 100 &
        S1Audio.ACC == 1 & S1response.ACC == 1 & S2Visuel.ACC == 1)
```

- Let's compute the simulated values in the case the responses to the stimuli are independent, _i.e._ using SOA=1500ms, and save it into a tibble called `IRI_sim`. Using successive pipe operations and starting from `dataclean`:
    - **Filter** rows so that `SOAdur` is equal to 1500
    - Delete the `SOAdur` and `IRI` columns
    - **Mutate** the table to add 3 columns `IRI_sim_xx`, where `xx`=15, 65 or 250 and `IRI_sim_xx = xx + S2Visuel.RT - S1Audio.RT`.
    - Using `pivot_longer()`{.R} and the options `names_prefix = "IRI_sim_", names_to = "SOAdur", values_to = "IRI_sim"`, pivot the columns containing `"IRI_sim_"` into a long table (you need to add the option to select the corresponding columns).

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
IRI_sim <- dataclean %>%
    filter(SOAdur == 1500) %>%
    select(-c(SOAdur, IRI)) %>%
    mutate(
        IRI_sim_15 = 15 + S2Visuel.RT - S1Audio.RT,
        IRI_sim_65 = 65 + S2Visuel.RT - S1Audio.RT,
        IRI_sim_250 = 250 + S2Visuel.RT - S1Audio.RT
    ) %>%
    pivot_longer(
        cols = contains("IRI_sim_"),
        names_prefix = "IRI_sim_",
        names_to = "SOAdur",
        values_to = "IRI_sim"
    )
```

- We want now to get the averaged `IRI` per subject and per SOA, and its standard deviation. Using `group_by()`{.R} and `summarise()`{.R}, store the mean and standard deviation of `IRI` in a table called `stats_obs`, starting from `dataclean`. It should look like this:

```{r echo=TRUE, warning = FALSE, message=FALSE, cache=FALSE}
(stats_obs <- dataclean %>%
    group_by(Subject, SOAdur) %>%
    summarise(
        mean = mean(IRI),
        sd = sd(IRI)
    ))
```

- We want to do the same for the 3 simulated IRI. 
```{r echo=TRUE, warning = FALSE, message=FALSE, cache=FALSE}
(stats_sim <- IRI_sim %>%
    group_by(Subject, SOAdur) %>%
    summarise(
        mean = mean(IRI_sim),
        sd = sd(IRI_sim)
    ))
```

# Plotting

- We want now to produce a graph showing the histograms of the observed `IRI` column using `ggplot2`. 
    - Create a `ggplot` using the `dataclean` dataset
    - Set the aesthetics to `x = IRI`
    - Create the histograms using `geom_histogram()`{.R}, with a fill color depending on `SOAdur`
    - Arrange the plots on a grid depending on the `Subject` column using `facet_wrap()`{.R}
    - Add a vertical lign marking the average value for each subject using `geom_vline()`{.R}. The data for these lines are stored in the `stats_obs` dataset.
    - Play with the theme and other ggplot commands to make the plot look like the one below
        - Try plotting a normalized histogram by [looking up on the Internet how to do this](https://www.google.fr/search?source=hp&ei=g0MLXeGwKNLPgweV04-IBA&q=ggplot+normalized+histogram&oq=ggplot+normalized+histogram)

```{r echo=TRUE, warning = FALSE, message=FALSE, cache=FALSE}
dataclean %>% 
    ggplot(aes(x = IRI, fill = factor(SOAdur))) +
    geom_histogram(aes(y = stat(count / sum(count))), bins=20) +
    scale_y_continuous(labels = scales::percent_format())+
    facet_wrap(~ paste("Subject", Subject)) +
    geom_vline(data = stats_obs, aes(xintercept = mean, color=factor(SOAdur)), lty = 2, show.legend = FALSE) +
    scale_x_continuous(limits = c(-2000,2000))+
    labs(title = "Observations",
         x = "IRI [ms]",
         y = "Occurence", 
         fill="SOA [ms]") +
    theme_bw() +
    theme(strip.background = element_rect(fill = "transparent", colour = NA),
          strip.text = element_text(face = "bold"),
          axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
```

- Let's do the same for the simulated dataset. It should look like this:

```{r echo=TRUE, warning = FALSE, message=FALSE, cache=FALSE}
IRI_sim %>% 
    ggplot(aes(x = IRI_sim, fill = factor(SOAdur, levels=c(15, 65, 250)))) +
    geom_histogram(aes(y = stat(count / sum(count))), bins = 20) +
    scale_y_continuous(labels = scales::percent_format()) +
    facet_wrap(~ paste("Subject", Subject)) +
    geom_vline(data = stats_sim, aes(xintercept = mean, color=factor(SOAdur)), lty = 2, show.legend = FALSE) +
    scale_x_continuous(limits = c(-2000,2000))+
    labs(
        title = "Simulations",
        x = "IRI [ms]",
        y = "Occurence",
        fill = "SOA [ms]"
    ) +
    theme_bw() +
    theme(
        strip.background = element_rect(fill = "transparent", colour = NA),
        strip.text = element_text(face = "bold"),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
    )
```