We will now work with Raman spectroscopy data from the luminescence of a ruby particle under high pressure: Data/rubis_xx.txt. This measurement is performed in a diamond anvil cell with a laser of wavelength \(\lambda_L\)=568.189 nm, and the pressure is known by calibrating the frequency of a diamond line.

The ruby luminescence contains two lines, and we want to use the position of the R1 line, which is the most intense and has the highest energy, to determine the pressure. We therefore need to determine the parameters \(A\) and \(B\) of the equation of state linking wavelength (in nm) and pressure (in GPa):

\[ P~[GPa] = \frac{A}{B}\left[\left(\frac{\lambda}{\lambda_0}\right)^B-1\right], \] where \(\lambda_0\) = 694.24 nm is the wavelength of the R1 line at zero pressure, and \(\lambda\) is the wavelength of the R1 line at \(P\) pressure. For a given line, the relationship between its measured Raman shift, \(\omega\), and its wavelength is given by \(\lambda = \frac{10^7}{\omega_L - \omega}\), with \(\omega_L= \frac{10^7}{\lambda_L}\) the frequency of the probe laser.

Open one of the files, for example Data/rubis_01.txt to see its structure - they’re all the same. They are made up of two columns, which are the Raman shifts in cm-1 and the intensities. The nomenclature of the files "Data/rubis_xx.txt" is such that xx is the pressure in GPa.

First, consider the steps involved in determining the parameters \(A\) and \(B\) of the equation of state.

  • What data do you need?
  • What steps do you need to take to obtain them?

  • Let’s start by defining the function Prubis(w,A,B), which returns the pressure in GPa as a function of the Raman shift in cm-1, and the parameters \(A\) and \(B\).
Prubis <- function(w,A,B){
    wl  <-  1e7/568.189
    lamb0  <-  694.24
    lamb  <-  1e7/(wl - w)
    return(A/B * ((lamb/lamb0)^B - 1))
}

Find in the folder “Data” the list of files with the pattern “rubis_” in their name (check out the list.files() function) and store it in flist

flist <- list.files(path="Data", pattern = "rubis_")

What is the length of flist?

length(flist)
## [1] 17

Initialize an empty tibble called spec that will store all spectra:

library(tidyverse)
spec   <- tibble()

Using a for loop, load each file in flist and store them into spec by making it tidy.

for(file in flist){
    d <- read_table(file.path("Data",file), col_names = c("w","int"))
    d$name <- file
    spec <- bind_rows(spec,d)
}
spec
## # A tibble: 17,255 × 3
##        w   int name        
##    <dbl> <dbl> <chr>       
##  1 3064.  43.9 rubis_01.txt
##  2 3064.  47.9 rubis_01.txt
##  3 3064.  44.5 rubis_01.txt
##  4 3065.  50.5 rubis_01.txt
##  5 3065.  50.5 rubis_01.txt
##  6 3065.  44.5 rubis_01.txt
##  7 3065.  44.9 rubis_01.txt
##  8 3066.  39.9 rubis_01.txt
##  9 3066.  49.5 rubis_01.txt
## 10 3066.  48.9 rubis_01.txt
## # ℹ 17,245 more rows

Another version of this that does not use a for loop would be like so (try to understand what’s happening in this code by running each line one after the other and looking at the help on the functions used):

# spec <- tibble(name=flist) %>% 
#     mutate(data=map(name, ~read_table(file.path("Data",.), 
#                                       col_names = c("w","int")))
#           ) %>% 
#     unnest(data)

Modify spec to get the pressure in GPa from the file name, and so that the intensity column is normalized to [0,1].

norm01 <- function(x){
    (x-min(x))/(max(x)-min(x))
}
spec <- spec %>% 
    mutate(pressure = as.numeric(str_extract(name, "[0-9]+"))) %>% 
    group_by(name) %>% 
    mutate(int=norm01(int))

Plot the first spectrum in spec to see the x range

spec %>% 
    filter(name==flist[1]) %>%
    ggplot(aes(x=w, y=int))+
        geom_line()+
        labs(x="Raman Shift [1/cm]", y="Intensity [arb. units]")+
        theme_bw()


Using ggplot2, plot all spectra normalized to [0,1] (with points) and stacked on top of each other with a vertical shift corresponding to the pressure in GPa

spec %>% 
    ggplot(aes(x=w, 
               y=int + pressure,
               group=name))+
        geom_point(size=.2, alpha=.2)+
        labs(x="Raman Shift [1/cm]", y="Pressure [GPa]")+
        theme_bw()


What we want to do now is to fit this data by the sum of two Lorentzians to find the precise position of the R1 line as a function of pressure. To do this, we’ll use the function nls(), which allows us to make a non-linear fit of the data. We also need to define the function lor(x,x0,gamma) which returns a Lorentzian centered at x0 and of width at half-height gamma. Start by defining the function lor(x,x0,gamma).

lor <- function(x,x0,gamma){
    2/pi/gamma / (1 + ((x-x0)/(gamma/2))^2)
}

Now create an empty tibble called results, in which we’ll store all the fitting parameters in a tidy tibble. Then fit each spectrum by the sum of two Lorentzians, and save all fit parameters in results, along with the file name and pressure. For the fit to work, you’ll need to give reasonable initial values for the fit parameters using the start=list(...) argument of nls(). For example, you can use the which.max() function to find the index of the maximum of a vector.

results <- tibble()
for(file in flist){
    d <- spec %>% filter(name==file)
    fit <- nls(int ~ A1*lor(w, x1, gamma1) + A2*lor(w, x2, gamma2), 
               data=d, 
               start=list(x1=d$w[which.max(d$int)], 
                          x2=d$w[which.max(d$int)]-30, 
                          gamma1=10, gamma2=10,
                          A1=10,A2=7))
    res <- tibble(file, 
                  pressure=unique(d$pressure),
                  as_tibble(t(coef(fit))))
    results <- bind_rows(results, res)
}
results
## # A tibble: 17 × 8
##    file         pressure    x1    x2 gamma1 gamma2    A1    A2
##    <chr>           <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>
##  1 rubis_01.txt        1 3203. 3173.   15.0  11.1   21.8 10.8 
##  2 rubis_02.txt        2 3212. 3182.   11.9   8.92  18.1  8.76
##  3 rubis_03.txt        3 3219. 3189.   11.9   8.96  18.3  8.78
##  4 rubis_04.txt        4 3226. 3196.   12.3   9.16  18.8  8.70
##  5 rubis_05.txt        5 3234. 3205.   13.9  10.2   21.4  9.68
##  6 rubis_06.txt        6 3241. 3212.   13.6  10.2   20.9  9.47
##  7 rubis_07.txt        7 3248. 3219.   14.6  10.7   22.6  9.71
##  8 rubis_08.txt        8 3256. 3226.   15.3  11.0   23.4  9.90
##  9 rubis_10.txt       10 3270. 3241.   18.2  12.5   27.7 10.4 
## 10 rubis_12.txt       12 3285. 3255.   19.5  13.5   29.7 10.8 
## 11 rubis_14.txt       14 3301. 3271.   22.0  15.2   33.4 11.3 
## 12 rubis_17.txt       17 3322. 3291.   23.8  14.8   37.0 11.0 
## 13 rubis_21.txt       21 3348. 3315.   29.7  17.4   46.0 11.9 
## 14 rubis_23.txt       23 3361. 3328.   32.1  17.4   50.1 11.7 
## 15 rubis_25.txt       25 3373. 3339.   31.8  16.8   49.9 11.9 
## 16 rubis_27.txt       27 3385. 3351.   31.4  18.1   47.1 12.4 
## 17 rubis_29.txt       29 3401. 3366.   30.4  15.6   46.5 11.3

Another way of doing that is to use the map() function from the purrr package, which allows us to apply a function to each element of a list. Here, we want to apply the nls() function to each spectrum in spec, and store the result in a new column of spec. We can do this like so:

library(broom)
specfitted <- spec %>% 
    nest(data=c(w,int)) %>% 
    mutate(fit=map(data, ~nls(int ~ A1*lor(w, x1, gamma1) + A2*lor(w, x2, gamma2), 
                              data=., 
                              start=list(x1=.$w[which.max(.$int)], 
                                         x2=.$w[which.max(.$int)]-30, 
                                         gamma1=10, gamma2=10,
                                         A1=10,A2=7))),
           tidied = map(fit, tidy),
           augmented = map(fit, augment))
results <- specfitted %>% 
    unnest(tidied) %>% 
    select(name, pressure, term, estimate)
specfitted <- specfitted %>% 
    unnest(augmented)

Plot the result of the fit on the graph above.

specfitted %>% 
    ggplot(aes(x=w, group=name))+
        geom_point(aes(y=int + pressure),size=.2, alpha=.2)+
        geom_line(aes(y=.fitted + pressure), color='red')+
        labs(x="Raman Shift [1/cm]", y="Pressure [GPa]")+
        theme_bw()


Now all you have to do is plot the position of the peak as a function of pressure, and model this data by the equation of state defined at the beginning. What values of A and B do you obtain? Compare with the literature values: A = 1876 GPa and B = 10.71.

tofit <- results %>% filter(term=='x1')
fit <- nls(pressure~Prubis(estimate, A, B), 
           data=tofit, 
           start=list(A=10, B=10))
A <- tidy(fit) %>% filter(term=='A') %>% 
        select(estimate,std.error) %>% round(1) %>% paste(collapse="±")
B <- tidy(fit) %>% filter(term=='B') %>% 
        select(estimate,std.error) %>% round(1) %>% paste(collapse="±")
tofit %>%  
    ggplot(aes(x=estimate, y=pressure))+
        geom_point(size=4)+
        geom_line(aes(y=predict(fit)), color='red')+
        labs(y="Pressure [GPa]", x="Raman Shift [1/cm]")+
        theme_bw()+
        ggtitle(glue::glue("A = {A} GPa, B = {B}"))

---
title : "R Exercises - Spectroscopic data - Solution"
date  : "`r Sys.Date()`"
output: 
    html_document:
        toc            : true
        toc_float      : true
        toc_depth      : 4
        highlight      : tango
        number_sections: false
        code_download  : true
params: 
    solution:
        value: true
---


```{r echo=FALSE, warning=FALSE, message=FALSE, fig.align="center"}
library(downloadthis)
download_link(
  link = "https://github.com/colinbousige/rclass/raw/main/Exo/spectro2/Archive.zip",
  output_name = "Data Files",
  button_label = "Download Data Files",
  button_type = "default",
  has_icon = TRUE,
  icon = "fa fa-save",
  self_contained = FALSE
)
```
<br>



We will now work with Raman spectroscopy data from the luminescence of a ruby particle under high pressure: `Data/rubis_xx.txt`. 
This measurement is performed in a diamond anvil cell with a laser of wavelength $\lambda_L$=568.189 nm, and the pressure is known by calibrating the frequency of a diamond line. 

The ruby luminescence contains two lines, and we want to use the position of the R1 line, which is the most intense and has the highest energy, to determine the pressure. We therefore need to determine the parameters $A$ and $B$ of the equation of state linking wavelength (in nm) and pressure (in GPa):

$$
P~[GPa] = \frac{A}{B}\left[\left(\frac{\lambda}{\lambda_0}\right)^B-1\right],
$$
where $\lambda_0$ = 694.24 nm is the wavelength of the R1 line at zero pressure, and $\lambda$ is the wavelength of the R1 line at $P$ pressure. 
For a given line, the relationship between its measured Raman shift, $\omega$, and its wavelength is given by $\lambda = \frac{10^7}{\omega_L - \omega}$, with $\omega_L= \frac{10^7}{\lambda_L}$ the frequency of the probe laser.

Open one of the files, for example [Data/rubis_01.txt](Data/rubis_01.txt) to see its structure - they're all the same. They are made up of two columns, which are the Raman shifts in cm^-1^ and the intensities. The nomenclature of the files `"Data/rubis_xx.txt"` is such that `xx` is the pressure in GPa.


> First, consider the steps involved in determining the parameters $A$ and $B$ of the equation of state. 
> 
> - What data do you need? 
> - What steps do you need to take to obtain them?

----------

- Let's start by defining the function `Prubis(w,A,B)`, which returns the pressure in GPa as a function of the Raman shift in cm^-1^, and the parameters $A$ and $B$. 


```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
Prubis <- function(w,A,B){
    wl  <-  1e7/568.189
    lamb0  <-  694.24
    lamb  <-  1e7/(wl - w)
    return(A/B * ((lamb/lamb0)^B - 1))
}
```

----------

Find in the folder "Data" the list of files with the pattern "rubis\_" in their name (check out the `list.files()` function) and store it in `flist`

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
flist <- list.files(path="Data", pattern = "rubis_")
```

----------

What is the length of `flist`?

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
length(flist)
```

----------

Initialize an empty tibble called `spec` that will store all spectra:

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
library(tidyverse)
spec   <- tibble()
```

----------

Using a `for` loop, load each file in `flist` and store them into `spec` by *making it tidy*.

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
for(file in flist){
    d <- read_table(file.path("Data",file), col_names = c("w","int"))
    d$name <- file
    spec <- bind_rows(spec,d)
}
spec
```

Another version of this that does not use a for loop would be like so (try to understand what's happening in this code by running each line one after the other and looking at the help on the functions used):

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
# spec <- tibble(name=flist) %>% 
#     mutate(data=map(name, ~read_table(file.path("Data",.), 
#                                       col_names = c("w","int")))
#           ) %>% 
#     unnest(data)
```

----------

Modify `spec` to get the pressure in GPa from the file name, and so that the intensity column is normalized to [0,1].

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
norm01 <- function(x){
    (x-min(x))/(max(x)-min(x))
}
spec <- spec %>% 
    mutate(pressure = as.numeric(str_extract(name, "[0-9]+"))) %>% 
    group_by(name) %>% 
    mutate(int=norm01(int))
```

----------

Plot the first spectrum in `spec` to see the x range

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
spec %>% 
    filter(name==flist[1]) %>%
    ggplot(aes(x=w, y=int))+
        geom_line()+
        labs(x="Raman Shift [1/cm]", y="Intensity [arb. units]")+
        theme_bw()
```

----------

Using `ggplot2`, plot all spectra normalized to [0,1] (with points) and stacked on top of each other with a vertical shift corresponding to the pressure in GPa

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
spec %>% 
    ggplot(aes(x=w, 
               y=int + pressure,
               group=name))+
        geom_point(size=.2, alpha=.2)+
        labs(x="Raman Shift [1/cm]", y="Pressure [GPa]")+
        theme_bw()
```

----------

What we want to do now is to fit this data by the sum of two Lorentzians to find the precise position of the R1 line as a function of pressure. To do this, we'll use the function [`nls()`](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/nls), which allows us to make a non-linear fit of the data. We also need to define the function `lor(x,x0,gamma)` which returns a [Lorentzian](https://fr.wikipedia.org/wiki/Fonction_lorentzienne) centered at `x0` and of width at half-height `gamma`. Start by defining the function `lor(x,x0,gamma)`.

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
lor <- function(x,x0,gamma){
    2/pi/gamma / (1 + ((x-x0)/(gamma/2))^2)
}
```

----------

Now create an empty tibble called `results`, in which we'll store all the fitting parameters in a tidy tibble. Then fit each spectrum by the sum of two Lorentzians, and save all fit parameters in `results`, along with the file name and pressure. For the fit to work, you'll need to give reasonable initial values for the fit parameters using the `start=list(...)` argument of `nls()`. For example, you can use the `which.max()` function to find the index of the maximum of a vector.

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
results <- tibble()
for(file in flist){
    d <- spec %>% filter(name==file)
    fit <- nls(int ~ A1*lor(w, x1, gamma1) + A2*lor(w, x2, gamma2), 
               data=d, 
               start=list(x1=d$w[which.max(d$int)], 
                          x2=d$w[which.max(d$int)]-30, 
                          gamma1=10, gamma2=10,
                          A1=10,A2=7))
    res <- tibble(file, 
                  pressure=unique(d$pressure),
                  as_tibble(t(coef(fit))))
    results <- bind_rows(results, res)
}
results
```

Another way of doing that is to use the `map()` function from the `purrr` package, which allows us to apply a function to each element of a list. Here, we want to apply the `nls()` function to each spectrum in `spec`, and store the result in a new column of `spec`. We can do this like so:

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
library(broom)
specfitted <- spec %>% 
    nest(data=c(w,int)) %>% 
    mutate(fit=map(data, ~nls(int ~ A1*lor(w, x1, gamma1) + A2*lor(w, x2, gamma2), 
                              data=., 
                              start=list(x1=.$w[which.max(.$int)], 
                                         x2=.$w[which.max(.$int)]-30, 
                                         gamma1=10, gamma2=10,
                                         A1=10,A2=7))),
           tidied = map(fit, tidy),
           augmented = map(fit, augment))
results <- specfitted %>% 
    unnest(tidied) %>% 
    select(name, pressure, term, estimate)
specfitted <- specfitted %>% 
    unnest(augmented)
```

----------

Plot the result of the fit on the graph above. 

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
specfitted %>% 
    ggplot(aes(x=w, group=name))+
        geom_point(aes(y=int + pressure),size=.2, alpha=.2)+
        geom_line(aes(y=.fitted + pressure), color='red')+
        labs(x="Raman Shift [1/cm]", y="Pressure [GPa]")+
        theme_bw()
```

----------

Now all you have to do is plot the position of the peak as a function of pressure, and model this data by the equation of state defined at the beginning. What values of A and B do you obtain? Compare with the [literature values](https://doi.org/10.1063/1.2135877): A = 1876 GPa and B = 10.71.

```{r include=params$solution, warning = FALSE, message=FALSE, cache=FALSE}
tofit <- results %>% filter(term=='x1')
fit <- nls(pressure~Prubis(estimate, A, B), 
           data=tofit, 
           start=list(A=10, B=10))
A <- tidy(fit) %>% filter(term=='A') %>% 
        select(estimate,std.error) %>% round(1) %>% paste(collapse="±")
B <- tidy(fit) %>% filter(term=='B') %>% 
        select(estimate,std.error) %>% round(1) %>% paste(collapse="±")
tofit %>%  
    ggplot(aes(x=estimate, y=pressure))+
        geom_point(size=4)+
        geom_line(aes(y=predict(fit)), color='red')+
        labs(y="Pressure [GPa]", x="Raman Shift [1/cm]")+
        theme_bw()+
        ggtitle(glue::glue("A = {A} GPa, B = {B}"))
```

