R Exercises - Spectroscopic data - Solution
We will now work with Raman spectroscopy data from the luminescence of a ruby particle under high pressure: Data/rubis_xx.txt
. This measurement is performed in a diamond anvil cell with a laser of wavelength \(\lambda_L\)=568.189 nm, and the pressure is known by calibrating the frequency of a diamond line.
The ruby luminescence contains two lines, and we want to use the position of the R1 line, which is the most intense and has the highest energy, to determine the pressure. We therefore need to determine the parameters \(A\) and \(B\) of the equation of state linking wavelength (in nm) and pressure (in GPa):
\[ P~[GPa] = \frac{A}{B}\left[\left(\frac{\lambda}{\lambda_0}\right)^B-1\right], \] where \(\lambda_0\) = 694.24 nm is the wavelength of the R1 line at zero pressure, and \(\lambda\) is the wavelength of the R1 line at \(P\) pressure. For a given line, the relationship between its measured Raman shift, \(\omega\), and its wavelength is given by \(\lambda = \frac{10^7}{\omega_L - \omega}\), with \(\omega_L= \frac{10^7}{\lambda_L}\) the frequency of the probe laser.
Open one of the files, for example Data/rubis_01.txt to see its structure - they’re all the same. They are made up of two columns, which are the Raman shifts in cm-1 and the intensities. The nomenclature of the files "Data/rubis_xx.txt"
is such that xx
is the pressure in GPa.
First, consider the steps involved in determining the parameters \(A\) and \(B\) of the equation of state.
- What data do you need?
- What steps do you need to take to obtain them?
- Let’s start by defining the function
Prubis(w,A,B)
, which returns the pressure in GPa as a function of the Raman shift in cm-1, and the parameters \(A\) and \(B\).
Prubis <- function(w,A,B){
wl <- 1e7/568.189
lamb0 <- 694.24
lamb <- 1e7/(wl - w)
return(A/B * ((lamb/lamb0)^B - 1))
}
Find in the folder “Data” the list of files with the pattern “rubis_” in their name (check out the list.files()
function) and store it in flist
flist <- list.files(path="Data", pattern = "rubis_")
What is the length of flist
?
length(flist)
[1] 17
Initialize an empty tibble called spec
that will store all spectra:
Using a for
loop, load each file in flist
and store them into spec
by making it tidy.
for(file in flist){
d <- read_table(file.path("Data",file), col_names = c("w","int"))
d$name <- file
spec <- bind_rows(spec,d)
}
spec
# A tibble: 17,255 × 3
w int name
<dbl> <dbl> <chr>
1 3064. 43.9 rubis_01.txt
2 3064. 47.9 rubis_01.txt
3 3064. 44.5 rubis_01.txt
4 3065. 50.5 rubis_01.txt
5 3065. 50.5 rubis_01.txt
6 3065. 44.5 rubis_01.txt
7 3065. 44.9 rubis_01.txt
8 3066. 39.9 rubis_01.txt
9 3066. 49.5 rubis_01.txt
10 3066. 48.9 rubis_01.txt
# ℹ 17,245 more rows
Another version of this that does not use a for loop would be like so (try to understand what’s happening in this code by running each line one after the other and looking at the help on the functions used):
# spec <- tibble(name=flist) |>
# mutate(data=map(name, ~read_table(file.path("Data",.),
# col_names = c("w","int")))
# ) |>
# unnest(data)
Modify spec
to get the pressure in GPa from the file name, and so that the intensity column is normalized to [0,1].
norm01 <- function(x){
(x-min(x))/(max(x)-min(x))
}
spec <- spec |>
mutate(pressure = as.numeric(str_extract(name, "[0-9]+"))) |>
group_by(name) |>
mutate(int=norm01(int))
Plot the first spectrum in spec
to see the x range
spec |>
filter(name==flist[1]) |>
ggplot(aes(x=w, y=int))+
geom_line()+
labs(x="Raman Shift [1/cm]", y="Intensity [arb. units]")+
theme_bw()
Using ggplot2
, plot all spectra normalized to [0,1] (with points) and stacked on top of each other with a vertical shift corresponding to the pressure in GPa
spec |>
ggplot(aes(x=w,
y=int + pressure,
group=name))+
geom_point(size=.2, alpha=.2)+
labs(x="Raman Shift [1/cm]", y="Pressure [GPa]")+
theme_bw()
What we want to do now is to fit this data by the sum of two Lorentzians to find the precise position of the R1 line as a function of pressure. To do this, we’ll use the function nls()
, which allows us to make a non-linear fit of the data. We also need to define the function lor(x,x0,gamma)
which returns a Lorentzian centered at x0
and of width at half-height gamma
. Start by defining the function lor(x,x0,gamma)
.
lor <- function(x,x0,gamma){
2/pi/gamma / (1 + ((x-x0)/(gamma/2))^2)
}
Now create an empty tibble called results
, in which we’ll store all the fitting parameters in a tidy tibble. Then fit each spectrum by the sum of two Lorentzians, and save all fit parameters in results
, along with the file name and pressure. For the fit to work, you’ll need to give reasonable initial values for the fit parameters using the start=list(...)
argument of nls()
. For example, you can use the which.max()
function to find the index of the maximum of a vector.
results <- tibble()
for(file in flist){
d <- spec |> filter(name==file)
fit <- nls(int ~ A1*lor(w, x1, gamma1) + A2*lor(w, x2, gamma2),
data=d,
start=list(x1=d$w[which.max(d$int)],
x2=d$w[which.max(d$int)]-30,
gamma1=10, gamma2=10,
A1=10,A2=7))
res <- tibble(file,
pressure=unique(d$pressure),
as_tibble(t(coef(fit))))
results <- bind_rows(results, res)
}
results
# A tibble: 17 × 8
file pressure x1 x2 gamma1 gamma2 A1 A2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 rubis_01.txt 1 3203. 3173. 15.0 11.1 21.8 10.8
2 rubis_02.txt 2 3212. 3182. 11.9 8.92 18.1 8.76
3 rubis_03.txt 3 3219. 3189. 11.9 8.96 18.3 8.78
4 rubis_04.txt 4 3226. 3196. 12.3 9.16 18.8 8.70
5 rubis_05.txt 5 3234. 3205. 13.9 10.2 21.4 9.68
6 rubis_06.txt 6 3241. 3212. 13.6 10.2 20.9 9.47
7 rubis_07.txt 7 3248. 3219. 14.6 10.7 22.6 9.71
8 rubis_08.txt 8 3256. 3226. 15.3 11.0 23.4 9.90
9 rubis_10.txt 10 3270. 3241. 18.2 12.5 27.7 10.4
10 rubis_12.txt 12 3285. 3255. 19.5 13.5 29.7 10.8
11 rubis_14.txt 14 3301. 3271. 22.0 15.2 33.4 11.3
12 rubis_17.txt 17 3322. 3291. 23.8 14.8 37.0 11.0
13 rubis_21.txt 21 3348. 3315. 29.7 17.4 46.0 11.9
14 rubis_23.txt 23 3361. 3328. 32.1 17.4 50.1 11.7
15 rubis_25.txt 25 3373. 3339. 31.8 16.8 49.9 11.9
16 rubis_27.txt 27 3385. 3351. 31.4 18.1 47.1 12.4
17 rubis_29.txt 29 3401. 3366. 30.4 15.6 46.5 11.3
Another way of doing that is to use the map()
function from the purrr
package, which allows us to apply a function to each element of a list. Here, we want to apply the nls()
function to each spectrum in spec
, and store the result in a new column of spec
. We can do this like so:
library(broom)
specfitted <- spec |>
nest(data=c(w,int)) |>
mutate(fit=map(data, ~nls(int ~ A1*lor(w, x1, gamma1) + A2*lor(w, x2, gamma2),
data=.,
start=list(x1=.$w[which.max(.$int)],
x2=.$w[which.max(.$int)]-30,
gamma1=10, gamma2=10,
A1=10,A2=7))),
tidied = map(fit, tidy),
augmented = map(fit, augment))
results <- specfitted |>
unnest(tidied) |>
select(name, pressure, term, estimate)
specfitted <- specfitted |>
unnest(augmented)
Plot the result of the fit on the graph above.
specfitted |>
ggplot(aes(x=w, group=name))+
geom_point(aes(y=int + pressure),size=.2, alpha=.2)+
geom_line(aes(y=.fitted + pressure), color='red')+
labs(x="Raman Shift [1/cm]", y="Pressure [GPa]")+
theme_bw()
Now all you have to do is plot the position of the peak as a function of pressure, and model this data by the equation of state defined at the beginning. What values of A and B do you obtain? Compare with the literature values: A = 1876 GPa and B = 10.71.
tofit <- results |> filter(term=='x1')
fit <- nls(pressure~Prubis(estimate, A, B),
data=tofit,
start=list(A=10, B=10))
A <- tidy(fit) |> filter(term=='A') |>
select(estimate,std.error) |> round(1) |> paste(collapse="±")
B <- tidy(fit) |> filter(term=='B') |>
select(estimate,std.error) |> round(1) |> paste(collapse="±")
tofit |>
ggplot(aes(x=estimate, y=pressure))+
geom_point(size=4)+
geom_line(aes(y=predict(fit)), color='red')+
labs(y="Pressure [GPa]", x="Raman Shift [1/cm]")+
theme_bw()+
ggtitle(glue::glue("A = {A} GPa, B = {B}"))