R Exercises - Particle analysis from SEM images
Context
The aim of this exercise is to determine the correlations between experimental parameters and the size distributions of nickel nanoparticles obtained by dewetting. The nanoparticles were obtained by heat treatment at different temperatures under dihydrogen of a silicon wafer on which a layer of 2, 4.5 or 8 nm of nickel had previously been deposited. Treatment was carried out for 5, 10, 15, 30 or 60 minutes.
These wafers were then observed by scanning electron microscopy, and different images of the surface were taken: these images were analyzed with ImageJ, as shown on Figure 1. With this analysis, we obtained a set of files of type [x]nmNi-T[y]-[z]min-[u].csv
, where:
-
[x]
is the Ni thickness in the sample before treatment, -
[y]
is the treatment temperature in ˚C, -
[z]
the treatment time in minutes, and -
[u]
the image number.
Area values are given in pixel2, with scales stored in a separate file Data/scales.csv
.
In this tutorial, we’ll see how to import data from a large number of files, and aggregate them into a single tidy array. This table can then be exported in csv format, or used to generate graphs.
Data wrangling
- Load the package
tidyverse
. Set the globalggplot2
theme to black and white. Also, make it so that thestrip.background
(background of the facets titles) is blank, and that thestrip.text
is bold.
library(tidyverse)
theme_set(theme_bw()+
theme(strip.background = element_blank(),
strip.text = element_text(face = "bold")
))
- Find all
[x]nmNi-T[y]-[z]min-[u].csv
files in theData
folder and store them inflist
. You could use the functionglob2rx()
to help write a regular expression using the wildcard sign*
.
flist <- list.files(path = "Data", pattern = glob2rx("*nmNi*.csv"), full.names = TRUE)
- The pixel <-> length correspondence for each image has been stored in the file
Data/scales.csv
. Import this file into ascales
tibble.- Next, add to
scales
a columnpix2_to_nm2
which will contain the pixel2 -> nm2 conversion value for each image. - Then, modify the
file
column to contain the file name without extension. - Separate the
file
column into 4 columnsthickness
,temperature
,time
andimg
. - Convert these new columns to numeric values (you need to remove the text in them first).
- Next, add to
scales <- read_csv("Data/scales.csv") |>
mutate(pix2_to_nm2 = size^2/pixel^2,
file = file |> str_remove(".jpg")) |>
separate(file, c("thickness","temperature","time","img"), convert=TRUE, sep="-") |>
mutate(thickness = as.numeric(thickness |> str_remove("nmNi")),
temperature = as.numeric(temperature |> str_remove("T")),
time = as.numeric(time |> str_remove("min")))
- Let’s now import all our data files into a tibble called
data
, and modify this tibble to also store the information written in the files names. We will do this in a succession of pipe operations.- First, import all csv files into a tibble called
data
. You can use theread_csv()
function to do so, and look into theid
parameter to store the file name in a columnfile
. Also, we are only interested in theArea
column. - Nest the
Area
column into a columndata
using thenest()
function. This will make the next operations faster. - Modify the
file
column so that it contains the file name without extension and path. - Separate the
file
column into 4 columnsthickness
,temperature
,time
andimg
. - Convert these new columns to numeric values (you need to remove the text in them first).
- Finally, unnest the
data
column to get a single columnArea
containing the area values.
- First, import all csv files into a tibble called
data <- read_csv(flist, id='file', col_select=Area) |>
nest(data=Area) |>
mutate(file = basename(file) |> str_remove(".csv")) |>
separate(file, c("thickness","temperature","time","img"), convert=TRUE, sep="-") |>
mutate(thickness = as.numeric(thickness |> str_remove("nmNi")),
temperature = as.numeric(temperature |> str_remove("T")),
time = as.numeric(time |> str_remove("min"))) |>
unnest(data)
- Now we want to convert the areas in pixel2 to areas in nm2.
- We will first join the
data
andscales
tibbles into a tibble calledalldata
. - Then, we’ll modify the
Area
column to convert it to nm2, and we’ll create adiameter
column containing the diameter of the particles in nm. - Then, we’ll get rid of the
pix2_to_nm2
,pixel
andsize
columns that are useless. - Finally, we’ll filter the data to keep only particles with a diameter between 10 and 400 nm.
- We will first join the
In case you didn’t manage to get there, here is the data
tibble, you can read it with alldata <- read_csv("Data/alldata.csv")
.
Plotting and analysis
Plotting
- First we want to take a look at our data. Plot the histogram of all particle diameters, with a fill color depending on the time (you need to convert time to a factor), and with a grid showing temperature vs. substrate thickness. Put the legend on top of the graph, and add some transparency to your colors.
- In fact, I usually prefer to plot it using
geom_density()
which is basically an histogram convoluted with a Gaussian distribution of bandwidthbw
. This allows for smoother graphs. Make this plot and play with thebw
parameter. - Your plot should look like this:
alldata |>
ggplot(aes(x=diameter, fill=factor(time), color=factor(time)))+
geom_density(alpha=.1, bw=5)+
labs(x = "Particle Diameter [nm]",
y = "Density [arb. units]",
color= "Time [min]",
fill = "Time [min]")+
facet_grid(glue::glue("{thickness} nm Ni") ~
glue::glue("{temperature}˚C"),
scales="free_y")+
theme(legend.position = "top",
text = element_text(size=10),
strip.text.y = element_text(angle = 0))+
ggthemes::scale_fill_tableau()+
ggthemes::scale_color_tableau()
Analysis
- Now, store in
particles_ave
the average particle diameter and its standard deviation per substrate thickness, time and temperature of reaction.
particles_ave <- alldata |>
group_by(thickness, time, temperature) |>
summarise(diam = mean(diameter),
sddiam = sd(diameter))
particles_ave
# A tibble: 23 × 5
# Groups: thickness, time [10]
thickness time temperature diam sddiam
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 15 600 40.1 20.3
2 2 15 700 44.9 18.9
3 2 60 800 50.5 17.2
4 4.5 5 600 105. 39.1
5 4.5 5 800 94.7 42.9
6 4.5 10 600 151. 77.6
7 4.5 10 650 125. 48.6
8 4.5 10 700 109. 31.1
9 4.5 10 750 76.5 25.5
10 4.5 10 800 82.0 27.1
# ℹ 13 more rows
- Let’s see what are the parameters influencing the particle diameters the most. For this, we’ll perform a multiple linear regression using
lm()
, fitting thediam
variable as a function of a combination ofthickness
,temperature
, andtime
. We can also set the weights for thediam
values as the inverse of their standard deviation squared (by convention). Store the result infit
, and print the summary of the fit.
fit <- lm(diam ~ thickness + temperature + time,
data = particles_ave,
weights = 1/sddiam^2)
fit |> summary()
Call:
lm(formula = diam ~ thickness + temperature + time, data = particles_ave,
weights = 1/sddiam^2)
Weighted Residuals:
Min 1Q Median 3Q Max
-1.09185 -0.25392 -0.03437 0.46899 0.97474
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.52077 41.06198 -0.013 0.990013
thickness 9.73350 2.13589 4.557 0.000215 ***
temperature 0.05712 0.06091 0.938 0.360126
time -0.22459 0.26069 -0.862 0.399689
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5931 on 19 degrees of freedom
Multiple R-squared: 0.5803, Adjusted R-squared: 0.514
F-statistic: 8.756 on 3 and 19 DF, p-value: 0.0007448
How would you interpret the result? What are the most important parameters?