R Exercises - Particle analysis from SEM images

Author

Your Name

Published

November 6, 2024

Context

The aim of this exercise is to determine the correlations between experimental parameters and the size distributions of nickel nanoparticles obtained by dewetting. The nanoparticles were obtained by heat treatment at different temperatures under dihydrogen of a silicon wafer on which a layer of 2, 4.5 or 8 nm of nickel had previously been deposited. Treatment was carried out for 5, 10, 15, 30 or 60 minutes.

These wafers were then observed by scanning electron microscopy, and different images of the surface were taken: these images were analyzed with ImageJ, as shown on Figure 1. With this analysis, we obtained a set of files of type [x]nmNi-T[y]-[z]min-[u].csv, where:

[x] is the Ni thickness in the sample before treatment,
[y] is the treatment temperature in ˚C,
[z] the treatment time in minutes, and
[u] the image number.

Area values are given in pixel², with scales stored in a separate file Data/scales.csv.

Figure 1: Typical SEM image of Ni nanoparticles: from the raw image to particle analysis

Note

In this tutorial, we’ll see how to import data from a large number of files, and aggregate them into a single tidy array. This table can then be exported in csv format, or used to generate graphs.

Data wrangling

Load the package tidyverse. Set the global ggplot2 theme to black and white. Also, make it so that the strip.background (background of the facets titles) is blank, and that the strip.text is bold.

library(tidyverse)
theme_set(theme_bw()+
          theme(strip.background = element_blank(),
                strip.text = element_text(face = "bold")
                ))

Find all [x]nmNi-T[y]-[z]min-[u].csv files in the Data folder and store them in flist. You could use the function glob2rx() to help write a regular expression using the wildcard sign *.

flist <- list.files(path = "Data", pattern = glob2rx("*nmNi*.csv"), full.names = TRUE)

The pixel <-> length correspondence for each image has been stored in the file Data/scales.csv. Import this file into a scales tibble.
- Next, add to scales a column pix2_to_nm2 which will contain the pixel² -> nm² conversion value for each image.
- Then, modify the file column to contain the file name without extension.
- Separate the file column into 4 columns thickness, temperature, time and img.
- Convert these new columns to numeric values (you need to remove the text in them first).

scales <- read_csv("Data/scales.csv") |> 
    mutate(pix2_to_nm2 = size^2/pixel^2,
           file        = file |> str_remove(".jpg")) |> 
    separate(file, c("thickness","temperature","time","img"), convert=TRUE, sep="-") |>
    mutate(thickness   = as.numeric(thickness |> str_remove("nmNi")),
           temperature = as.numeric(temperature |> str_remove("T")),
           time        = as.numeric(time |> str_remove("min")))

Let’s now import all our data files into a tibble called data, and modify this tibble to also store the information written in the files names. We will do this in a succession of pipe operations.
- First, import all csv files into a tibble called data. You can use the read_csv() function to do so, and look into the id parameter to store the file name in a column file. Also, we are only interested in the Area column.
- Nest the Area column into a column data using the nest() function. This will make the next operations faster.
- Modify the file column so that it contains the file name without extension and path.
- Separate the file column into 4 columns thickness, temperature, time and img.
- Convert these new columns to numeric values (you need to remove the text in them first).
- Finally, unnest the data column to get a single column Area containing the area values.

data <- read_csv(flist, id='file', col_select=Area) |> 
    nest(data=Area) |> 
    mutate(file = basename(file) |> str_remove(".csv")) |> 
    separate(file, c("thickness","temperature","time","img"), convert=TRUE, sep="-") |>
    mutate(thickness   = as.numeric(thickness |> str_remove("nmNi")),
           temperature = as.numeric(temperature |> str_remove("T")),
           time        = as.numeric(time |> str_remove("min"))) |> 
    unnest(data)

Now we want to convert the areas in pixel² to areas in nm².
- We will first join the data and scales tibbles into a tibble called alldata.
- Then, we’ll modify the Area column to convert it to nm², and we’ll create a diameter column containing the diameter of the particles in nm.
- Then, we’ll get rid of the pix2_to_nm2, pixel and size columns that are useless.
- Finally, we’ll filter the data to keep only particles with a diameter between 10 and 400 nm.

alldata <- data |> inner_join(scales) |> 
    mutate(Area     = Area*pix2_to_nm2,
           diameter = sqrt(Area/pi)*2) |> 
    select(-c(pix2_to_nm2, pixel, size)) |> 
    filter(diameter>10, diameter<400)

Note

In case you didn’t manage to get there, here is the data tibble, you can read it with alldata <- read_csv("Data/alldata.csv").

Plotting and analysis

Plotting

First we want to take a look at our data. Plot the histogram of all particle diameters, with a fill color depending on the time (you need to convert time to a factor), and with a grid showing temperature vs. substrate thickness. Put the legend on top of the graph, and add some transparency to your colors.
In fact, I usually prefer to plot it using geom_density() which is basically an histogram convoluted with a Gaussian distribution of bandwidth bw. This allows for smoother graphs. Make this plot and play with the bw parameter.
Your plot should look like this:

alldata |>
    ggplot(aes(x=diameter, fill=factor(time), color=factor(time)))+
        geom_density(alpha=.1, bw=5)+
        labs(x    = "Particle Diameter [nm]",
             y    = "Density [arb. units]",
             color= "Time [min]",
             fill = "Time [min]")+
        facet_grid(glue::glue("{thickness} nm Ni") ~
                   glue::glue("{temperature}˚C"), 
                   scales="free_y")+
        theme(legend.position = "top",
              text = element_text(size=10),
              strip.text.y = element_text(angle = 0))+
        ggthemes::scale_fill_tableau()+
        ggthemes::scale_color_tableau()

Analysis

Now, store in particles_ave the average particle diameter and its standard deviation per substrate thickness, time and temperature of reaction.

particles_ave <- alldata |> 
    group_by(thickness, time, temperature) |> 
    summarise(diam   = mean(diameter),
              sddiam = sd(diameter))
particles_ave

# A tibble: 23 × 5
# Groups:   thickness, time [10]
   thickness  time temperature  diam sddiam
       <dbl> <dbl>       <dbl> <dbl>  <dbl>
 1       2      15         600  40.1   20.3
 2       2      15         700  44.9   18.9
 3       2      60         800  50.5   17.2
 4       4.5     5         600 105.    39.1
 5       4.5     5         800  94.7   42.9
 6       4.5    10         600 151.    77.6
 7       4.5    10         650 125.    48.6
 8       4.5    10         700 109.    31.1
 9       4.5    10         750  76.5   25.5
10       4.5    10         800  82.0   27.1
# ℹ 13 more rows

Let’s see what are the parameters influencing the particle diameters the most. For this, we’ll perform a multiple linear regression using lm(), fitting the diam variable as a function of a combination of thickness, temperature, and time. We can also set the weights for the diam values as the inverse of their standard deviation squared (by convention). Store the result in fit, and print the summary of the fit.

fit <- lm(diam ~ thickness + temperature + time,
          data = particles_ave, 
          weights = 1/sddiam^2)
fit |> summary()


Call:
lm(formula = diam ~ thickness + temperature + time, data = particles_ave, 
    weights = 1/sddiam^2)

Weighted Residuals:
     Min       1Q   Median       3Q      Max 
-1.09185 -0.25392 -0.03437  0.46899  0.97474 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.52077   41.06198  -0.013 0.990013    
thickness    9.73350    2.13589   4.557 0.000215 ***
temperature  0.05712    0.06091   0.938 0.360126    
time        -0.22459    0.26069  -0.862 0.399689    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5931 on 19 degrees of freedom
Multiple R-squared:  0.5803,    Adjusted R-squared:  0.514 
F-statistic: 8.756 on 3 and 19 DF,  p-value: 0.0007448

How would you interpret the result? What are the most important parameters?

--- title : "R Exercises - Particle analysis from SEM images" author : Your Name date : today format: html: toc : true theme : cosmo page-layout : full linkcolor : "#3d54cb" code-link : true code-tools : true code-copy : true code-block-border-left: true code-block-bg : true self-contained : true execute: warning: false message: false cache: false --- ```{r echo=FALSE, warning=FALSE, message=FALSE, fig.align="center"} xfun::embed_file("./Archive.zip", text = "Download data files") ``` ---- # Context The aim of this exercise is to determine the correlations between experimental parameters and the size distributions of nickel nanoparticles obtained by dewetting. The nanoparticles were obtained by heat treatment at different temperatures under dihydrogen of a silicon wafer on which a layer of 2, 4.5 or 8 nm of nickel had previously been deposited. Treatment was carried out for 5, 10, 15, 30 or 60 minutes. These wafers were then observed by scanning electron microscopy, and different images of the surface were taken: these images were analyzed with [ImageJ](https://imagej.net/Welcome), as shown on @fig-SEMimages. With this analysis, we obtained a set of files of type `[x]nmNi-T[y]-[z]min-[u].csv`, where: - `[x]` is the Ni thickness in the sample before treatment, - `[y]` is the treatment temperature in ˚C, - `[z]` the treatment time in minutes, and - `[u]` the image number. Area values are given in pixel^2^, with scales stored in a separate file [`Data/scales.csv`](Data/scales.csv). ```{r} #| label: fig-SEMimages #| fig-cap: "Typical SEM image of Ni nanoparticles: from the raw image to particle analysis" #| fig-align: "center" #| out-width: "85%" #| echo: FALSE myimages <- c("Data/picture.png") knitr::include_graphics(myimages) ``` ::: {.callout-note} In this tutorial, we'll see how to import data from a large number of files, and aggregate them into a single tidy array. This table can then be exported in csv format, or used to generate graphs. ::: # Data wrangling - Load the package `tidyverse`. Set the global `ggplot2` theme to black and white. Also, make it so that the `strip.background` (background of the facets titles) is blank, and that the `strip.text` is bold. ```{r} #| include: true library(tidyverse) theme_set(theme_bw()+ theme(strip.background = element_blank(), strip.text = element_text(face = "bold") )) ``` - Find all `[x]nmNi-T[y]-[z]min-[u].csv` files in the `Data` folder and store them in `flist`. You could use the function `glob2rx()`{.R} to help write a regular expression using the wildcard sign `*`{.R}. ```{r} #| include: true flist <- list.files(path = "Data", pattern = glob2rx("*nmNi*.csv"), full.names = TRUE) ``` - The pixel <-> length correspondence for each image has been stored in the file `Data/scales.csv`. Import this file into a `scales` tibble. - Next, add to `scales` a column `pix2_to_nm2` which will contain the pixel^2^ -> nm^2^ conversion value for each image. - Then, modify the `file` column to contain the file name without extension. - Separate the `file` column into 4 columns `thickness`, `temperature`, `time` and `img`. - Convert these new columns to numeric values (you need to remove the text in them first). ```{r} #| include: true scales <- read_csv("Data/scales.csv") |> mutate(pix2_to_nm2 = size^2/pixel^2, file = file |> str_remove(".jpg")) |> separate(file, c("thickness","temperature","time","img"), convert=TRUE, sep="-") |> mutate(thickness = as.numeric(thickness |> str_remove("nmNi")), temperature = as.numeric(temperature |> str_remove("T")), time = as.numeric(time |> str_remove("min"))) ``` - Let's now import all our data files into a tibble called `data`, and modify this tibble to also store the information written in the files names. We will do this in a succession of pipe operations. - First, import all csv files into a tibble called `data`. You can use the `read_csv()`{.R} function to do so, and look into the `id` parameter to store the file name in a column `file`. Also, we are only interested in the `Area` column. - Nest the `Area` column into a column `data` using the `nest()`{.R} function. This will make the next operations faster. - Modify the `file` column so that it contains the file name without extension and path. - Separate the `file` column into 4 columns `thickness`, `temperature`, `time` and `img`. - Convert these new columns to numeric values (you need to remove the text in them first). - Finally, unnest the `data` column to get a single column `Area` containing the area values. ```{r} #| include: true data <- read_csv(flist, id='file', col_select=Area) |> nest(data=Area) |> mutate(file = basename(file) |> str_remove(".csv")) |> separate(file, c("thickness","temperature","time","img"), convert=TRUE, sep="-") |> mutate(thickness = as.numeric(thickness |> str_remove("nmNi")), temperature = as.numeric(temperature |> str_remove("T")), time = as.numeric(time |> str_remove("min"))) |> unnest(data) ``` - Now we want to convert the areas in pixel^2^ to areas in nm^2^. - We will first join the `data` and `scales` tibbles into a tibble called `alldata`. - Then, we'll modify the `Area` column to convert it to nm^2^, and we'll create a `diameter` column containing the diameter of the particles in nm. - Then, we'll get rid of the `pix2_to_nm2`, `pixel` and `size` columns that are useless. - Finally, we'll filter the data to keep only particles with a diameter between 10 and 400 nm. ```{r} #| include: true alldata <- data |> inner_join(scales) |> mutate(Area = Area*pix2_to_nm2, diameter = sqrt(Area/pi)*2) |> select(-c(pix2_to_nm2, pixel, size)) |> filter(diameter>10, diameter<400) ``` ```{r include=FALSE} write_csv(alldata, "Data/alldata.csv") ``` ::: {.callout-note} In case you didn't manage to get there, [here](Data/alldata.csv) is the `data` tibble, you can read it with `alldata <- read_csv("Data/alldata.csv")`{.R}. ::: # Plotting and analysis ## Plotting - First we want to take a look at our data. Plot the histogram of all particle diameters, with a fill color depending on the time (you need to convert time to a factor), and with a grid showing temperature vs. substrate thickness. Put the legend on top of the graph, and add some transparency to your colors. - In fact, I usually prefer to plot it using `geom_density()`{.R} which is basically an histogram convoluted with a Gaussian distribution of bandwidth `bw`. This allows for smoother graphs. Make this plot and play with the `bw` parameter. - Your plot should look like this: ```{r} #| label: fig-histogram #| out-width: "100%" #| echo: FALSE #| fig-cap: "The wanted plot" #| fig-align: "center" knitr::include_graphics("Data/plot.png") ``` ```{r} #| include: true #| out-width: "100%" alldata |> ggplot(aes(x=diameter, fill=factor(time), color=factor(time)))+ geom_density(alpha=.1, bw=5)+ labs(x = "Particle Diameter [nm]", y = "Density [arb. units]", color= "Time [min]", fill = "Time [min]")+ facet_grid(glue::glue("{thickness} nm Ni") ~ glue::glue("{temperature}˚C"), scales="free_y")+ theme(legend.position = "top", text = element_text(size=10), strip.text.y = element_text(angle = 0))+ ggthemes::scale_fill_tableau()+ ggthemes::scale_color_tableau() ``` ## Analysis - Now, store in `particles_ave` the average particle diameter and its standard deviation per substrate thickness, time and temperature of reaction. ```{r} #| include: true particles_ave <- alldata |> group_by(thickness, time, temperature) |> summarise(diam = mean(diameter), sddiam = sd(diameter)) particles_ave ``` - Let's see what are the parameters influencing the particle diameters the most. For this, we'll perform a [multiple linear regression](http://www.sthda.com/english/articles/40-regression-analysis/168-multiple-linear-regression-in-r/) using `lm()`{.R}, fitting the `diam` variable as a function of a combination of `thickness`, `temperature`, and `time`. We can also set the weights for the `diam` values as the inverse of their standard deviation squared (by convention). Store the result in `fit`, and print the summary of the fit. ```{r} #| include: true fit <- lm(diam ~ thickness + temperature + time, data = particles_ave, weights = 1/sddiam^2) fit |> summary() ``` How would you interpret the result? What are the most important parameters?