R Exercises - Particle analysis from SEM images

Author

Your Name

Published

November 14, 2024

Context

The aim of this exercise is to determine the correlations between experimental parameters and the size distributions of nickel nanoparticles obtained by dewetting. The nanoparticles were obtained by heat treatment at different temperatures under dihydrogen of a silicon wafer on which a layer of 2, 4.5 or 8 nm of nickel had previously been deposited. Treatment was carried out for 5, 10, 15, 30 or 60 minutes.

These wafers were then observed by scanning electron microscopy, and different images of the surface were taken: these images were analyzed with ImageJ, as shown on Figure 1. With this analysis, we obtained a set of files of type [x]nmNi-T[y]-[z]min-[u].csv, where:

[x] is the Ni thickness in the sample before treatment,
[y] is the treatment temperature in ˚C,
[z] the treatment time in minutes, and
[u] the image number.

Area values are given in pixel², with scales stored in a separate file Data/scales.csv.

Figure 1: Typical SEM image of Ni nanoparticles: from the raw image to particle analysis

Note

In this tutorial, we’ll see how to import data from a large number of files, and aggregate them into a single tidy array. This table can then be exported in csv format, or used to generate graphs.

Data wrangling

Load the package tidyverse. Set the global ggplot2 theme to black and white. Also, make it so that the strip.background (background of the facets titles) is blank, and that the strip.text is bold.
Find all [x]nmNi-T[y]-[z]min-[u].csv files in the Data folder and store them in flist. You could use the function glob2rx() to help write a regular expression using the wildcard sign *.
The pixel <-> length correspondence for each image has been stored in the file Data/scales.csv. Import this file into a scales tibble.
- Next, add to scales a column pix2_to_nm2 which will contain the pixel² -> nm² conversion value for each image.
- Then, modify the file column to contain the file name without extension.
- Separate the file column into 4 columns thickness, temperature, time and img.
- Convert these new columns to numeric values (you need to remove the text in them first).
Let’s now import all our data files into a tibble called data, and modify this tibble to also store the information written in the files names. We will do this in a succession of pipe operations.
- First, import all csv files into a tibble called data. You can use the read_csv() function to do so, and look into the id parameter to store the file name in a column file. Also, we are only interested in the Area column.
- Nest the Area column into a column data using the nest() function. This will make the next operations faster.
- Modify the file column so that it contains the file name without extension and path.
- Separate the file column into 4 columns thickness, temperature, time and img.
- Convert these new columns to numeric values (you need to remove the text in them first).
- Finally, unnest the data column to get a single column Area containing the area values.
Now we want to convert the areas in pixel² to areas in nm².
- We will first join the data and scales tibbles into a tibble called alldata.
- Then, we’ll modify the Area column to convert it to nm², and we’ll create a diameter column containing the diameter of the particles in nm.
- Then, we’ll get rid of the pix2_to_nm2, pixel and size columns that are useless.
- Finally, we’ll filter the data to keep only particles with a diameter between 10 and 400 nm.

Note

In case you didn’t manage to get there, here is the data tibble, you can read it with alldata <- read_csv("Data/alldata.csv").

Plotting and analysis

Plotting

First we want to take a look at our data. Plot the histogram of all particle diameters, with a fill color depending on the time (you need to convert time to a factor), and with a grid showing temperature vs. substrate thickness. Put the legend on top of the graph, and add some transparency to your colors.
In fact, I usually prefer to plot it using geom_density() which is basically an histogram convoluted with a Gaussian distribution of bandwidth bw. This allows for smoother graphs. Make this plot and play with the bw parameter.
Your plot should look like this:

Analysis

Now, store in particles_ave the average particle diameter and its standard deviation per substrate thickness, time and temperature of reaction.
Let’s see what are the parameters influencing the particle diameters the most. For this, we’ll perform a multiple linear regression using lm(), fitting the diam variable as a function of a combination of thickness, temperature, and time. We can also set the weights for the diam values as the inverse of their standard deviation squared (by convention). Store the result in fit, and print the summary of the fit.

How would you interpret the result? What are the most important parameters?