In this tutorial, you will analyze hyperspectral images with samples of nuts (almond, hazelnut, pecan, and walnut) and shells for each nut type. The tutorial contains samples of known types and an unknown mix of samples.

Before doing this tutorial, first, do the tutorial Classification of nuts step 1

Your goal is to learn how to use Breeze to make a classification model and then use it to predict the class of new samples.

Import known class information to samples

In this tutorial you will use the same images as in the tutorial “Classification of nuts step 1 - basic” but add a new class variable to classify the type of nut. You will also test two classification models types (PLS-DA, SIMCA).

Make sure that you are in the Record mode (if you are in the “Play” or “Model” mode use the shortcut buttons (lower-left corner) to return to the Record mode).

The “Nuts_classification” study and “Table” tab should be selected. Make sure that the Segmentation drop-down menu under the table is set to “Sample - Nuts_Classification”.

Press the “Import” tab, select “Import variables and id data” and press “Apply”.

Select “Nuts_Classification_Train.csv” and press “Next”.

Make sure the correct segmentation is selected and press “Finish”.

In the Table view, you should see the new “Nut type” class variable that was imported. The reference values were automatically matched with the correct sample object.

The spreadsheet .CSV file that you imported looks like this when opened in Excel. The column “Measurement” matches the class data (“Nut or shell” and “Nut type”) to the correct images and samples.

Press the “Explore” tab and then under “Color” select “Nut type” to see how the different types cluster

Press “Model

Create classification model (PLS-DA)

Now you will create a Classification model for “Nut type”. Press Add to make a new model. Select “Classification” and “Method” PLS-DA. Write a name or use the default. Press OK.

Choose the “Nut type” category in the first step of the wizard and press “Next”

In the 2nd and 3rd steps of the model wizard just press “Next” (use the default) so that you come to the 4th step (“Model”).

In the “Overview (total for all Y) you should have 6 components for your PLS-DA model. Under “Components” you can press “Add” to add more components to the PLS-DA model (to a total of 6, i.e. 6 red and 6 blue bars). In the “Overview (total for all Y)” you can see that the model improved (higher R2 and Q2). Press “Next”.

In the last step, you can see the class separation by pressing the tabs for the different classes under the “Nut type” vs “Ycalc.Nut type” plot. The “Variable overview” graph shows that Almond and Hazelnut could not be classified as well as the other types. Press “Finish”.

Press the “Classification” tab to see how well the training data was classified by the PLS-DA model (the results might vary slightly depending on how you did your sample model).

In this example, 3 of the “Almond” samples are misclassified as a “Hazelnut” and 3 as “No class”. For the “Hazelnut” and “Pecan”, there are 5 and 1 samples respectively that are incorrectly classified as “No class”.

Create classification model (SIMCA)

Let’s compare the PLS-DA model with a different classification model type. Press “Add”, select “Classification” and “SIMCA” as “Method”. Write a name for the model or use default and press OK.

Choose “Nut type”

In the 2nd and 3rd steps just press “Next” (use the default).

In the SIMCA method, one PCA model is created for each class. All samples are then compared to that class model to determine if they belong to that class. In the Coomans plot, you can set the critical distance for each class model. If a sample is inside the critical distance it belongs to the class.

Select the class model for “Almond” by using the tabs under the Coomans plot

Drag the vertical red line to adjust the limit to include all “Almond” samples (but as few as possible of the other samples). Samples to the left of the red line are included in the Almond model.

Press the tab for each of the classes and repeat the steps in a. and b.

Example: All Shell samples (blue dots) are inside the limit of the Shell model

(“Overview (Total for all Y)” is only showing how well each class model can explain the samples in that class. It does not show how well it can classify other samples)

Press “Finish” to complete the model

Press the “Classification” tab to see how well the samples in the training data were classified. The results will vary depending on how you adjusted the critical limits in the previous step. Press the three dots to maximize the table view and the arrow to open the preview image.

Click on a field in the table to see the corresponding samples in the preview image.

In this example, with the SIMCA classification a total of only 6 samples were misclassified (you might get slightly different results depending on how you set the critical distance in the previous step). This can be compared to 12 misclassified samples for the PLS-DA.

Import the known class information for the test samples

To validate the model you should use an external test set to see how well it can classify samples that were not in the training data set. We will now add the known class information to the image “Mix” in the “Test” group.
Press the “Record” button and then press the “Open” button to open the “Nuts_ Classification” study.

  • Select “Test” group

  • Select “Import” tab and select “Import variables and id data”

  • Press “Apply”.

Select “Nuts_Classification_Test_Samples.csv” and press “Next”.

In the next step make sure the correct segmentation is selected and press “Finish”.

The table should look like this for the Test group. Scroll down to see the classes for the “Mix” samples.

Press the “Model” button

and with the SIMCA model selected press “Edit”

Press the plus sign to Add “External Record Test Group”. Make sure the group named “Test” is selected in the menu on the right.

Press the “Up” button located in the upper left corner

Press the “Classification” tab and make sure that “Test (External)” is selected in the drop-down menu under the table.

You can now see how well the SIMCA model could classify the 23 samples in the “Test” group.

Let’s see how well the PLS-DA model can classify the test set. Select the PLS-DA model, press Edit and add the “Test” group as the “External Record Test Group” (just like you did for the SIMCA model)

Look in the Classification table for the PLS-DA. Which model was best?

Create workflow and Import Record test data

Now let’s test the new models for “Nut type” in the Play mode. Press “Play”

Press the “Add” button, “Record data” tab, and select the “Test” group. Press “OK”.

By default Breeze applies the latest model that you have in “Model” in the workflow. In this case the SIMCA model.

Press the “Analyse Tree” tab. In the Analysis Tree, you can see the steps in the workflow. First, the Measurement is analyzed by the Sample model that finds the samples (Object). Each Object is then analyzed by our model and then classified into different classes.

Click on the blue symbol for the “Nut type” model (the model that classifies into the Nut or shell classes), and then press the “Duplicate” button.

The “Nut Type” model has now been copied and added to the Analyse Tree.

Click on the first “Nut type” model to see the settings menu for that model on the right side (pull the vertical line to expand the size). In this menu, you can see the settings for the selected “Node” in your “Analyse Tree”.

In the “Model” drop-down menu you can see that this is the SIMCA model. So in the “Alias” field, write “SIMCA” and press enter on your keyboard.

Click on the 2nd “Nut type” model in the Analyse Tree and change to using the PLS-DA model in the “Model” drop-down menu. Write the Alias as “PLS-DA” and press enter. As you can see the text for each model has now been updated in the “Analyse Tree”.

In Breeze, you can add many different types of descriptors to your workflow. Click on the sample model in the Analyse Tree and then press “Add node”.

A new Node has now been generated as a subnode under the Object generated from the Sample model. In the “Method” drop-down menu you can see the different types of descriptors that are available in Breeze. Select “Spatial values from sample”. The default “Output” is “area”.

With the Area node selected in the Analyse Tree, press “Duplicate” and select “Width” in the “Output” menu. Press “Duplicate” again and this time select “Circumference”.

Select the Table view and then press “Apply Changes”

The “Table” should look like this after it has been updated to the new workflow.

Nice job! You have reached the end of step 2 of the “Classification of Nuts” tutorial. See step 3 at:

Nut classification step 3