Breadcrumbs

Modelling

Breeze Modelling: Best Practices

While creating models in Breeze is intuitive, building optimal models requires a strategic approach to data. These guidelines help you navigate the balance between flexibility and model stability.

1. Define Your Design Space

Build models with representative data that covers the full range of conditions your model will encounter in the real world.

  • Design of Experiments (DOE): We recommend using DOE to systematically map your process variables. This ensures your model isn't "blind" to natural variations in your production environment.
    Design of experiments - Wikipedia.

2. Organizing Data: Simplified vs. Granular

Breeze allows you to assign Classes, Properties (numerical values), and Identifiers (text labels) at different levels.

  • The Simplified Approach (Recommended): Place only one class or one property type into a single measurement (image). Benefit: This mirrors the structure of the Breeze tutorials and makes it incredibly easy to create, evaluate, and audit your models.

  • The Granular Approach: You can assign unique classes or properties to individual samples within a single measurement. Benefit: This is powerful for complex scenes where many different items are captured at once.

3. The Re-training Trap: Segmentation Sensitivity

Re-training is a common real-world scenario where a model is updated with new data to account for process changes. However, there is a significant risk if your model relies on Segmentations or Excluded Samples in the Analysis Tree.

The Technical Risk: Breeze models are based on the measurement file and the numerical sample indices within that file.

If you change your segmentation logic (e.g., adjusting a threshold), the index that previously referred to a specific sample might now refer to a completely different sample or a background area in that same image.

For example, if you have assigned a specific Class or Property to "Sample #5," and a segmentation change makes the new "Sample #5" a different object, your model will re-train using incorrect labels.

When you re-train models in Breeze in the Modelling wizard, pay attention to the samples that are included in your updated models, so that you use the right data.

Safe Practice: If you plan to re-train frequently, keep your Analysis Tree and segmentation steps consistent. If you must significantly alter how samples are identified, it is often safer to create a new model to ensure your metadata remains synced with the correct pixels.

Modelling Wizard usage tips

Selecting rows in the Model wizard

The right Select pane in the Model wizard has commands for quickly selecting rows in the table for further interaction to create the model.

image-20260123-120007.png

The Select rows by buttons has various commands for selecting rows in the table:

  • Interval - select evenly spaced rows

  • Class - select rows matching a class

  • Property value - select rows which have certain property values

  • Random percentage - select a percentage of rows randomly

  • Model binning - select a percentage of rows by using the model space to distribute the selection evenly. This can be used to remove sparse outliers (noise) and keep the most representative data, resulting in a more stable model.

    • This works by grouping the scatter plot points into a 2D grid (similar to a 2D histogram). Density represents how many pixels fall within the same grid cell. The percentage threshold selects only points whose density is above a given fraction of the maximum density.

  • Included, Excluded - select rows that are Included in the model, or Excluded from the model.

You can also select rows in the table manually by clicking. Use Shift + Click to select are region, or Ctrl + Click to select disjoint rows.

After the rows are selected, use the toolbar above the table to act on these rows to for example excluding them from the model, or setting as test data.

Subpages

  • Cross-validation

    Cross-validation evaluates model performance by predicting unseen data, with various methods like random, stratified, and leave-one-out techniques to ensure robustness and generalizability.