## Learning objectives

Provide the basis of chemometrics illustrating its application and that of multivariate data analysis in general, in the following contexts: optimization and experimental design; exploratory and descriptive statistical data analysis; methodologies for quality control of products and industrial processes (QbD, PAT); management of data from instrumental techniques; applied aspects of regression and classification.

At the end of the course the student will:

- know the main methods for multivariate analysis of high-dimensional and complex data and their graphical representation;

- know the main contexts of application of chemometrics in experimental sciences and in the productive context;

- recognize the different types and structure of data and the problems involved in preprocessing;

- know how to use the main methods of multivariate analysis of data in contexts: exploratory analysis, experimental planning, classification and multivariate calibration.

- be able to build appropriate models for the analysis of data sets through practical exercises with commercial and/or open source tools.

- acquire knowledge and competences of tools for model validation and critical interpretation of data analysis results.

Greater details on learning objectives is present on the expected learning results section.

## Prerequisites

Basic knowledge of General Chemistry and Analytical Chemistry regarding the basics of sampling, measurement uncertainty, an overview of the main instrumental analysis techniques. Some univariate statistical concepts: the concept of distribution of a variable, mean, standard deviation, correlation between pairs of variables. Knowing how to use a computer, a spreadsheet and a text editor.

## Course unit content

Introduction to Experimental Design and Optimization techniques: exploration and screening (Full and fractional factorial designs, pluckett and Burman design); optimization (central composite designs); d-optimal designs; designs for the study of formulations. Introduction to elaboration of response surfaces. Diagnosis of models through the analysis of residuals and normal probability plots.

Exploratory data analysis: univariate methods (frequency histograms, box plots, scatter plots); multivariate methods, meaning, definition and calculation of the latent variables. Principal component analysis, PCA (definition, derivation, application), graphical representation (scores, loadings, biplot); Cluster analysis. Pretreatment of multivariate data: punctual variables, instrumental signals, images. Introduction to multivariate methods for process monitoring: multivariate control charts. Introduction to NIR spectroscopy and pretreatments of NIR signals. Outline of spectroscopic techniques that can be implemented at/ on/in-line in the PAT context.

Methods for Uni and Multivariate Regression: MLR, PCR and PLS. Multivariate calibration. Illustration of some variable ranking methods.

Introduction to classification: SIMCA, LDA, PLS-DA, differences and context of application.

Each topic includes integration with exercises that provide for the analysis (and in some cases the acquisition) of the data set related to chemical experimentation. Softwares used: Open Office, PLS-Toolbox for MATLAB.

## Full programme

- - -

## Bibliography

PLS toolbox Manual - Eigenvectors: www.eigenvector.com

-Several Tutorial papers suggested and furnished by the Teacher.

-K. Varmuza, P. Filzmoser, Introduction to multivariate statistical analysis in chemometrics, CRC press 2009. Print ISBN: 978-1-4200-5947-2 eBook ISBN:978-1-4200-5949-6

- Kim H. Esbensen, Brad Swarbrick, Frank Westad, Pat Whitcomb, Mark Anderson, Multivariate Data Analysis: An introduction to Multivariate Analysis, Process Analytical Technology and Quality by Design 6th Edition 2018. ISBN-13: 978-8269110401 ISBN-10: 826911040X.

- Trevor Hastie, Robert Tibshirani, Jerome Friedman, The elements of statistical learning. Data Mining, Inference, and Prediction. 2nd Ed. Springer Series in Statistics, Springer. Stanford, California 2008

- M. Forina Fondamenta di Chimica Analitica e-book:

www.sisnir.org/452/index.html (download at the bottom of the page)

- D.L. Massart and B. Vandeginste, Chemometrics: a textbook, Elsevier 1988, ISBN: 978-0-444-42660-4, www.sciencedirect.com/science/bookseries/09223487/23

## Teaching methods

The lessons will be delivered in presence unless COVID-19 emergency would impose delivering at distance in this case a synchronous modality will be adopted.

Frontal lessons with power points slides. Interactive exercises with chemometrics software. Autonomous exercises by the students on real data sets. Discussion on the presented topics.

Comments and correction of the students' exercises reports. Seminars which illustrate chemometrics application to industry.

## Assessment methods and criteria

Evaluation during the course: it is required to write a report for each exercise session. Each report is evaluated on a numerical scale from 0 to 10, according to the criteria: organization, language and ability to synthesize (0-3); Selection of appropriate methods of analysis (0-2); correct application (0-2); ability to describe and interpret the results (0-3). The evaluation does not lead to the attribution of a valid score for the final exam but serves as a qualification to take the exam (average> = 6).

Final verification, it is possible to choose between two modalities:

1) to each student is assigned, towards the end of the course a data set to be processed by some of the methods used during the exercises, following the requests assigned. Then, it is requested to make a presentation of the processed data, in electronic format, in order to be discussed during the final exam. During the discussion some questions are asked taking a cue from the student discussion. In addition, two questions are posed which may relate to other course topics not discussed previously, in order to ascertain the degree of knowledge.

2) to the student are posed 3 broad questions on the three basic parts of the program (DoE, exploratory analysis, modeling) and other 2-3 questions on more specific topics aimed at evaluating the ability to apply the methodologies studied.

In the final assessment are evaluated: the correct selection/description and application/discussion of processing methods used (30%); the ability to apply the acquired knowledge (30%); communication abilities (10%); the level of theoretical knowledge (30%). The final score is expressed in thirtieths with eventual praise.

Non-attending students, who have not presented exercise reports, have to answer further questions to ascertain their knowledge and judgment in the analysis of data sets

## Other information

ORARIO DI RICEVIMENTO (Dip. Scienze Chimiche e Geologiche 1 Piano):

Lunedi ore 12-13; Giovedi ore 14.30-16.30.

Curriculum: http://personale.unimore.it/AddressBook/curriculum/cocchi