CODING IN R FOR DATA ANALYSIS
cod. 1011639

Anno accademico 2023/24
2° anno di corso - Secondo semestre
Docente
- Elisa BERGAMI
Settore scientifico disciplinare
Ecologia (BIO/07)
Field
A scelta dello studente
Tipologia attività formativa
A scelta dello studente
24 ore
di attività frontali
3 crediti
sede: UNIMORE
insegnamento
in INGLESE

Obiettivi formativi

Students attending the course will acquire a general knowledge of the R environment and its extensions and they will be able to put into practice data analysis previously shown in other statistical courses.

Knowledge and understanding:
- recognise the general features of the R environment, R studio and R syntax.
- list the main R packages for data wrangling, analysis and visualization.

Applied knowledge and understanding:
- understand and apply functions from R packages to transform and plot data.
- apply the knowledge to perform data analysis in the field of Biological/Environmental Sciences within R.

Autonomy of judgment: evaluate and interpret the outputs of the different functions, reorganizing the knowledge learned and knowing how to choose the most suitable function for data analysis in R.

Communication skills: Generate a report showing the results of data processing and analysis using RMarkdown.

Prerequisiti

Knowledge in Mathematics and Data analysis. Students should be able to recognise and use the main statistical analysis techniques used in Biological sciences and related graphical representations.

Contenuti dell'insegnamento

The course aims to serve as an introduction to the R statistical programming language and the main packages useful for data organization, analysis and plot in a uni- and bi-variate context. The course provides the knowledge and skills to perform statistical analysis in the R environment within the field of Biological and Environmental Sciences.

Programma esteso

Introduction to R and data wrangling (1 CFU)
Presentation of the course and teaching material.
Download and installation of R and Rstudio. R syntax and foundational R programming concepts.
Data import from online datasets. Data types, vectors, matrices and indexing and operations in R including sorting and data wrangling (dplyr and tidyr).

Data visualization and report (1 CFU)
Read in, process and visualize distributions of univariate and bivariate data using ggplot. Main types of plots for visualizing distributions include histograms, density plots, bar plots, Tukey boxplots and scatter plots. Produce data analysis reports using RMarkdown.

Data analysis (1 CFU)
Normal linear models (e.g. t-test, One way ANOVA) and generalised linear models with exercises based on real data sets available from case studies in the field of Biological and Environmental sciences. Basics of programming in R.

The program will be adapted to the audience and additional time may be dedicated to one section of the program, as needed and depending on the feedback from the students.

Bibliografia

Additional reference text(s) for the R environment and its packages will soon be added.

Metodi didattici

Classroom lecture-style presentations to introduce and explain key concepts and theory (using presentations, spreadsheets, scientific articles) followed by practical exercises held in the computer room. Exercises will be performed step by step with the professor, allowing time to become familiar with R environment. Additional exercises will be solved individually by the students to gain confidence with the concepts used to solve the exsercises.

Modalità verifica apprendimento

The exam consists of multiple-choice questions (30% of the final mark) followed by an exercise to be performed in R and R studio (70% of the final mark) in the computer room. The examination will be carried out in 45-60 minutes.

Altre informazioni

Lessons frequency is not mandatory, although it is strongly encouraged considering the hands-on activity planned.