Learning objectives
This course will teach the basis of data driven inference in the physical sciences. Students will acquire computational skills, knowledge of statistical analysis, error analysis, good practises for handling, processing, and analyzing data and (including big-data) programmatically, and communication and visualization skills.
Prerequisites
Coding experience, preferably in python. Basic statistical knowledge (descriptive statistics). Basic linear algebra knowledge (matrix, vectors, matrix multiplications and transformations
Course unit content
The course will be organized in a modular fashion, with some guest lectures. Each machine learning method will be studied as it is applied to a physical problem, based on open data and literature examples. Students will learn from examples of machine learning methods applied to current problems in Physics and the Natural Sciences. Some of the simpler algorithms will be explored in detail and implemented from scratch, others will be implemented through the use of dedicated python libraries.
Full programme
The course will review: Traditional Null Hypothesis Testing statistics concepts and modern applied statistics and machine
learning methods including: Bayesian Statistics, MonteCarlo Markov Chains, Principal Component
Analysis, Support Vector Machines, Tree methods, Clustering, and Neural Network (including
Autoencodes, Convolutional, and Recurrent Neural Networks).
You will learn examples of machine learning methods applied to current problems in Physics and the Natural Sciences. You will acquire basic computational skills, knowledge of statistical analysis, error analysis, good practises for handling, processing, and analyzing data and (including big-data) programmatically, and communication and visualization skills
Bibliography
No textbook is required but several textbooks may be helpful throughout the class, including :
• Elements of Statistical Learning, Hastie, Tibshirani, Friedman, Springer 2001
• Statistics, Data Mining, and Machine Learning in Astronomy, Ivezic, Connoly, VanderPlas, Gray, Princeton Press 2nd edition
• ML in python: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow probably the book that is closer to the syllabus in terms of techniques, but doesn’t buy it, because the second edition is due to come out imminently and the deep learning chapters of the previous edition are out of date now
Additional textbooks, particularly helpful for students with less experience in coding or python, include:
• Python Data Science Handbook, Jake VanderPlas, O'Reilly Media [https://www.oreilly.com/library/view/python-data-science/9781491912126/]
• computing and coding: Beginning Python Visualization, 2009
• data analysis: Statistics in a nutshell, S. Boslaugh, O'Reilly Media
• Visualizations: Visualizations Analysis and Design, T. Munzer, 2014
Most of the content of the listed books that will be referred to in lectures can be found online.
Teaching methods
Google Collaboratory will be used for the class. Homework can be developed on any platform as long as the computational set up consistent the entire class: the class assistants and I need to be able to reproduce your work and obtain the same results. Modules and library used in your work need to be accessible to me, the graders, and your classmates. We may make a docker image and a virtual environment as well and instructions on how to set up your environment to allow you to work offline
Assessment methods and criteria
Homework (reproducing literature analysis) done in groups, Midterm real-time exam, final group project, quizzes to assess ongoing understanding. Participation is also included in the grading
Other information
original syllabus from last year's course that i will base this on https://docs.google.com/document/d/16Rh9lnr4cMicDvpxehaZGEGMRW_PVB1FATyJcKRdEGI/edit
syllabus tradotto https://docs.google.com/document/d/1HJNnSd3ghgVBQj4ugB-jp5YlYU2J1x-4mUUCtyS38lM/edit?usp=sharing(automaticamente da https://www.deepl.com/translator.