BIG DATA AND BUSINESS INTELLIGENCE
cod. 1007077

Academic year 2021/22
3° year of course - First semester
Professor
- Gianfranco LOMBARDO
Academic discipline
Sistemi di elaborazione delle informazioni (ING-INF/05)
Field
Ingegneria informatica
Type of training activity
Characterising
48 hours
of face-to-face activities
6 credits
hub: PARMA
course unit
in ITALIAN

Learning objectives

This course aims to provide general knowledge about the main methodologies and techniques related with Data Science and Data Engineering to develop applications for Big Data Mining. In particular, students will develop problem-solving skills in the field of descriptive and inferential analysis learning how to develop an intelligent system, how to design a machine learning model and how to process and storage Big Data.

Prerequisites

No propedeutic courses. However, Students should have knowledge of programming (especially python) and basic math to understand tools behind inferential analysis.

Course unit content

-Data Science e Business Intelligence
-Artificial Intelligence
-Machine Learning
-Data Engineering
-Network Science

Full programme

Introduction to Big Data and Python review (3 hours)
1.1 Definitions
1.2 Reason behind this phenomenon
1.3 Datewarehouse vs ERP
1.4 Structured and unstructured data
1.5 What is Data Science?
1.6 Procedural and OOP with Python
Business intelligence and data science (7 hours)
2.1 Definitions
2.2 Value of Knowledge
2.3 Challenges in Business Intelligence
2.4 Data: a value for BI
2.5 Business Intelligence vs Data Science
2.6 Data visualization
2.7 Descriptive analysis and its metrics
2.8 Inferential analysis
2.9 Python for data analysis (Pandas, Numpy, Matplotlib, Scipy)
2.10 Practical lesson on data analysis

Artificial Intelligence (6 hours)
3.1 weak AI vs strong AI: The Turing test
3.2 History of AI
3.3 AI techniques: symbolic AI vs subsymbolic AI
3.4 Build an AI system: knowledge and logic
3.5 Semantic technologies
3.6 Inductive reasoning and learning
3.7 Inference in IA: From expert systems to machine learning
3.8 XAI: Black Box AI & Explainable A

Machine learning and inferential analysis (18 hours)
4.1 Definitions, dataset and types of ML
4.2 Overfitting and underfitting
4.3 Regression: metrics and linear regression
4.4 Practical lesson with Scikit-learn on Regression
4.5 Classification: metrics, logistic regression and decision tree
4.6 Practical lesson with Scikit-learn on Classification
4.7 Ensemble learning: Bagging and boosting (Adaboost, Gradient Boosting)
4.8 Machine learning project management: parameters tuning. model comparison, regularization
4.9 Practical lesson with Scikit-learn on Ensemble learning
4.10 Natural Language Processing: Bag of words, tokenization e tf-idf
4.11 Neural networks: Multilayer Perceptron, Backpropagation and iper-parameters
4.12 Practical lesson with Keras on neural networks and NLP

Data Engineering (6 hours)
5.1 Big data management
5.2 Relational database
5.3 NoSQL database
5.4 An introduction to Hadoop and PySpark

Network Science for data analysis (8 hours)
6.1 Definitions
6.2 Modelling data with graphs (networks)
6.3 Centrality and community detection measures
6.4 Feature for graph analysis
6.5 NetworkX and Gephi
6.6 Practical lesson with networks

Bibliography

- A. Rezzani (2017). Big Data Analytics. Il manuale del data scientist. Maggioli Editore (Aopogeo Education).
- A. Geron. Hands-On Machine Learning With Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems
- A.Barabasi. Network Science

Teaching methods

Lectures and laboratory exercises.
Lectures will cover the theoretical aspects of the course subjects.
Practical exercises on real problems will be carried out in the laboratory

Assessment methods and criteria

There are no mid-term tests.
The exam consist of two parts:
i) a written exam consisting on the theoretical topics of the course covered in class with the aim of evaluating the knowledge gained on these matters.
ii) a written report on a project work to assess the ability to apply the knowledge gained during the course. However, the value of its evaluation will also depend on the quality of the developed system and the attached documentation.
The exam is passed if, in each of the two parts, the student reaches at least sufficiency.
The final mark is a weighted average score obtained in the written test (60%) and the one obtained in the project work (40%).
Praise is given in case of achieving the highest score on all partials.

Other information

- - -