BIG DATA AND DATA MINING
cod. 1009070

Academic year 2023/24
1° year of course - Second semester
Professor
- Flavio BERTINI
Academic discipline
Informatica (INF/01)
Field
Discipline informatiche
Type of training activity
Characterising
48 hours
of face-to-face activities
6 credits
hub:
course unit
in ITALIAN

Learning objectives


At the end of the course the student should have acquired knowledge and skills related to knowledge representation techniques and data mining algorithms. In particular, the student is expected to be able to:
- Know the main problems of Big Data and the objectives of Data Mining.
- Know the main techniques of knowledge representation.
- Knowing how to use formalisms appropriately for the representation of knowledge.
- Knowing how to use the main data mining techniques and algorithms.
- Knowing how to present a work project.
- Be able to analyze a problem and develop a data mining project.

Taking Dublin Indicators into account:

Knowledge and understanding
The course introduces the first concepts related to the operating systems. Particular emphasis is given to the understanding of the main algorithm underlying prominent kernel tasks. The reference text is in Italian, but standard English terminology is commonly used during the lessons as goodwill to the consultation of the international scientific literature.

Applying knowledge and understanding
The knowledge presented is always applied to the resolution of specific problems. The companion exercises are focused on problem solving and testing the comprehension of proposed algorithms. Often the solution methods are presented in the form of an algorithm, providing the students the ability to formalize procedures that are useful in many parts of computer science, and not only in the study of operating systems.

Making judgments
The exercises, which are proposed in relation to the theoretical part presented in class, can be solved individually or in groups. The comparison with classmates, work at home or in classroom, favors the development of specific skills in students to enable the explanation of arguments to fellows and teachers. Often the exercises can be solved in many different ways and listening to the solutions proposed by other allows students to develop the ability to identify common structures, beyond the apparent superficial differences.

Communication skills
The numerous discussions on the different methods to solve problems allow students to improve communication skills. Specific communication of computer technology is also used during classes and exercises.

Learning skills
The study of the origins of technological solutions and their introduction motivated by quantitative considerations contributes to the students’ ability to learn in a comprehensive way. The knowledge acquired is never rigid and definitive, but it is adaptable to any evolution and change of perspective and context.

Prerequisites


Good knowledge of the relational data model is strongly recommended. Knowledge of imperative programming languages.

Course unit content


■ Semi-structured and unstructured data models
■ The limits of SQL and an introduction to SQL/XML and XQuery
■ The information retrieval models and web information retrieval
■ The datawarehousing and data mining

Full programme


■ Part I
■ Introduction
■ Semi-structured and unstructured data models
■ Part II
■ XML introduction
■ SQL/XML language
■ XQuery language
■ XQuery and database management system
■ NoSQL database
■ Part III
■ Information Retrieval introduction
■ Ranking
■ Web Information Retrieval
■ Information Retrieval evaluation
■ Advanced methods
■ Part IV
■ Data analytics
■ Data warehouse
■ Data mining: association rule, classification and clustering

Bibliography


■ A. Moller, M. Schwartzbach - Introduzione a XML - Pearson, 2007, ISBN: 9788871923734
■ P.-N. Tan, M. Steinbach, V. Kumar - Introduction to data mining - Addison Wesley, 2005, ISBN: 0321420527
■ C.D. Manning, P. Raghavan, H. Schütze - Introduction to Information Retrieval - Cambridge University Press, 2008, ISBN: 0521865719
■ M. Golfarelli, S. Rizzi - Datawarehouse. Teoria e pratica della progettazione - McGraw-Hill Education, 2006, ISBN: 9788838662911

Teaching methods


Teaching activity partly in the classroom

Assessment methods and criteria


The assessment takes place with the discussion of a scientific article. The student explores an advanced topic starting from a research paper among those proposed and prepares a presentation to be used during the exam. The discussion will be mainly on the topics of the chosen article. The student, after the instructor's approval, can alternatively carry out a project on a topic of the course. The results of the project will have to be discussed during the exam. To take part in an exam session, you must register before 7 days of the exam date. Further health indications and restrictions may imply the activation of the remote mode for the exam.

Other information

- - -