Data engineering for the quantified self
Welcome to the homepage of the 2024 edition of the Everyday AI 3 (EAI3) course on Data engineering for the quantified. The course is taught as a module of the Studies on Human Behaviour course, part of the Data Science degree at the University of Trento.

 

 

News


Website online!

August 19th, 2024

 

 

Last modification: August 19th, 2024

Instructions


The Fall 2024 Edition of the course is delivered in the classrooms. The idea behind this course is that the students do most of the work during the course duration because this should yield considerably better results.

Learning how to handle one’s own data involves working on real personal data. Data collection is then a crucial part of the whole process. For this reason, the students of the course will be taught and asked to collect data from their own smartphones via an application developed by the Knowdive group, ideally over around one month. The calendar shows the dates of the data collection and when the data will be made available (dates can still change). Participants unwilling to collect their personal data will work on data collected in past data collections. More details will be provided by the professors during the first lessons.

Syllabus


Course Objectives and Outcomes

The course participants will learn how to collect and manage data about their everyday life to become researchers of themselves. The general objective is to allow the participants to gain a practical understanding of how data about their everyday life are related to AI. The participants have the opportunity to explore both technical questions about data quality and data generation, and ethical and privacy aspects. The course covers all the phases of the data life cycle: collection, preparation, documentation and distribution. After a brief theoretical introduction, the participants will collect behavioural data about themselves using a mobile application (available for Android and iOS) called iLog and developed by the Knowdive group at the University of Trento. The app collects phone sensor data and sends questions to the participants at regular intervals. Then, they will prepare and document their data. The course is data-intensive and hands on. The participants are free to decide whether to collect their own data or to work on data collected from previous studies. However, the collection and management of one’s own data is strongly suggested to fully understand and become aware of the challenges and the potential of handling one’s own personal data. The overall objective of this course is to familiarise with the management of personal data and to raise awareness about the value and the impact of one own’s data.

 

General Description

This course will cover the following topics:
  • Why and how to collect personal data
  • What type of data can be collected with mobile devices
  • How to prepare the data to reduce the cost of data use and reuse 3
  • How to document and distribute data

 

Prerequisites

Students from all backgrounds are welcome. Because of the practical parts, a basic knowledge of Python programming is strongly suggested. Basic knowledge of data science, ethics, and data governance is useful but not mandatory to attend the course.

 

Course modality

Theory:
  • The lessons will cover the basic aspects of data collection, data management and data distribution, mainly from a computer science perspective but also with some notions from social science.
Practie:
  • Participants will collect and process their own data. During the course, they will be guided through the various phases and asked to work on their data at home. Multiple lessons are allocated to answer questions and address issues about the project.
Modality:
  • The project work will run in parallel with the theory lessons. After the completion of a theory module, there is a practical application to one own’s data. During each theory lesson, there will be Q&A sessions about its content, and some lessons will be fully allocated to Q&A related to previous content and project progress.

Teachers


name.surname@unitn.it

Prof. Fausto Giunchiglia

Prof. Ivano Bison

Dr. Andrea Bontempelli

Dr. Matteo Busso

Calendar and Material


The course runs from September 9 2024 till November 26, 2024 with the following schedule

     

  • Monday, 10:30-13:30, Room A216 Povo 1

  • Tuesday, 14:30-16:30, Room A216 Povo 1

 

You might want to read the Instructions to understand how to take the course.

 

Notice also the titles and structure of the lessons yet to be delivered might change slightly. The rule of the thumb is: if there are links with materials, things won’t change; if there are no links to the materials, titles and content are just suggestions.

 

Lesson Number Date                                  Time Material                              Content of Material Lecturer(s)                 External resources                        
1 Mon 9 Sep, 2024 10:30 Module organization slides
Module organization IB and FG
2 Tue 10 Sep, 2024 14:30 Module Project organization slides
Quantified self slides
Big thick data and quantified self + project organization FG
5 Mon 23 Sep, 2024 10:30 Quantified self, data and privacy
iLog app<
Data collection FG
- Wed 24 Sep, 2024 - Google Play
App Store
Datascientia Project
iLog data collection starts
7 Mon 30 Sep, 2024 10:30 Slides Types of data: passive and active data FG
8 Tue 1 Oct, 2024 14:30 Q/A: project definition IB and FG
9 Mon 7 Oct, 2024 10:30 Data cleaning and preparation - part 1 (motivation, problem, methods) FG
10 Tue 8 Oct, 2024 14:30 Data cleaning and preparation - part 2 (Total survey error) IB
12 Tue 15 Oct, 2024 14:30 Data cleaning and preparation - part 3 (feature engineering) IB
13 Mon 21 Oct, 2024 10:30 Data cleaning and preparation - part 4 (pseudonymization and anonymization) FG
14 Tue 22 Oct, 2024 14:30 Data cleaning and preparation - part 5 (documenting and sharing) FG
- Tue 22 Oct, 2024 - iLog data collection end
17 Mon 4 Nov, 2024 10:30 Q/A: data preparation IB and FG
23 Wed 25 Nov, 2024 10:30 Final presentation IB and FG

Exam


The exam will consist of presenting the results of the study in a short presentation during the last lesson. The required deliverables (templates to be provided) are:

  • pseudonymized, cleaned and prepared dataset (only for evaluation purposes and removed from the course storage after the evaluation process);

  • metadata catalog with the metadata of the pseudonymized dataset (catalog example). The catalog is a static webpage hosted by the participant and can be private or public. If the visualization is set to private, remember to give permission to the professor to access the webpage.

  • static website describing the data collection and the data preparation

Collaboration Opportunities


Multiple positions are available as 150h and internships. They should be considered as the first part of a research project and thesis with the Knowdive group. The general activities of the group are listed on the website (http://knowdive.disi.unitn.it/), while activities already scheduled and available now can be found at http://knowdive.disi.unitn.it/work-with-us/. The 150h activities have variable length and are strictly related to software development: for this reason, knowledge of software development with at least onr programming language is a must. All the activities can also be carried on in a remote fashion.

 

Anyone interested in these opportunities can send an email to knowdive-positions@disi.unitn.it, providing already information about preferences in terms of topics or activities (if known). For 150h activities it is important to provide information about known programming languages with the corresponding level, a value in the range [1 - 5] where 1= basic knowledge, 5= advanced knowledge.

 

The applications to the “150 ore” program can be done at the link:
https://www.unitn.it/servizi/224/collaborazioni-studenti-150-ore
Notice that the deadline for applications for the A.Y 2024-2025 is September 30, 2024