FAQ

The DataScientia team is at your disposal for any questions related to technical aspects, clarifications or possible collaborations. Contact Us.

1. Data catalog search and navigation

What is the LivePeople catalog?

See the catalog description.

Why are you not distributing your data through one of the existing data catalogs?

Current data catalogs are not designed to distribute person-centric data at our granularity level, thus requiring custom procedures. To reduce the risk of re-identification or abuse, the data are shared only for research purposes with identified researchers. The current procedure has been designed with legal and privacy experts.

What is the difference between datasets, bundles and projects?

See the dataset organization.

Why are the datasets organized into datasets, bundles and projects?

Based on the GDPR minimization principle, data must be adequate, limited, and relevant for the analysis. Thus, data from measurement instruments, such as accelerometer and time diaries, are provided separately. Researchers can request access to a single dataset of a specific data collection (e.g., WiFi networks in Italy in the DiversityOne data collection) or a combination of datasets from multiple data collection or sensors. To streamline dataset selection and download, we have created thematic bundles that group data commonly used together for key research purposes. For instance, activity recognition studies can download the motion bundle grouping accelerometer, activities, step counter and others. Another bundle is tailored for studying social interaction and combines questionnaires, time diaries, and location data. The catalog lists both bundles and datasets containing one single sensor.

What is the meaning of the metadata?

The metadata provides information about the dataset and allows the data consumers to understand whether it fits their needs or research questions. The metadata glossary describes them.

What is the Parquet format?

The format of each file in the datasets is Apache Parquet, an efficient data storage format that can be opened by most of the existing data processing tools. The format is supported by most of the main data analysis tools. Suggested tools to process or view the data are:

  • pandas, a python library. Parquet file can be read with pd.read_parquet('path/to/dataset.parquet');
  • DuckDB, a in-process database solution (see how to read parquet file on DuckDB documentation);
  • Tad, a desktop application to visualize parquet files.

2. Data request, download and usage

Can I download the data directly from the data catalog?

No, data are not directly accessible from the data catalog. Each dataset webpage has a link to the request forms. LivePeople catalog provides access to the metadata only.

How to download the data?

After submitting the form online, if the request is accepted, you will receive an email with instructions on how to access the storage with the requested datasets. Access will be granted for a limited period.

Can I request the data from multiple data collection projects?

Yes, in the request form, you can request data from multiple projects.

Can I request all the data of a project?

Yes, Datascientia will evaluate the coherence of the research proposal with the requested datasets.

What are the eligibility criteria to request the data?

The specific criteria for each dataset are listed in the license. For most of the dataset, the main requirement is to be a researcher affiliated with a research institution, and the usage of the data is restricted to research purposes.

Can I also use the same data for another project besides the approved one?

No, a new dataset request or an update of the previous project proposal is needed.

Can I redistribute or transfer the downloaded datasets or their derived datasets to third parties?

The research entity can’t, directly or indirectly, sell, license or sub-license, rent or otherwise transfer to third parties the dataset provided by this catalog, nor permit any third party to do so. Specific datasets may have different policies, please look at the use terms and license linked in the metadata in each dataset webpage.

Can I keep the data after the end of my research project?

No, the research entity that requested the data deletes it at the end date specified in the research proposal. The research entity is asked to notify the elimination.


3. Data upload and custom catalog

You can upload your metadata values and/or your data to our catalog.

Why should I create my catalog?

You can create your own instance of the catalog to redistribute data that you own. This will allow you to join the Datascientia network and increase the visibility of your data. Contact us if you have additional questions or if you want to join the community.

How can I create my own data catalog?

DataScientia foundation provides a data catalog template, built on top of JKAN, which can be customized to your needs. Contact DataScientia to get the template, and become part of the community with your catalog.

Why should I upload my metadata and/or data on your catalog?

This allows you to make your data more visible and, if you want, leverage our data distribution procedure.

How can I upload my own data?

Contact us and we will provide the detailed steps. In summary, you need to provide the metadata values, data documentation, license, and how interested data consumers can download the data.

Can I organize a data collection using your infrastructure and/or services?

Yes, we support you in designing the study and provide access to our services. Contact us and describe what study you would like to organize.

Back to top