The DataScientia team is at your disposal for any questions related to technical aspects, clarifications or possible collaborations. Contact Us.
It is a data catalog allowing data consumers to discover which data are available and understand if it fits their purposes. Each dataset is described by:
You can find who we are on the Datascientia webpage.
Current data catalogs are not designed to distribute person-centric data at our granularity level, thus requiring custom procedures. To reduce the risk of re-identification or abuse, the data are shared only for research purposes with identified researchers. The current procedure has been designed with legal and privacy experts.
<year>-<acronym for the data collection experiment>-<data collection location-dataset name>
of the sensor or measure instrument>.All these three types can be requested. Selecting a bundle or a project means that all the datasets that are contained are also selected.
Based on the GDPR minimization principle, data must be adequate, limited, and relevant for the analysis. Thus, data from measurement instruments, such as accelerometer and time diaries, are provided separately. Researchers can request access to a single dataset of a specific data collection (e.g., WiFi networks in Italy in the DiversityOne data collection) or a combination of datasets from multiple data collection or sensors. To streamline dataset selection and download, we have created thematic bundles that group data commonly used together for key research purposes. For instance, activity recognition studies can download the motion bundle grouping accelerometer, activities, step counter and others. Another bundle is tailored for studying social interaction and combines questionnaires, time diaries, and location data. The catalog lists both bundles and datasets containing one single sensor.
The metadata provides information about the dataset and allows the data consumers to understand whether it fits their needs or research questions. The metadata glossary describes them.
The format of each file in the datasets is Apache Parquet, an efficient data storage format that can be opened by most of the existing data processing tools. Suggested tools: the Python library pandas pd.read_parquet('path/to/dataset.parquet')
, DuckDB, a in-process database solution, Tad, a desktop application to visualize parquet files.
No, data are not directly accessible from the data catalog. Each dataset webpage has a link to the request forms. LivePeople catalog provides access to the metadata only.
After submitting the form online, if the request is accepted, you will receive an email with instructions on how to access the storage with the requested datasets. Access will be granted for a limited period.
Yes, in the request form, you can request data from multiple projects.
Yes, Datascientia will evaluate the coherence of the research proposal with the requested datasets.
The specific criteria for each dataset are listed in the license. For most of the dataset, the main requirement is to be a researcher affiliated with a research institution, and the usage of the data is restricted to research purposes.
No, a new dataset request or an update of the previous project proposal is needed.
The research entity can’t, directly or indirectly, sell, license or sub-license, rent or otherwise transfer to third parties the dataset provided by this catalog, nor permit any third party to do so. Specific datasets may have different policies, please look at the use terms and license linked in the metadata in each dataset webpage.
No, the research entity that requested the data deletes it at the end date specified in the research proposal. The research entity is asked to notify the elimination.
You can upload your metadata values and/or your data to our catalog.
You can create your own instance of the catalog to redistribute data that you own. This will allow you to join the Datascientia network and increase the visibility of your data. Contact us if you have additional questions or if you want to join the community.
DataScientia foundation provides a data catalog template, built on top of JKAN, which can be customized to your needs. Contact DataScientia to get the template, and become part of the community with your catalog.
This allows you to make your data more visible and, if you want, leverage our data distribution procedure.
Contact us and we will provide the detailed steps. In summary, you need to provide the metadata values, data documentation, license, and how interested data consumers can download the data.
Yes, we support you in designing the study and provide access to our services. Contact us and describe what study you would like to organize.