Overview

Dataset statistics

Number of variables10
Number of observations2959286
Missing cells2959286
Missing cells (%)10.0%
Total size in memory276.6 MiB
Average record size in memory98.0 B

Variable types

Text1
Numeric7
DateTime1
Unsupported1

Dataset

Description[unitless] Returns the Point Of Interests surrounding the geocoordinates of where the phone is located. POI extracted every 5 minutes. To compare each sensor observation, the frequency was reduced to one minute. The first non-missing name is reported for each of the categorical variables.
CreatorAndrea Bontempelli, Matteo Busso, Roy Alia Asiku
AuthorAndrea Bontempelli, Matteo Busso, Fausto Giunchiglia
URL
Copyright(c) University of Trento - Knowledge Diversity 2023

Variable descriptions

experimentidExperiment Id
useridUser id
timestampshow month(2), day(2), hour(2), minute(2), second(2), decimals(3)
accuracyThe GPS accuracy in meters
bearingThe compass direction from the current position the intended destination. Bearing is measured in degrees and calculated clockwise from true north (e.g., the bearing for the direction of east is 090°)
latitudeGeographic coordinate that specifies the N/S position. Latitude is an angle which ranges from 0° at the Equator to 90° at the poles. It is expressed in sexadecimal notation.
longitudeGeographic coordinate that specifies the E/W position. Longitude is an angle which ranges from 0° at the prime Meridian to 180°. It is expressed in sexadecimal notation
altitudeElevation above sea level in meters.
providerIt indicates whether the coordinates were found using the network/Wi-Fi It indicates whether the coordinates were found using GPS
speedThe speed of the device, measured in meters/second over ground

Alerts

experimentid has constant value "wenetIndia"Constant
altitude is highly overall correlated with bearing and 1 other fieldsHigh correlation
bearing is highly overall correlated with altitude and 1 other fieldsHigh correlation
latitude is highly overall correlated with longitudeHigh correlation
longitude is highly overall correlated with latitudeHigh correlation
speed is highly overall correlated with altitude and 1 other fieldsHigh correlation
provider has 2959286 (100.0%) missing valuesMissing
provider is an unsupported type, check if it needs cleaning or further analysisUnsupported
userid has 1725537 (58.3%) zerosZeros
bearing has 994276 (33.6%) zerosZeros
speed has 974502 (32.9%) zerosZeros

Reproduction

Analysis started2024-11-24 10:11:09.561433
Analysis finished2024-11-24 10:11:34.597369
Duration25.04 seconds
Software versionydata-profiling v4.8.3
Download configurationconfig.json

Variables

experimentid
Text

CONSTANT 

Experiment Id

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.4 MiB
2024-11-24T11:11:34.696867image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters29592860
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowwenetIndia
2nd rowwenetIndia
3rd rowwenetIndia
4th rowwenetIndia
5th rowwenetIndia
ValueCountFrequency (%)
wenetindia 2959286
100.0%
2024-11-24T11:11:34.911031image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 5918572
20.0%
n 5918572
20.0%
w 2959286
10.0%
t 2959286
10.0%
I 2959286
10.0%
d 2959286
10.0%
i 2959286
10.0%
a 2959286
10.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 29592860
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 5918572
20.0%
n 5918572
20.0%
w 2959286
10.0%
t 2959286
10.0%
I 2959286
10.0%
d 2959286
10.0%
i 2959286
10.0%
a 2959286
10.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 29592860
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 5918572
20.0%
n 5918572
20.0%
w 2959286
10.0%
t 2959286
10.0%
I 2959286
10.0%
d 2959286
10.0%
i 2959286
10.0%
a 2959286
10.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 29592860
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 5918572
20.0%
n 5918572
20.0%
w 2959286
10.0%
t 2959286
10.0%
I 2959286
10.0%
d 2959286
10.0%
i 2959286
10.0%
a 2959286
10.0%

userid
Real number (ℝ)

ZEROS 

User id

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.01533546
Minimum0
Maximum57
Zeros1725537
Zeros (%)58.3%
Negative0
Negative (%)0.0%
Memory size45.2 MiB
2024-11-24T11:11:35.005712image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q317
95-th percentile49
Maximum57
Range57
Interquartile range (IQR)17

Descriptive statistics

Standard deviation16.64628318
Coefficient of variation (CV)1.51119167
Kurtosis0.61166495
Mean11.01533546
Median Absolute Deviation (MAD)0
Skewness1.408791675
Sum32597528
Variance277.0987438
MonotonicityIncreasing
2024-11-24T11:11:35.090861image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
0 1725537
58.3%
17 322268
 
10.9%
9 224551
 
7.6%
49 175404
 
5.9%
43 172356
 
5.8%
12 140857
 
4.8%
26 75059
 
2.5%
57 42561
 
1.4%
35 38928
 
1.3%
44 35535
 
1.2%
Other values (5) 6230
 
0.2%
ValueCountFrequency (%)
0 1725537
58.3%
8 3754
 
0.1%
9 224551
 
7.6%
12 140857
 
4.8%
17 322268
 
10.9%
ValueCountFrequency (%)
57 42561
 
1.4%
49 175404
5.9%
44 35535
 
1.2%
43 172356
5.8%
40 170
 
< 0.1%

timestamp
Date

show month(2), day(2), hour(2), minute(2), second(2), decimals(3)

Distinct2958046
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size45.2 MiB
Minimum2021-07-12 08:00:05.380000
Maximum2021-08-12 14:41:26.813000
2024-11-24T11:11:35.195949image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-11-24T11:11:35.318476image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

accuracy
Real number (ℝ)

The GPS accuracy in meters

Distinct12897
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean158.8652758
Minimum0
Maximum4800
Zeros23
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size45.2 MiB
2024-11-24T11:11:35.433774image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11.79199982
Q115.74199963
median20.36800003
Q332
95-th percentile1700
Maximum4800
Range4800
Interquartile range (IQR)16.25800037

Descriptive statistics

Standard deviation530.5685565
Coefficient of variation (CV)3.339738994
Kurtosis20.0523486
Mean158.8652758
Median Absolute Deviation (MAD)6.431999207
Skewness4.343201165
Sum470127786.6
Variance281502.9932
MonotonicityNot monotonic
2024-11-24T11:11:35.552836image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20 285465
 
9.6%
18.22400093 62814
 
2.1%
17.15200043 61928
 
2.1%
19.29600143 60639
 
2.0%
20.36800003 60299
 
2.0%
1899.999023 60169
 
2.0%
16.07999992 58134
 
2.0%
21.44000053 55399
 
1.9%
23.58399963 50911
 
1.7%
22.51200104 49516
 
1.7%
Other values (12887) 2154012
72.8%
ValueCountFrequency (%)
0 23
 
< 0.1%
0.75 39
 
< 0.1%
1 102
< 0.1%
1.5 25
 
< 0.1%
2 10
 
< 0.1%
ValueCountFrequency (%)
4800 10
 
< 0.1%
4400 47
< 0.1%
4232 2
 
< 0.1%
4154.649902 1
 
< 0.1%
4099.999023 23
< 0.1%

bearing
Real number (ℝ)

HIGH CORRELATION  ZEROS 

The compass direction from the current position the intended destination. Bearing is measured in degrees and calculated clockwise from true north (e.g., the bearing for the direction of east is 090°)

Distinct4168
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.93113383
Minimum-1
Maximum359.98
Zeros994276
Zeros (%)33.6%
Negative1306496
Negative (%)44.1%
Memory size45.2 MiB
2024-11-24T11:11:35.664632image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile-1
Q1-1
median0
Q30
95-th percentile280.6
Maximum359.98
Range360.98
Interquartile range (IQR)1

Descriptive statistics

Standard deviation91.60481454
Coefficient of variation (CV)2.294069959
Kurtosis3.528110499
Mean39.93113383
Median Absolute Deviation (MAD)1
Skewness2.203514313
Sum118167645.3
Variance8391.442047
MonotonicityNot monotonic
2024-11-24T11:11:35.782349image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1 1306496
44.1%
0 994276
33.6%
40.9 1674
 
0.1%
105.9 1230
 
< 0.1%
359.4 1070
 
< 0.1%
0.1 950
 
< 0.1%
359.5 884
 
< 0.1%
0.4 868
 
< 0.1%
359.7 863
 
< 0.1%
359.8 830
 
< 0.1%
Other values (4158) 650145
22.0%
ValueCountFrequency (%)
-1 1306496
44.1%
0 994276
33.6%
0.1 950
 
< 0.1%
0.2 648
 
< 0.1%
0.21 13
 
< 0.1%
ValueCountFrequency (%)
359.98 20
 
< 0.1%
359.9 810
< 0.1%
359.8 830
< 0.1%
359.7 863
< 0.1%
359.6 775
< 0.1%

latitude
Real number (ℝ)

HIGH CORRELATION 

Geographic coordinate that specifies the N/S position. Latitude is an angle which ranges from 0° at the Equator to 90° at the poles. It is expressed in sexadecimal notation.

Distinct2205
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.82031836
Minimum8.8222
Maximum22.603
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.2 MiB
2024-11-24T11:11:35.896847image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum8.8222
5-th percentile10.9059
Q112.4937
median17.6788
Q317.6788
95-th percentile22.4673
Maximum22.603
Range13.7808
Interquartile range (IQR)5.1851

Descriptive statistics

Standard deviation3.380108103
Coefficient of variation (CV)0.2136561367
Kurtosis-0.8651079623
Mean15.82031836
Median Absolute Deviation (MAD)0.0001
Skewness-0.2931366385
Sum46816846.64
Variance11.42513079
MonotonicityNot monotonic
2024-11-24T11:11:36.012368image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17.6788 1299493
43.9%
17.6789 243619
 
8.2%
10.9069 214079
 
7.2%
22.4675 95688
 
3.2%
12.4937 88984
 
3.0%
11.392 80855
 
2.7%
12.9089 74404
 
2.5%
10.9515 61791
 
2.1%
17.6787 56014
 
1.9%
12.4938 51090
 
1.7%
Other values (2195) 693269
23.4%
ValueCountFrequency (%)
8.8222 1
 
< 0.1%
8.8238 17
< 0.1%
8.824 8
< 0.1%
8.8241 8
< 0.1%
8.8242 8
< 0.1%
ValueCountFrequency (%)
22.603 3
 
< 0.1%
22.6001 4
< 0.1%
22.5986 3
 
< 0.1%
22.5985 9
< 0.1%
22.5984 9
< 0.1%

longitude
Real number (ℝ)

HIGH CORRELATION 

Geographic coordinate that specifies the E/W position. Longitude is an angle which ranges from 0° at the prime Meridian to 180°. It is expressed in sexadecimal notation

Distinct1977
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean81.46204161
Minimum75.9163
Maximum88.3514
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.2 MiB
2024-11-24T11:11:36.126061image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum75.9163
5-th percentile75.9163
Q178.5681
median83.1905
Q383.1905
95-th percentile88.3474
Maximum88.3514
Range12.4351
Interquartile range (IQR)4.6224

Descriptive statistics

Standard deviation3.351574719
Coefficient of variation (CV)0.04114277831
Kurtosis-0.5891114238
Mean81.46204161
Median Absolute Deviation (MAD)0.0001
Skewness-0.2994814235
Sum241069479.3
Variance11.2330531
MonotonicityNot monotonic
2024-11-24T11:11:36.246377image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
83.1905 1345512
45.5%
75.9163 213880
 
7.2%
83.1904 168147
 
5.7%
78.5681 128852
 
4.4%
83.1906 114156
 
3.9%
76.8138 84861
 
2.9%
88.3476 80546
 
2.7%
80.0667 63791
 
2.2%
76.9282 62507
 
2.1%
80.0668 54956
 
1.9%
Other values (1967) 642078
21.7%
ValueCountFrequency (%)
75.9163 213880
7.2%
75.9167 28
 
< 0.1%
75.9182 6
 
< 0.1%
75.9192 11
 
< 0.1%
75.9196 94
 
< 0.1%
ValueCountFrequency (%)
88.3514 23
 
< 0.1%
88.3511 13
 
< 0.1%
88.3507 4360
 
0.1%
88.3505 11080
0.4%
88.3504 511
 
< 0.1%

altitude
Real number (ℝ)

HIGH CORRELATION 

Elevation above sea level in meters.

Distinct81071
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.25340244
Minimum-3188.1157
Maximum2065.1025
Zeros1551
Zeros (%)0.1%
Negative2683804
Negative (%)90.7%
Memory size45.2 MiB
2024-11-24T11:11:36.364579image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-3188.1157
5-th percentile-77.3483
Q1-51.57655
median-1
Q3-1
95-th percentile307.9227
Maximum2065.1025
Range5253.2182
Interquartile range (IQR)50.57655

Descriptive statistics

Standard deviation334.1075854
Coefficient of variation (CV)7.223416391
Kurtosis24.92051768
Mean46.25340244
Median Absolute Deviation (MAD)30.22935
Skewness5.025307199
Sum136877046.3
Variance111627.8786
MonotonicityNot monotonic
2024-11-24T11:11:36.485004image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1 1307162
44.2%
-59 8653
 
0.3%
-69 6757
 
0.2%
-65 5897
 
0.2%
-64 5787
 
0.2%
-56 5576
 
0.2%
-58 5567
 
0.2%
-67 5430
 
0.2%
-61 5195
 
0.2%
-63 5189
 
0.2%
Other values (81061) 1598073
54.0%
ValueCountFrequency (%)
-3188.1157 48
< 0.1%
-667.0139 13
 
< 0.1%
-636.4958 13
 
< 0.1%
-461.9189 13
 
< 0.1%
-437.8804 13
 
< 0.1%
ValueCountFrequency (%)
2065.1025 5
 
< 0.1%
2055.7131 11
< 0.1%
2045.4749 5
 
< 0.1%
2045.1738 6
 
< 0.1%
2042.4321 15
< 0.1%

provider
Unsupported

MISSING  REJECTED  UNSUPPORTED 

It indicates whether the coordinates were found using the network/Wi-Fi It indicates whether the coordinates were found using GPS

Missing2959286
Missing (%)100.0%
Memory size45.2 MiB

speed
Real number (ℝ)

HIGH CORRELATION  ZEROS 

The speed of the device, measured in meters/second over ground

Distinct1647
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2948011549
Minimum-1
Maximum53
Zeros974502
Zeros (%)32.9%
Negative1306496
Negative (%)44.1%
Memory size45.2 MiB
2024-11-24T11:11:36.611183image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile-0.009999999776
Q1-0.009999999776
median0
Q30
95-th percentile1.639999986
Maximum53
Range54
Interquartile range (IQR)0.009999999776

Descriptive statistics

Standard deviation1.137970045
Coefficient of variation (CV)3.860127498
Kurtosis151.3292039
Mean0.2948011549
Median Absolute Deviation (MAD)0.009999999776
Skewness9.215967963
Sum872400.9306
Variance1.294975822
MonotonicityNot monotonic
2024-11-24T11:11:36.727982image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.009999999776 1253866
42.4%
0 974502
32.9%
-1 52630
 
1.8%
0.5199999809 10193
 
0.3%
0.5799999833 8996
 
0.3%
0.4499999881 6822
 
0.2%
0.4199999869 6407
 
0.2%
0.4799999893 6279
 
0.2%
0.3899999857 6189
 
0.2%
0.2899999917 6062
 
0.2%
Other values (1637) 627340
21.2%
ValueCountFrequency (%)
-1 52630
 
1.8%
-0.009999999776 1253866
42.4%
0 974502
32.9%
0.009999999776 2826
 
0.1%
0.01999999955 874
 
< 0.1%
ValueCountFrequency (%)
53 15
< 0.1%
50 12
< 0.1%
49 8
< 0.1%
45 5
 
< 0.1%
44 5
 
< 0.1%

Correlations

2024-11-24T11:11:36.811774image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
accuracyaltitudebearinglatitudelongitudespeeduserid
accuracy1.000-0.1980.093-0.132-0.1700.0920.255
altitude-0.1981.000-0.589-0.318-0.287-0.5820.220
bearing0.093-0.5891.0000.2570.1970.979-0.165
latitude-0.132-0.3180.2571.0000.8800.257-0.490
longitude-0.170-0.2870.1970.8801.0000.203-0.474
speed0.092-0.5820.9790.2570.2031.000-0.165
userid0.2550.220-0.165-0.490-0.474-0.1651.000