Overview

Dataset statistics

Number of variables10
Number of observations8347758
Missing cells0
Missing cells (%)0.0%
Total size in memory851.6 MiB
Average record size in memory107.0 B

Variable types

Text2
Numeric7
DateTime1

Dataset

Description[unitless] Returns the Point Of Interests surrounding the geocoordinates of where the phone is located. POI extracted every 5 minutes. To compare each sensor observation, the frequency was reduced to one minute. The first non-missing name is reported for each of the categorical variables.
CreatorAndrea Bontempelli, Matteo Busso, Roy Alia Asiku
AuthorAndrea Bontempelli, Matteo Busso, Fausto Giunchiglia
URL
Copyright(c) University of Trento - Knowledge Diversity 2023

Variable descriptions

experimentidExperiment Id
useridUser id
timestampshow month(2), day(2), hour(2), minute(2), second(2), decimals(3)
accuracyThe GPS accuracy in meters
bearingThe compass direction from the current position the intended destination. Bearing is measured in degrees and calculated clockwise from true north (e.g., the bearing for the direction of east is 090°)
latitudeGeographic coordinate that specifies the N/S position. Latitude is an angle which ranges from 0° at the Equator to 90° at the poles. It is expressed in sexadecimal notation.
longitudeGeographic coordinate that specifies the E/W position. Longitude is an angle which ranges from 0° at the prime Meridian to 180°. It is expressed in sexadecimal notation
altitudeElevation above sea level in meters.
providerIt indicates whether the coordinates were found using the network/Wi-Fi It indicates whether the coordinates were found using GPS
speedThe speed of the device, measured in meters/second over ground

Alerts

experimentid has constant value "wenetDenmark"Constant
accuracy is highly overall correlated with bearingHigh correlation
altitude is highly overall correlated with bearing and 1 other fieldsHigh correlation
bearing is highly overall correlated with accuracy and 2 other fieldsHigh correlation
latitude is highly overall correlated with useridHigh correlation
speed is highly overall correlated with altitude and 1 other fieldsHigh correlation
userid is highly overall correlated with latitudeHigh correlation
latitude is highly skewed (γ1 = -20.43444326)Skewed
userid has 355760 (4.3%) zerosZeros
bearing has 3165485 (37.9%) zerosZeros
speed has 3043352 (36.5%) zerosZeros

Reproduction

Analysis started2024-11-24 10:26:55.908496
Analysis finished2024-11-24 10:28:14.514396
Duration1 minute and 18.61 seconds
Software versionydata-profiling v4.8.3
Download configurationconfig.json

Variables

experimentid
Text

CONSTANT 

Experiment Id

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size222.9 MiB
2024-11-24T11:28:14.620681image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters100173096
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowwenetDenmark
2nd rowwenetDenmark
3rd rowwenetDenmark
4th rowwenetDenmark
5th rowwenetDenmark
ValueCountFrequency (%)
wenetdenmark 8347758
100.0%
2024-11-24T11:28:14.821757image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 25043274
25.0%
n 16695516
16.7%
w 8347758
 
8.3%
t 8347758
 
8.3%
D 8347758
 
8.3%
m 8347758
 
8.3%
a 8347758
 
8.3%
r 8347758
 
8.3%
k 8347758
 
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 100173096
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 25043274
25.0%
n 16695516
16.7%
w 8347758
 
8.3%
t 8347758
 
8.3%
D 8347758
 
8.3%
m 8347758
 
8.3%
a 8347758
 
8.3%
r 8347758
 
8.3%
k 8347758
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 100173096
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 25043274
25.0%
n 16695516
16.7%
w 8347758
 
8.3%
t 8347758
 
8.3%
D 8347758
 
8.3%
m 8347758
 
8.3%
a 8347758
 
8.3%
r 8347758
 
8.3%
k 8347758
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 100173096
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 25043274
25.0%
n 16695516
16.7%
w 8347758
 
8.3%
t 8347758
 
8.3%
D 8347758
 
8.3%
m 8347758
 
8.3%
a 8347758
 
8.3%
r 8347758
 
8.3%
k 8347758
 
8.3%

userid
Real number (ℝ)

HIGH CORRELATION  ZEROS 

User id

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.25540367
Minimum0
Maximum27
Zeros355760
Zeros (%)4.3%
Negative0
Negative (%)0.0%
Memory size127.4 MiB
2024-11-24T11:28:14.928306image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q12
median6
Q323
95-th percentile23
Maximum27
Range27
Interquartile range (IQR)21

Descriptive statistics

Standard deviation9.820145598
Coefficient of variation (CV)0.8724827549
Kurtosis-1.795675052
Mean11.25540367
Median Absolute Deviation (MAD)4
Skewness0.2430803855
Sum93957386
Variance96.43525957
MonotonicityIncreasing
2024-11-24T11:28:15.029092image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
2 2879409
34.5%
23 2574351
30.8%
17 888516
 
10.6%
3 763339
 
9.1%
6 540901
 
6.5%
0 355760
 
4.3%
26 122914
 
1.5%
25 87108
 
1.0%
27 56672
 
0.7%
20 30343
 
0.4%
Other values (7) 48445
 
0.6%
ValueCountFrequency (%)
0 355760
 
4.3%
2 2879409
34.5%
3 763339
 
9.1%
6 540901
 
6.5%
8 3550
 
< 0.1%
ValueCountFrequency (%)
27 56672
 
0.7%
26 122914
 
1.5%
25 87108
 
1.0%
23 2574351
30.8%
22 403
 
< 0.1%

timestamp
Date

show month(2), day(2), hour(2), minute(2), second(2), decimals(3)

Distinct8335509
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size127.4 MiB
Minimum2020-11-16 07:00:01.596000
Maximum2020-12-11 21:59:59.410000
2024-11-24T11:28:15.142641image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-11-24T11:28:15.267117image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

accuracy
Real number (ℝ)

HIGH CORRELATION 

The GPS accuracy in meters

Distinct58219
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56.12208393
Minimum0
Maximum19491.63086
Zeros65
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size127.4 MiB
2024-11-24T11:28:15.390637image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3.790092468
Q111.79199982
median16.07999992
Q323.59000015
95-th percentile500
Maximum19491.63086
Range19491.63086
Interquartile range (IQR)11.79800034

Descriptive statistics

Standard deviation160.84333
Coefficient of variation (CV)2.865954339
Kurtosis242.8493051
Mean56.12208393
Median Absolute Deviation (MAD)5.420000076
Skewness8.021189504
Sum468493575.1
Variance25870.5768
MonotonicityNot monotonic
2024-11-24T11:28:15.524456image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.72000027 507904
 
6.1%
9.648000717 397815
 
4.8%
11.79199982 383980
 
4.6%
500 291305
 
3.5%
12.86400032 277304
 
3.3%
13.93600082 203121
 
2.4%
15.00800037 148263
 
1.8%
3.21600008 133966
 
1.6%
4.288000107 96780
 
1.2%
3 87054
 
1.0%
Other values (58209) 5820266
69.7%
ValueCountFrequency (%)
0 65
 
< 0.1%
0.75 47
 
< 0.1%
1 17275
 
0.2%
1.5 24593
 
0.3%
2 66553
0.8%
ValueCountFrequency (%)
19491.63086 1
< 0.1%
15895.68066 1
< 0.1%
14766.28711 1
< 0.1%
13052.87207 1
< 0.1%
11670.86719 1
< 0.1%

bearing
Real number (ℝ)

HIGH CORRELATION  ZEROS 

The compass direction from the current position the intended destination. Bearing is measured in degrees and calculated clockwise from true north (e.g., the bearing for the direction of east is 090°)

Distinct22006
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.71276922
Minimum-1
Maximum359.98
Zeros3165485
Zeros (%)37.9%
Negative3868586
Negative (%)46.3%
Memory size127.4 MiB
2024-11-24T11:28:15.653343image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile-1
Q1-1
median0
Q30
95-th percentile247.5
Maximum359.98
Range360.98
Interquartile range (IQR)1

Descriptive statistics

Standard deviation78.21396772
Coefficient of variation (CV)2.724013386
Kurtosis6.085894308
Mean28.71276922
Median Absolute Deviation (MAD)1
Skewness2.693662012
Sum239687249
Variance6117.424746
MonotonicityNot monotonic
2024-11-24T11:28:15.782608image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1 3868586
46.3%
0 3165485
37.9%
57.69 1704
 
< 0.1%
215.4 1182
 
< 0.1%
181.8 1053
 
< 0.1%
234.4 1047
 
< 0.1%
78.7 1013
 
< 0.1%
256 1007
 
< 0.1%
171.5 989
 
< 0.1%
82.1 981
 
< 0.1%
Other values (21996) 1304711
 
15.6%
ValueCountFrequency (%)
-1 3868586
46.3%
0 3165485
37.9%
0.01 22
 
< 0.1%
0.06 5
 
< 0.1%
0.07 29
 
< 0.1%
ValueCountFrequency (%)
359.98 1
 
< 0.1%
359.95 9
< 0.1%
359.94 5
< 0.1%
359.93 2
 
< 0.1%
359.92 1
 
< 0.1%

latitude
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Geographic coordinate that specifies the N/S position. Latitude is an angle which ranges from 0° at the Equator to 90° at the poles. It is expressed in sexadecimal notation.

Distinct2400
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55.67934835
Minimum49.1523
Maximum57.073
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size127.4 MiB
2024-11-24T11:28:15.909547image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum49.1523
5-th percentile55.6498
Q155.6738
median55.6836
Q355.7049
95-th percentile55.7068
Maximum57.073
Range7.9207
Interquartile range (IQR)0.0311

Descriptive statistics

Standard deviation0.2782766924
Coefficient of variation (CV)0.004997843916
Kurtosis439.5341958
Mean55.67934835
Median Absolute Deviation (MAD)0.0176
Skewness-20.43444326
Sum464797725.6
Variance0.07743791753
MonotonicityNot monotonic
2024-11-24T11:28:16.041541image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
55.705 1107618
 
13.3%
55.7049 1030127
 
12.3%
55.6739 909351
 
10.9%
55.6738 669453
 
8.0%
55.6977 345131
 
4.1%
55.6836 338150
 
4.1%
55.6835 236385
 
2.8%
55.6976 235314
 
2.8%
55.6894 196035
 
2.3%
55.7068 172727
 
2.1%
Other values (2390) 3107467
37.2%
ValueCountFrequency (%)
49.1523 2
 
< 0.1%
49.1585 3
 
< 0.1%
49.1586 9
< 0.1%
49.1591 6
< 0.1%
49.1592 6
< 0.1%
ValueCountFrequency (%)
57.073 1
 
< 0.1%
57.0728 4
 
< 0.1%
57.0726 4
 
< 0.1%
57.0724 14
< 0.1%
57.0722 12
< 0.1%

longitude
Real number (ℝ)

Geographic coordinate that specifies the E/W position. Longitude is an angle which ranges from 0° at the prime Meridian to 180°. It is expressed in sexadecimal notation

Distinct4380
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.49972573
Minimum9.218
Maximum12.6574
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size127.4 MiB
2024-11-24T11:28:16.172606image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum9.218
5-th percentile12.3587
Q112.5168
median12.5567
Q312.5804
95-th percentile12.5922
Maximum12.6574
Range3.4394
Interquartile range (IQR)0.0636

Descriptive statistics

Standard deviation0.3270433062
Coefficient of variation (CV)0.02616403856
Kurtosis53.67855847
Mean12.49972573
Median Absolute Deviation (MAD)0.0319
Skewness-7.306708426
Sum104344685.5
Variance0.1069573241
MonotonicityNot monotonic
2024-11-24T11:28:16.301162image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12.5568 626481
 
7.5%
12.5922 570335
 
6.8%
12.5168 478753
 
5.7%
12.5921 425770
 
5.1%
12.5569 413810
 
5.0%
12.557 389087
 
4.7%
12.592 369964
 
4.4%
12.5566 257748
 
3.1%
12.5472 256489
 
3.1%
12.5567 216578
 
2.6%
Other values (4370) 4342743
52.0%
ValueCountFrequency (%)
9.218 2
 
< 0.1%
9.2237 12
< 0.1%
9.2239 6
< 0.1%
9.2243 6
< 0.1%
9.2254 6
< 0.1%
ValueCountFrequency (%)
12.6574 91
 
< 0.1%
12.6573 277
< 0.1%
12.6572 104
 
< 0.1%
12.6571 73
 
< 0.1%
12.657 144
< 0.1%

altitude
Real number (ℝ)

HIGH CORRELATION 

Elevation above sea level in meters.

Distinct80078
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.96541956
Minimum-775.0223
Maximum1592
Zeros3313
Zeros (%)< 0.1%
Negative3918952
Negative (%)46.9%
Memory size127.4 MiB
2024-11-24T11:28:16.425904image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-775.0223
5-th percentile-1
Q1-1
median34
Q354
95-th percentile72
Maximum1592
Range2367.0223
Interquartile range (IQR)55

Descriptive statistics

Standard deviation33.52588441
Coefficient of variation (CV)1.157445151
Kurtosis30.21041423
Mean28.96541956
Median Absolute Deviation (MAD)35
Skewness2.058292561
Sum241796312.9
Variance1123.984926
MonotonicityNot monotonic
2024-11-24T11:28:16.559167image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1 3869594
46.4%
52 215888
 
2.6%
53 164625
 
2.0%
54 162402
 
1.9%
51 159207
 
1.9%
50 144514
 
1.7%
55 133617
 
1.6%
56 121845
 
1.5%
57 119441
 
1.4%
49 112472
 
1.3%
Other values (80068) 3144153
37.7%
ValueCountFrequency (%)
-775.0223 6
< 0.1%
-651.6061 8
< 0.1%
-558.2589 1
 
< 0.1%
-490.9298 8
< 0.1%
-432.1291 1
 
< 0.1%
ValueCountFrequency (%)
1592 10
< 0.1%
1105.0639 8
< 0.1%
1027.3932 8
< 0.1%
1011.5348 8
< 0.1%
986.5196 7
< 0.1%

provider
Text

It indicates whether the coordinates were found using the network/Wi-Fi It indicates whether the coordinates were found using GPS

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size182.8 MiB
2024-11-24T11:28:16.638346image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length7
Median length7
Mean length6.965081403
Min length3

Characters and Unicode

Total characters58142814
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownetwork
2nd rowpassive
3rd rowpassive
4th rowpassive
5th rownetwork
ValueCountFrequency (%)
passive 8109094
97.1%
network 165791
 
2.0%
gps 72873
 
0.9%
2024-11-24T11:28:16.856514image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 16291061
28.0%
e 8274885
14.2%
p 8181967
14.1%
a 8109094
13.9%
i 8109094
13.9%
v 8109094
13.9%
n 165791
 
0.3%
t 165791
 
0.3%
w 165791
 
0.3%
o 165791
 
0.3%
Other values (3) 404455
 
0.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 58142814
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s 16291061
28.0%
e 8274885
14.2%
p 8181967
14.1%
a 8109094
13.9%
i 8109094
13.9%
v 8109094
13.9%
n 165791
 
0.3%
t 165791
 
0.3%
w 165791
 
0.3%
o 165791
 
0.3%
Other values (3) 404455
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 58142814
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s 16291061
28.0%
e 8274885
14.2%
p 8181967
14.1%
a 8109094
13.9%
i 8109094
13.9%
v 8109094
13.9%
n 165791
 
0.3%
t 165791
 
0.3%
w 165791
 
0.3%
o 165791
 
0.3%
Other values (3) 404455
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 58142814
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s 16291061
28.0%
e 8274885
14.2%
p 8181967
14.1%
a 8109094
13.9%
i 8109094
13.9%
v 8109094
13.9%
n 165791
 
0.3%
t 165791
 
0.3%
w 165791
 
0.3%
o 165791
 
0.3%
Other values (3) 404455
 
0.7%

speed
Real number (ℝ)

HIGH CORRELATION  ZEROS 

The speed of the device, measured in meters/second over ground

Distinct2535
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4649905785
Minimum-1
Maximum81.5
Zeros3043352
Zeros (%)36.5%
Negative3868586
Negative (%)46.3%
Memory size127.4 MiB
2024-11-24T11:28:16.995963image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile-0.009999999776
Q1-0.009999999776
median0
Q30
95-th percentile3.730000019
Maximum81.5
Range82.5
Interquartile range (IQR)0.009999999776

Descriptive statistics

Standard deviation2.09217548
Coefficient of variation (CV)4.499393272
Kurtosis189.388445
Mean0.4649905785
Median Absolute Deviation (MAD)0.009999999776
Skewness11.24471986
Sum3881628.821
Variance4.377198241
MonotonicityNot monotonic
2024-11-24T11:28:17.129042image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.009999999776 3702795
44.4%
0 3043352
36.5%
-1 165791
 
2.0%
0.3899999857 11024
 
0.1%
0.2199999988 11006
 
0.1%
0.5199999809 10693
 
0.1%
0.4499999881 10588
 
0.1%
1.159999967 10417
 
0.1%
0.1899999976 10023
 
0.1%
0.2599999905 9980
 
0.1%
Other values (2525) 1362089
 
16.3%
ValueCountFrequency (%)
-1 165791
 
2.0%
-0.009999999776 3702795
44.4%
0 3043352
36.5%
0.009999999776 7747
 
0.1%
0.01999999955 5031
 
0.1%
ValueCountFrequency (%)
81.5 18
< 0.1%
59.04000092 4
 
< 0.1%
58.40999985 2
 
< 0.1%
57.83000183 4
 
< 0.1%
53.95999908 2
 
< 0.1%

Correlations

2024-11-24T11:28:17.209492image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
accuracyaltitudebearinglatitudelongitudespeeduserid
accuracy1.000-0.488-0.503-0.2090.260-0.4940.328
altitude-0.4881.0000.8260.290-0.2750.815-0.402
bearing-0.5030.8261.0000.164-0.3010.980-0.272
latitude-0.2090.2900.1641.000-0.1160.144-0.697
longitude0.260-0.275-0.301-0.1161.000-0.2990.329
speed-0.4940.8150.9800.144-0.2991.000-0.241
userid0.328-0.402-0.272-0.6970.329-0.2411.000