Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 568824 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Total size in memory | 20.1 MiB |
Average record size in memory | 37.0 B |
Variable types
Text | 1 |
---|---|
Numeric | 1 |
DateTime | 1 |
Boolean | 1 |
Dataset
Variable descriptions
experimentid | Experiment Id |
---|---|
userid | User id |
timestamp | show month(2), day(2), hour(2), minute(2), second(2), decimals(3) |
status | Return if the user is present or not |
experimentid has constant value "wenetDenmark" | Constant |
status is highly imbalanced (50.3%) | Imbalance |
userid has 18114 (3.2%) zeros | Zeros |
Reproduction
Analysis started | 2024-11-23 01:50:53.055309 |
---|---|
Analysis finished | 2024-11-23 01:50:54.957283 |
Duration | 1.9 second |
Software version | ydata-profiling v4.8.3 |
Download configuration | config.json |
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.8 MiB |
Length
Max length | 12 |
---|---|
Median length | 12 |
Mean length | 12 |
Min length | 12 |
Characters and Unicode
Total characters | 6825888 |
---|---|
Distinct characters | 9 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | wenetDenmark |
---|---|
2nd row | wenetDenmark |
3rd row | wenetDenmark |
4th row | wenetDenmark |
5th row | wenetDenmark |
Value | Count | Frequency (%) |
wenetdenmark | 568824 |
Most occurring characters
Value | Count | Frequency (%) |
e | 1706472 | |
n | 1137648 | |
w | 568824 | 8.3% |
t | 568824 | 8.3% |
D | 568824 | 8.3% |
m | 568824 | 8.3% |
a | 568824 | 8.3% |
r | 568824 | 8.3% |
k | 568824 | 8.3% |
Most occurring categories
Value | Count | Frequency (%) |
(unknown) | 6825888 |
Most frequent character per category
(unknown)
Value | Count | Frequency (%) |
e | 1706472 | |
n | 1137648 | |
w | 568824 | 8.3% |
t | 568824 | 8.3% |
D | 568824 | 8.3% |
m | 568824 | 8.3% |
a | 568824 | 8.3% |
r | 568824 | 8.3% |
k | 568824 | 8.3% |
Most occurring scripts
Value | Count | Frequency (%) |
(unknown) | 6825888 |
Most frequent character per script
(unknown)
Value | Count | Frequency (%) |
e | 1706472 | |
n | 1137648 | |
w | 568824 | 8.3% |
t | 568824 | 8.3% |
D | 568824 | 8.3% |
m | 568824 | 8.3% |
a | 568824 | 8.3% |
r | 568824 | 8.3% |
k | 568824 | 8.3% |
Most occurring blocks
Value | Count | Frequency (%) |
(unknown) | 6825888 |
Most frequent character per block
(unknown)
Value | Count | Frequency (%) |
e | 1706472 | |
n | 1137648 | |
w | 568824 | 8.3% |
t | 568824 | 8.3% |
D | 568824 | 8.3% |
m | 568824 | 8.3% |
a | 568824 | 8.3% |
r | 568824 | 8.3% |
k | 568824 | 8.3% |
Distinct | 17 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 12.8791577 |
Minimum | 0 |
---|---|
Maximum | 27 |
Zeros | 18114 |
Zeros (%) | 3.2% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.3 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 2 |
Q1 | 6 |
median | 17 |
Q3 | 17 |
95-th percentile | 25 |
Maximum | 27 |
Range | 27 |
Interquartile range (IQR) | 11 |
Descriptive statistics
Standard deviation | 7.938786329 |
---|---|
Coefficient of variation (CV) | 0.6164057086 |
Kurtosis | -1.444711791 |
Mean | 12.8791577 |
Median Absolute Deviation (MAD) | 6 |
Skewness | -0.07130086789 |
Sum | 7325974 |
Variance | 63.02432838 |
Monotonicity | Increasing |
Histogram with fixed size bins (bins=17)
Value | Count | Frequency (%) |
17 | 199953 | |
6 | 119188 | |
3 | 92512 | |
23 | 63674 | 11.2% |
21 | 19537 | 3.4% |
0 | 18114 | 3.2% |
26 | 12082 | 2.1% |
2 | 11142 | 2.0% |
25 | 9810 | 1.7% |
27 | 7174 | 1.3% |
Other values (7) | 15638 | 2.7% |
Value | Count | Frequency (%) |
0 | 18114 | 3.2% |
2 | 11142 | 2.0% |
3 | 92512 | |
6 | 119188 | |
8 | 937 | 0.2% |
Value | Count | Frequency (%) |
27 | 7174 | 1.3% |
26 | 12082 | 2.1% |
25 | 9810 | 1.7% |
23 | 63674 | |
22 | 4553 | 0.8% |
timestamp
Date
show month(2), day(2), hour(2), minute(2), second(2), decimals(3)
Distinct | 568732 |
---|---|
Distinct (%) | > 99.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.3 MiB |
Minimum | 2020-11-16 07:00:00.484000 |
---|---|
Maximum | 2020-12-11 21:58:45.492000 |
Histogram with fixed size bins (bins=50)
status | userid | |
---|---|---|
status | 1.000 | 0.072 |
userid | 0.072 | 1.000 |