Overview
Brought to you by YData
Dataset statistics
Number of variables | 3 |
---|---|
Number of observations | 53599 |
Missing cells | 21780 |
Missing cells (%) | 13.5% |
Duplicate rows | 1160 |
Duplicate rows (%) | 2.2% |
Total size in memory | 3.7 MiB |
Average record size in memory | 71.4 B |
Variable types
Text | 1 |
---|---|
Numeric | 2 |
Dataset has 1160 (2.2%) duplicate rows | Duplicates |
no_of_hours_used_during_last_week has 21780 (40.6%) missing values | Missing |
no_of_hours_used_during_last_week has 7837 (14.6%) zeros | Zeros |
Reproduction
Analysis started | 2024-11-18 08:37:37.994301 |
---|---|
Analysis finished | 2024-11-18 08:37:39.021516 |
Duration | 1.03 second |
Software version | ydata-profiling vv4.11.0 |
Download configuration | config.json |
Variables
appliance_ID
Text
Distinct | 210 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 2.8 MiB |
Value | Count | Frequency (%) |
o_45_1 | 3575 | 6.7% |
o_26_1 | 3482 | 6.5% |
o_1_1 | 3433 | 6.4% |
o_31_1 | 3230 | 6.0% |
o_12_1 | 3059 | 5.7% |
o_8_1 | 2965 | 5.5% |
o_45_2 | 2523 | 4.7% |
o_24_1 | 2351 | 4.4% |
o_33_1 | 1677 | 3.1% |
o_13_1 | 1586 | 3.0% |
Other values (200) | 25718 |
Most occurring characters
Value | Count | Frequency (%) |
_ | 107198 | |
1 | 61383 | |
O | 53599 | |
4 | 22678 | 7.3% |
2 | 17531 | 5.6% |
3 | 15588 | 5.0% |
5 | 13717 | 4.4% |
6 | 8615 | 2.8% |
8 | 4936 | 1.6% |
7 | 3911 | 1.3% |
Other values (2) | 3453 | 1.1% |
Most occurring categories
Value | Count | Frequency (%) |
(unknown) | 312609 |
Most frequent character per category
(unknown)
Value | Count | Frequency (%) |
_ | 107198 | |
1 | 61383 | |
O | 53599 | |
4 | 22678 | 7.3% |
2 | 17531 | 5.6% |
3 | 15588 | 5.0% |
5 | 13717 | 4.4% |
6 | 8615 | 2.8% |
8 | 4936 | 1.6% |
7 | 3911 | 1.3% |
Other values (2) | 3453 | 1.1% |
Most occurring scripts
Value | Count | Frequency (%) |
(unknown) | 312609 |
Most frequent character per script
(unknown)
Value | Count | Frequency (%) |
_ | 107198 | |
1 | 61383 | |
O | 53599 | |
4 | 22678 | 7.3% |
2 | 17531 | 5.6% |
3 | 15588 | 5.0% |
5 | 13717 | 4.4% |
6 | 8615 | 2.8% |
8 | 4936 | 1.6% |
7 | 3911 | 1.3% |
Other values (2) | 3453 | 1.1% |
Most occurring blocks
Value | Count | Frequency (%) |
(unknown) | 312609 |
Most frequent character per block
(unknown)
Value | Count | Frequency (%) |
_ | 107198 | |
1 | 61383 | |
O | 53599 | |
4 | 22678 | 7.3% |
2 | 17531 | 5.6% |
3 | 15588 | 5.0% |
5 | 13717 | 4.4% |
6 | 8615 | 2.8% |
8 | 4936 | 1.6% |
7 | 3911 | 1.3% |
Other values (2) | 3453 | 1.1% |
appliance_type
Real number (ℝ)
Distinct | 77 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 31.122913 |
Minimum | 1 |
---|---|
Maximum | 77 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.8 MiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 13 |
median | 31 |
Q3 | 45 |
95-th percentile | 66 |
Maximum | 77 |
Range | 76 |
Interquartile range (IQR) | 32 |
Descriptive statistics
Standard deviation | 18.593103 |
---|---|
Coefficient of variation (CV) | 0.59740882 |
Kurtosis | -0.7891563 |
Mean | 31.122913 |
Median Absolute Deviation (MAD) | 14 |
Skewness | 0.11427738 |
Sum | 1668157 |
Variance | 345.70346 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
45 | 7976 | 14.9% |
1 | 3545 | 6.6% |
26 | 3535 | 6.6% |
31 | 3402 | 6.3% |
12 | 3223 | 6.0% |
8 | 2989 | 5.6% |
24 | 2395 | 4.5% |
43 | 1802 | 3.4% |
47 | 1733 | 3.2% |
33 | 1705 | 3.2% |
Other values (67) | 21294 |
Value | Count | Frequency (%) |
1 | 3545 | |
2 | 38 | 0.1% |
3 | 10 | < 0.1% |
4 | 528 | 1.0% |
5 | 902 | 1.7% |
6 | 273 | 0.5% |
7 | 184 | 0.3% |
8 | 2989 | |
9 | 516 | 1.0% |
10 | 443 | 0.8% |
Value | Count | Frequency (%) |
77 | 53 | 0.1% |
76 | 313 | |
75 | 66 | 0.1% |
74 | 8 | < 0.1% |
73 | 12 | < 0.1% |
72 | 378 | |
71 | 26 | < 0.1% |
70 | 23 | < 0.1% |
69 | 518 | |
68 | 234 |
no_of_hours_used_during_last_week
Real number (ℝ)
Missing  Zeros 
Distinct | 374 |
---|---|
Distinct (%) | 1.2% |
Missing | 21780 |
Missing (%) | 40.6% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 22.21993 |
Minimum | 0 |
---|---|
Maximum | 168 |
Zeros | 7837 |
Zeros (%) | 14.6% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.8 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0.12 |
median | 1.25 |
Q3 | 7 |
95-th percentile | 168 |
Maximum | 168 |
Range | 168 |
Interquartile range (IQR) | 6.88 |
Descriptive statistics
Standard deviation | 49.93416 |
---|---|
Coefficient of variation (CV) | 2.247269 |
Kurtosis | 4.0407563 |
Mean | 22.21993 |
Median Absolute Deviation (MAD) | 1.25 |
Skewness | 2.3841297 |
Sum | 707015.96 |
Variance | 2493.4203 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 7837 | 14.6% |
1 | 2997 | 5.6% |
168 | 2944 | 5.5% |
0.25 | 1647 | 3.1% |
2 | 1432 | 2.7% |
0.5 | 1421 | 2.7% |
7 | 1366 | 2.5% |
3.5 | 1098 | 2.0% |
14 | 999 | 1.9% |
3 | 896 | 1.7% |
Other values (364) | 9182 | |
(Missing) | 21780 |
Value | Count | Frequency (%) |
0 | 7837 | |
0.02 | 1 | < 0.1% |
0.025 | 2 | < 0.1% |
0.05 | 10 | < 0.1% |
0.083 | 1 | < 0.1% |
0.1 | 78 | 0.1% |
0.11 | 1 | < 0.1% |
0.117 | 3 | < 0.1% |
0.12 | 30 | 0.1% |
0.125 | 9 | < 0.1% |
Value | Count | Frequency (%) |
168 | 2944 | |
167 | 3 | < 0.1% |
166.075 | 1 | < 0.1% |
166 | 3 | < 0.1% |
165 | 7 | < 0.1% |
164 | 5 | < 0.1% |
163 | 1 | < 0.1% |
161 | 1 | < 0.1% |
160 | 24 | < 0.1% |
159 | 1 | < 0.1% |
Interactions
Correlations
appliance_type | no_of_hours_used_during_last_week | |
---|---|---|
appliance_type | 1.000 | 0.039 |
no_of_hours_used_during_last_week | 0.039 | 1.000 |
Missing values
Sample
appliance_ID | appliance_type | no_of_hours_used_during_last_week | |
---|---|---|---|
household_ID | |||
ID0001 | O_1_1 | 1 | 84.0 |
ID0001 | O_12_1 | 12 | 3.0 |
ID0001 | O_26_1 | 26 | 1.0 |
ID0001 | O_31_1 | 31 | 0.0 |
ID0001 | O_45_1 | 45 | 5.0 |
ID0001 | O_45_2 | 45 | 5.0 |
ID0001 | O_47_1 | 47 | NaN |
ID0001 | O_47_2 | 47 | NaN |
ID0002 | O_1_1 | 1 | 168.0 |
ID0002 | O_12_1 | 12 | 0.0 |
appliance_ID | appliance_type | no_of_hours_used_during_last_week | |
---|---|---|---|
household_ID | |||
ID4063 | O_33_1 | 33 | NaN |
ID4063 | O_43_1 | 43 | 2.0 |
ID4063 | O_44_1 | 44 | NaN |
ID4063 | O_45_1 | 45 | 3.0 |
ID4063 | O_45_2 | 45 | 2.0 |
ID4063 | O_45_3 | 45 | NaN |
ID4063 | O_45_4 | 45 | NaN |
ID4063 | O_45_5 | 45 | NaN |
ID4063 | O_47_1 | 47 | NaN |
ID4063 | O_47_2 | 47 | NaN |
Duplicate rows
Most frequently occurring
appliance_ID | appliance_type | no_of_hours_used_during_last_week | # duplicates | |
---|---|---|---|---|
308 | O_1_1 | 1 | 168.0 | 2755 |
843 | O_45_1 | 45 | NaN | 2418 |
859 | O_45_2 | 45 | NaN | 2363 |
628 | O_33_1 | 33 | NaN | 1573 |
886 | O_47_1 | 47 | NaN | 1289 |
863 | O_45_3 | 45 | NaN | 1224 |
443 | O_26_1 | 26 | 1.0 | 1163 |
205 | O_15_1 | 15 | 0.0 | 1104 |
625 | O_32_1 | 32 | NaN | 1051 |
637 | O_34_1 | 34 | NaN | 1033 |