Overview

Brought to you by YData

Dataset statistics

Number of variables3
Number of observations53599
Missing cells21780
Missing cells (%)13.5%
Duplicate rows1160
Duplicate rows (%)2.2%
Total size in memory3.7 MiB
Average record size in memory71.4 B

Variable types

Text1
Numeric2

Alerts

Dataset has 1160 (2.2%) duplicate rowsDuplicates
no_of_hours_used_during_last_week has 21780 (40.6%) missing values Missing
no_of_hours_used_during_last_week has 7837 (14.6%) zeros Zeros

Reproduction

Analysis started2024-11-18 08:37:37.994301
Analysis finished2024-11-18 08:37:39.021516
Duration1.03 second
Software versionydata-profiling vv4.11.0
Download configurationconfig.json

Variables

Distinct210
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
2024-11-18T14:07:39.233272image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.8323663
Min length5

Characters and Unicode

Total characters312609
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)0.1%

Sample

1st rowO_1_1
2nd rowO_12_1
3rd rowO_26_1
4th rowO_31_1
5th rowO_45_1
ValueCountFrequency (%)
o_45_1 3575
 
6.7%
o_26_1 3482
 
6.5%
o_1_1 3433
 
6.4%
o_31_1 3230
 
6.0%
o_12_1 3059
 
5.7%
o_8_1 2965
 
5.5%
o_45_2 2523
 
4.7%
o_24_1 2351
 
4.4%
o_33_1 1677
 
3.1%
o_13_1 1586
 
3.0%
Other values (200) 25718
48.0%
2024-11-18T14:07:39.590068image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 107198
34.3%
1 61383
19.6%
O 53599
17.1%
4 22678
 
7.3%
2 17531
 
5.6%
3 15588
 
5.0%
5 13717
 
4.4%
6 8615
 
2.8%
8 4936
 
1.6%
7 3911
 
1.3%
Other values (2) 3453
 
1.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 312609
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
_ 107198
34.3%
1 61383
19.6%
O 53599
17.1%
4 22678
 
7.3%
2 17531
 
5.6%
3 15588
 
5.0%
5 13717
 
4.4%
6 8615
 
2.8%
8 4936
 
1.6%
7 3911
 
1.3%
Other values (2) 3453
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 312609
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
_ 107198
34.3%
1 61383
19.6%
O 53599
17.1%
4 22678
 
7.3%
2 17531
 
5.6%
3 15588
 
5.0%
5 13717
 
4.4%
6 8615
 
2.8%
8 4936
 
1.6%
7 3911
 
1.3%
Other values (2) 3453
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 312609
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
_ 107198
34.3%
1 61383
19.6%
O 53599
17.1%
4 22678
 
7.3%
2 17531
 
5.6%
3 15588
 
5.0%
5 13717
 
4.4%
6 8615
 
2.8%
8 4936
 
1.6%
7 3911
 
1.3%
Other values (2) 3453
 
1.1%

appliance_type
Real number (ℝ)

Distinct77
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.122913
Minimum1
Maximum77
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 MiB
2024-11-18T14:07:39.708702image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q113
median31
Q345
95-th percentile66
Maximum77
Range76
Interquartile range (IQR)32

Descriptive statistics

Standard deviation18.593103
Coefficient of variation (CV)0.59740882
Kurtosis-0.7891563
Mean31.122913
Median Absolute Deviation (MAD)14
Skewness0.11427738
Sum1668157
Variance345.70346
MonotonicityNot monotonic
2024-11-18T14:07:39.829177image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
45 7976
 
14.9%
1 3545
 
6.6%
26 3535
 
6.6%
31 3402
 
6.3%
12 3223
 
6.0%
8 2989
 
5.6%
24 2395
 
4.5%
43 1802
 
3.4%
47 1733
 
3.2%
33 1705
 
3.2%
Other values (67) 21294
39.7%
ValueCountFrequency (%)
1 3545
6.6%
2 38
 
0.1%
3 10
 
< 0.1%
4 528
 
1.0%
5 902
 
1.7%
6 273
 
0.5%
7 184
 
0.3%
8 2989
5.6%
9 516
 
1.0%
10 443
 
0.8%
ValueCountFrequency (%)
77 53
 
0.1%
76 313
0.6%
75 66
 
0.1%
74 8
 
< 0.1%
73 12
 
< 0.1%
72 378
0.7%
71 26
 
< 0.1%
70 23
 
< 0.1%
69 518
1.0%
68 234
0.4%

no_of_hours_used_during_last_week
Real number (ℝ)

Missing  Zeros 

Distinct374
Distinct (%)1.2%
Missing21780
Missing (%)40.6%
Infinite0
Infinite (%)0.0%
Mean22.21993
Minimum0
Maximum168
Zeros7837
Zeros (%)14.6%
Negative0
Negative (%)0.0%
Memory size2.8 MiB
2024-11-18T14:07:39.945634image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.12
median1.25
Q37
95-th percentile168
Maximum168
Range168
Interquartile range (IQR)6.88

Descriptive statistics

Standard deviation49.93416
Coefficient of variation (CV)2.247269
Kurtosis4.0407563
Mean22.21993
Median Absolute Deviation (MAD)1.25
Skewness2.3841297
Sum707015.96
Variance2493.4203
MonotonicityNot monotonic
2024-11-18T14:07:40.062006image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 7837
 
14.6%
1 2997
 
5.6%
168 2944
 
5.5%
0.25 1647
 
3.1%
2 1432
 
2.7%
0.5 1421
 
2.7%
7 1366
 
2.5%
3.5 1098
 
2.0%
14 999
 
1.9%
3 896
 
1.7%
Other values (364) 9182
17.1%
(Missing) 21780
40.6%
ValueCountFrequency (%)
0 7837
14.6%
0.02 1
 
< 0.1%
0.025 2
 
< 0.1%
0.05 10
 
< 0.1%
0.083 1
 
< 0.1%
0.1 78
 
0.1%
0.11 1
 
< 0.1%
0.117 3
 
< 0.1%
0.12 30
 
0.1%
0.125 9
 
< 0.1%
ValueCountFrequency (%)
168 2944
5.5%
167 3
 
< 0.1%
166.075 1
 
< 0.1%
166 3
 
< 0.1%
165 7
 
< 0.1%
164 5
 
< 0.1%
163 1
 
< 0.1%
161 1
 
< 0.1%
160 24
 
< 0.1%
159 1
 
< 0.1%

Interactions

2024-11-18T14:07:38.427417image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-11-18T14:07:38.182400image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-11-18T14:07:38.533239image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-11-18T14:07:38.321080image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2024-11-18T14:07:40.134850image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
appliance_typeno_of_hours_used_during_last_week
appliance_type1.0000.039
no_of_hours_used_during_last_week0.0391.000

Missing values

2024-11-18T14:07:38.870121image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-11-18T14:07:38.951525image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

appliance_IDappliance_typeno_of_hours_used_during_last_week
household_ID
ID0001O_1_1184.0
ID0001O_12_1123.0
ID0001O_26_1261.0
ID0001O_31_1310.0
ID0001O_45_1455.0
ID0001O_45_2455.0
ID0001O_47_147NaN
ID0001O_47_247NaN
ID0002O_1_11168.0
ID0002O_12_1120.0
appliance_IDappliance_typeno_of_hours_used_during_last_week
household_ID
ID4063O_33_133NaN
ID4063O_43_1432.0
ID4063O_44_144NaN
ID4063O_45_1453.0
ID4063O_45_2452.0
ID4063O_45_345NaN
ID4063O_45_445NaN
ID4063O_45_545NaN
ID4063O_47_147NaN
ID4063O_47_247NaN

Duplicate rows

Most frequently occurring

appliance_IDappliance_typeno_of_hours_used_during_last_week# duplicates
308O_1_11168.02755
843O_45_145NaN2418
859O_45_245NaN2363
628O_33_133NaN1573
886O_47_147NaN1289
863O_45_345NaN1224
443O_26_1261.01163
205O_15_1150.01104
625O_32_132NaN1051
637O_34_134NaN1033