Overview

Brought to you by YData

Dataset statistics

Number of variables16
Number of observations36,408
Missing cells94,148
Missing cells (%)16.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.4 MiB
Average record size in memory128.0 B

Variable types

Text1
Categorical9
Numeric6

Alerts

have_extra_ventilation_openings is highly overall correlated with main_material_used_for_the_floor_of_the_room and 2 other fieldsHigh correlation
main_material_used_for_the_floor_of_the_room is highly overall correlated with have_extra_ventilation_openingsHigh correlation
main_material_used_for_the_roof_of_the_room is highly overall correlated with have_extra_ventilation_openingsHigh correlation
no_of_fans_in_the_room is highly overall correlated with no_of_windowsHigh correlation
no_of_windows is highly overall correlated with no_of_fans_in_the_roomHigh correlation
type_of_ceiling_of_the_room is highly overall correlated with have_extra_ventilation_openingsHigh correlation
main_material_used_for_the_floor_of_the_room is highly imbalanced (51.5%) Imbalance
main_material_used_for_window_panes is highly imbalanced (61.2%) Imbalance
no_of_ACs_in_the_room is highly imbalanced (90.6%) Imbalance
storey_which_the_room_located has 3017 (8.3%) missing values Missing
main_material_used_for_the_roof_of_the_room has 3009 (8.3%) missing values Missing
type_of_ceiling_of_the_room has 3017 (8.3%) missing values Missing
main_material_used_for_the_floor_of_the_room has 3017 (8.3%) missing values Missing
no_of_doors_opened_to_external_environment has 3017 (8.3%) missing values Missing
no_of_windows has 3017 (8.3%) missing values Missing
main_material_used_for_window_panes has 22084 (60.7%) missing values Missing
have_curtains_or_blinds_for_windows has 15296 (42.0%) missing values Missing
have_extra_ventilation_openings has 3017 (8.3%) missing values Missing
no_of_bulbs_in_the_room has 3017 (8.3%) missing values Missing
no_of_bulbs_used_during_last_week has 26603 (73.1%) missing values Missing
no_of_fans_in_the_room has 3017 (8.3%) missing values Missing
no_of_ACs_in_the_room has 3017 (8.3%) missing values Missing
no_of_doors_opened_to_external_environment is highly skewed (γ1 = 30.98014855) Skewed
storey_which_the_room_located has 26381 (72.5%) zeros Zeros
no_of_doors_opened_to_external_environment has 19548 (53.7%) zeros Zeros
no_of_windows has 12279 (33.7%) zeros Zeros
no_of_bulbs_in_the_room has 2257 (6.2%) zeros Zeros
no_of_bulbs_used_during_last_week has 1291 (3.5%) zeros Zeros
no_of_fans_in_the_room has 21657 (59.5%) zeros Zeros

Reproduction

Analysis started2024-12-06 05:54:36.382353
Analysis finished2024-12-06 05:54:41.691799
Duration5.31 seconds
Software versionydata-profiling vv4.11.0
Download configurationconfig.json

Variables

Distinct4063
Distinct (%)11.2%
Missing0
Missing (%)0.0%
Memory size284.6 KiB
2024-12-06T11:24:41.865652image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters218,448
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowID0001
2nd rowID0001
3rd rowID0001
4th rowID0001
5th rowID0001
ValueCountFrequency (%)
id0698 32
 
0.1%
id1165 29
 
0.1%
id1142 24
 
0.1%
id2632 24
 
0.1%
id0255 23
 
0.1%
id1621 23
 
0.1%
id1132 23
 
0.1%
id1676 22
 
0.1%
id3864 22
 
0.1%
id1792 22
 
0.1%
Other values (4053) 36164
99.3%
2024-12-06T11:24:42.184482image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
I 36408
16.7%
D 36408
16.7%
0 20584
9.4%
3 19966
9.1%
1 19852
9.1%
2 19503
8.9%
4 11487
 
5.3%
6 11138
 
5.1%
7 10945
 
5.0%
8 10826
 
5.0%
Other values (2) 21331
9.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 145632
66.7%
Uppercase Letter 72816
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 20584
14.1%
3 19966
13.7%
1 19852
13.6%
2 19503
13.4%
4 11487
7.9%
6 11138
7.6%
7 10945
7.5%
8 10826
7.4%
5 10712
7.4%
9 10619
7.3%
Uppercase Letter
ValueCountFrequency (%)
I 36408
50.0%
D 36408
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 145632
66.7%
Latin 72816
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 20584
14.1%
3 19966
13.7%
1 19852
13.6%
2 19503
13.4%
4 11487
7.9%
6 11138
7.6%
7 10945
7.5%
8 10826
7.4%
5 10712
7.4%
9 10619
7.3%
Latin
ValueCountFrequency (%)
I 36408
50.0%
D 36408
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 218448
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 36408
16.7%
D 36408
16.7%
0 20584
9.4%
3 19966
9.1%
1 19852
9.1%
2 19503
8.9%
4 11487
 
5.3%
6 11138
 
5.1%
7 10945
 
5.0%
8 10826
 
5.0%
Other values (2) 21331
9.8%

room_ID
Categorical

Distinct32
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size284.6 KiB
I1
4063 
I2
4060 
I3
4044 
I4
3967 
I5
3711 
Other values (27)
16563 

Length

Max length8
Median length8
Mean length7.986102
Min length7

Characters and Unicode

Total characters290,758
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowI1
2nd rowI2
3rd rowI3
4th rowI4
5th rowI5

Common Values

ValueCountFrequency (%)
I1 4063
11.2%
I2 4060
11.2%
I3 4044
11.1%
I4 3967
10.9%
I5 3711
10.2%
I6 3243
8.9%
I7 2724
7.5%
I8 2488
6.8%
I9 2382
6.5%
I10 2343
6.4%
Other values (22) 3383
9.3%

Length

2024-12-06T11:24:42.299121image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
i1 4063
11.2%
i2 4060
11.2%
i3 4044
11.1%
i4 3967
10.9%
i5 3711
10.2%
i6 3243
8.9%
i7 2724
7.5%
i8 2488
6.8%
i9 2382
6.5%
i10 2343
6.4%
Other values (22) 3383
9.3%

Most occurring characters

ValueCountFrequency (%)
211201
72.6%
I 36408
 
12.5%
1 10389
 
3.6%
2 4735
 
1.6%
3 4492
 
1.5%
4 4307
 
1.5%
5 3970
 
1.4%
6 3432
 
1.2%
7 2868
 
1.0%
8 2593
 
0.9%
Other values (5) 6363
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Space Separator 211201
72.6%
Decimal Number 41631
 
14.3%
Uppercase Letter 36914
 
12.7%
Lowercase Letter 1012
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 10389
25.0%
2 4735
11.4%
3 4492
10.8%
4 4307
10.3%
5 3970
 
9.5%
6 3432
 
8.2%
7 2868
 
6.9%
8 2593
 
6.2%
9 2455
 
5.9%
0 2390
 
5.7%
Uppercase Letter
ValueCountFrequency (%)
I 36408
98.6%
O 506
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
t 506
50.0%
h 506
50.0%
Space Separator
ValueCountFrequency (%)
211201
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 252832
87.0%
Latin 37926
 
13.0%

Most frequent character per script

Common
ValueCountFrequency (%)
211201
83.5%
1 10389
 
4.1%
2 4735
 
1.9%
3 4492
 
1.8%
4 4307
 
1.7%
5 3970
 
1.6%
6 3432
 
1.4%
7 2868
 
1.1%
8 2593
 
1.0%
9 2455
 
1.0%
Latin
ValueCountFrequency (%)
I 36408
96.0%
O 506
 
1.3%
t 506
 
1.3%
h 506
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 290758
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
211201
72.6%
I 36408
 
12.5%
1 10389
 
3.6%
2 4735
 
1.6%
3 4492
 
1.5%
4 4307
 
1.5%
5 3970
 
1.4%
6 3432
 
1.2%
7 2868
 
1.0%
8 2593
 
0.9%
Other values (5) 6363
 
2.2%
Distinct18
Distinct (%)< 0.1%
Missing3
Missing (%)< 0.1%
Memory size284.6 KiB
Bedrooms
10380 
Bathroom and / or toilets
5767 
Kitchen and/ or pantry
5159 
Living room
4431 
Gaming room
3776 
Other values (13)
6892 

Length

Max length29
Median length25
Mean length13.412663
Min length5

Characters and Unicode

Total characters488,288
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLiving room
2nd rowBedrooms
3rd rowBedrooms
4th rowBedrooms
5th rowBedrooms

Common Values

ValueCountFrequency (%)
Bedrooms 10380
28.5%
Bathroom and / or toilets 5767
15.8%
Kitchen and/ or pantry 5159
14.2%
Living room 4431
12.2%
Gaming room 3776
 
10.4%
Veranda 3405
 
9.4%
Passage 1825
 
5.0%
Servant's Room 736
 
2.0%
Storage room 609
 
1.7%
Other 81
 
0.2%
Other values (8) 236
 
0.6%

Length

2024-12-06T11:24:42.402702image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and 10926
12.9%
or 10926
12.9%
bedrooms 10380
12.3%
room 9616
11.4%
5767
6.8%
bathroom 5767
6.8%
toilets 5767
6.8%
kitchen 5159
6.1%
pantry 5159
6.1%
living 4431
5.2%
Other values (18) 10773
12.7%

Most occurring characters

ValueCountFrequency (%)
o 68996
14.1%
48266
9.9%
r 46164
 
9.5%
a 37657
 
7.7%
n 33682
 
6.9%
m 29563
 
6.1%
t 29224
 
6.0%
e 28198
 
5.8%
d 24806
 
5.1%
i 23739
 
4.9%
Other values (21) 117993
24.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 391155
80.1%
Space Separator 48266
 
9.9%
Uppercase Letter 37152
 
7.6%
Other Punctuation 11715
 
2.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 68996
17.6%
r 46164
11.8%
a 37657
9.6%
n 33682
8.6%
m 29563
7.6%
t 29224
7.5%
e 28198
7.2%
d 24806
 
6.3%
i 23739
 
6.1%
s 20745
 
5.3%
Other values (9) 48381
12.4%
Uppercase Letter
ValueCountFrequency (%)
B 16175
43.5%
K 5159
 
13.9%
L 4431
 
11.9%
G 3853
 
10.4%
V 3449
 
9.3%
P 1825
 
4.9%
S 1408
 
3.8%
R 738
 
2.0%
O 114
 
0.3%
Other Punctuation
ValueCountFrequency (%)
/ 10935
93.3%
' 780
 
6.7%
Space Separator
ValueCountFrequency (%)
48266
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 428307
87.7%
Common 59981
 
12.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 68996
16.1%
r 46164
10.8%
a 37657
8.8%
n 33682
 
7.9%
m 29563
 
6.9%
t 29224
 
6.8%
e 28198
 
6.6%
d 24806
 
5.8%
i 23739
 
5.5%
s 20745
 
4.8%
Other values (18) 85533
20.0%
Common
ValueCountFrequency (%)
48266
80.5%
/ 10935
 
18.2%
' 780
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 488288
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 68996
14.1%
48266
9.9%
r 46164
 
9.5%
a 37657
 
7.7%
n 33682
 
6.9%
m 29563
 
6.1%
t 29224
 
6.0%
e 28198
 
5.8%
d 24806
 
5.1%
i 23739
 
4.9%
Other values (21) 117993
24.2%

storey_which_the_room_located
Real number (ℝ)

Missing  Zeros 

Distinct14
Distinct (%)< 0.1%
Missing3017
Missing (%)8.3%
Infinite0
Infinite (%)0.0%
Mean0.270522
Minimum0
Maximum18
Zeros26381
Zeros (%)72.5%
Negative0
Negative (%)0.0%
Memory size284.6 KiB
2024-12-06T11:24:42.488123image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum18
Range18
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.71070568
Coefficient of variation (CV)2.6271641
Kurtosis75.218635
Mean0.270522
Median Absolute Deviation (MAD)0
Skewness6.6014703
Sum9033
Variance0.50510257
MonotonicityNot monotonic
2024-12-06T11:24:42.579530image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0 26381
72.5%
1 5985
 
16.4%
2 702
 
1.9%
3 152
 
0.4%
8 33
 
0.1%
4 30
 
0.1%
7 29
 
0.1%
5 24
 
0.1%
6 20
 
0.1%
10 12
 
< 0.1%
Other values (4) 23
 
0.1%
(Missing) 3017
 
8.3%
ValueCountFrequency (%)
0 26381
72.5%
1 5985
 
16.4%
2 702
 
1.9%
3 152
 
0.4%
4 30
 
0.1%
5 24
 
0.1%
6 20
 
0.1%
7 29
 
0.1%
8 33
 
0.1%
9 11
 
< 0.1%
ValueCountFrequency (%)
18 1
 
< 0.1%
12 3
 
< 0.1%
11 8
 
< 0.1%
10 12
 
< 0.1%
9 11
 
< 0.1%
8 33
0.1%
7 29
0.1%
6 20
0.1%
5 24
0.1%
4 30
0.1%

main_material_used_for_the_roof_of_the_room
Categorical

High correlation  Missing 

Distinct10
Distinct (%)< 0.1%
Missing3009
Missing (%)8.3%
Memory size284.6 KiB
Asbestos
14980 
Concrete
11566 
Tile
4329 
Garden - Not relevant
1739 
Takaran
 
287
Other values (5)
 
498

Length

Max length21
Median length8
Mean length8.1620108
Min length4

Characters and Unicode

Total characters272,603
Distinct characters32
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowAsbestos
2nd rowAsbestos
3rd rowAsbestos
4th rowAsbestos
5th rowAsbestos

Common Values

ValueCountFrequency (%)
Asbestos 14980
41.1%
Concrete 11566
31.8%
Tile 4329
 
11.9%
Garden - Not relevant 1739
 
4.8%
Takaran 287
 
0.8%
Metal Sheet 226
 
0.6%
Other 211
 
0.6%
Plastic sheets 59
 
0.2%
Tent 1
 
< 0.1%
Cadjun/Palmyra/Straw 1
 
< 0.1%
(Missing) 3009
 
8.3%

Length

2024-12-06T11:24:42.685742image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:42.790221image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
asbestos 14980
38.5%
concrete 11566
29.7%
tile 4329
 
11.1%
garden 1739
 
4.5%
1739
 
4.5%
not 1739
 
4.5%
relevant 1739
 
4.5%
takaran 287
 
0.7%
metal 226
 
0.6%
sheet 226
 
0.6%
Other values (5) 331
 
0.9%

Most occurring characters

ValueCountFrequency (%)
e 48666
17.9%
s 45117
16.6%
t 30807
11.3%
o 28285
10.4%
r 15544
 
5.7%
n 15333
 
5.6%
b 14980
 
5.5%
A 14980
 
5.5%
c 11625
 
4.3%
C 11567
 
4.2%
Other values (22) 35699
13.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 229994
84.4%
Uppercase Letter 35366
 
13.0%
Space Separator 5502
 
2.0%
Dash Punctuation 1739
 
0.6%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 48666
21.2%
s 45117
19.6%
t 30807
13.4%
o 28285
12.3%
r 15544
 
6.8%
n 15333
 
6.7%
b 14980
 
6.5%
c 11625
 
5.1%
l 6354
 
2.8%
a 4628
 
2.0%
Other values (10) 8655
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 14980
42.4%
C 11567
32.7%
T 4617
 
13.1%
G 1739
 
4.9%
N 1739
 
4.9%
S 227
 
0.6%
M 226
 
0.6%
O 211
 
0.6%
P 60
 
0.2%
Space Separator
ValueCountFrequency (%)
5502
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1739
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 265360
97.3%
Common 7243
 
2.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 48666
18.3%
s 45117
17.0%
t 30807
11.6%
o 28285
10.7%
r 15544
 
5.9%
n 15333
 
5.8%
b 14980
 
5.6%
A 14980
 
5.6%
c 11625
 
4.4%
C 11567
 
4.4%
Other values (19) 28456
10.7%
Common
ValueCountFrequency (%)
5502
76.0%
- 1739
 
24.0%
/ 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 272603
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 48666
17.9%
s 45117
16.6%
t 30807
11.3%
o 28285
10.4%
r 15544
 
5.7%
n 15333
 
5.6%
b 14980
 
5.5%
A 14980
 
5.5%
c 11625
 
4.3%
C 11567
 
4.2%
Other values (22) 35699
13.1%

type_of_ceiling_of_the_room
Categorical

High correlation  Missing 

Distinct9
Distinct (%)< 0.1%
Missing3017
Missing (%)8.3%
Memory size284.6 KiB
No ceiling, the concrete slab
10521 
No ceiling, just the roof above
9847 
A conventional ceiling
5960 
Wooden ceiling
1847 
Garden - Not relevant
1843 
Other values (4)
3373 

Length

Max length31
Median length30
Mean length25.608487
Min length5

Characters and Unicode

Total characters855,093
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA beamed ceiling
2nd rowA beamed ceiling
3rd rowA beamed ceiling
4th rowA beamed ceiling
5th rowA beamed ceiling

Common Values

ValueCountFrequency (%)
No ceiling, the concrete slab 10521
28.9%
No ceiling, just the roof above 9847
27.0%
A conventional ceiling 5960
16.4%
Wooden ceiling 1847
 
5.1%
Garden - Not relevant 1843
 
5.1%
A hanging ceiling 1320
 
3.6%
A beamed ceiling 1281
 
3.5%
Other 682
 
1.9%
A polythene cover as a ceiling 90
 
0.2%
(Missing) 3017
 
8.3%

Length

2024-12-06T11:24:42.910113image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:43.009873image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
ceiling 30866
20.6%
no 20368
13.6%
the 20368
13.6%
concrete 10521
 
7.0%
slab 10521
 
7.0%
just 9847
 
6.6%
roof 9847
 
6.6%
above 9847
 
6.6%
a 8741
 
5.8%
conventional 5960
 
4.0%
Other values (11) 12772
8.5%

Most occurring characters

ValueCountFrequency (%)
116267
13.6%
e 98973
11.6%
o 78067
 
9.1%
i 69012
 
8.1%
n 67530
 
7.9%
c 57958
 
6.8%
t 51154
 
6.0%
l 49280
 
5.8%
g 33506
 
3.9%
a 32795
 
3.8%
Other values (19) 200551
23.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 681381
79.7%
Space Separator 116267
 
13.6%
Uppercase Letter 35234
 
4.1%
Other Punctuation 20368
 
2.4%
Dash Punctuation 1843
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 98973
14.5%
o 78067
11.5%
i 69012
10.1%
n 67530
9.9%
c 57958
8.5%
t 51154
7.5%
l 49280
7.2%
g 33506
 
4.9%
a 32795
 
4.8%
r 24826
 
3.6%
Other values (11) 118280
17.4%
Uppercase Letter
ValueCountFrequency (%)
N 22211
63.0%
A 8651
 
24.6%
W 1847
 
5.2%
G 1843
 
5.2%
O 682
 
1.9%
Space Separator
ValueCountFrequency (%)
116267
100.0%
Other Punctuation
ValueCountFrequency (%)
, 20368
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1843
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 716615
83.8%
Common 138478
 
16.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 98973
13.8%
o 78067
10.9%
i 69012
9.6%
n 67530
9.4%
c 57958
 
8.1%
t 51154
 
7.1%
l 49280
 
6.9%
g 33506
 
4.7%
a 32795
 
4.6%
r 24826
 
3.5%
Other values (16) 153514
21.4%
Common
ValueCountFrequency (%)
116267
84.0%
, 20368
 
14.7%
- 1843
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 855093
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
116267
13.6%
e 98973
11.6%
o 78067
 
9.1%
i 69012
 
8.1%
n 67530
 
7.9%
c 57958
 
6.8%
t 51154
 
6.0%
l 49280
 
5.8%
g 33506
 
3.9%
a 32795
 
3.8%
Other values (19) 200551
23.5%

main_material_used_for_the_floor_of_the_room
Categorical

High correlation  Imbalance  Missing 

Distinct11
Distinct (%)< 0.1%
Missing3017
Missing (%)8.3%
Memory size284.6 KiB
Tile
16272 
Cement
12961 
Garden - Not relevant
1806 
Concrete
 
1263
Teraso
 
398
Other values (6)
 
691

Length

Max length21
Median length15
Mean length5.8957803
Min length3

Characters and Unicode

Total characters196,866
Distinct characters28
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTile
2nd rowTile
3rd rowTile
4th rowCement
5th rowCement

Common Values

ValueCountFrequency (%)
Tile 16272
44.7%
Cement 12961
35.6%
Garden - Not relevant 1806
 
5.0%
Concrete 1263
 
3.5%
Teraso 398
 
1.1%
Other 306
 
0.8%
Wood 137
 
0.4%
Granite 73
 
0.2%
Sand 72
 
0.2%
Mud 69
 
0.2%
(Missing) 3017
 
8.3%

Length

2024-12-06T11:24:43.138241image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tile 16272
41.9%
cement 12961
33.4%
garden 1806
 
4.6%
1806
 
4.6%
not 1806
 
4.6%
relevant 1806
 
4.6%
concrete 1263
 
3.3%
teraso 398
 
1.0%
other 306
 
0.8%
wood 171
 
0.4%
Other values (4) 248
 
0.6%

Most occurring characters

ValueCountFrequency (%)
e 50949
25.9%
t 18215
 
9.3%
l 18078
 
9.2%
n 18015
 
9.2%
T 16670
 
8.5%
i 16413
 
8.3%
C 14224
 
7.2%
m 12961
 
6.6%
r 5652
 
2.9%
5452
 
2.8%
Other values (18) 20237
 
10.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 154343
78.4%
Uppercase Letter 35197
 
17.9%
Space Separator 5452
 
2.8%
Dash Punctuation 1806
 
0.9%
Open Punctuation 34
 
< 0.1%
Close Punctuation 34
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 50949
33.0%
t 18215
 
11.8%
l 18078
 
11.7%
n 18015
 
11.7%
i 16413
 
10.6%
m 12961
 
8.4%
r 5652
 
3.7%
a 4155
 
2.7%
o 3809
 
2.5%
d 2152
 
1.4%
Other values (6) 3944
 
2.6%
Uppercase Letter
ValueCountFrequency (%)
T 16670
47.4%
C 14224
40.4%
G 1879
 
5.3%
N 1806
 
5.1%
O 306
 
0.9%
W 171
 
0.5%
S 72
 
0.2%
M 69
 
0.2%
Space Separator
ValueCountFrequency (%)
5452
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1806
100.0%
Open Punctuation
ValueCountFrequency (%)
( 34
100.0%
Close Punctuation
ValueCountFrequency (%)
) 34
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 189540
96.3%
Common 7326
 
3.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 50949
26.9%
t 18215
 
9.6%
l 18078
 
9.5%
n 18015
 
9.5%
T 16670
 
8.8%
i 16413
 
8.7%
C 14224
 
7.5%
m 12961
 
6.8%
r 5652
 
3.0%
a 4155
 
2.2%
Other values (14) 14208
 
7.5%
Common
ValueCountFrequency (%)
5452
74.4%
- 1806
 
24.7%
( 34
 
0.5%
) 34
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 196866
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 50949
25.9%
t 18215
 
9.3%
l 18078
 
9.2%
n 18015
 
9.2%
T 16670
 
8.5%
i 16413
 
8.3%
C 14224
 
7.2%
m 12961
 
6.6%
r 5652
 
2.9%
5452
 
2.8%
Other values (18) 20237
 
10.3%

no_of_doors_opened_to_external_environment
Real number (ℝ)

Missing  Skewed  Zeros 

Distinct18
Distinct (%)0.1%
Missing3017
Missing (%)8.3%
Infinite0
Infinite (%)0.0%
Mean0.50076368
Minimum0
Maximum90
Zeros19548
Zeros (%)53.7%
Negative0
Negative (%)0.0%
Memory size284.6 KiB
2024-12-06T11:24:43.230196image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile1
Maximum90
Range90
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.93502175
Coefficient of variation (CV)1.8671916
Kurtosis2638.9937
Mean0.50076368
Median Absolute Deviation (MAD)0
Skewness30.980149
Sum16721
Variance0.87426566
MonotonicityNot monotonic
2024-12-06T11:24:43.318679image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
0 19548
53.7%
1 12298
33.8%
2 972
 
2.7%
3 280
 
0.8%
4 174
 
0.5%
5 33
 
0.1%
6 33
 
0.1%
8 20
 
0.1%
7 16
 
< 0.1%
12 4
 
< 0.1%
Other values (8) 13
 
< 0.1%
(Missing) 3017
 
8.3%
ValueCountFrequency (%)
0 19548
53.7%
1 12298
33.8%
2 972
 
2.7%
3 280
 
0.8%
4 174
 
0.5%
5 33
 
0.1%
6 33
 
0.1%
7 16
 
< 0.1%
8 20
 
0.1%
9 3
 
< 0.1%
ValueCountFrequency (%)
90 1
 
< 0.1%
41 1
 
< 0.1%
21 1
 
< 0.1%
15 1
 
< 0.1%
13 1
 
< 0.1%
12 4
 
< 0.1%
11 3
 
< 0.1%
10 2
 
< 0.1%
9 3
 
< 0.1%
8 20
0.1%

no_of_windows
Real number (ℝ)

High correlation  Missing  Zeros 

Distinct24
Distinct (%)0.1%
Missing3017
Missing (%)8.3%
Infinite0
Infinite (%)0.0%
Mean1.5957294
Minimum0
Maximum52
Zeros12279
Zeros (%)33.7%
Negative0
Negative (%)0.0%
Memory size284.6 KiB
2024-12-06T11:24:43.408742image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum52
Range52
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.9270753
Coefficient of variation (CV)1.2076455
Kurtosis23.017829
Mean1.5957294
Median Absolute Deviation (MAD)1
Skewness2.7094613
Sum53283
Variance3.7136194
MonotonicityNot monotonic
2024-12-06T11:24:43.506131image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
0 12279
33.7%
1 6797
18.7%
2 6039
16.6%
3 4770
 
13.1%
4 1498
 
4.1%
6 643
 
1.8%
5 466
 
1.3%
8 282
 
0.8%
7 279
 
0.8%
9 137
 
0.4%
Other values (14) 201
 
0.6%
(Missing) 3017
 
8.3%
ValueCountFrequency (%)
0 12279
33.7%
1 6797
18.7%
2 6039
16.6%
3 4770
 
13.1%
4 1498
 
4.1%
5 466
 
1.3%
6 643
 
1.8%
7 279
 
0.8%
8 282
 
0.8%
9 137
 
0.4%
ValueCountFrequency (%)
52 1
 
< 0.1%
28 1
 
< 0.1%
22 2
 
< 0.1%
21 1
 
< 0.1%
20 4
< 0.1%
18 2
 
< 0.1%
17 4
< 0.1%
16 8
< 0.1%
15 6
< 0.1%
14 9
< 0.1%

main_material_used_for_window_panes
Categorical

Imbalance  Missing 

Distinct6
Distinct (%)< 0.1%
Missing22084
Missing (%)60.7%
Memory size284.6 KiB
Glass
10942 
Wood
2839 
Other
 
203
None, it's open
 
185
Net
 
142

Length

Max length21
Median length5
Mean length4.9256493
Min length3

Characters and Unicode

Total characters70,555
Distinct characters21
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGlass
2nd rowGlass
3rd rowGlass
4th rowGlass
5th rowGlass

Common Values

ValueCountFrequency (%)
Glass 10942
30.1%
Wood 2839
 
7.8%
Other 203
 
0.6%
None, it's open 185
 
0.5%
Net 142
 
0.4%
Garden - Not relevant 13
 
< 0.1%
(Missing) 22084
60.7%

Length

2024-12-06T11:24:43.612162image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:43.711898image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
glass 10942
74.3%
wood 2839
 
19.3%
other 203
 
1.4%
none 185
 
1.3%
it's 185
 
1.3%
open 185
 
1.3%
net 142
 
1.0%
garden 13
 
0.1%
13
 
0.1%
not 13
 
0.1%

Most occurring characters

ValueCountFrequency (%)
s 22069
31.3%
a 10968
15.5%
G 10955
15.5%
l 10955
15.5%
o 6061
 
8.6%
d 2852
 
4.0%
W 2839
 
4.0%
e 754
 
1.1%
t 556
 
0.8%
409
 
0.6%
Other values (11) 2137
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 55426
78.6%
Uppercase Letter 14337
 
20.3%
Space Separator 409
 
0.6%
Other Punctuation 370
 
0.5%
Dash Punctuation 13
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 22069
39.8%
a 10968
19.8%
l 10955
19.8%
o 6061
 
10.9%
d 2852
 
5.1%
e 754
 
1.4%
t 556
 
1.0%
n 396
 
0.7%
r 229
 
0.4%
h 203
 
0.4%
Other values (3) 383
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
G 10955
76.4%
W 2839
 
19.8%
N 340
 
2.4%
O 203
 
1.4%
Other Punctuation
ValueCountFrequency (%)
, 185
50.0%
' 185
50.0%
Space Separator
ValueCountFrequency (%)
409
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 69763
98.9%
Common 792
 
1.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 22069
31.6%
a 10968
15.7%
G 10955
15.7%
l 10955
15.7%
o 6061
 
8.7%
d 2852
 
4.1%
W 2839
 
4.1%
e 754
 
1.1%
t 556
 
0.8%
n 396
 
0.6%
Other values (7) 1358
 
1.9%
Common
ValueCountFrequency (%)
409
51.6%
, 185
23.4%
' 185
23.4%
- 13
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 70555
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 22069
31.3%
a 10968
15.5%
G 10955
15.5%
l 10955
15.5%
o 6061
 
8.6%
d 2852
 
4.0%
W 2839
 
4.0%
e 754
 
1.1%
t 556
 
0.8%
409
 
0.6%
Other values (11) 2137
 
3.0%
Distinct3
Distinct (%)< 0.1%
Missing15296
Missing (%)42.0%
Memory size284.6 KiB
Yes
13235 
No
7839 
Garden - Not relevant
 
38

Length

Max length21
Median length3
Mean length2.6610932
Min length2

Characters and Unicode

Total characters56,181
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYes
2nd rowYes
3rd rowYes
4th rowYes
5th rowYes

Common Values

ValueCountFrequency (%)
Yes 13235
36.4%
No 7839
21.5%
Garden - Not relevant 38
 
0.1%
(Missing) 15296
42.0%

Length

2024-12-06T11:24:43.816928image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:43.907627image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
yes 13235
62.4%
no 7839
36.9%
garden 38
 
0.2%
38
 
0.2%
not 38
 
0.2%
relevant 38
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e 13349
23.8%
Y 13235
23.6%
s 13235
23.6%
N 7877
14.0%
o 7877
14.0%
114
 
0.2%
a 76
 
0.1%
n 76
 
0.1%
r 76
 
0.1%
t 76
 
0.1%
Other values (5) 190
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 34879
62.1%
Uppercase Letter 21150
37.6%
Space Separator 114
 
0.2%
Dash Punctuation 38
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 13349
38.3%
s 13235
37.9%
o 7877
22.6%
a 76
 
0.2%
n 76
 
0.2%
r 76
 
0.2%
t 76
 
0.2%
d 38
 
0.1%
l 38
 
0.1%
v 38
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
Y 13235
62.6%
N 7877
37.2%
G 38
 
0.2%
Space Separator
ValueCountFrequency (%)
114
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 38
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 56029
99.7%
Common 152
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 13349
23.8%
Y 13235
23.6%
s 13235
23.6%
N 7877
14.1%
o 7877
14.1%
a 76
 
0.1%
n 76
 
0.1%
r 76
 
0.1%
t 76
 
0.1%
G 38
 
0.1%
Other values (3) 114
 
0.2%
Common
ValueCountFrequency (%)
114
75.0%
- 38
 
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 56181
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 13349
23.8%
Y 13235
23.6%
s 13235
23.6%
N 7877
14.0%
o 7877
14.0%
114
 
0.2%
a 76
 
0.1%
n 76
 
0.1%
r 76
 
0.1%
t 76
 
0.1%
Other values (5) 190
 
0.3%

have_extra_ventilation_openings
Categorical

High correlation  Missing 

Distinct3
Distinct (%)< 0.1%
Missing3017
Missing (%)8.3%
Memory size284.6 KiB
Yes
17714 
No
13370 
Garden - Not relevant
2307 

Length

Max length21
Median length3
Mean length3.8432212
Min length2

Characters and Unicode

Total characters128,329
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
Yes 17714
48.7%
No 13370
36.7%
Garden - Not relevant 2307
 
6.3%
(Missing) 3017
 
8.3%

Length

2024-12-06T11:24:44.006642image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:44.095753image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
yes 17714
43.9%
no 13370
33.2%
garden 2307
 
5.7%
2307
 
5.7%
not 2307
 
5.7%
relevant 2307
 
5.7%

Most occurring characters

ValueCountFrequency (%)
e 24635
19.2%
Y 17714
13.8%
s 17714
13.8%
N 15677
12.2%
o 15677
12.2%
6921
 
5.4%
a 4614
 
3.6%
n 4614
 
3.6%
r 4614
 
3.6%
t 4614
 
3.6%
Other values (5) 11535
9.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 83403
65.0%
Uppercase Letter 35698
27.8%
Space Separator 6921
 
5.4%
Dash Punctuation 2307
 
1.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 24635
29.5%
s 17714
21.2%
o 15677
18.8%
a 4614
 
5.5%
n 4614
 
5.5%
r 4614
 
5.5%
t 4614
 
5.5%
d 2307
 
2.8%
l 2307
 
2.8%
v 2307
 
2.8%
Uppercase Letter
ValueCountFrequency (%)
Y 17714
49.6%
N 15677
43.9%
G 2307
 
6.5%
Space Separator
ValueCountFrequency (%)
6921
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2307
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 119101
92.8%
Common 9228
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 24635
20.7%
Y 17714
14.9%
s 17714
14.9%
N 15677
13.2%
o 15677
13.2%
a 4614
 
3.9%
n 4614
 
3.9%
r 4614
 
3.9%
t 4614
 
3.9%
G 2307
 
1.9%
Other values (3) 6921
 
5.8%
Common
ValueCountFrequency (%)
6921
75.0%
- 2307
 
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 128329
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 24635
19.2%
Y 17714
13.8%
s 17714
13.8%
N 15677
12.2%
o 15677
12.2%
6921
 
5.4%
a 4614
 
3.6%
n 4614
 
3.6%
r 4614
 
3.6%
t 4614
 
3.6%
Other values (5) 11535
9.0%

no_of_bulbs_in_the_room
Real number (ℝ)

Missing  Zeros 

Distinct21
Distinct (%)0.1%
Missing3017
Missing (%)8.3%
Infinite0
Infinite (%)0.0%
Mean1.7330418
Minimum0
Maximum20
Zeros2257
Zeros (%)6.2%
Negative0
Negative (%)0.0%
Memory size284.6 KiB
2024-12-06T11:24:44.179068image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile5
Maximum20
Range20
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.0331509
Coefficient of variation (CV)1.173169
Kurtosis25.597735
Mean1.7330418
Median Absolute Deviation (MAD)0
Skewness4.3728391
Sum57868
Variance4.1337028
MonotonicityNot monotonic
2024-12-06T11:24:44.277506image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
1 20781
57.1%
2 5581
 
15.3%
0 2257
 
6.2%
3 1558
 
4.3%
4 1157
 
3.2%
5 557
 
1.5%
6 452
 
1.2%
7 220
 
0.6%
8 197
 
0.5%
10 166
 
0.5%
Other values (11) 465
 
1.3%
(Missing) 3017
 
8.3%
ValueCountFrequency (%)
0 2257
 
6.2%
1 20781
57.1%
2 5581
 
15.3%
3 1558
 
4.3%
4 1157
 
3.2%
5 557
 
1.5%
6 452
 
1.2%
7 220
 
0.6%
8 197
 
0.5%
9 98
 
0.3%
ValueCountFrequency (%)
20 54
0.1%
19 9
 
< 0.1%
18 23
 
0.1%
17 3
 
< 0.1%
16 16
 
< 0.1%
15 102
0.3%
14 32
 
0.1%
13 23
 
0.1%
12 64
0.2%
11 41
0.1%

no_of_bulbs_used_during_last_week
Real number (ℝ)

Missing  Zeros 

Distinct12
Distinct (%)0.1%
Missing26603
Missing (%)73.1%
Infinite0
Infinite (%)0.0%
Mean1.0333503
Minimum0
Maximum11
Zeros1291
Zeros (%)3.5%
Negative0
Negative (%)0.0%
Memory size284.6 KiB
2024-12-06T11:24:44.371288image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile2
Maximum11
Range11
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.70639199
Coefficient of variation (CV)0.6835939
Kurtosis22.445811
Mean1.0333503
Median Absolute Deviation (MAD)0
Skewness3.1117452
Sum10132
Variance0.49898964
MonotonicityNot monotonic
2024-12-06T11:24:44.458371image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1 7458
 
20.5%
0 1291
 
3.5%
2 726
 
2.0%
3 200
 
0.5%
4 83
 
0.2%
6 19
 
0.1%
5 16
 
< 0.1%
7 6
 
< 0.1%
8 3
 
< 0.1%
11 1
 
< 0.1%
Other values (2) 2
 
< 0.1%
(Missing) 26603
73.1%
ValueCountFrequency (%)
0 1291
 
3.5%
1 7458
20.5%
2 726
 
2.0%
3 200
 
0.5%
4 83
 
0.2%
5 16
 
< 0.1%
6 19
 
0.1%
7 6
 
< 0.1%
8 3
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
11 1
 
< 0.1%
10 1
 
< 0.1%
9 1
 
< 0.1%
8 3
 
< 0.1%
7 6
 
< 0.1%
6 19
 
0.1%
5 16
 
< 0.1%
4 83
 
0.2%
3 200
 
0.5%
2 726
2.0%

no_of_fans_in_the_room
Real number (ℝ)

High correlation  Missing  Zeros 

Distinct11
Distinct (%)< 0.1%
Missing3017
Missing (%)8.3%
Infinite0
Infinite (%)0.0%
Mean0.40504926
Minimum0
Maximum10
Zeros21657
Zeros (%)59.5%
Negative0
Negative (%)0.0%
Memory size284.6 KiB
2024-12-06T11:24:44.547017image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile1
Maximum10
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.63488269
Coefficient of variation (CV)1.5674209
Kurtosis12.267449
Mean0.40504926
Median Absolute Deviation (MAD)0
Skewness2.3606338
Sum13525
Variance0.40307603
MonotonicityNot monotonic
2024-12-06T11:24:44.631712image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
0 21657
59.5%
1 10512
28.9%
2 873
 
2.4%
3 200
 
0.5%
4 101
 
0.3%
5 41
 
0.1%
9 3
 
< 0.1%
8 1
 
< 0.1%
6 1
 
< 0.1%
10 1
 
< 0.1%
(Missing) 3017
 
8.3%
ValueCountFrequency (%)
0 21657
59.5%
1 10512
28.9%
2 873
 
2.4%
3 200
 
0.5%
4 101
 
0.3%
5 41
 
0.1%
6 1
 
< 0.1%
7 1
 
< 0.1%
8 1
 
< 0.1%
9 3
 
< 0.1%
ValueCountFrequency (%)
10 1
 
< 0.1%
9 3
 
< 0.1%
8 1
 
< 0.1%
7 1
 
< 0.1%
6 1
 
< 0.1%
5 41
 
0.1%
4 101
 
0.3%
3 200
 
0.5%
2 873
 
2.4%
1 10512
28.9%

no_of_ACs_in_the_room
Categorical

Imbalance  Missing 

Distinct5
Distinct (%)< 0.1%
Missing3017
Missing (%)8.3%
Memory size284.6 KiB
0.0
32305 
1.0
 
1029
2.0
 
30
3.0
 
26
4.0
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters100,173
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 32305
88.7%
1.0 1029
 
2.8%
2.0 30
 
0.1%
3.0 26
 
0.1%
4.0 1
 
< 0.1%
(Missing) 3017
 
8.3%

Length

2024-12-06T11:24:44.725866image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:44.810827image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0 32305
96.7%
1.0 1029
 
3.1%
2.0 30
 
0.1%
3.0 26
 
0.1%
4.0 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 65696
65.6%
. 33391
33.3%
1 1029
 
1.0%
2 30
 
< 0.1%
3 26
 
< 0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 66782
66.7%
Other Punctuation 33391
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 65696
98.4%
1 1029
 
1.5%
2 30
 
< 0.1%
3 26
 
< 0.1%
4 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 33391
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 100173
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 65696
65.6%
. 33391
33.3%
1 1029
 
1.0%
2 30
 
< 0.1%
3 26
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100173
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 65696
65.6%
. 33391
33.3%
1 1029
 
1.0%
2 30
 
< 0.1%
3 26
 
< 0.1%
4 1
 
< 0.1%

Interactions

2024-12-06T11:24:40.441844image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:37.924048image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.439876image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.929166image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.426379image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.950960image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.529104image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.014587image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.526140image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.017367image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.519269image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.031298image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.605673image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.096828image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.602744image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.094353image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.603197image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.107188image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.685157image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.181467image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.681385image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.173229image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.689998image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.195740image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.771719image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.273853image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.769115image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.260536image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.782856image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.281253image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.855309image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.354361image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:38.850166image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.347369image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:39.866239image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:40.357962image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2024-12-06T11:24:44.884945image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
have_curtains_or_blinds_for_windowshave_extra_ventilation_openingsmain_material_used_for_the_floor_of_the_roommain_material_used_for_the_roof_of_the_roommain_material_used_for_window_panesmain_purpose_of_the_roomno_of_ACs_in_the_roomno_of_bulbs_in_the_roomno_of_bulbs_used_during_last_weekno_of_doors_opened_to_external_environmentno_of_fans_in_the_roomno_of_windowsroom_IDstorey_which_the_room_locatedtype_of_ceiling_of_the_room
have_curtains_or_blinds_for_windows1.0000.3630.1790.2100.2950.3270.1010.1170.0980.0450.0770.1050.2590.0170.151
have_extra_ventilation_openings0.3631.0000.5580.5430.2960.4620.0390.0740.0700.0000.0630.0820.2770.0290.549
main_material_used_for_the_floor_of_the_room0.1790.5581.0000.3570.1780.2230.0820.0730.0510.0000.0430.0440.1260.0250.372
main_material_used_for_the_roof_of_the_room0.2100.5430.3571.0000.1790.2350.0270.0310.0340.0000.0330.0270.1180.0370.455
main_material_used_for_window_panes0.2950.2960.1780.1791.0000.1360.0150.0300.0000.0000.0290.0200.0300.0110.127
main_purpose_of_the_room0.3270.4620.2230.2350.1361.0000.0930.1240.0890.0000.0800.1620.4430.0380.258
no_of_ACs_in_the_room0.1010.0390.0820.0270.0150.0931.0000.3730.1950.0110.2050.0330.0700.0280.069
no_of_bulbs_in_the_room0.1170.0740.0730.0310.0300.1240.3731.0000.3420.1860.3050.2750.1210.0650.079
no_of_bulbs_used_during_last_week0.0980.0700.0510.0340.0000.0890.1950.3421.0000.1870.2150.1810.083-0.1170.069
no_of_doors_opened_to_external_environment0.0450.0000.0000.0000.0000.0000.0110.1860.1871.0000.1650.2690.000-0.0040.003
no_of_fans_in_the_room0.0770.0630.0430.0330.0290.0800.2050.3050.2150.1651.0000.5200.0800.0440.051
no_of_windows0.1050.0820.0440.0270.0200.1620.0330.2750.1810.2690.5201.0000.1590.0220.046
room_ID0.2590.2770.1260.1180.0300.4430.0700.1210.0830.0000.0800.1591.0000.0170.134
storey_which_the_room_located0.0170.0290.0250.0370.0110.0380.0280.065-0.117-0.0040.0440.0220.0171.0000.036
type_of_ceiling_of_the_room0.1510.5490.3720.4550.1270.2580.0690.0790.0690.0030.0510.0460.1340.0361.000

Missing values

2024-12-06T11:24:40.979974image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-12-06T11:24:41.421269image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

household_IDroom_IDmain_purpose_of_the_roomstorey_which_the_room_locatedmain_material_used_for_the_roof_of_the_roomtype_of_ceiling_of_the_roommain_material_used_for_the_floor_of_the_roomno_of_doors_opened_to_external_environmentno_of_windowsmain_material_used_for_window_paneshave_curtains_or_blinds_for_windowshave_extra_ventilation_openingsno_of_bulbs_in_the_roomno_of_bulbs_used_during_last_weekno_of_fans_in_the_roomno_of_ACs_in_the_room
0ID0001I1Living room0.0AsbestosA beamed ceilingTile1.02.0GlassYesNo4.0NaN1.00.0
1ID0001I2Bedrooms0.0AsbestosA beamed ceilingTile1.01.0GlassYesNo1.0NaN1.00.0
2ID0001I3Bedrooms0.0AsbestosA beamed ceilingTile1.01.0GlassYesNo1.0NaN1.00.0
3ID0001I4Bedrooms1.0AsbestosA beamed ceilingCement1.01.0GlassYesNo1.0NaN1.00.0
4ID0001I5Bedrooms1.0AsbestosA beamed ceilingCement1.01.0GlassYesNo1.0NaN1.00.0
5ID0001I6Kitchen and/ or pantry1.0AsbestosA beamed ceilingTile1.01.0GlassYesYes1.0NaN0.00.0
6ID0001I7Bathroom and / or toilets0.0AsbestosA beamed ceilingTile1.00.0NaNNaNYes1.0NaN0.00.0
7ID0001I8Gaming room0.0AsbestosNo ceiling, the concrete slabConcrete0.00.0NaNNaNNo0.0NaN0.00.0
8ID0001I9Servant's Room0.0Garden - Not relevantGarden - Not relevantGarden - Not relevant0.00.0NaNNaNGarden - Not relevant0.0NaN0.00.0
9ID0001I10Passage0.0Garden - Not relevantGarden - Not relevantGarden - Not relevant0.00.0NaNNaNGarden - Not relevant0.0NaN0.00.0
household_IDroom_IDmain_purpose_of_the_roomstorey_which_the_room_locatedmain_material_used_for_the_roof_of_the_roomtype_of_ceiling_of_the_roommain_material_used_for_the_floor_of_the_roomno_of_doors_opened_to_external_environmentno_of_windowsmain_material_used_for_window_paneshave_curtains_or_blinds_for_windowshave_extra_ventilation_openingsno_of_bulbs_in_the_roomno_of_bulbs_used_during_last_weekno_of_fans_in_the_roomno_of_ACs_in_the_room
36398ID4063I1Living room0.0ConcreteNo ceiling, the concrete slabConcrete1.06.0NaNYesNo2.01.01.00.0
36399ID4063I2Living room0.0ConcreteNo ceiling, the concrete slabConcrete1.03.0NaNYesNo1.01.01.00.0
36400ID4063I3Bedrooms0.0ConcreteNo ceiling, the concrete slabConcrete1.03.0NaNYesNo1.01.01.00.0
36401ID4063I4Bedrooms1.0AsbestosNo ceiling, just the roof aboveConcrete1.02.0NaNNoNo0.0NaN0.00.0
36402ID4063I5Bedrooms1.0AsbestosNo ceiling, just the roof aboveConcrete1.02.0NaNNoNo0.0NaN0.00.0
36403ID4063I6Bedrooms1.0AsbestosNo ceiling, just the roof aboveConcrete2.04.0NaNNoNo1.00.00.00.0
36404ID4063I7Kitchen and/ or pantry0.0ConcreteNo ceiling, the concrete slabConcrete1.03.0NaNYesNo1.01.00.00.0
36405ID4063I8Passage0.0ConcreteNo ceiling, the concrete slabConcrete1.03.0NaNNoNo1.01.00.00.0
36406ID4063I9VerandaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
36407ID4063I10VerandaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN