Overview

Brought to you by YData

Dataset statistics

Number of variables26
Number of observations4,063
Missing cells27,643
Missing cells (%)26.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory825.4 KiB
Average record size in memory208.0 B

Variable types

Text1
Numeric6
Categorical17
Boolean2

Alerts

awareness_of_electricity_consumption_of_renters has constant value "I know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc." Constant
charging_method_of_renters_for_electricity is highly overall correlated with no_of_storeys and 1 other fieldsHigh correlation
electricity_provider_csc_area is highly overall correlated with type_of_electricity_meterHigh correlation
floor_which_house_located is highly overall correlated with occupy_renters_boardersHigh correlation
highest_level_of_education_of_the_chief_wage_earner is highly overall correlated with socio_economic_classHigh correlation
is_there_business_carried_out_in_the_household is highly overall correlated with type_of_businessHigh correlation
no_of_storeys is highly overall correlated with charging_method_of_renters_for_electricity and 1 other fieldsHigh correlation
occupation_of_the_chief_wage_earner is highly overall correlated with socio_economic_classHigh correlation
occupy_renters_boarders is highly overall correlated with floor_which_house_located and 1 other fieldsHigh correlation
own_the_house_or_living_on_rent is highly overall correlated with charging_method_of_renters_for_electricity and 1 other fieldsHigh correlation
socio_economic_class is highly overall correlated with highest_level_of_education_of_the_chief_wage_earner and 1 other fieldsHigh correlation
type_of_business is highly overall correlated with is_there_business_carried_out_in_the_household and 1 other fieldsHigh correlation
type_of_electricity_meter is highly overall correlated with electricity_provider_csc_areaHigh correlation
own_the_house_or_living_on_rent is highly imbalanced (68.5%) Imbalance
occupy_renters_boarders is highly imbalanced (86.2%) Imbalance
type_of_house is highly imbalanced (59.8%) Imbalance
charged_method_for_rent_for_electricity is highly imbalanced (79.8%) Imbalance
is_there_business_carried_out_in_the_household is highly imbalanced (73.2%) Imbalance
main_material_used_for_roof_of_the_house is highly imbalanced (50.4%) Imbalance
any_constructions_or_renovations_in_the_household is highly imbalanced (71.3%) Imbalance
occupy_renters_boarders has 536 (13.2%) missing values Missing
awareness_of_electricity_consumption_of_renters has 3959 (97.4%) missing values Missing
floor_which_house_located has 3970 (97.7%) missing values Missing
no_of_storeys has 3814 (93.9%) missing values Missing
charging_method_of_renters_for_electricity has 3959 (97.4%) missing values Missing
charged_method_for_rent_for_electricity has 3527 (86.8%) missing values Missing
type_of_business has 3877 (95.4%) missing values Missing
whom_or_how_the_house_was_designed has 1280 (31.5%) missing values Missing
availability_of_certificate_of_compliance has 1280 (31.5%) missing values Missing
main_material_used_for_roof_of_the_house has 1280 (31.5%) missing values Missing
total_monthly_expenditure_of_last_month has 135 (3.3%) missing values Missing
household_ID has unique values Unique

Reproduction

Analysis started2024-12-06 05:54:12.129767
Analysis finished2024-12-06 05:54:18.907451
Duration6.78 seconds
Software versionydata-profiling vv4.11.0
Download configurationconfig.json

Variables

household_ID
Text

Unique 

Distinct4063
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
2024-12-06T11:24:19.109970image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters24,378
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4,063 ?
Unique (%)100.0%

Sample

1st rowID0001
2nd rowID0002
3rd rowID0003
4th rowID0004
5th rowID0005
ValueCountFrequency (%)
id0039 1
 
< 0.1%
id4063 1
 
< 0.1%
id0001 1
 
< 0.1%
id0002 1
 
< 0.1%
id0003 1
 
< 0.1%
id0004 1
 
< 0.1%
id0005 1
 
< 0.1%
id0006 1
 
< 0.1%
id0007 1
 
< 0.1%
id0008 1
 
< 0.1%
Other values (4053) 4053
99.8%
2024-12-06T11:24:19.440328image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
I 4063
16.7%
D 4063
16.7%
0 2277
9.3%
3 2217
9.1%
2 2217
9.1%
1 2217
9.1%
4 1280
 
5.3%
5 1216
 
5.0%
6 1210
 
5.0%
7 1206
 
4.9%
Other values (2) 2412
9.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16252
66.7%
Uppercase Letter 8126
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2277
14.0%
3 2217
13.6%
2 2217
13.6%
1 2217
13.6%
4 1280
7.9%
5 1216
7.5%
6 1210
7.4%
7 1206
7.4%
8 1206
7.4%
9 1206
7.4%
Uppercase Letter
ValueCountFrequency (%)
I 4063
50.0%
D 4063
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 16252
66.7%
Latin 8126
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2277
14.0%
3 2217
13.6%
2 2217
13.6%
1 2217
13.6%
4 1280
7.9%
5 1216
7.5%
6 1210
7.4%
7 1206
7.4%
8 1206
7.4%
9 1206
7.4%
Latin
ValueCountFrequency (%)
I 4063
50.0%
D 4063
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24378
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 4063
16.7%
D 4063
16.7%
0 2277
9.3%
3 2217
9.1%
2 2217
9.1%
1 2217
9.1%
4 1280
 
5.3%
5 1216
 
5.0%
6 1210
 
5.0%
7 1206
 
4.9%
Other values (2) 2412
9.9%

no_of_electricity_meters
Real number (ℝ)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0762983
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.9 KiB
2024-12-06T11:24:19.536459image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum7
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.31628519
Coefficient of variation (CV)0.29386387
Kurtosis51.158272
Mean1.0762983
Median Absolute Deviation (MAD)0
Skewness5.6452849
Sum4373
Variance0.10003632
MonotonicityNot monotonic
2024-12-06T11:24:19.616639image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 3799
93.5%
2 225
 
5.5%
3 36
 
0.9%
5 1
 
< 0.1%
7 1
 
< 0.1%
4 1
 
< 0.1%
ValueCountFrequency (%)
1 3799
93.5%
2 225
 
5.5%
3 36
 
0.9%
4 1
 
< 0.1%
5 1
 
< 0.1%
7 1
 
< 0.1%
ValueCountFrequency (%)
7 1
 
< 0.1%
5 1
 
< 0.1%
4 1
 
< 0.1%
3 36
 
0.9%
2 225
 
5.5%
1 3799
93.5%

electricity_provider_csc_area
Categorical

High correlation 

Distinct23
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
MORATUWA NORTH
533 
MORATUWA SOUTH
370 
PANADURA
357 
GALLE
 
216
KESELWATTA
 
206
Other values (18)
2381 

Length

Max length14
Median length11
Mean length9.6475511
Min length5

Characters and Unicode

Total characters39,198
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGALLE
2nd rowGALLE
3rd rowGALLE
4th rowBORALASGAMUWA
5th rowKOLONNAWA

Common Values

ValueCountFrequency (%)
MORATUWA NORTH 533
 
13.1%
MORATUWA SOUTH 370
 
9.1%
PANADURA 357
 
8.8%
GALLE 216
 
5.3%
KESELWATTA 206
 
5.1%
MAHARAGAMA 202
 
5.0%
PAYAGALA 196
 
4.8%
KALUTARA 189
 
4.7%
HIKKADUWA 163
 
4.0%
ALUTHGAMA 158
 
3.9%
Other values (13) 1473
36.3%

Length

2024-12-06T11:24:19.713075image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
moratuwa 903
18.2%
north 533
 
10.7%
south 370
 
7.5%
panadura 357
 
7.2%
galle 216
 
4.3%
keselwatta 206
 
4.1%
maharagama 202
 
4.1%
payagala 196
 
3.9%
kalutara 189
 
3.8%
hikkaduwa 163
 
3.3%
Other values (14) 1631
32.8%

Most occurring characters

ValueCountFrequency (%)
A 10055
25.7%
T 3417
 
8.7%
O 2914
 
7.4%
U 2568
 
6.6%
R 2454
 
6.3%
M 2125
 
5.4%
L 1849
 
4.7%
W 1788
 
4.6%
N 1693
 
4.3%
H 1565
 
4.0%
Other values (12) 8770
22.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 38065
97.1%
Space Separator 903
 
2.3%
Dash Punctuation 230
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 10055
26.4%
T 3417
 
9.0%
O 2914
 
7.7%
U 2568
 
6.7%
R 2454
 
6.4%
M 2125
 
5.6%
L 1849
 
4.9%
W 1788
 
4.7%
N 1693
 
4.4%
H 1565
 
4.1%
Other values (10) 7637
20.1%
Space Separator
ValueCountFrequency (%)
903
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 230
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 38065
97.1%
Common 1133
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 10055
26.4%
T 3417
 
9.0%
O 2914
 
7.7%
U 2568
 
6.7%
R 2454
 
6.4%
M 2125
 
5.6%
L 1849
 
4.9%
W 1788
 
4.7%
N 1693
 
4.4%
H 1565
 
4.1%
Other values (10) 7637
20.1%
Common
ValueCountFrequency (%)
903
79.7%
- 230
 
20.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 39198
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 10055
25.7%
T 3417
 
8.7%
O 2914
 
7.4%
U 2568
 
6.6%
R 2454
 
6.3%
M 2125
 
5.4%
L 1849
 
4.7%
W 1788
 
4.6%
N 1693
 
4.3%
H 1565
 
4.0%
Other values (12) 8770
22.4%

own_the_house_or_living_on_rent
Categorical

High correlation  Imbalance 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
Yes, I or a household member owns it.
3527 
No, I am living on rent and the rent is paid by me or a household member.
482 
No, I or any household member does not own or rent this household. We occupy this household without any payment of rent.
 
50
No, I am living on rent and the rent is paid by the employer.
 
4

Length

Max length120
Median length37
Mean length42.315777
Min length37

Characters and Unicode

Total characters171,929
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYes, I or a household member owns it.
2nd rowYes, I or a household member owns it.
3rd rowYes, I or a household member owns it.
4th rowYes, I or a household member owns it.
5th rowYes, I or a household member owns it.

Common Values

ValueCountFrequency (%)
Yes, I or a household member owns it. 3527
86.8%
No, I am living on rent and the rent is paid by me or a household member. 482
 
11.9%
No, I or any household member does not own or rent this household. We occupy this household without any payment of rent. 50
 
1.2%
No, I am living on rent and the rent is paid by the employer. 4
 
0.1%

Length

2024-12-06T11:24:19.804388image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:19.904825image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
household 4159
11.1%
or 4109
10.9%
i 4063
10.8%
member 4059
10.8%
a 4009
10.7%
yes 3527
9.4%
owns 3527
9.4%
it 3527
9.4%
rent 1072
 
2.9%
no 536
 
1.4%
Other values (20) 4978
13.3%

Most occurring characters

ValueCountFrequency (%)
33503
19.5%
e 18006
 
10.5%
o 17280
 
10.1%
s 11849
 
6.9%
r 9244
 
5.4%
m 9140
 
5.3%
h 8958
 
5.2%
n 6307
 
3.7%
i 5621
 
3.3%
a 5617
 
3.3%
Other values (18) 46404
27.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 122074
71.0%
Space Separator 33503
 
19.5%
Other Punctuation 8176
 
4.8%
Uppercase Letter 8176
 
4.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 18006
14.8%
o 17280
14.2%
s 11849
9.7%
r 9244
 
7.6%
m 9140
 
7.5%
h 8958
 
7.3%
n 6307
 
5.2%
i 5621
 
4.6%
a 5617
 
4.6%
t 5389
 
4.4%
Other values (11) 24663
20.2%
Uppercase Letter
ValueCountFrequency (%)
I 4063
49.7%
Y 3527
43.1%
N 536
 
6.6%
W 50
 
0.6%
Other Punctuation
ValueCountFrequency (%)
. 4113
50.3%
, 4063
49.7%
Space Separator
ValueCountFrequency (%)
33503
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 130250
75.8%
Common 41679
 
24.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 18006
13.8%
o 17280
13.3%
s 11849
 
9.1%
r 9244
 
7.1%
m 9140
 
7.0%
h 8958
 
6.9%
n 6307
 
4.8%
i 5621
 
4.3%
a 5617
 
4.3%
t 5389
 
4.1%
Other values (15) 32839
25.2%
Common
ValueCountFrequency (%)
33503
80.4%
. 4113
 
9.9%
, 4063
 
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 171929
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
33503
19.5%
e 18006
 
10.5%
o 17280
 
10.1%
s 11849
 
6.9%
r 9244
 
5.4%
m 9140
 
5.3%
h 8958
 
5.2%
n 6307
 
3.7%
i 5621
 
3.3%
a 5617
 
3.3%
Other values (18) 46404
27.0%

occupy_renters_boarders
Categorical

High correlation  Imbalance  Missing 

Distinct3
Distinct (%)0.1%
Missing536
Missing (%)13.2%
Memory size31.9 KiB
I don't occupy any of the above.
3423 
Renters / boarders who are living in your annexe or any other attached place, maintaining separate living conditions but share the same electricity meter.
 
72
Boarders who live in your house using a room/s that is attached to your living conditions.
 
32

Length

Max length154
Median length32
Mean length35.016728
Min length32

Characters and Unicode

Total characters123,504
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowI don't occupy any of the above.
2nd rowI don't occupy any of the above.
3rd rowI don't occupy any of the above.
4th rowI don't occupy any of the above.
5th rowI don't occupy any of the above.

Common Values

ValueCountFrequency (%)
I don't occupy any of the above. 3423
84.2%
Renters / boarders who are living in your annexe or any other attached place, maintaining separate living conditions but share the same electricity meter. 72
 
1.8%
Boarders who live in your house using a room/s that is attached to your living conditions. 32
 
0.8%
(Missing) 536
 
13.2%

Length

2024-12-06T11:24:20.017622image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:20.106330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
any 3495
13.3%
the 3495
13.3%
don't 3423
13.1%
i 3423
13.1%
occupy 3423
13.1%
of 3423
13.1%
above 3423
13.1%
living 176
 
0.7%
your 136
 
0.5%
in 104
 
0.4%
Other values (26) 1680
6.4%

Most occurring characters

ValueCountFrequency (%)
22674
18.4%
o 14516
11.8%
e 8270
 
6.7%
a 7942
 
6.4%
t 7902
 
6.4%
n 7870
 
6.4%
c 7270
 
5.9%
y 7126
 
5.8%
h 3911
 
3.2%
d 3735
 
3.0%
Other values (20) 32288
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90177
73.0%
Space Separator 22674
 
18.4%
Other Punctuation 7126
 
5.8%
Uppercase Letter 3527
 
2.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 14516
16.1%
e 8270
9.2%
a 7942
8.8%
t 7902
8.8%
n 7870
8.7%
c 7270
8.1%
y 7126
 
7.9%
h 3911
 
4.3%
d 3735
 
4.1%
u 3695
 
4.1%
Other values (12) 17940
19.9%
Other Punctuation
ValueCountFrequency (%)
. 3527
49.5%
' 3423
48.0%
/ 104
 
1.5%
, 72
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
I 3423
97.1%
R 72
 
2.0%
B 32
 
0.9%
Space Separator
ValueCountFrequency (%)
22674
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 93704
75.9%
Common 29800
 
24.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 14516
15.5%
e 8270
 
8.8%
a 7942
 
8.5%
t 7902
 
8.4%
n 7870
 
8.4%
c 7270
 
7.8%
y 7126
 
7.6%
h 3911
 
4.2%
d 3735
 
4.0%
u 3695
 
3.9%
Other values (15) 21467
22.9%
Common
ValueCountFrequency (%)
22674
76.1%
. 3527
 
11.8%
' 3423
 
11.5%
/ 104
 
0.3%
, 72
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 123504
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
22674
18.4%
o 14516
11.8%
e 8270
 
6.7%
a 7942
 
6.4%
t 7902
 
6.4%
n 7870
 
6.4%
c 7270
 
5.9%
y 7126
 
5.8%
h 3911
 
3.2%
d 3735
 
3.0%
Other values (20) 32288
26.1%

awareness_of_electricity_consumption_of_renters
Categorical

Constant  Missing 

Distinct1
Distinct (%)1.0%
Missing3959
Missing (%)97.4%
Memory size31.9 KiB
I know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
104 

Length

Max length218
Median length218
Mean length218
Min length218

Characters and Unicode

Total characters22,672
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
2nd rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
3rd rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
4th rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
5th rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.

Common Values

ValueCountFrequency (%)
I know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc. 104
 
2.6%
(Missing) 3959
97.4%

Length

2024-12-06T11:24:20.203642image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:20.277530image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
the 728
18.4%
they 312
 
7.9%
of 208
 
5.3%
use 208
 
5.3%
and 208
 
5.3%
all 104
 
2.6%
details 104
 
2.6%
electricity 104
 
2.6%
know 104
 
2.6%
about 104
 
2.6%
Other values (17) 1768
44.7%

Most occurring characters

ValueCountFrequency (%)
3848
17.0%
e 2912
12.8%
t 2080
 
9.2%
h 1456
 
6.4%
a 1248
 
5.5%
s 1248
 
5.5%
n 1144
 
5.0%
i 1040
 
4.6%
o 936
 
4.1%
c 832
 
3.7%
Other values (17) 5928
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17992
79.4%
Space Separator 3848
 
17.0%
Other Punctuation 728
 
3.2%
Uppercase Letter 104
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2912
16.2%
t 2080
11.6%
h 1456
 
8.1%
a 1248
 
6.9%
s 1248
 
6.9%
n 1144
 
6.4%
i 1040
 
5.8%
o 936
 
5.2%
c 832
 
4.6%
l 728
 
4.0%
Other values (11) 4368
24.3%
Other Punctuation
ValueCountFrequency (%)
. 312
42.9%
; 208
28.6%
/ 104
 
14.3%
, 104
 
14.3%
Space Separator
ValueCountFrequency (%)
3848
100.0%
Uppercase Letter
ValueCountFrequency (%)
I 104
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 18096
79.8%
Common 4576
 
20.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2912
16.1%
t 2080
11.5%
h 1456
 
8.0%
a 1248
 
6.9%
s 1248
 
6.9%
n 1144
 
6.3%
i 1040
 
5.7%
o 936
 
5.2%
c 832
 
4.6%
l 728
 
4.0%
Other values (12) 4472
24.7%
Common
ValueCountFrequency (%)
3848
84.1%
. 312
 
6.8%
; 208
 
4.5%
/ 104
 
2.3%
, 104
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 22672
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3848
17.0%
e 2912
12.8%
t 2080
 
9.2%
h 1456
 
6.4%
a 1248
 
5.5%
s 1248
 
5.5%
n 1144
 
5.0%
i 1040
 
4.6%
o 936
 
4.1%
c 832
 
3.7%
Other values (17) 5928
26.1%
Distinct7
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
2000-2009
918 
2010-2019
758 
Before 1980
740 
1990-1999
615 
1980-1989
482 
Other values (2)
550 

Length

Max length21
Median length9
Mean length10.108787
Min length9

Characters and Unicode

Total characters41,072
Distinct characters20
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2000-2009
2nd rowBefore 1980
3rd row1980-1989
4th row2010-2019
5th row2010-2019

Common Values

ValueCountFrequency (%)
2000-2009 918
22.6%
2010-2019 758
18.7%
Before 1980 740
18.2%
1990-1999 615
15.1%
1980-1989 482
11.9%
Don't know 325
 
8.0%
In 2020 or After 2020 225
 
5.5%

Length

2024-12-06T11:24:20.365330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:20.461581image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
2000-2009 918
15.2%
2010-2019 758
12.6%
before 740
12.3%
1980 740
12.3%
1990-1999 615
10.2%
1980-1989 482
8.0%
2020 450
7.5%
don't 325
 
5.4%
know 325
 
5.4%
in 225
 
3.7%
Other values (2) 450
7.5%

Most occurring characters

ValueCountFrequency (%)
0 9601
23.4%
9 6937
16.9%
1 4450
10.8%
2 4252
10.4%
- 2773
 
6.8%
1965
 
4.8%
e 1705
 
4.2%
8 1704
 
4.1%
o 1615
 
3.9%
r 1190
 
2.9%
Other values (10) 4880
11.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 26944
65.6%
Lowercase Letter 7550
 
18.4%
Dash Punctuation 2773
 
6.8%
Space Separator 1965
 
4.8%
Uppercase Letter 1515
 
3.7%
Other Punctuation 325
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1705
22.6%
o 1615
21.4%
r 1190
15.8%
f 965
12.8%
n 875
11.6%
t 550
 
7.3%
k 325
 
4.3%
w 325
 
4.3%
Decimal Number
ValueCountFrequency (%)
0 9601
35.6%
9 6937
25.7%
1 4450
16.5%
2 4252
15.8%
8 1704
 
6.3%
Uppercase Letter
ValueCountFrequency (%)
B 740
48.8%
D 325
21.5%
I 225
 
14.9%
A 225
 
14.9%
Dash Punctuation
ValueCountFrequency (%)
- 2773
100.0%
Space Separator
ValueCountFrequency (%)
1965
100.0%
Other Punctuation
ValueCountFrequency (%)
' 325
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 32007
77.9%
Latin 9065
 
22.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1705
18.8%
o 1615
17.8%
r 1190
13.1%
f 965
10.6%
n 875
9.7%
B 740
8.2%
t 550
 
6.1%
D 325
 
3.6%
k 325
 
3.6%
w 325
 
3.6%
Other values (2) 450
 
5.0%
Common
ValueCountFrequency (%)
0 9601
30.0%
9 6937
21.7%
1 4450
13.9%
2 4252
13.3%
- 2773
 
8.7%
1965
 
6.1%
8 1704
 
5.3%
' 325
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41072
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 9601
23.4%
9 6937
16.9%
1 4450
10.8%
2 4252
10.4%
- 2773
 
6.8%
1965
 
4.8%
e 1705
 
4.2%
8 1704
 
4.1%
o 1615
 
3.9%
r 1190
 
2.9%
Other values (10) 4880
11.9%

type_of_house
Categorical

Imbalance 

Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
Single House - Single Floor
2482 
Single House - Double Floor
1332 
Single House - More than 2 floors
 
113
Flat
 
80
Condominium/ Luxury apartments
 
13
Other values (5)
 
43

Length

Max length33
Median length27
Mean length26.605464
Min length4

Characters and Unicode

Total characters108,098
Distinct characters35
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSingle House - Double Floor
2nd rowSingle House - Single Floor
3rd rowSingle House - Single Floor
4th rowSingle House - Double Floor
5th rowFlat

Common Values

ValueCountFrequency (%)
Single House - Single Floor 2482
61.1%
Single House - Double Floor 1332
32.8%
Single House - More than 2 floors 113
 
2.8%
Flat 80
 
2.0%
Condominium/ Luxury apartments 13
 
0.3%
Slum / Shanty 11
 
0.3%
Line room/row house 11
 
0.3%
Attached house / Annex 10
 
0.2%
Twin houses 9
 
0.2%
Other 2
 
< 0.1%

Length

2024-12-06T11:24:20.575361image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:20.678674image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
single 6409
31.9%
house 3948
19.6%
3948
19.6%
floor 3814
19.0%
double 1332
 
6.6%
more 113
 
0.6%
than 113
 
0.6%
2 113
 
0.6%
floors 113
 
0.6%
flat 80
 
0.4%
Other values (12) 123
 
0.6%

Most occurring characters

ValueCountFrequency (%)
16043
14.8%
o 13315
12.3%
e 11857
11.0%
l 11759
10.9%
n 6612
 
6.1%
i 6455
 
6.0%
S 6431
 
5.9%
g 6409
 
5.9%
u 5339
 
4.9%
s 4092
 
3.8%
Other values (25) 19786
18.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 72205
66.8%
Space Separator 16043
 
14.8%
Uppercase Letter 15765
 
14.6%
Dash Punctuation 3927
 
3.6%
Decimal Number 113
 
0.1%
Other Punctuation 45
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 13315
18.4%
e 11857
16.4%
l 11759
16.3%
n 6612
9.2%
i 6455
8.9%
g 6409
8.9%
u 5339
7.4%
s 4092
 
5.7%
r 4090
 
5.7%
b 1332
 
1.8%
Other values (11) 945
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
S 6431
40.8%
H 3927
24.9%
F 3894
24.7%
D 1332
 
8.4%
M 113
 
0.7%
L 24
 
0.2%
A 20
 
0.1%
C 13
 
0.1%
T 9
 
0.1%
O 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
16043
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3927
100.0%
Decimal Number
ValueCountFrequency (%)
2 113
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 45
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 87970
81.4%
Common 20128
 
18.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 13315
15.1%
e 11857
13.5%
l 11759
13.4%
n 6612
7.5%
i 6455
7.3%
S 6431
7.3%
g 6409
7.3%
u 5339
6.1%
s 4092
 
4.7%
r 4090
 
4.6%
Other values (21) 11611
13.2%
Common
ValueCountFrequency (%)
16043
79.7%
- 3927
 
19.5%
2 113
 
0.6%
/ 45
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 108098
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16043
14.8%
o 13315
12.3%
e 11857
11.0%
l 11759
10.9%
n 6612
 
6.1%
i 6455
 
6.0%
S 6431
 
5.9%
g 6409
 
5.9%
u 5339
 
4.9%
s 4092
 
3.8%
Other values (25) 19786
18.3%

floor_which_house_located
Real number (ℝ)

High correlation  Missing 

Distinct12
Distinct (%)12.9%
Missing3970
Missing (%)97.7%
Infinite0
Infinite (%)0.0%
Mean2.7741935
Minimum0
Maximum11
Zeros17
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size31.9 KiB
2024-12-06T11:24:20.790764image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile9
Maximum11
Range11
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.8173725
Coefficient of variation (CV)1.0155645
Kurtosis0.61342339
Mean2.7741935
Median Absolute Deviation (MAD)1
Skewness1.2342572
Sum258
Variance7.9375877
MonotonicityNot monotonic
2024-12-06T11:24:20.874711image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1 24
 
0.6%
2 18
 
0.4%
0 17
 
0.4%
3 7
 
0.2%
4 7
 
0.2%
5 4
 
0.1%
8 4
 
0.1%
6 3
 
0.1%
7 3
 
0.1%
9 3
 
0.1%
Other values (2) 3
 
0.1%
(Missing) 3970
97.7%
ValueCountFrequency (%)
0 17
0.4%
1 24
0.6%
2 18
0.4%
3 7
 
0.2%
4 7
 
0.2%
5 4
 
0.1%
6 3
 
0.1%
7 3
 
0.1%
8 4
 
0.1%
9 3
 
0.1%
ValueCountFrequency (%)
11 1
 
< 0.1%
10 2
 
< 0.1%
9 3
 
0.1%
8 4
 
0.1%
7 3
 
0.1%
6 3
 
0.1%
5 4
 
0.1%
4 7
 
0.2%
3 7
 
0.2%
2 18
0.4%

no_of_storeys
Real number (ℝ)

High correlation  Missing 

Distinct6
Distinct (%)2.4%
Missing3814
Missing (%)93.9%
Infinite0
Infinite (%)0.0%
Mean1.746988
Minimum0
Maximum5
Zeros35
Zeros (%)0.9%
Negative0
Negative (%)0.0%
Memory size31.9 KiB
2024-12-06T11:24:20.954837image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q33
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.1378039
Coefficient of variation (CV)0.65129465
Kurtosis-1.2583883
Mean1.746988
Median Absolute Deviation (MAD)1
Skewness0.013050986
Sum435
Variance1.2945977
MonotonicityNot monotonic
2024-12-06T11:24:21.034756image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
3 93
 
2.3%
1 91
 
2.2%
0 35
 
0.9%
2 28
 
0.7%
4 1
 
< 0.1%
5 1
 
< 0.1%
(Missing) 3814
93.9%
ValueCountFrequency (%)
0 35
 
0.9%
1 91
2.2%
2 28
 
0.7%
3 93
2.3%
4 1
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
5 1
 
< 0.1%
4 1
 
< 0.1%
3 93
2.3%
2 28
 
0.7%
1 91
2.2%
0 35
 
0.9%

floor_area
Real number (ℝ)

Distinct386
Distinct (%)9.6%
Missing26
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean1356.1812
Minimum100
Maximum9000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.9 KiB
2024-12-06T11:24:21.134343image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile300
Q1600
median1000
Q32000
95-th percentile3000
Maximum9000
Range8900
Interquartile range (IQR)1400

Descriptive statistics

Standard deviation950.2113
Coefficient of variation (CV)0.70065219
Kurtosis2.1828572
Mean1356.1812
Median Absolute Deviation (MAD)500
Skewness1.2111539
Sum5474903.3
Variance902901.51
MonotonicityNot monotonic
2024-12-06T11:24:21.245246image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3000 277
 
6.8%
1000 264
 
6.5%
1200 256
 
6.3%
800 203
 
5.0%
600 202
 
5.0%
2400 180
 
4.4%
1500 175
 
4.3%
2000 171
 
4.2%
500 140
 
3.4%
400 111
 
2.7%
Other values (376) 2058
50.7%
ValueCountFrequency (%)
100 12
0.3%
108 1
 
< 0.1%
120 2
 
< 0.1%
125 1
 
< 0.1%
136.5 1
 
< 0.1%
140 2
 
< 0.1%
143 1
 
< 0.1%
144 1
 
< 0.1%
150 22
0.5%
160 1
 
< 0.1%
ValueCountFrequency (%)
9000 2
 
< 0.1%
6000 1
 
< 0.1%
5000 1
 
< 0.1%
4700 1
 
< 0.1%
4600 7
 
0.2%
4400 8
 
0.2%
4200 15
0.4%
4000 23
0.6%
3960 2
 
< 0.1%
3900 1
 
< 0.1%

no_of_household_members
Real number (ℝ)

Distinct13
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.0044302
Minimum1
Maximum13
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.9 KiB
2024-12-06T11:24:21.333857image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median4
Q35
95-th percentile7
Maximum13
Range12
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.6872622
Coefficient of variation (CV)0.42134889
Kurtosis1.2549374
Mean4.0044302
Median Absolute Deviation (MAD)1
Skewness0.68467186
Sum16270
Variance2.8468538
MonotonicityNot monotonic
2024-12-06T11:24:21.419971image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
4 1014
25.0%
3 818
20.1%
5 772
19.0%
2 602
14.8%
6 407
10.0%
1 186
 
4.6%
7 144
 
3.5%
8 65
 
1.6%
9 29
 
0.7%
10 15
 
0.4%
Other values (3) 11
 
0.3%
ValueCountFrequency (%)
1 186
 
4.6%
2 602
14.8%
3 818
20.1%
4 1014
25.0%
5 772
19.0%
6 407
10.0%
7 144
 
3.5%
8 65
 
1.6%
9 29
 
0.7%
10 15
 
0.4%
ValueCountFrequency (%)
13 2
 
< 0.1%
12 4
 
0.1%
11 5
 
0.1%
10 15
 
0.4%
9 29
 
0.7%
8 65
 
1.6%
7 144
 
3.5%
6 407
10.0%
5 772
19.0%
4 1014
25.0%

charging_method_of_renters_for_electricity
Categorical

High correlation  Missing 

Distinct5
Distinct (%)4.8%
Missing3959
Missing (%)97.4%
Memory size31.9 KiB
You charge a fixed amount every month for electricity.
34 
You don't charge them for electricity consumption.
24 
You charge an amount for electricity depending on the variance of the bill.
21 
You don't charge a specific amount for electricity but charge a fixed amount for all the utilities such as electricity, water etc.
19 
You don't charge a specific amount for electricity but charge a varied amount for all the utilities such as electricity, water etc. The amount charged varied based on the utility bills.

Length

Max length185
Median length130
Mean length78.759615
Min length50

Characters and Unicode

Total characters8,191
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYou don't charge them for electricity consumption.
2nd rowYou don't charge them for electricity consumption.
3rd rowYou don't charge them for electricity consumption.
4th rowYou don't charge a specific amount for electricity but charge a fixed amount for all the utilities such as electricity, water etc.
5th rowYou don't charge a specific amount for electricity but charge a fixed amount for all the utilities such as electricity, water etc.

Common Values

ValueCountFrequency (%)
You charge a fixed amount every month for electricity. 34
 
0.8%
You don't charge them for electricity consumption. 24
 
0.6%
You charge an amount for electricity depending on the variance of the bill. 21
 
0.5%
You don't charge a specific amount for electricity but charge a fixed amount for all the utilities such as electricity, water etc. 19
 
0.5%
You don't charge a specific amount for electricity but charge a varied amount for all the utilities such as electricity, water etc. The amount charged varied based on the utility bills. 6
 
0.1%
(Missing) 3959
97.4%

Length

2024-12-06T11:24:21.515011image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:21.838313image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
charge 129
 
9.5%
for 129
 
9.5%
electricity 129
 
9.5%
amount 111
 
8.2%
you 104
 
7.7%
a 84
 
6.2%
the 79
 
5.8%
fixed 53
 
3.9%
don't 49
 
3.6%
every 34
 
2.5%
Other values (22) 450
33.3%

Most occurring characters

ValueCountFrequency (%)
1247
15.2%
e 798
 
9.7%
t 710
 
8.7%
i 553
 
6.8%
c 538
 
6.6%
o 523
 
6.4%
a 486
 
5.9%
r 485
 
5.9%
n 353
 
4.3%
u 320
 
3.9%
Other values (18) 2178
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6650
81.2%
Space Separator 1247
 
15.2%
Other Punctuation 184
 
2.2%
Uppercase Letter 110
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 798
12.0%
t 710
10.7%
i 553
 
8.3%
c 538
 
8.1%
o 523
 
7.9%
a 486
 
7.3%
r 485
 
7.3%
n 353
 
5.3%
u 320
 
4.8%
h 297
 
4.5%
Other values (12) 1587
23.9%
Other Punctuation
ValueCountFrequency (%)
. 110
59.8%
' 49
26.6%
, 25
 
13.6%
Uppercase Letter
ValueCountFrequency (%)
Y 104
94.5%
T 6
 
5.5%
Space Separator
ValueCountFrequency (%)
1247
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6760
82.5%
Common 1431
 
17.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 798
11.8%
t 710
10.5%
i 553
 
8.2%
c 538
 
8.0%
o 523
 
7.7%
a 486
 
7.2%
r 485
 
7.2%
n 353
 
5.2%
u 320
 
4.7%
h 297
 
4.4%
Other values (14) 1697
25.1%
Common
ValueCountFrequency (%)
1247
87.1%
. 110
 
7.7%
' 49
 
3.4%
, 25
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8191
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1247
15.2%
e 798
 
9.7%
t 710
 
8.7%
i 553
 
6.8%
c 538
 
6.6%
o 523
 
6.4%
a 486
 
5.9%
r 485
 
5.9%
n 353
 
4.3%
u 320
 
3.9%
Other values (18) 2178
26.6%

charged_method_for_rent_for_electricity
Categorical

Imbalance  Missing 

Distinct6
Distinct (%)1.1%
Missing3527
Missing (%)86.8%
Memory size31.9 KiB
You pay the full amount of the electricity bill.
496 
You don't pay the owner for electricity consumption.
 
17
You pay a fixed amount to the owner every month for electricity.
 
13
You pay a varied amount to the owner every month for electricity. The amount paid varies depending on the variance of the bill.
 
6
You don't pay a specific amount for electricity, but pay a fixed amount for all the utilities such as electricity, water etc.
 
3

Length

Max length197
Median length48
Mean length50.108209
Min length48

Characters and Unicode

Total characters26,858
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st rowYou pay the full amount of the electricity bill.
2nd rowYou pay the full amount of the electricity bill.
3rd rowYou don't pay the owner for electricity consumption.
4th rowYou pay the full amount of the electricity bill.
5th rowYou pay the full amount of the electricity bill.

Common Values

ValueCountFrequency (%)
You pay the full amount of the electricity bill. 496
 
12.2%
You don't pay the owner for electricity consumption. 17
 
0.4%
You pay a fixed amount to the owner every month for electricity. 13
 
0.3%
You pay a varied amount to the owner every month for electricity. The amount paid varies depending on the variance of the bill. 6
 
0.1%
You don't pay a specific amount for electricity, but pay a fixed amount for all the utilities such as electricity, water etc. 3
 
0.1%
You don't pay a specific amount for electricity, but pay a varied amount for all the utilities such as electricity, water etc. The amount paid varies depending on the variance of the utility bills. 1
 
< 0.1%
(Missing) 3527
86.8%

Length

2024-12-06T11:24:21.963691image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:22.059220image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
the 1053
21.1%
electricity 540
10.8%
pay 540
10.8%
you 536
10.7%
amount 530
10.6%
of 503
10.1%
bill 502
10.1%
full 496
9.9%
for 44
 
0.9%
owner 36
 
0.7%
Other values (23) 214
 
4.3%

Most occurring characters

ValueCountFrequency (%)
4458
16.6%
t 2754
10.3%
l 2551
 
9.5%
e 2274
 
8.5%
o 1749
 
6.5%
i 1673
 
6.2%
u 1592
 
5.9%
a 1144
 
4.3%
c 1120
 
4.2%
y 1100
 
4.1%
Other values (18) 6443
24.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 21285
79.3%
Space Separator 4458
 
16.6%
Other Punctuation 572
 
2.1%
Uppercase Letter 543
 
2.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 2754
12.9%
l 2551
12.0%
e 2274
10.7%
o 1749
8.2%
i 1673
 
7.9%
u 1592
 
7.5%
a 1144
 
5.4%
c 1120
 
5.3%
y 1100
 
5.2%
h 1076
 
5.1%
Other values (12) 4252
20.0%
Other Punctuation
ValueCountFrequency (%)
. 543
94.9%
' 21
 
3.7%
, 8
 
1.4%
Uppercase Letter
ValueCountFrequency (%)
Y 536
98.7%
T 7
 
1.3%
Space Separator
ValueCountFrequency (%)
4458
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 21828
81.3%
Common 5030
 
18.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 2754
12.6%
l 2551
11.7%
e 2274
10.4%
o 1749
 
8.0%
i 1673
 
7.7%
u 1592
 
7.3%
a 1144
 
5.2%
c 1120
 
5.1%
y 1100
 
5.0%
h 1076
 
4.9%
Other values (14) 4795
22.0%
Common
ValueCountFrequency (%)
4458
88.6%
. 543
 
10.8%
' 21
 
0.4%
, 8
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26858
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4458
16.6%
t 2754
10.3%
l 2551
 
9.5%
e 2274
 
8.5%
o 1749
 
6.5%
i 1673
 
6.2%
u 1592
 
5.9%
a 1144
 
4.3%
c 1120
 
4.2%
y 1100
 
4.1%
Other values (18) 6443
24.0%

is_there_business_carried_out_in_the_household
Boolean

High correlation  Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.1 KiB
False
3877 
True
 
186
ValueCountFrequency (%)
False 3877
95.4%
True 186
 
4.6%
2024-12-06T11:24:22.171205image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

type_of_business
Categorical

High correlation  Missing 

Distinct3
Distinct (%)1.6%
Missing3877
Missing (%)95.4%
Memory size31.9 KiB
Other
121 
A shop
63 
A communication
 
2

Length

Max length15
Median length5
Mean length5.4462366
Min length5

Characters and Unicode

Total characters1,013
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOther
2nd rowOther
3rd rowOther
4th rowOther
5th rowOther

Common Values

ValueCountFrequency (%)
Other 121
 
3.0%
A shop 63
 
1.6%
A communication 2
 
< 0.1%
(Missing) 3877
95.4%

Length

2024-12-06T11:24:22.256716image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:22.338911image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
other 121
48.2%
a 65
25.9%
shop 63
25.1%
communication 2
 
0.8%

Most occurring characters

ValueCountFrequency (%)
h 184
18.2%
t 123
12.1%
O 121
11.9%
e 121
11.9%
r 121
11.9%
o 67
 
6.6%
65
 
6.4%
A 65
 
6.4%
s 63
 
6.2%
p 63
 
6.2%
Other values (6) 20
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 762
75.2%
Uppercase Letter 186
 
18.4%
Space Separator 65
 
6.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
h 184
24.1%
t 123
16.1%
e 121
15.9%
r 121
15.9%
o 67
 
8.8%
s 63
 
8.3%
p 63
 
8.3%
c 4
 
0.5%
m 4
 
0.5%
n 4
 
0.5%
Other values (3) 8
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
O 121
65.1%
A 65
34.9%
Space Separator
ValueCountFrequency (%)
65
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 948
93.6%
Common 65
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
h 184
19.4%
t 123
13.0%
O 121
12.8%
e 121
12.8%
r 121
12.8%
o 67
 
7.1%
A 65
 
6.9%
s 63
 
6.6%
p 63
 
6.6%
c 4
 
0.4%
Other values (5) 16
 
1.7%
Common
ValueCountFrequency (%)
65
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1013
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
h 184
18.2%
t 123
12.1%
O 121
11.9%
e 121
11.9%
r 121
11.9%
o 67
 
6.6%
65
 
6.4%
A 65
 
6.4%
s 63
 
6.2%
p 63
 
6.2%
Other values (6) 20
 
2.0%
Distinct6
Distinct (%)0.2%
Missing1280
Missing (%)31.5%
Memory size31.9 KiB
The house is designed by a certified architect.
1389 
I am not aware of that.
738 
The house plan is not done by an architect, nor checked by a certified architect or engineer. The house was not designed keeping in mind the legal requirements of the local authorities. The house was designed only to suit our needs.
359 
The house plan is not done by an architect, nor checked by a certified architect / engineer, the house is designed to barely pass the legal requirements of the local authorities.
 
125
The house plan was not done by an architect but checked by a certified architect or engineer.
 
117

Length

Max length232
Median length178
Mean length72.238951
Min length23

Characters and Unicode

Total characters201,041
Distinct characters29
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowThe house is designed by a certified architect.
2nd rowThis is a house provided by the government.
3rd rowThe house is designed by a certified architect.
4th rowThe house is designed by a certified architect.
5th rowThe house is designed by a certified architect.

Common Values

ValueCountFrequency (%)
The house is designed by a certified architect. 1389
34.2%
I am not aware of that. 738
18.2%
The house plan is not done by an architect, nor checked by a certified architect or engineer. The house was not designed keeping in mind the legal requirements of the local authorities. The house was designed only to suit our needs. 359
 
8.8%
The house plan is not done by an architect, nor checked by a certified architect / engineer, the house is designed to barely pass the legal requirements of the local authorities. 125
 
3.1%
The house plan was not done by an architect but checked by a certified architect or engineer. 117
 
2.9%
This is a house provided by the government. 55
 
1.4%
(Missing) 1280
31.5%

Length

2024-12-06T11:24:22.455714image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:22.557100image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
the 3856
 
10.5%
house 2888
 
7.9%
by 2646
 
7.2%
architect 2591
 
7.1%
designed 2232
 
6.1%
is 2053
 
5.6%
a 2045
 
5.6%
certified 1990
 
5.4%
not 1698
 
4.6%
of 1222
 
3.3%
Other values (31) 13342
36.5%

Most occurring characters

ValueCountFrequency (%)
33780
16.8%
e 26269
13.1%
i 14455
 
7.2%
t 13961
 
6.9%
a 11327
 
5.6%
h 11213
 
5.6%
s 9999
 
5.0%
n 9808
 
4.9%
o 9649
 
4.8%
r 8926
 
4.4%
Other values (19) 51654
25.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 159525
79.3%
Space Separator 33780
 
16.8%
Other Punctuation 4235
 
2.1%
Uppercase Letter 3501
 
1.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 26269
16.5%
i 14455
9.1%
t 13961
 
8.8%
a 11327
 
7.1%
h 11213
 
7.0%
s 9999
 
6.3%
n 9808
 
6.1%
o 9649
 
6.0%
r 8926
 
5.6%
c 8858
 
5.6%
Other values (13) 35060
22.0%
Other Punctuation
ValueCountFrequency (%)
. 3501
82.7%
, 609
 
14.4%
/ 125
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
T 2763
78.9%
I 738
 
21.1%
Space Separator
ValueCountFrequency (%)
33780
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 163026
81.1%
Common 38015
 
18.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 26269
16.1%
i 14455
 
8.9%
t 13961
 
8.6%
a 11327
 
6.9%
h 11213
 
6.9%
s 9999
 
6.1%
n 9808
 
6.0%
o 9649
 
5.9%
r 8926
 
5.5%
c 8858
 
5.4%
Other values (15) 38561
23.7%
Common
ValueCountFrequency (%)
33780
88.9%
. 3501
 
9.2%
, 609
 
1.6%
/ 125
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 201041
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
33780
16.8%
e 26269
13.1%
i 14455
 
7.2%
t 13961
 
6.9%
a 11327
 
5.6%
h 11213
 
5.6%
s 9999
 
5.0%
n 9808
 
4.9%
o 9649
 
4.8%
r 8926
 
4.4%
Other values (19) 51654
25.7%
Distinct3
Distinct (%)0.1%
Missing1280
Missing (%)31.5%
Memory size31.9 KiB
Yes
1268 
Don't know
851 
No
664 

Length

Max length10
Median length3
Mean length4.9019044
Min length2

Characters and Unicode

Total characters13,642
Distinct characters12
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowYes
3rd rowNo
4th rowNo
5th rowYes

Common Values

ValueCountFrequency (%)
Yes 1268
31.2%
Don't know 851
20.9%
No 664
16.3%
(Missing) 1280
31.5%

Length

2024-12-06T11:24:22.689777image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:22.774698image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
yes 1268
34.9%
don't 851
23.4%
know 851
23.4%
no 664
18.3%

Most occurring characters

ValueCountFrequency (%)
o 2366
17.3%
n 1702
12.5%
Y 1268
9.3%
e 1268
9.3%
s 1268
9.3%
D 851
 
6.2%
' 851
 
6.2%
t 851
 
6.2%
851
 
6.2%
k 851
 
6.2%
Other values (2) 1515
11.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9157
67.1%
Uppercase Letter 2783
 
20.4%
Other Punctuation 851
 
6.2%
Space Separator 851
 
6.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 2366
25.8%
n 1702
18.6%
e 1268
13.8%
s 1268
13.8%
t 851
 
9.3%
k 851
 
9.3%
w 851
 
9.3%
Uppercase Letter
ValueCountFrequency (%)
Y 1268
45.6%
D 851
30.6%
N 664
23.9%
Other Punctuation
ValueCountFrequency (%)
' 851
100.0%
Space Separator
ValueCountFrequency (%)
851
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11940
87.5%
Common 1702
 
12.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 2366
19.8%
n 1702
14.3%
Y 1268
10.6%
e 1268
10.6%
s 1268
10.6%
D 851
 
7.1%
t 851
 
7.1%
k 851
 
7.1%
w 851
 
7.1%
N 664
 
5.6%
Common
ValueCountFrequency (%)
' 851
50.0%
851
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13642
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 2366
17.3%
n 1702
12.5%
Y 1268
9.3%
e 1268
9.3%
s 1268
9.3%
D 851
 
6.2%
' 851
 
6.2%
t 851
 
6.2%
851
 
6.2%
k 851
 
6.2%
Other values (2) 1515
11.1%
Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
Cement Block
1837 
Brick
1621 
I am not aware of that
299 
Cabook
191 
Wood / Takaran / Asbestos
 
49
Other values (5)
 
66

Length

Max length25
Median length22
Mean length9.8624169
Min length3

Characters and Unicode

Total characters40,071
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowBrick
2nd rowBrick
3rd rowBrick
4th rowCement Block
5th rowI am not aware of that

Common Values

ValueCountFrequency (%)
Cement Block 1837
45.2%
Brick 1621
39.9%
I am not aware of that 299
 
7.4%
Cabook 191
 
4.7%
Wood / Takaran / Asbestos 49
 
1.2%
Pressed soil blocks 29
 
0.7%
Stones/Cube stones 19
 
0.5%
Other 9
 
0.2%
Mud 8
 
0.2%
Metal Sheet 1
 
< 0.1%

Length

2024-12-06T11:24:22.865325image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:22.968184image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
cement 1837
24.0%
block 1837
24.0%
brick 1621
21.1%
i 299
 
3.9%
am 299
 
3.9%
not 299
 
3.9%
aware 299
 
3.9%
of 299
 
3.9%
that 299
 
3.9%
cabook 191
 
2.5%
Other values (13) 389
 
5.1%

Most occurring characters

ValueCountFrequency (%)
e 4149
10.4%
k 3727
 
9.3%
3606
 
9.0%
c 3487
 
8.7%
B 3458
 
8.6%
o 3060
 
7.6%
t 2832
 
7.1%
n 2223
 
5.5%
m 2136
 
5.3%
C 2047
 
5.1%
Other values (20) 9346
23.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 30330
75.7%
Uppercase Letter 6018
 
15.0%
Space Separator 3606
 
9.0%
Other Punctuation 117
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 4149
13.7%
k 3727
12.3%
c 3487
11.5%
o 3060
10.1%
t 2832
9.3%
n 2223
7.3%
m 2136
7.0%
r 2007
6.6%
l 1896
6.3%
i 1650
 
5.4%
Other values (8) 3163
10.4%
Uppercase Letter
ValueCountFrequency (%)
B 3458
57.5%
C 2047
34.0%
I 299
 
5.0%
W 49
 
0.8%
T 49
 
0.8%
A 49
 
0.8%
P 29
 
0.5%
S 20
 
0.3%
O 9
 
0.1%
M 9
 
0.1%
Space Separator
ValueCountFrequency (%)
3606
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 117
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 36348
90.7%
Common 3723
 
9.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 4149
11.4%
k 3727
10.3%
c 3487
9.6%
B 3458
9.5%
o 3060
8.4%
t 2832
 
7.8%
n 2223
 
6.1%
m 2136
 
5.9%
C 2047
 
5.6%
r 2007
 
5.5%
Other values (18) 7222
19.9%
Common
ValueCountFrequency (%)
3606
96.9%
/ 117
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 40071
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 4149
10.4%
k 3727
 
9.3%
3606
 
9.0%
c 3487
 
8.7%
B 3458
 
8.6%
o 3060
 
7.6%
t 2832
 
7.1%
n 2223
 
5.5%
m 2136
 
5.3%
C 2047
 
5.1%
Other values (20) 9346
23.3%

main_material_used_for_roof_of_the_house
Categorical

Imbalance  Missing 

Distinct8
Distinct (%)0.3%
Missing1280
Missing (%)31.5%
Memory size31.9 KiB
Asbestos
1665 
Concrete
577 
Tile
494 
Takaran
 
19
Metal Sheet
 
14
Other values (3)
 
14

Length

Max length14
Median length8
Mean length7.298958
Min length4

Characters and Unicode

Total characters20,313
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowAsbestos
2nd rowAsbestos
3rd rowTile
4th rowConcrete
5th rowConcrete

Common Values

ValueCountFrequency (%)
Asbestos 1665
41.0%
Concrete 577
 
14.2%
Tile 494
 
12.2%
Takaran 19
 
0.5%
Metal Sheet 14
 
0.3%
Other 8
 
0.2%
Plastic sheets 5
 
0.1%
Tent 1
 
< 0.1%
(Missing) 1280
31.5%

Length

2024-12-06T11:24:23.100933image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:23.211251image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
asbestos 1665
59.4%
concrete 577
 
20.6%
tile 494
 
17.6%
takaran 19
 
0.7%
metal 14
 
0.5%
sheet 14
 
0.5%
other 8
 
0.3%
plastic 5
 
0.2%
sheets 5
 
0.2%
tent 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
s 5010
24.7%
e 3374
16.6%
t 2289
11.3%
o 2242
11.0%
A 1665
 
8.2%
b 1665
 
8.2%
r 604
 
3.0%
n 597
 
2.9%
c 582
 
2.9%
C 577
 
2.8%
Other values (11) 1708
 
8.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17497
86.1%
Uppercase Letter 2797
 
13.8%
Space Separator 19
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 5010
28.6%
e 3374
19.3%
t 2289
13.1%
o 2242
12.8%
b 1665
 
9.5%
r 604
 
3.5%
n 597
 
3.4%
c 582
 
3.3%
l 513
 
2.9%
i 499
 
2.9%
Other values (3) 122
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 1665
59.5%
C 577
 
20.6%
T 514
 
18.4%
M 14
 
0.5%
S 14
 
0.5%
O 8
 
0.3%
P 5
 
0.2%
Space Separator
ValueCountFrequency (%)
19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20294
99.9%
Common 19
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 5010
24.7%
e 3374
16.6%
t 2289
11.3%
o 2242
11.0%
A 1665
 
8.2%
b 1665
 
8.2%
r 604
 
3.0%
n 597
 
2.9%
c 582
 
2.9%
C 577
 
2.8%
Other values (10) 1689
 
8.3%
Common
ValueCountFrequency (%)
19
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20313
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 5010
24.7%
e 3374
16.6%
t 2289
11.3%
o 2242
11.0%
A 1665
 
8.2%
b 1665
 
8.2%
r 604
 
3.0%
n 597
 
2.9%
c 582
 
2.9%
C 577
 
2.8%
Other values (11) 1708
 
8.4%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.1 KiB
False
3859 
True
 
204
ValueCountFrequency (%)
False 3859
95.0%
True 204
 
5.0%
2024-12-06T11:24:23.305880image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Distinct7
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
O/L or A/L pending / Passed
1879 
Schooling up to Grade 6 - 9
672 
Diploma with O/L or A/L (Non graduate)
563 
Graduate / Post-Grads / Degree level professional qualification
528 
Other professional certificates with O/L or A/L / Part qualification (Non graduate)
272 
Other values (2)
 
149

Length

Max length83
Median length27
Mean length36.543441
Min length10

Characters and Unicode

Total characters148,476
Distinct characters38
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowO/L or A/L pending / Passed
2nd rowDiploma with O/L or A/L (Non graduate)
3rd rowGraduate / Post-Grads / Degree level professional qualification
4th rowOther professional certificates with O/L or A/L / Part qualification (Non graduate)
5th rowSchooling up to Grade 6 - 9

Common Values

ValueCountFrequency (%)
O/L or A/L pending / Passed 1879
46.2%
Schooling up to Grade 6 - 9 672
 
16.5%
Diploma with O/L or A/L (Non graduate) 563
 
13.9%
Graduate / Post-Grads / Degree level professional qualification 528
 
13.0%
Other professional certificates with O/L or A/L / Part qualification (Non graduate) 272
 
6.7%
Primary Education 125
 
3.1%
Illiterate 24
 
0.6%

Length

2024-12-06T11:24:23.397258image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:23.500665image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
3879
14.0%
or 2714
 
9.8%
o/l 2714
 
9.8%
a/l 2714
 
9.8%
pending 1879
 
6.8%
passed 1879
 
6.8%
graduate 1363
 
4.9%
non 835
 
3.0%
with 835
 
3.0%
professional 800
 
2.9%
Other values (17) 8069
29.1%

Most occurring characters

ValueCountFrequency (%)
23618
15.9%
e 10097
 
6.8%
a 9586
 
6.5%
o 9181
 
6.2%
/ 8635
 
5.8%
i 7967
 
5.4%
r 7695
 
5.2%
n 6990
 
4.7%
s 6686
 
4.5%
d 6446
 
4.3%
Other values (28) 51575
34.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 93602
63.0%
Space Separator 23618
 
15.9%
Uppercase Letter 18407
 
12.4%
Other Punctuation 8635
 
5.8%
Decimal Number 1344
 
0.9%
Dash Punctuation 1200
 
0.8%
Open Punctuation 835
 
0.6%
Close Punctuation 835
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 10097
10.8%
a 9586
10.2%
o 9181
9.8%
i 7967
8.5%
r 7695
 
8.2%
n 6990
 
7.5%
s 6686
 
7.1%
d 6446
 
6.9%
t 5459
 
5.8%
l 3939
 
4.2%
Other values (11) 19556
20.9%
Uppercase Letter
ValueCountFrequency (%)
L 5428
29.5%
O 2986
16.2%
P 2804
15.2%
A 2714
14.7%
G 1728
 
9.4%
D 1091
 
5.9%
N 835
 
4.5%
S 672
 
3.7%
E 125
 
0.7%
I 24
 
0.1%
Decimal Number
ValueCountFrequency (%)
6 672
50.0%
9 672
50.0%
Space Separator
ValueCountFrequency (%)
23618
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 8635
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1200
100.0%
Open Punctuation
ValueCountFrequency (%)
( 835
100.0%
Close Punctuation
ValueCountFrequency (%)
) 835
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 112009
75.4%
Common 36467
 
24.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 10097
 
9.0%
a 9586
 
8.6%
o 9181
 
8.2%
i 7967
 
7.1%
r 7695
 
6.9%
n 6990
 
6.2%
s 6686
 
6.0%
d 6446
 
5.8%
t 5459
 
4.9%
L 5428
 
4.8%
Other values (21) 36474
32.6%
Common
ValueCountFrequency (%)
23618
64.8%
/ 8635
 
23.7%
- 1200
 
3.3%
( 835
 
2.3%
) 835
 
2.3%
6 672
 
1.8%
9 672
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 148476
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
23618
15.9%
e 10097
 
6.8%
a 9586
 
6.5%
o 9181
 
6.2%
/ 8635
 
5.8%
i 7967
 
5.4%
r 7695
 
5.2%
n 6990
 
4.7%
s 6686
 
4.5%
d 6446
 
4.3%
Other values (28) 51575
34.7%

occupation_of_the_chief_wage_earner
Categorical

High correlation 

Distinct19
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
Skilled Worker
1239 
Small Businessman / Self employed (Non professional)
483 
Manager / Professional
452 
Clerk / Salesman grades
398 
Unskilled Worker
334 
Other values (14)
1157 

Length

Max length52
Median length43
Mean length24.075068
Min length12

Characters and Unicode

Total characters97,817
Distinct characters52
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st rowSkilled Worker
2nd rowUnskilled Worker
3rd rowMiddle and Senior executive
4th row1-9 Employed
5th rowSkilled Worker

Common Values

ValueCountFrequency (%)
Skilled Worker 1239
30.5%
Small Businessman / Self employed (Non professional) 483
 
11.9%
Manager / Professional 452
 
11.1%
Clerk / Salesman grades 398
 
9.8%
Unskilled Worker 334
 
8.2%
Self employed (Professional) - No employees 262
 
6.4%
Junior executive / Executive 251
 
6.2%
Middle and Senior executive 237
 
5.8%
1-9 Employed 160
 
3.9%
Supervisor grades 153
 
3.8%
Other values (9) 94
 
2.3%

Length

2024-12-06T11:24:23.633597image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1879
13.5%
worker 1589
 
11.4%
skilled 1239
 
8.9%
professional 1197
 
8.6%
employed 921
 
6.6%
self 745
 
5.4%
executive 739
 
5.3%
grades 551
 
4.0%
non 483
 
3.5%
small 483
 
3.5%
Other values (34) 4068
29.3%

Most occurring characters

ValueCountFrequency (%)
e 12577
12.9%
9831
 
10.1%
l 8319
 
8.5%
r 6720
 
6.9%
o 6687
 
6.8%
s 5545
 
5.7%
i 4950
 
5.1%
a 4698
 
4.8%
n 4629
 
4.7%
d 3760
 
3.8%
Other values (42) 30101
30.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 75099
76.8%
Space Separator 9831
 
10.1%
Uppercase Letter 8974
 
9.2%
Other Punctuation 1607
 
1.6%
Close Punctuation 745
 
0.8%
Open Punctuation 745
 
0.8%
Dash Punctuation 435
 
0.4%
Decimal Number 362
 
0.4%
Math Symbol 16
 
< 0.1%
Other Number 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 12577
16.7%
l 8319
11.1%
r 6720
8.9%
o 6687
8.9%
s 5545
7.4%
i 4950
 
6.6%
a 4698
 
6.3%
n 4629
 
6.2%
d 3760
 
5.0%
k 3560
 
4.7%
Other values (14) 13654
18.2%
Uppercase Letter
ValueCountFrequency (%)
S 3255
36.3%
W 1589
17.7%
N 745
 
8.3%
P 714
 
8.0%
M 689
 
7.7%
B 536
 
6.0%
E 427
 
4.8%
C 398
 
4.4%
U 334
 
3.7%
J 251
 
2.8%
Other values (5) 36
 
0.4%
Decimal Number
ValueCountFrequency (%)
1 180
49.7%
9 160
44.2%
0 16
 
4.4%
2 3
 
0.8%
5 3
 
0.8%
Dash Punctuation
ValueCountFrequency (%)
- 432
99.3%
3
 
0.7%
Space Separator
ValueCountFrequency (%)
9831
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 1607
100.0%
Close Punctuation
ValueCountFrequency (%)
) 745
100.0%
Open Punctuation
ValueCountFrequency (%)
( 745
100.0%
Math Symbol
ValueCountFrequency (%)
+ 16
100.0%
Other Number
ValueCountFrequency (%)
½ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 84073
85.9%
Common 13744
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 12577
15.0%
l 8319
 
9.9%
r 6720
 
8.0%
o 6687
 
8.0%
s 5545
 
6.6%
i 4950
 
5.9%
a 4698
 
5.6%
n 4629
 
5.5%
d 3760
 
4.5%
k 3560
 
4.2%
Other values (29) 22628
26.9%
Common
ValueCountFrequency (%)
9831
71.5%
/ 1607
 
11.7%
) 745
 
5.4%
( 745
 
5.4%
- 432
 
3.1%
1 180
 
1.3%
9 160
 
1.2%
0 16
 
0.1%
+ 16
 
0.1%
3
 
< 0.1%
Other values (3) 9
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 97811
> 99.9%
Punctuation 3
 
< 0.1%
None 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 12577
12.9%
9831
 
10.1%
l 8319
 
8.5%
r 6720
 
6.9%
o 6687
 
6.8%
s 5545
 
5.7%
i 4950
 
5.1%
a 4698
 
4.8%
n 4629
 
4.7%
d 3760
 
3.8%
Other values (40) 30095
30.8%
Punctuation
ValueCountFrequency (%)
3
100.0%
None
ValueCountFrequency (%)
½ 3
100.0%

socio_economic_class
Categorical

High correlation 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
SEC C
1485 
SEC B
868 
SEC A
786 
SEC D
669 
SEC E
255 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters20,315
Distinct characters7
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSEC C
2nd rowSEC D
3rd rowSEC A
4th rowSEC A
5th rowSEC D

Common Values

ValueCountFrequency (%)
SEC C 1485
36.5%
SEC B 868
21.4%
SEC A 786
19.3%
SEC D 669
16.5%
SEC E 255
 
6.3%

Length

2024-12-06T11:24:23.733109image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:23.819758image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
sec 4063
50.0%
c 1485
 
18.3%
b 868
 
10.7%
a 786
 
9.7%
d 669
 
8.2%
e 255
 
3.1%

Most occurring characters

ValueCountFrequency (%)
C 5548
27.3%
E 4318
21.3%
S 4063
20.0%
4063
20.0%
B 868
 
4.3%
A 786
 
3.9%
D 669
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 16252
80.0%
Space Separator 4063
 
20.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 5548
34.1%
E 4318
26.6%
S 4063
25.0%
B 868
 
5.3%
A 786
 
4.8%
D 669
 
4.1%
Space Separator
ValueCountFrequency (%)
4063
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16252
80.0%
Common 4063
 
20.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 5548
34.1%
E 4318
26.6%
S 4063
25.0%
B 868
 
5.3%
A 786
 
4.8%
D 669
 
4.1%
Common
ValueCountFrequency (%)
4063
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20315
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 5548
27.3%
E 4318
21.3%
S 4063
20.0%
4063
20.0%
B 868
 
4.3%
A 786
 
3.9%
D 669
 
3.3%

total_monthly_expenditure_of_last_month
Real number (ℝ)

Missing 

Distinct85
Distinct (%)2.2%
Missing135
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean71327.253
Minimum5000
Maximum275000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.9 KiB
2024-12-06T11:24:23.926491image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum5000
5-th percentile20000
Q140000
median60000
Q3100000
95-th percentile150000
Maximum275000
Range270000
Interquartile range (IQR)60000

Descriptive statistics

Standard deviation44311.381
Coefficient of variation (CV)0.62124054
Kurtosis2.7624431
Mean71327.253
Median Absolute Deviation (MAD)20000
Skewness1.5241384
Sum2.8017345 × 108
Variance1.9634985 × 109
MonotonicityNot monotonic
2024-12-06T11:24:24.041862image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50000 535
13.2%
100000 498
12.3%
60000 417
10.3%
40000 294
 
7.2%
30000 249
 
6.1%
80000 223
 
5.5%
70000 218
 
5.4%
150000 191
 
4.7%
75000 158
 
3.9%
35000 127
 
3.1%
Other values (75) 1018
25.1%
(Missing) 135
 
3.3%
ValueCountFrequency (%)
5000 6
 
0.1%
6000 2
 
< 0.1%
7000 1
 
< 0.1%
7500 2
 
< 0.1%
8000 3
 
0.1%
10000 36
0.9%
11000 1
 
< 0.1%
12000 10
 
0.2%
13000 1
 
< 0.1%
15000 64
1.6%
ValueCountFrequency (%)
275000 2
 
< 0.1%
270000 1
 
< 0.1%
250000 31
 
0.8%
230000 3
 
0.1%
225000 2
 
< 0.1%
220000 1
 
< 0.1%
215000 1
 
< 0.1%
200000 113
2.8%
180000 6
 
0.1%
175000 6
 
0.1%

type_of_electricity_meter
Categorical

High correlation 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
Smart meter
2186 
Non smart meter
1877 

Length

Max length15
Median length11
Mean length12.847896
Min length11

Characters and Unicode

Total characters52,201
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNon smart meter
2nd rowNon smart meter
3rd rowSmart meter
4th rowSmart meter
5th rowSmart meter

Common Values

ValueCountFrequency (%)
Smart meter 2186
53.8%
Non smart meter 1877
46.2%

Length

2024-12-06T11:24:24.152031image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:24.239247image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
smart 4063
40.6%
meter 4063
40.6%
non 1877
18.8%

Most occurring characters

ValueCountFrequency (%)
m 8126
15.6%
t 8126
15.6%
r 8126
15.6%
e 8126
15.6%
5940
11.4%
a 4063
7.8%
S 2186
 
4.2%
N 1877
 
3.6%
o 1877
 
3.6%
n 1877
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 42198
80.8%
Space Separator 5940
 
11.4%
Uppercase Letter 4063
 
7.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m 8126
19.3%
t 8126
19.3%
r 8126
19.3%
e 8126
19.3%
a 4063
9.6%
o 1877
 
4.4%
n 1877
 
4.4%
s 1877
 
4.4%
Uppercase Letter
ValueCountFrequency (%)
S 2186
53.8%
N 1877
46.2%
Space Separator
ValueCountFrequency (%)
5940
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 46261
88.6%
Common 5940
 
11.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
m 8126
17.6%
t 8126
17.6%
r 8126
17.6%
e 8126
17.6%
a 4063
8.8%
S 2186
 
4.7%
N 1877
 
4.1%
o 1877
 
4.1%
n 1877
 
4.1%
s 1877
 
4.1%
Common
ValueCountFrequency (%)
5940
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 52201
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
m 8126
15.6%
t 8126
15.6%
r 8126
15.6%
e 8126
15.6%
5940
11.4%
a 4063
7.8%
S 2186
 
4.2%
N 1877
 
3.6%
o 1877
 
3.6%
n 1877
 
3.6%

Interactions

2024-12-06T11:24:17.847178image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:15.525572image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:15.996169image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.405591image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.817483image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.250068image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.928356image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:15.614864image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.073242image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.476721image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.894930image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.489144image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.996871image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:15.693975image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.138686image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.545666image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.960239image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.554072image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:18.068231image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:15.763910image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.209276image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.613343image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.028883image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.622737image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:18.142949image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:15.837963image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.272926image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.678306image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.097843image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.694915image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:18.223108image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:15.916938image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.336936image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:16.747299image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.174006image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:17.770318image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2024-12-06T11:24:24.320684image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
any_constructions_or_renovations_in_the_householdavailability_of_certificate_of_compliancebuilt_year_of_the_housecharged_method_for_rent_for_electricitycharging_method_of_renters_for_electricityelectricity_provider_csc_areafloor_areafloor_which_house_locatedhighest_level_of_education_of_the_chief_wage_earneris_there_business_carried_out_in_the_householdmain_material_used_for_roof_of_the_housemain_material_used_for_walls_of_the_houseno_of_electricity_metersno_of_household_membersno_of_storeysoccupation_of_the_chief_wage_earneroccupy_renters_boardersown_the_house_or_living_on_rentsocio_economic_classtotal_monthly_expenditure_of_last_monthtype_of_businesstype_of_electricity_metertype_of_housewhom_or_how_the_house_was_designed
any_constructions_or_renovations_in_the_household1.0000.0640.1500.0000.1500.0620.0580.0000.0000.0410.0710.0630.0000.0000.0000.0360.0090.0780.0000.0350.0000.0000.0470.101
availability_of_certificate_of_compliance0.0641.0000.2850.1820.0000.2400.2130.2790.2130.0050.1170.2680.0780.0220.2380.1970.0410.2780.2170.1540.0000.1250.2280.435
built_year_of_the_house0.1500.2851.0000.0000.0200.0790.0370.0680.0360.0390.1390.2900.0000.0480.1720.0340.0120.4420.0230.0360.0930.0750.1460.221
charged_method_for_rent_for_electricity0.0000.1820.0001.0000.0000.0760.0320.0000.0000.0000.0000.1290.0000.1150.1070.0000.0000.2670.0300.0000.0000.0470.0000.000
charging_method_of_renters_for_electricity0.1500.0000.0200.0001.0000.1710.0000.0000.1560.0750.0000.1310.0000.1630.5770.0130.3611.0000.1030.0430.0000.0000.2000.000
electricity_provider_csc_area0.0620.2400.0790.0760.1711.0000.1440.0000.1770.0280.1580.1250.0570.0580.1930.1050.0750.1080.1770.0890.1720.5020.1240.181
floor_area0.0580.2130.0370.0320.0000.1441.0000.1140.1400.0000.0270.0970.0870.0260.4030.1230.0000.0300.1760.2780.0300.1550.1410.150
floor_which_house_located0.0000.2790.0680.0000.0000.0000.1141.0000.1540.0000.0000.0000.0140.0390.0150.1371.0000.0000.1160.073NaN0.1180.3730.000
highest_level_of_education_of_the_chief_wage_earner0.0000.2130.0360.0000.1560.1770.1400.1541.0000.0420.0490.1010.0210.0480.1590.3060.0000.0360.6500.1550.1590.1760.1380.156
is_there_business_carried_out_in_the_household0.0410.0050.0390.0000.0750.0280.0000.0000.0421.0000.0000.0000.0000.0000.0000.1740.0000.0000.0520.0001.0000.0080.0000.000
main_material_used_for_roof_of_the_house0.0710.1170.1390.0000.0000.1580.0270.0000.0490.0001.0000.1680.1030.0000.1810.0280.1060.0000.0570.0000.0000.1600.1510.066
main_material_used_for_walls_of_the_house0.0630.2680.2900.1290.1310.1250.0970.0000.1010.0000.1681.0000.0000.0120.2620.0850.0000.2720.1340.0530.0000.0860.1510.222
no_of_electricity_meters0.0000.0780.0000.0000.0000.0570.0870.0140.0210.0000.1030.0001.000-0.0350.1000.0700.0000.0000.0480.0480.0510.1010.1580.063
no_of_household_members0.0000.0220.0480.1150.1630.0580.0260.0390.0480.0000.0000.012-0.0351.0000.0180.0290.0980.0210.0630.2750.0000.0000.0450.029
no_of_storeys0.0000.2380.1720.1070.5770.1930.4030.0150.1590.0000.1810.2620.1000.0181.0000.1680.0000.1140.1570.3881.0000.1380.3520.327
occupation_of_the_chief_wage_earner0.0360.1970.0340.0000.0130.1050.1230.1370.3060.1740.0280.0850.0700.0290.1681.0000.0000.0330.6250.1390.2420.1670.1150.124
occupy_renters_boarders0.0090.0410.0120.0000.3610.0750.0001.0000.0000.0000.1060.0000.0000.0980.0000.0001.0001.0000.0000.0000.0000.0000.0000.000
own_the_house_or_living_on_rent0.0780.2780.4420.2671.0000.1080.0300.0000.0360.0000.0000.2720.0000.0210.1140.0331.0001.0000.0390.0030.0000.0330.0630.261
socio_economic_class0.0000.2170.0230.0300.1030.1770.1760.1160.6500.0520.0570.1340.0480.0630.1570.6250.0000.0391.0000.2180.1280.1920.1840.168
total_monthly_expenditure_of_last_month0.0350.1540.0360.0000.0430.0890.2780.0730.1550.0000.0000.0530.0480.2750.3880.1390.0000.0030.2181.0000.0000.1480.0860.103
type_of_business0.0000.0000.0930.0000.0000.1720.030NaN0.1591.0000.0000.0000.0510.0001.0000.2420.0000.0000.1280.0001.0000.0000.0000.000
type_of_electricity_meter0.0000.1250.0750.0470.0000.5020.1550.1180.1760.0080.1600.0860.1010.0000.1380.1670.0000.0330.1920.1480.0001.0000.2200.101
type_of_house0.0470.2280.1460.0000.2000.1240.1410.3730.1380.0000.1510.1510.1580.0450.3520.1150.0000.0630.1840.0860.0000.2201.0000.211
whom_or_how_the_house_was_designed0.1010.4350.2210.0000.0000.1810.1500.0000.1560.0000.0660.2220.0630.0290.3270.1240.0000.2610.1680.1030.0000.1010.2111.000

Missing values

2024-12-06T11:24:18.361282image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-12-06T11:24:18.657190image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

household_IDno_of_electricity_meterselectricity_provider_csc_areaown_the_house_or_living_on_rentoccupy_renters_boardersawareness_of_electricity_consumption_of_rentersbuilt_year_of_the_housetype_of_housefloor_which_house_locatedno_of_storeysfloor_areano_of_household_memberscharging_method_of_renters_for_electricitycharged_method_for_rent_for_electricityis_there_business_carried_out_in_the_householdtype_of_businesswhom_or_how_the_house_was_designedavailability_of_certificate_of_compliancemain_material_used_for_walls_of_the_housemain_material_used_for_roof_of_the_houseany_constructions_or_renovations_in_the_householdhighest_level_of_education_of_the_chief_wage_earneroccupation_of_the_chief_wage_earnersocio_economic_classtotal_monthly_expenditure_of_last_monthtype_of_electricity_meter
0ID00011GALLEYes, I or a household member owns it.I don't occupy any of the above.NaN2000-2009Single House - Double FloorNaNNaN1500.04NaNNaNNoNaNThe house is designed by a certified architect.NoBrickAsbestosNoO/L or A/L pending / PassedSkilled WorkerSEC C35000.0Non smart meter
1ID00021GALLEYes, I or a household member owns it.I don't occupy any of the above.NaNBefore 1980Single House - Single FloorNaNNaN440.03NaNNaNNoNaNThis is a house provided by the government.YesBrickAsbestosNoDiploma with O/L or A/L (Non graduate)Unskilled WorkerSEC D40000.0Non smart meter
2ID00031GALLEYes, I or a household member owns it.I don't occupy any of the above.NaN1980-1989Single House - Single FloorNaNNaN2500.04NaNNaNNoNaNThe house is designed by a certified architect.NoBrickTileNoGraduate / Post-Grads / Degree level professional qualificationMiddle and Senior executiveSEC A250000.0Smart meter
3ID00041BORALASGAMUWAYes, I or a household member owns it.I don't occupy any of the above.NaN2010-2019Single House - Double FloorNaNNaN2600.04NaNNaNNoNaNThe house is designed by a certified architect.NoCement BlockConcreteNoOther professional certificates with O/L or A/L / Part qualification (Non graduate)1-9 EmployedSEC A100000.0Smart meter
4ID00051KOLONNAWAYes, I or a household member owns it.I don't occupy any of the above.NaN2010-2019Flat10.01.0480.02NaNNaNNoNaNThe house is designed by a certified architect.YesI am not aware of thatConcreteNoSchooling up to Grade 6 - 9Skilled WorkerSEC D60000.0Smart meter
5ID00061KOLONNAWAYes, I or a household member owns it.I don't occupy any of the above.NaN2010-2019Flat1.02.0440.06NaNNaNNoNaNThis is a house provided by the government.YesCement BlockConcreteNoSchooling up to Grade 6 - 9Unskilled WorkerSEC E100000.0Smart meter
6ID00071KOLONNAWAYes, I or a household member owns it.I don't occupy any of the above.NaN2010-2019Flat2.01.0480.04NaNNaNNoNaNThis is a house provided by the government.YesCement BlockConcreteNoO/L or A/L pending / PassedSmall Businessman / Self employed (Non professional)SEC C60000.0Smart meter
7ID00082GALLEYes, I or a household member owns it.I don't occupy any of the above.NaN2010-2019Single House - Double FloorNaNNaN1400.05NaNNaNNoNaNThe house is designed by a certified architect.YesCement BlockAsbestosNoOther professional certificates with O/L or A/L / Part qualification (Non graduate)Clerk / Salesman gradesSEC B150000.0Smart meter
8ID00091GALLEYes, I or a household member owns it.I don't occupy any of the above.NaN2000-2009Single House - Single FloorNaNNaN350.02NaNNaNNoNaNThe house is designed by a certified architect.NoBrickAsbestosNoO/L or A/L pending / PassedSkilled WorkerSEC C15000.0Non smart meter
9ID00101GALLEYes, I or a household member owns it.I don't occupy any of the above.NaN1990-1999Single House - Single FloorNaNNaN1000.07NaNNaNNoNaNThe house is designed by a certified architect.NoBrickTileNoSchooling up to Grade 6 - 9Unskilled WorkerSEC E50000.0Non smart meter
household_IDno_of_electricity_meterselectricity_provider_csc_areaown_the_house_or_living_on_rentoccupy_renters_boardersawareness_of_electricity_consumption_of_rentersbuilt_year_of_the_housetype_of_housefloor_which_house_locatedno_of_storeysfloor_areano_of_household_memberscharging_method_of_renters_for_electricitycharged_method_for_rent_for_electricityis_there_business_carried_out_in_the_householdtype_of_businesswhom_or_how_the_house_was_designedavailability_of_certificate_of_compliancemain_material_used_for_walls_of_the_housemain_material_used_for_roof_of_the_houseany_constructions_or_renovations_in_the_householdhighest_level_of_education_of_the_chief_wage_earneroccupation_of_the_chief_wage_earnersocio_economic_classtotal_monthly_expenditure_of_last_monthtype_of_electricity_meter
4053ID40541NEGOMBONo, I am living on rent and the rent is paid by me or a household member.NaNNaN1980-1989Single House - Single FloorNaNNaN1250.06NaNYou pay the full amount of the electricity bill.NoNaNNaNNaNBrickNaNNoO/L or A/L pending / PassedSkilled WorkerSEC C70000.0Smart meter
4054ID40551NEGOMBOYes, I or a household member owns it.I don't occupy any of the above.NaN2000-2009Single House - Double FloorNaNNaN1500.03NaNNaNNoNaNNaNNaNBrickNaNNoO/L or A/L pending / PassedMiddle and Senior executiveSEC B60000.0Smart meter
4055ID40561GALLEYes, I or a household member owns it.I don't occupy any of the above.NaN2000-2009Single House - Single FloorNaNNaN2400.03NaNNaNNoNaNNaNNaNCement BlockNaNNoSchooling up to Grade 6 - 9Unskilled WorkerSEC ENaNNon smart meter
4056ID40571GALLEYes, I or a household member owns it.I don't occupy any of the above.NaNBefore 1980Single House - Single FloorNaNNaN2400.02NaNNaNNoNaNNaNNaNCabookNaNNoO/L or A/L pending / PassedSkilled WorkerSEC C25000.0Non smart meter
4057ID40581GALLEYes, I or a household member owns it.I don't occupy any of the above.NaN1990-1999Single House - Single FloorNaNNaN3000.06NaNNaNNoNaNNaNNaNCement BlockNaNNoDiploma with O/L or A/L (Non graduate)Skilled WorkerSEC C8000.0Non smart meter
4058ID40591NUGEGODAYes, I or a household member owns it.I don't occupy any of the above.NaNIn 2020 or After 2020Single House - Double FloorNaNNaN400.04NaNNaNNoNaNNaNNaNBrickNaNNoGraduate / Post-Grads / Degree level professional qualificationClerk / Salesman gradesSEC B150000.0Smart meter
4059ID40601HIKKADUWAYes, I or a household member owns it.I don't occupy any of the above.NaNBefore 1980Single House - Single FloorNaNNaN3000.01NaNNaNNoNaNNaNNaNPressed soil blocksNaNNoO/L or A/L pending / PassedBoutique ownerSEC B50000.0Non smart meter
4060ID40611WATTALAYes, I or a household member owns it.I don't occupy any of the above.NaN2000-2009Single House - Single FloorNaNNaN680.02NaNNaNNoNaNNaNNaNCement BlockNaNNoO/L or A/L pending / PassedUnskilled WorkerSEC D20000.0Non smart meter
4061ID40621ALUTHGAMAYes, I or a household member owns it.I don't occupy any of the above.NaN1980-1989Single House - Single FloorNaNNaN700.02NaNNaNNoNaNNaNNaNCabookNaNNoO/L or A/L pending / PassedSelf employed (Professional) - No employeesSEC B30000.0Non smart meter
4062ID40631ALUTHGAMAYes, I or a household member owns it.I don't occupy any of the above.NaN2010-2019Single House - Double FloorNaNNaN2400.05NaNNaNNoNaNNaNNaNCement BlockNaNNoSchooling up to Grade 6 - 9Skilled WorkerSEC D100000.0Non smart meter