Overview

Brought to you by YData

Dataset statistics

Number of variables26
Number of observations4,063
Missing cells27,643
Missing cells (%)26.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory825.4 KiB
Average record size in memory208.0 B

Variable types

Text1
Numeric6
Categorical17
Boolean2

Alerts

awareness_of_electricity_consumption_of_renters has constant value "I know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc." Constant
charging_method_of_renters_for_electricity is highly overall correlated with no_of_storeys and 1 other fieldsHigh correlation
electricity_provider_csc_area is highly overall correlated with type_of_electricity_meterHigh correlation
floor_which_house_located is highly overall correlated with occupy_renters_boardersHigh correlation
highest_level_of_education_of_the_chief_wage_earner is highly overall correlated with socio_economic_classHigh correlation
is_there_business_carried_out_in_the_household is highly overall correlated with type_of_businessHigh correlation
no_of_storeys is highly overall correlated with charging_method_of_renters_for_electricity and 1 other fieldsHigh correlation
occupation_of_the_chief_wage_earner is highly overall correlated with socio_economic_classHigh correlation
occupy_renters_boarders is highly overall correlated with floor_which_house_located and 1 other fieldsHigh correlation
own_the_house_or_living_on_rent is highly overall correlated with charging_method_of_renters_for_electricity and 1 other fieldsHigh correlation
socio_economic_class is highly overall correlated with highest_level_of_education_of_the_chief_wage_earner and 1 other fieldsHigh correlation
type_of_business is highly overall correlated with is_there_business_carried_out_in_the_household and 1 other fieldsHigh correlation
type_of_electricity_meter is highly overall correlated with electricity_provider_csc_areaHigh correlation
own_the_house_or_living_on_rent is highly imbalanced (68.5%) Imbalance
occupy_renters_boarders is highly imbalanced (86.2%) Imbalance
type_of_house is highly imbalanced (59.8%) Imbalance
charged_method_for_rent_for_electricity is highly imbalanced (79.8%) Imbalance
is_there_business_carried_out_in_the_household is highly imbalanced (73.2%) Imbalance
main_material_used_for_roof_of_the_house is highly imbalanced (50.4%) Imbalance
any_constructions_or_renovations_in_the_household is highly imbalanced (71.3%) Imbalance
occupy_renters_boarders has 536 (13.2%) missing values Missing
awareness_of_electricity_consumption_of_renters has 3959 (97.4%) missing values Missing
floor_which_house_located has 3970 (97.7%) missing values Missing
no_of_storeys has 3814 (93.9%) missing values Missing
charging_method_of_renters_for_electricity has 3959 (97.4%) missing values Missing
charged_method_for_rent_for_electricity has 3527 (86.8%) missing values Missing
type_of_business has 3877 (95.4%) missing values Missing
whom_or_how_the_house_was_designed has 1280 (31.5%) missing values Missing
availability_of_certificate_of_compliance has 1280 (31.5%) missing values Missing
main_material_used_for_roof_of_the_house has 1280 (31.5%) missing values Missing
total_monthly_expenditure_of_last_month has 135 (3.3%) missing values Missing
household_ID has unique values Unique

Reproduction

Analysis started2024-12-06 05:54:12.129767
Analysis finished2024-12-06 05:54:18.907451
Duration6.78 seconds
Software versionydata-profiling vv4.11.0
Download configurationconfig.json

Variables

household_ID
Text

Unique 

Distinct4063
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
2024-12-06T11:24:19.109970image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters24,378
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4,063 ?
Unique (%)100.0%

Sample

1st rowID0001
2nd rowID0002
3rd rowID0003
4th rowID0004
5th rowID0005
ValueCountFrequency (%)
id0039 1
 
< 0.1%
id4063 1
 
< 0.1%
id0001 1
 
< 0.1%
id0002 1
 
< 0.1%
id0003 1
 
< 0.1%
id0004 1
 
< 0.1%
id0005 1
 
< 0.1%
id0006 1
 
< 0.1%
id0007 1
 
< 0.1%
id0008 1
 
< 0.1%
Other values (4053) 4053
99.8%
2024-12-06T11:24:19.440328image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
I 4063
16.7%
D 4063
16.7%
0 2277
9.3%
3 2217
9.1%
2 2217
9.1%
1 2217
9.1%
4 1280
 
5.3%
5 1216
 
5.0%
6 1210
 
5.0%
7 1206
 
4.9%
Other values (2) 2412
9.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16252
66.7%
Uppercase Letter 8126
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2277
14.0%
3 2217
13.6%
2 2217
13.6%
1 2217
13.6%
4 1280
7.9%
5 1216
7.5%
6 1210
7.4%
7 1206
7.4%
8 1206
7.4%
9 1206
7.4%
Uppercase Letter
ValueCountFrequency (%)
I 4063
50.0%
D 4063
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 16252
66.7%
Latin 8126
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2277
14.0%
3 2217
13.6%
2 2217
13.6%
1 2217
13.6%
4 1280
7.9%
5 1216
7.5%
6 1210
7.4%
7 1206
7.4%
8 1206
7.4%
9 1206
7.4%
Latin
ValueCountFrequency (%)
I 4063
50.0%
D 4063
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24378
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 4063
16.7%
D 4063
16.7%
0 2277
9.3%
3 2217
9.1%
2 2217
9.1%
1 2217
9.1%
4 1280
 
5.3%
5 1216
 
5.0%
6 1210
 
5.0%
7 1206
 
4.9%
Other values (2) 2412
9.9%

no_of_electricity_meters
Real number (ℝ)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0762983
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.9 KiB
2024-12-06T11:24:19.536459image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum7
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.31628519
Coefficient of variation (CV)0.29386387
Kurtosis51.158272
Mean1.0762983
Median Absolute Deviation (MAD)0
Skewness5.6452849
Sum4373
Variance0.10003632
MonotonicityNot monotonic
2024-12-06T11:24:19.616639image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 3799
93.5%
2 225
 
5.5%
3 36
 
0.9%
5 1
 
< 0.1%
7 1
 
< 0.1%
4 1
 
< 0.1%
ValueCountFrequency (%)
1 3799
93.5%
2 225
 
5.5%
3 36
 
0.9%
4 1
 
< 0.1%
5 1
 
< 0.1%
7 1
 
< 0.1%
ValueCountFrequency (%)
7 1
 
< 0.1%
5 1
 
< 0.1%
4 1
 
< 0.1%
3 36
 
0.9%
2 225
 
5.5%
1 3799
93.5%

electricity_provider_csc_area
Categorical

High correlation 

Distinct23
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
MORATUWA NORTH
533 
MORATUWA SOUTH
370 
PANADURA
357 
GALLE
 
216
KESELWATTA
 
206
Other values (18)
2381 

Length

Max length14
Median length11
Mean length9.6475511
Min length5

Characters and Unicode

Total characters39,198
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGALLE
2nd rowGALLE
3rd rowGALLE
4th rowBORALASGAMUWA
5th rowKOLONNAWA

Common Values

ValueCountFrequency (%)
MORATUWA NORTH 533
 
13.1%
MORATUWA SOUTH 370
 
9.1%
PANADURA 357
 
8.8%
GALLE 216
 
5.3%
KESELWATTA 206
 
5.1%
MAHARAGAMA 202
 
5.0%
PAYAGALA 196
 
4.8%
KALUTARA 189
 
4.7%
HIKKADUWA 163
 
4.0%
ALUTHGAMA 158
 
3.9%
Other values (13) 1473
36.3%

Length

2024-12-06T11:24:19.713075image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
moratuwa 903
18.2%
north 533
 
10.7%
south 370
 
7.5%
panadura 357
 
7.2%
galle 216
 
4.3%
keselwatta 206
 
4.1%
maharagama 202
 
4.1%
payagala 196
 
3.9%
kalutara 189
 
3.8%
hikkaduwa 163
 
3.3%
Other values (14) 1631
32.8%

Most occurring characters

ValueCountFrequency (%)
A 10055
25.7%
T 3417
 
8.7%
O 2914
 
7.4%
U 2568
 
6.6%
R 2454
 
6.3%
M 2125
 
5.4%
L 1849
 
4.7%
W 1788
 
4.6%
N 1693
 
4.3%
H 1565
 
4.0%
Other values (12) 8770
22.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 38065
97.1%
Space Separator 903
 
2.3%
Dash Punctuation 230
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 10055
26.4%
T 3417
 
9.0%
O 2914
 
7.7%
U 2568
 
6.7%
R 2454
 
6.4%
M 2125
 
5.6%
L 1849
 
4.9%
W 1788
 
4.7%
N 1693
 
4.4%
H 1565
 
4.1%
Other values (10) 7637
20.1%
Space Separator
ValueCountFrequency (%)
903
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 230
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 38065
97.1%
Common 1133
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 10055
26.4%
T 3417
 
9.0%
O 2914
 
7.7%
U 2568
 
6.7%
R 2454
 
6.4%
M 2125
 
5.6%
L 1849
 
4.9%
W 1788
 
4.7%
N 1693
 
4.4%
H 1565
 
4.1%
Other values (10) 7637
20.1%
Common
ValueCountFrequency (%)
903
79.7%
- 230
 
20.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 39198
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 10055
25.7%
T 3417
 
8.7%
O 2914
 
7.4%
U 2568
 
6.6%
R 2454
 
6.3%
M 2125
 
5.4%
L 1849
 
4.7%
W 1788
 
4.6%
N 1693
 
4.3%
H 1565
 
4.0%
Other values (12) 8770
22.4%

own_the_house_or_living_on_rent
Categorical

High correlation  Imbalance 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
Yes, I or a household member owns it.
3527 
No, I am living on rent and the rent is paid by me or a household member.
482 
No, I or any household member does not own or rent this household. We occupy this household without any payment of rent.
 
50
No, I am living on rent and the rent is paid by the employer.
 
4

Length

Max length120
Median length37
Mean length42.315777
Min length37

Characters and Unicode

Total characters171,929
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYes, I or a household member owns it.
2nd rowYes, I or a household member owns it.
3rd rowYes, I or a household member owns it.
4th rowYes, I or a household member owns it.
5th rowYes, I or a household member owns it.

Common Values

ValueCountFrequency (%)
Yes, I or a household member owns it. 3527
86.8%
No, I am living on rent and the rent is paid by me or a household member. 482
 
11.9%
No, I or any household member does not own or rent this household. We occupy this household without any payment of rent. 50
 
1.2%
No, I am living on rent and the rent is paid by the employer. 4
 
0.1%

Length

2024-12-06T11:24:19.804388image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:19.904825image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
household 4159
11.1%
or 4109
10.9%
i 4063
10.8%
member 4059
10.8%
a 4009
10.7%
yes 3527
9.4%
owns 3527
9.4%
it 3527
9.4%
rent 1072
 
2.9%
no 536
 
1.4%
Other values (20) 4978
13.3%

Most occurring characters

ValueCountFrequency (%)
33503
19.5%
e 18006
 
10.5%
o 17280
 
10.1%
s 11849
 
6.9%
r 9244
 
5.4%
m 9140
 
5.3%
h 8958
 
5.2%
n 6307
 
3.7%
i 5621
 
3.3%
a 5617
 
3.3%
Other values (18) 46404
27.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 122074
71.0%
Space Separator 33503
 
19.5%
Other Punctuation 8176
 
4.8%
Uppercase Letter 8176
 
4.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 18006
14.8%
o 17280
14.2%
s 11849
9.7%
r 9244
 
7.6%
m 9140
 
7.5%
h 8958
 
7.3%
n 6307
 
5.2%
i 5621
 
4.6%
a 5617
 
4.6%
t 5389
 
4.4%
Other values (11) 24663
20.2%
Uppercase Letter
ValueCountFrequency (%)
I 4063
49.7%
Y 3527
43.1%
N 536
 
6.6%
W 50
 
0.6%
Other Punctuation
ValueCountFrequency (%)
. 4113
50.3%
, 4063
49.7%
Space Separator
ValueCountFrequency (%)
33503
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 130250
75.8%
Common 41679
 
24.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 18006
13.8%
o 17280
13.3%
s 11849
 
9.1%
r 9244
 
7.1%
m 9140
 
7.0%
h 8958
 
6.9%
n 6307
 
4.8%
i 5621
 
4.3%
a 5617
 
4.3%
t 5389
 
4.1%
Other values (15) 32839
25.2%
Common
ValueCountFrequency (%)
33503
80.4%
. 4113
 
9.9%
, 4063
 
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 171929
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
33503
19.5%
e 18006
 
10.5%
o 17280
 
10.1%
s 11849
 
6.9%
r 9244
 
5.4%
m 9140
 
5.3%
h 8958
 
5.2%
n 6307
 
3.7%
i 5621
 
3.3%
a 5617
 
3.3%
Other values (18) 46404
27.0%

occupy_renters_boarders
Categorical

High correlation  Imbalance  Missing 

Distinct3
Distinct (%)0.1%
Missing536
Missing (%)13.2%
Memory size31.9 KiB
I don't occupy any of the above.
3423 
Renters / boarders who are living in your annexe or any other attached place, maintaining separate living conditions but share the same electricity meter.
 
72
Boarders who live in your house using a room/s that is attached to your living conditions.
 
32

Length

Max length154
Median length32
Mean length35.016728
Min length32

Characters and Unicode

Total characters123,504
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowI don't occupy any of the above.
2nd rowI don't occupy any of the above.
3rd rowI don't occupy any of the above.
4th rowI don't occupy any of the above.
5th rowI don't occupy any of the above.

Common Values

ValueCountFrequency (%)
I don't occupy any of the above. 3423
84.2%
Renters / boarders who are living in your annexe or any other attached place, maintaining separate living conditions but share the same electricity meter. 72
 
1.8%
Boarders who live in your house using a room/s that is attached to your living conditions. 32
 
0.8%
(Missing) 536
 
13.2%

Length

2024-12-06T11:24:20.017622image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:20.106330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
any 3495
13.3%
the 3495
13.3%
don't 3423
13.1%
i 3423
13.1%
occupy 3423
13.1%
of 3423
13.1%
above 3423
13.1%
living 176
 
0.7%
your 136
 
0.5%
in 104
 
0.4%
Other values (26) 1680
6.4%

Most occurring characters

ValueCountFrequency (%)
22674
18.4%
o 14516
11.8%
e 8270
 
6.7%
a 7942
 
6.4%
t 7902
 
6.4%
n 7870
 
6.4%
c 7270
 
5.9%
y 7126
 
5.8%
h 3911
 
3.2%
d 3735
 
3.0%
Other values (20) 32288
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90177
73.0%
Space Separator 22674
 
18.4%
Other Punctuation 7126
 
5.8%
Uppercase Letter 3527
 
2.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 14516
16.1%
e 8270
9.2%
a 7942
8.8%
t 7902
8.8%
n 7870
8.7%
c 7270
8.1%
y 7126
 
7.9%
h 3911
 
4.3%
d 3735
 
4.1%
u 3695
 
4.1%
Other values (12) 17940
19.9%
Other Punctuation
ValueCountFrequency (%)
. 3527
49.5%
' 3423
48.0%
/ 104
 
1.5%
, 72
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
I 3423
97.1%
R 72
 
2.0%
B 32
 
0.9%
Space Separator
ValueCountFrequency (%)
22674
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 93704
75.9%
Common 29800
 
24.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 14516
15.5%
e 8270
 
8.8%
a 7942
 
8.5%
t 7902
 
8.4%
n 7870
 
8.4%
c 7270
 
7.8%
y 7126
 
7.6%
h 3911
 
4.2%
d 3735
 
4.0%
u 3695
 
3.9%
Other values (15) 21467
22.9%
Common
ValueCountFrequency (%)
22674
76.1%
. 3527
 
11.8%
' 3423
 
11.5%
/ 104
 
0.3%
, 72
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 123504
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
22674
18.4%
o 14516
11.8%
e 8270
 
6.7%
a 7942
 
6.4%
t 7902
 
6.4%
n 7870
 
6.4%
c 7270
 
5.9%
y 7126
 
5.8%
h 3911
 
3.2%
d 3735
 
3.0%
Other values (20) 32288
26.1%

awareness_of_electricity_consumption_of_renters
Categorical

Constant  Missing 

Distinct1
Distinct (%)1.0%
Missing3959
Missing (%)97.4%
Memory size31.9 KiB
I know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
104 

Length

Max length218
Median length218
Mean length218
Min length218

Characters and Unicode

Total characters22,672
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
2nd rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
3rd rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
4th rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.
5th rowI know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc.

Common Values

ValueCountFrequency (%)
I know all the details about the electricity consumption of the renters/ boarders; i.e.; the appliances they use and the number of hours they use each appliance, the times they keep the lights and fans switched on etc. 104
 
2.6%
(Missing) 3959
97.4%

Length

2024-12-06T11:24:20.203642image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:20.277530image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
the 728
18.4%
they 312
 
7.9%
of 208
 
5.3%
use 208
 
5.3%
and 208
 
5.3%
all 104
 
2.6%
details 104
 
2.6%
electricity 104
 
2.6%
know 104
 
2.6%
about 104
 
2.6%
Other values (17) 1768
44.7%

Most occurring characters

ValueCountFrequency (%)
3848
17.0%
e 2912
12.8%
t 2080
 
9.2%
h 1456
 
6.4%
a 1248
 
5.5%
s 1248
 
5.5%
n 1144
 
5.0%
i 1040
 
4.6%
o 936
 
4.1%
c 832
 
3.7%
Other values (17) 5928
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17992
79.4%
Space Separator 3848
 
17.0%
Other Punctuation 728
 
3.2%
Uppercase Letter 104
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2912
16.2%
t 2080
11.6%
h 1456
 
8.1%
a 1248
 
6.9%
s 1248
 
6.9%
n 1144
 
6.4%
i 1040
 
5.8%
o 936
 
5.2%
c 832
 
4.6%
l 728
 
4.0%
Other values (11) 4368
24.3%
Other Punctuation
ValueCountFrequency (%)
. 312
42.9%
; 208
28.6%
/ 104
 
14.3%
, 104
 
14.3%
Space Separator
ValueCountFrequency (%)
3848
100.0%
Uppercase Letter
ValueCountFrequency (%)
I 104
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 18096
79.8%
Common 4576
 
20.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2912
16.1%
t 2080
11.5%
h 1456
 
8.0%
a 1248
 
6.9%
s 1248
 
6.9%
n 1144
 
6.3%
i 1040
 
5.7%
o 936
 
5.2%
c 832
 
4.6%
l 728
 
4.0%
Other values (12) 4472
24.7%
Common
ValueCountFrequency (%)
3848
84.1%
. 312
 
6.8%
; 208
 
4.5%
/ 104
 
2.3%
, 104
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 22672
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3848
17.0%
e 2912
12.8%
t 2080
 
9.2%
h 1456
 
6.4%
a 1248
 
5.5%
s 1248
 
5.5%
n 1144
 
5.0%
i 1040
 
4.6%
o 936
 
4.1%
c 832
 
3.7%
Other values (17) 5928
26.1%
Distinct7
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size31.9 KiB
2000-2009
918 
2010-2019
758 
Before 1980
740 
1990-1999
615 
1980-1989
482 
Other values (2)
550 

Length

Max length21
Median length9
Mean length10.108787
Min length9

Characters and Unicode

Total characters41,072
Distinct characters20
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2000-2009
2nd rowBefore 1980
3rd row1980-1989
4th row2010-2019
5th row2010-2019

Common Values

ValueCountFrequency (%)
2000-2009 918
22.6%
2010-2019 758
18.7%
Before 1980 740
18.2%
1990-1999 615
15.1%
1980-1989 482
11.9%
Don't know 325
 
8.0%
In 2020 or After 2020 225
 
5.5%

Length

2024-12-06T11:24:20.365330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:20.461581image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/