Overview

Brought to you by YData

Dataset statistics

Number of variables16
Number of observations16,270
Missing cells45,187
Missing cells (%)17.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.0 MiB
Average record size in memory128.0 B

Variable types

Text1
Categorical12
Numeric2
Boolean1

Alerts

employment_status_of_the_main_occupation is highly overall correlated with main_activity_engaged_inHigh correlation
ethnicity is highly overall correlated with religionHigh correlation
gender is highly overall correlated with relationship_to_the_head_of_householdHigh correlation
main_activity_engaged_in is highly overall correlated with employment_status_of_the_main_occupationHigh correlation
relationship_to_the_head_of_household is highly overall correlated with genderHigh correlation
religion is highly overall correlated with ethnicityHigh correlation
ethnicity is highly imbalanced (69.9%) Imbalance
current_attendance_in_any_education_instituition is highly imbalanced (56.1%) Imbalance
current_attendance_in_any_education_instituition has 418 (2.6%) missing values Missing
highest_level_of_education has 775 (4.8%) missing values Missing
main_activity_engaged_in has 2133 (13.1%) missing values Missing
main_occupation has 9902 (60.9%) missing values Missing
daily_wage_owner_or_not has 10090 (62.0%) missing values Missing
employment_status_of_the_main_occupation has 9902 (60.9%) missing values Missing
member_went_out_for_work_or_not_during_last_week has 11967 (73.6%) missing values Missing
no_of_hours_stayed_at_home_during_last_week has 699 (4.3%) zeros Zeros

Reproduction

Analysis started2024-12-06 05:54:26.223750
Analysis finished2024-12-06 05:54:28.327003
Duration2.1 seconds
Software versionydata-profiling vv4.11.0
Download configurationconfig.json

Variables

Distinct4063
Distinct (%)25.0%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
2024-12-06T11:24:28.734983image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters97,620
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique186 ?
Unique (%)1.1%

Sample

1st rowID0001
2nd rowID0001
3rd rowID0001
4th rowID0001
5th rowID0002
ValueCountFrequency (%)
id0349 13
 
0.1%
id3438 13
 
0.1%
id0849 12
 
0.1%
id1781 12
 
0.1%
id2880 12
 
0.1%
id0939 12
 
0.1%
id3013 11
 
0.1%
id0699 11
 
0.1%
id2896 11
 
0.1%
id2341 11
 
0.1%
Other values (4053) 16152
99.3%
2024-12-06T11:24:29.063692image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
I 16270
16.7%
D 16270
16.7%
0 9274
9.5%
1 8926
9.1%
2 8804
9.0%
3 8734
8.9%
4 5068
 
5.2%
8 5009
 
5.1%
6 4859
 
5.0%
5 4827
 
4.9%
Other values (2) 9579
9.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 65080
66.7%
Uppercase Letter 32540
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 9274
14.3%
1 8926
13.7%
2 8804
13.5%
3 8734
13.4%
4 5068
7.8%
8 5009
7.7%
6 4859
7.5%
5 4827
7.4%
7 4802
7.4%
9 4777
7.3%
Uppercase Letter
ValueCountFrequency (%)
I 16270
50.0%
D 16270
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 65080
66.7%
Latin 32540
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 9274
14.3%
1 8926
13.7%
2 8804
13.5%
3 8734
13.4%
4 5068
7.8%
8 5009
7.7%
6 4859
7.5%
5 4827
7.4%
7 4802
7.4%
9 4777
7.3%
Latin
ValueCountFrequency (%)
I 16270
50.0%
D 16270
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 97620
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 16270
16.7%
D 16270
16.7%
0 9274
9.5%
1 8926
9.1%
2 8804
9.0%
3 8734
8.9%
4 5068
 
5.2%
8 5009
 
5.1%
6 4859
 
5.0%
5 4827
 
4.9%
Other values (2) 9579
9.8%

member_ID
Categorical

Distinct13
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
I1
4063 
I2
3877 
I3
3275 
I4
2457 
I5
1443 
Other values (8)
1155 

Length

Max length3
Median length2
Mean length2.0027658
Min length2

Characters and Unicode

Total characters32,585
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowI1
2nd rowI2
3rd rowI3
4th rowI4
5th rowI1

Common Values

ValueCountFrequency (%)
I1 4063
25.0%
I2 3877
23.8%
I3 3275
20.1%
I4 2457
15.1%
I5 1443
 
8.9%
I6 671
 
4.1%
I7 264
 
1.6%
I8 120
 
0.7%
I9 55
 
0.3%
I10 26
 
0.2%
Other values (3) 19
 
0.1%

Length

2024-12-06T11:24:29.176647image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
i1 4063
25.0%
i2 3877
23.8%
i3 3275
20.1%
i4 2457
15.1%
i5 1443
 
8.9%
i6 671
 
4.1%
i7 264
 
1.6%
i8 120
 
0.7%
i9 55
 
0.3%
i10 26
 
0.2%
Other values (3) 19
 
0.1%

Most occurring characters

ValueCountFrequency (%)
I 16270
49.9%
1 4119
 
12.6%
2 3883
 
11.9%
3 3277
 
10.1%
4 2457
 
7.5%
5 1443
 
4.4%
6 671
 
2.1%
7 264
 
0.8%
8 120
 
0.4%
9 55
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16315
50.1%
Uppercase Letter 16270
49.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 4119
25.2%
2 3883
23.8%
3 3277
20.1%
4 2457
15.1%
5 1443
 
8.8%
6 671
 
4.1%
7 264
 
1.6%
8 120
 
0.7%
9 55
 
0.3%
0 26
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
I 16270
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 16315
50.1%
Latin 16270
49.9%

Most frequent character per script

Common
ValueCountFrequency (%)
1 4119
25.2%
2 3883
23.8%
3 3277
20.1%
4 2457
15.1%
5 1443
 
8.8%
6 671
 
4.1%
7 264
 
1.6%
8 120
 
0.7%
9 55
 
0.3%
0 26
 
0.2%
Latin
ValueCountFrequency (%)
I 16270
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32585
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 16270
49.9%
1 4119
 
12.6%
2 3883
 
11.9%
3 3277
 
10.1%
4 2457
 
7.5%
5 1443
 
4.4%
6 671
 
2.1%
7 264
 
0.8%
8 120
 
0.4%
9 55
 
0.2%

age
Real number (ℝ)

Distinct97
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.39287
Minimum0
Maximum98
Zeros149
Zeros (%)0.9%
Negative0
Negative (%)0.0%
Memory size127.2 KiB
2024-12-06T11:24:29.281207image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q119
median38
Q356
95-th percentile75
Maximum98
Range98
Interquartile range (IQR)37

Descriptive statistics

Standard deviation22.075172
Coefficient of variation (CV)0.57498103
Kurtosis-0.99725102
Mean38.39287
Median Absolute Deviation (MAD)18
Skewness0.14195376
Sum624652
Variance487.31322
MonotonicityNot monotonic
2024-12-06T11:24:29.396076image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19 282
 
1.7%
17 278
 
1.7%
23 276
 
1.7%
15 267
 
1.6%
18 266
 
1.6%
20 263
 
1.6%
16 256
 
1.6%
42 252
 
1.5%
22 251
 
1.5%
45 245
 
1.5%
Other values (87) 13634
83.8%
ValueCountFrequency (%)
0 149
0.9%
1 131
0.8%
2 138
0.8%
3 186
1.1%
4 171
1.1%
5 177
1.1%
6 163
1.0%
7 164
1.0%
8 196
1.2%
9 198
1.2%
ValueCountFrequency (%)
98 1
 
< 0.1%
96 1
 
< 0.1%
95 3
 
< 0.1%
93 8
 
< 0.1%
92 5
 
< 0.1%
91 7
 
< 0.1%
90 17
0.1%
89 20
0.1%
88 16
0.1%
87 14
0.1%

relationship_to_the_head_of_household
Categorical

High correlation 

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
Son/daughter
5654 
Head of the household
4012 
Wife/Husband
3226 
Parents of the head of the Household/ spouse
1198 
Other relative
685 
Other values (7)
1495 

Length

Max length44
Median length12
Mean length17.506884
Min length5

Characters and Unicode

Total characters284,837
Distinct characters34
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHead of the household
2nd rowWife/Husband
3rd rowSon/daughter
4th rowSon-in-law/Daughter in law
5th rowHead of the household

Common Values

ValueCountFrequency (%)
Son/daughter 5654
34.8%
Head of the household 4012
24.7%
Wife/Husband 3226
19.8%
Parents of the head of the Household/ spouse 1198
 
7.4%
Other relative 685
 
4.2%
Grandson/ Granddaughter 666
 
4.1%
Son-in-law/Daughter in law 434
 
2.7%
Boarder 237
 
1.5%
Domestic servant/driver/watcher 101
 
0.6%
Other 52
 
0.3%
Other values (2) 5
 
< 0.1%

Length

2024-12-06T11:24:29.509304image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
of 6408
16.4%
the 6408
16.4%
son/daughter 5654
14.5%
head 5210
13.4%
household 5210
13.4%
wife/husband 3226
8.3%
parents 1198
 
3.1%
spouse 1198
 
3.1%
other 737
 
1.9%
relative 685
 
1.8%
Other values (13) 3086
7.9%

Most occurring characters

ValueCountFrequency (%)
e 31961
 
11.2%
o 25125
 
8.8%
h 24420
 
8.6%
22750
 
8.0%
d 21639
 
7.6%
a 19715
 
6.9%
u 16391
 
5.8%
t 16090
 
5.6%
n 13486
 
4.7%
s 12904
 
4.5%
Other values (24) 80356
28.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 228043
80.1%
Space Separator 22750
 
8.0%
Uppercase Letter 21794
 
7.7%
Other Punctuation 11382
 
4.0%
Dash Punctuation 868
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 31961
14.0%
o 25125
11.0%
h 24420
10.7%
d 21639
9.5%
a 19715
8.6%
u 16391
7.2%
t 16090
7.1%
n 13486
 
5.9%
s 12904
 
5.7%
r 11587
 
5.1%
Other values (11) 34725
15.2%
Uppercase Letter
ValueCountFrequency (%)
H 8436
38.7%
S 6088
27.9%
W 3226
 
14.8%
G 1332
 
6.1%
P 1198
 
5.5%
O 737
 
3.4%
D 537
 
2.5%
B 237
 
1.1%
R 3
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 11380
> 99.9%
' 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
22750
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 868
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 249837
87.7%
Common 35000
 
12.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 31961
12.8%
o 25125
10.1%
h 24420
9.8%
d 21639
 
8.7%
a 19715
 
7.9%
u 16391
 
6.6%
t 16090
 
6.4%
n 13486
 
5.4%
s 12904
 
5.2%
r 11587
 
4.6%
Other values (20) 56519
22.6%
Common
ValueCountFrequency (%)
22750
65.0%
/ 11380
32.5%
- 868
 
2.5%
' 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 284837
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 31961
 
11.2%
o 25125
 
8.8%
h 24420
 
8.6%
22750
 
8.0%
d 21639
 
7.6%
a 19715
 
6.9%
u 16391
 
5.8%
t 16090
 
5.6%
n 13486
 
4.7%
s 12904
 
4.5%
Other values (24) 80356
28.2%

gender
Categorical

High correlation 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
Female
8386 
Male
7884 

Length

Max length6
Median length6
Mean length5.0308543
Min length4

Characters and Unicode

Total characters81,852
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowFemale
3rd rowMale
4th rowFemale
5th rowMale

Common Values

ValueCountFrequency (%)
Female 8386
51.5%
Male 7884
48.5%

Length

2024-12-06T11:24:29.616487image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:29.703871image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
female 8386
51.5%
male 7884
48.5%

Most occurring characters

ValueCountFrequency (%)
e 24656
30.1%
a 16270
19.9%
l 16270
19.9%
F 8386
 
10.2%
m 8386
 
10.2%
M 7884
 
9.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 65582
80.1%
Uppercase Letter 16270
 
19.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 24656
37.6%
a 16270
24.8%
l 16270
24.8%
m 8386
 
12.8%
Uppercase Letter
ValueCountFrequency (%)
F 8386
51.5%
M 7884
48.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 81852
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 24656
30.1%
a 16270
19.9%
l 16270
19.9%
F 8386
 
10.2%
m 8386
 
10.2%
M 7884
 
9.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 81852
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 24656
30.1%
a 16270
19.9%
l 16270
19.9%
F 8386
 
10.2%
m 8386
 
10.2%
M 7884
 
9.6%

ethnicity
Categorical

High correlation  Imbalance 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
Sinhala
13560 
Sri Lankan Moor/Muslim
1968 
Sri Lankan Tamil
 
572
Indian Tamil
 
82
Malay
 
42
Other values (2)
 
46

Length

Max length22
Median length7
Mean length9.1491088
Min length5

Characters and Unicode

Total characters148,856
Distinct characters25
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSinhala
2nd rowSinhala
3rd rowSinhala
4th rowSinhala
5th rowSinhala

Common Values

ValueCountFrequency (%)
Sinhala 13560
83.3%
Sri Lankan Moor/Muslim 1968
 
12.1%
Sri Lankan Tamil 572
 
3.5%
Indian Tamil 82
 
0.5%
Malay 42
 
0.3%
Burgher 32
 
0.2%
Other 14
 
0.1%

Length

2024-12-06T11:24:29.801818image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:29.904960image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
sinhala 13560
63.3%
sri 2540
 
11.9%
lankan 2540
 
11.9%
moor/muslim 1968
 
9.2%
tamil 654
 
3.1%
indian 82
 
0.4%
malay 42
 
0.2%
burgher 32
 
0.1%
other 14
 
0.1%

Most occurring characters

ValueCountFrequency (%)
a 33020
22.2%
n 18804
12.6%
i 18804
12.6%
l 16224
10.9%
S 16100
10.8%
h 13606
9.1%
5162
 
3.5%
r 4586
 
3.1%
M 3978
 
2.7%
o 3936
 
2.6%
Other values (15) 14636
9.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 118326
79.5%
Uppercase Letter 23400
 
15.7%
Space Separator 5162
 
3.5%
Other Punctuation 1968
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 33020
27.9%
n 18804
15.9%
i 18804
15.9%
l 16224
13.7%
h 13606
11.5%
r 4586
 
3.9%
o 3936
 
3.3%
m 2622
 
2.2%
k 2540
 
2.1%
u 2000
 
1.7%
Other values (6) 2184
 
1.8%
Uppercase Letter
ValueCountFrequency (%)
S 16100
68.8%
M 3978
 
17.0%
L 2540
 
10.9%
T 654
 
2.8%
I 82
 
0.4%
B 32
 
0.1%
O 14
 
0.1%
Space Separator
ValueCountFrequency (%)
5162
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 1968
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 141726
95.2%
Common 7130
 
4.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 33020
23.3%
n 18804
13.3%
i 18804
13.3%
l 16224
11.4%
S 16100
11.4%
h 13606
9.6%
r 4586
 
3.2%
M 3978
 
2.8%
o 3936
 
2.8%
m 2622
 
1.9%
Other values (13) 10046
 
7.1%
Common
ValueCountFrequency (%)
5162
72.4%
/ 1968
 
27.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 148856
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 33020
22.2%
n 18804
12.6%
i 18804
12.6%
l 16224
10.9%
S 16100
10.8%
h 13606
9.1%
5162
 
3.5%
r 4586
 
3.1%
M 3978
 
2.7%
o 3936
 
2.6%
Other values (15) 14636
9.8%

religion
Categorical

High correlation 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
Buddhism
10807 
Roman Catholicism
2442 
Islam
2053 
Other Christian denominations
 
547
Hinduism
 
407
Other values (2)
 
14

Length

Max length29
Median length8
Mean length9.6779348
Min length5

Characters and Unicode

Total characters157,460
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBuddhism
2nd rowBuddhism
3rd rowBuddhism
4th rowBuddhism
5th rowBuddhism

Common Values

ValueCountFrequency (%)
Buddhism 10807
66.4%
Roman Catholicism 2442
 
15.0%
Islam 2053
 
12.6%
Other Christian denominations 547
 
3.4%
Hinduism 407
 
2.5%
Other 8
 
< 0.1%
No religion 6
 
< 0.1%

Length

2024-12-06T11:24:30.015864image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:30.113732image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
buddhism 10807
54.5%
roman 2442
 
12.3%
catholicism 2442
 
12.3%
islam 2053
 
10.4%
other 555
 
2.8%
christian 547
 
2.8%
denominations 547
 
2.8%
hinduism 407
 
2.1%
no 6
 
< 0.1%
religion 6
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
d 22568
14.3%
i 18705
11.9%
m 18698
11.9%
s 16803
10.7%
h 14351
9.1%
u 11214
7.1%
B 10807
6.9%
a 8031
 
5.1%
o 5990
 
3.8%
n 5043
 
3.2%
Other values (13) 25250
16.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 134659
85.5%
Uppercase Letter 19259
 
12.2%
Space Separator 3542
 
2.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
d 22568
16.8%
i 18705
13.9%
m 18698
13.9%
s 16803
12.5%
h 14351
10.7%
u 11214
8.3%
a 8031
 
6.0%
o 5990
 
4.4%
n 5043
 
3.7%
l 4501
 
3.3%
Other values (5) 8755
 
6.5%
Uppercase Letter
ValueCountFrequency (%)
B 10807
56.1%
C 2989
 
15.5%
R 2442
 
12.7%
I 2053
 
10.7%
O 555
 
2.9%
H 407
 
2.1%
N 6
 
< 0.1%
Space Separator
ValueCountFrequency (%)
3542
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 153918
97.8%
Common 3542
 
2.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
d 22568
14.7%
i 18705
12.2%
m 18698
12.1%
s 16803
10.9%
h 14351
9.3%
u 11214
7.3%
B 10807
7.0%
a 8031
 
5.2%
o 5990
 
3.9%
n 5043
 
3.3%
Other values (12) 21708
14.1%
Common
ValueCountFrequency (%)
3542
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 157460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
d 22568
14.3%
i 18705
11.9%
m 18698
11.9%
s 16803
10.7%
h 14351
9.1%
u 11214
7.1%
B 10807
6.9%
a 8031
 
5.1%
o 5990
 
3.8%
n 5043
 
3.2%
Other values (13) 25250
16.0%

marital_status
Categorical

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
Currently married (registered)
7417 
Never married
6330 
Currently married (customary)
1311 
Widowed
893 
Other
 
138
Other values (4)
 
181

Length

Max length33
Median length30
Mean length21.736755
Min length5

Characters and Unicode

Total characters353,657
Distinct characters31
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCurrently married (registered)
2nd rowCurrently married (registered)
3rd rowCurrently married (registered)
4th rowCurrently married (registered)
5th rowCurrently married (registered)

Common Values

ValueCountFrequency (%)
Currently married (registered) 7417
45.6%
Never married 6330
38.9%
Currently married (customary) 1311
 
8.1%
Widowed 893
 
5.5%
Other 138
 
0.8%
Separated (not legally) 64
 
0.4%
Not married but lives as a Family 52
 
0.3%
Divorced 44
 
0.3%
Legally separated 21
 
0.1%

Length

2024-12-06T11:24:30.227608image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:30.332100image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
married 15110
37.3%
currently 8728
21.5%
registered 7417
18.3%
never 6330
15.6%
customary 1311
 
3.2%
widowed 893
 
2.2%
other 138
 
0.3%
not 116
 
0.3%
separated 85
 
0.2%
legally 85
 
0.2%
Other values (6) 304
 
0.8%

Most occurring characters

ValueCountFrequency (%)
r 70418
19.9%
e 60131
17.0%
d 24442
 
6.9%
24247
 
6.9%
i 23568
 
6.7%
t 17847
 
5.0%
a 16832
 
4.8%
m 16473
 
4.7%
y 10176
 
2.9%
u 10091
 
2.9%
Other values (21) 79432
22.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 295504
83.6%
Space Separator 24247
 
6.9%
Uppercase Letter 16322
 
4.6%
Open Punctuation 8792
 
2.5%
Close Punctuation 8792
 
2.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 70418
23.8%
e 60131
20.3%
d 24442
 
8.3%
i 23568
 
8.0%
t 17847
 
6.0%
a 16832
 
5.7%
m 16473
 
5.6%
y 10176
 
3.4%
u 10091
 
3.4%
l 9066
 
3.1%
Other values (10) 36460
12.3%
Uppercase Letter
ValueCountFrequency (%)
C 8728
53.5%
N 6382
39.1%
W 893
 
5.5%
O 138
 
0.8%
S 64
 
0.4%
F 52
 
0.3%
D 44
 
0.3%
L 21
 
0.1%
Space Separator
ValueCountFrequency (%)
24247
100.0%
Open Punctuation
ValueCountFrequency (%)
( 8792
100.0%
Close Punctuation
ValueCountFrequency (%)
) 8792
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 311826
88.2%
Common 41831
 
11.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 70418
22.6%
e 60131
19.3%
d 24442
 
7.8%
i 23568
 
7.6%
t 17847
 
5.7%
a 16832
 
5.4%
m 16473
 
5.3%
y 10176
 
3.3%
u 10091
 
3.2%
l 9066
 
2.9%
Other values (18) 52782
16.9%
Common
ValueCountFrequency (%)
24247
58.0%
( 8792
 
21.0%
) 8792
 
21.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 353657
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 70418
19.9%
e 60131
17.0%
d 24442
 
6.9%
24247
 
6.9%
i 23568
 
6.7%
t 17847
 
5.0%
a 16832
 
4.8%
m 16473
 
4.7%
y 10176
 
2.9%
u 10091
 
2.9%
Other values (21) 79432
22.5%

current_attendance_in_any_education_instituition
Categorical

Imbalance  Missing 

Distinct8
Distinct (%)0.1%
Missing418
Missing (%)2.6%
Memory size127.2 KiB
Does not attend
11435 
School
2964 
University
 
560
Preschool
 
298
Other educational institution
 
233
Other values (3)
 
362

Length

Max length34
Median length15
Mean length13.564219
Min length6

Characters and Unicode

Total characters215,020
Distinct characters34
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDoes not attend
2nd rowDoes not attend
3rd rowDoes not attend
4th rowDoes not attend
5th rowDoes not attend

Common Values

ValueCountFrequency (%)
Does not attend 11435
70.3%
School 2964
 
18.2%
University 560
 
3.4%
Preschool 298
 
1.8%
Other educational institution 233
 
1.4%
Vocational/Technical Institution 191
 
1.2%
Pending results G.C.E. (O.L / A.L) 105
 
0.6%
Still a toddler 66
 
0.4%
(Missing) 418
 
2.6%

Length

2024-12-06T11:24:30.457549image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:30.558607image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
does 11435
28.6%
not 11435
28.6%
attend 11435
28.6%
school 2964
 
7.4%
university 560
 
1.4%
institution 424
 
1.1%
preschool 298
 
0.7%
other 233
 
0.6%
educational 233
 
0.6%
vocational/technical 191
 
0.5%
Other values (9) 828
 
2.1%

Most occurring characters

ValueCountFrequency (%)
t 37031
17.2%
o 30499
14.2%
n 25103
11.7%
e 24661
11.5%
24184
11.2%
s 12927
 
6.0%
a 12540
 
5.8%
d 11905
 
5.5%
D 11435
 
5.3%
l 4180
 
1.9%
Other values (24) 20555
9.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 172836
80.4%
Space Separator 24184
 
11.2%
Uppercase Letter 16969
 
7.9%
Other Punctuation 821
 
0.4%
Open Punctuation 105
 
< 0.1%
Close Punctuation 105
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 37031
21.4%
o 30499
17.6%
n 25103
14.5%
e 24661
14.3%
s 12927
 
7.5%
a 12540
 
7.3%
d 11905
 
6.9%
l 4180
 
2.4%
c 4068
 
2.4%
h 3686
 
2.1%
Other values (6) 6236
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
D 11435
67.4%
S 3030
 
17.9%
U 560
 
3.3%
P 403
 
2.4%
O 338
 
2.0%
L 210
 
1.2%
V 191
 
1.1%
T 191
 
1.1%
I 191
 
1.1%
C 105
 
0.6%
Other values (3) 315
 
1.9%
Other Punctuation
ValueCountFrequency (%)
. 525
63.9%
/ 296
36.1%
Space Separator
ValueCountFrequency (%)
24184
100.0%
Open Punctuation
ValueCountFrequency (%)
( 105
100.0%
Close Punctuation
ValueCountFrequency (%)
) 105
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 189805
88.3%
Common 25215
 
11.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 37031
19.5%
o 30499
16.1%
n 25103
13.2%
e 24661
13.0%
s 12927
 
6.8%
a 12540
 
6.6%
d 11905
 
6.3%
D 11435
 
6.0%
l 4180
 
2.2%
c 4068
 
2.1%
Other values (19) 15456
8.1%
Common
ValueCountFrequency (%)
24184
95.9%
. 525
 
2.1%
/ 296
 
1.2%
( 105
 
0.4%
) 105
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 215020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 37031
17.2%
o 30499
14.2%
n 25103
11.7%
e 24661
11.5%
24184
11.2%
s 12927
 
6.0%
a 12540
 
5.8%
d 11905
 
5.5%
D 11435
 
5.3%
l 4180
 
1.9%
Other values (24) 20555
9.6%

highest_level_of_education
Categorical

Missing 

Distinct20
Distinct (%)0.1%
Missing775
Missing (%)4.8%
Memory size127.2 KiB
Passed G.C.E.(A/L) or equivalent
2989 
Passed G.C.E.(O/L)
2697 
Passed Grade 10
2484 
Passed Degree / Diploma
1333 
Passed Grade 12
1219 
Other values (15)
4773 

Length

Max length104
Median length35
Mean length20.53314
Min length3

Characters and Unicode

Total characters318,161
Distinct characters47
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPassed G.C.E.(A/L) or equivalent
2nd rowPassed G.C.E.(A/L) or equivalent
3rd rowPassed post Graduate Degree / Diploma
4th rowPassed post Graduate Degree / Diploma
5th rowPassed Grade 6

Common Values

ValueCountFrequency (%)
Passed G.C.E.(A/L) or equivalent 2989
18.4%
Passed G.C.E.(O/L) 2697
16.6%
Passed Grade 10 2484
15.3%
Passed Degree / Diploma 1333
8.2%
Passed Grade 12 1219
7.5%
Passed Grade 8 772
 
4.7%
Passed Grade 9 638
 
3.9%
Passed Grade 5 567
 
3.5%
Passed Grade 7 509
 
3.1%
Passed Grade 6 407
 
2.5%
Other values (10) 1880
11.6%
(Missing) 775
 
4.8%

Length

2024-12-06T11:24:30.674512image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
passed 15010
30.1%
grade 7737
15.5%
g.c.e.(a/l 2989
 
6.0%
or 2989
 
6.0%
equivalent 2989
 
6.0%
g.c.e.(o/l 2697
 
5.4%
10 2484
 
5.0%
1610
 
3.2%
degree 1568
 
3.1%
diploma 1568
 
3.1%
Other values (28) 8170
16.4%

Most occurring characters

ValueCountFrequency (%)
e 35229
 
11.1%
34316
 
10.8%
s 30417
 
9.6%
a 29400
 
9.2%
d 23453
 
7.4%
. 17058
 
5.4%
P 15094
 
4.7%
G 14306
 
4.5%
r 13261
 
4.2%
/ 7601
 
2.4%
Other values (37) 98026
30.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 177616
55.8%
Uppercase Letter 57786
 
18.2%
Space Separator 34316
 
10.8%
Other Punctuation 24659
 
7.8%
Decimal Number 11440
 
3.6%
Close Punctuation 6172
 
1.9%
Open Punctuation 6172
 
1.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 35229
19.8%
s 30417
17.1%
a 29400
16.6%
d 23453
13.2%
r 13261
 
7.5%
i 6643
 
3.7%
o 5806
 
3.3%
l 5709
 
3.2%
n 5136
 
2.9%
t 4477
 
2.5%
Other values (11) 18085
10.2%
Uppercase Letter
ValueCountFrequency (%)
P 15094
26.1%
G 14306
24.8%
E 6052
10.5%
C 5686
 
9.8%
L 5686
 
9.8%
A 3313
 
5.7%
D 3220
 
5.6%
O 2697
 
4.7%
S 868
 
1.5%
Q 648
 
1.1%
Decimal Number
ValueCountFrequency (%)
1 4007
35.0%
0 2484
21.7%
2 1452
 
12.7%
8 772
 
6.7%
9 638
 
5.6%
5 567
 
5.0%
7 509
 
4.4%
6 407
 
3.6%
4 327
 
2.9%
3 277
 
2.4%
Other Punctuation
ValueCountFrequency (%)
. 17058
69.2%
/ 7601
30.8%
Space Separator
ValueCountFrequency (%)
34316
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6172
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6172
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 235402
74.0%
Common 82759
 
26.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 35229
15.0%
s 30417
12.9%
a 29400
12.5%
d 23453
10.0%
P 15094
 
6.4%
G 14306
 
6.1%
r 13261
 
5.6%
i 6643
 
2.8%
E 6052
 
2.6%
o 5806
 
2.5%
Other values (22) 55741
23.7%
Common
ValueCountFrequency (%)
34316
41.5%
. 17058
20.6%
/ 7601
 
9.2%
) 6172
 
7.5%
( 6172
 
7.5%
1 4007
 
4.8%
0 2484
 
3.0%
2 1452
 
1.8%
8 772
 
0.9%
9 638
 
0.8%
Other values (5) 2087
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 318161
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 35229
 
11.1%
34316
 
10.8%
s 30417
 
9.6%
a 29400
 
9.2%
d 23453
 
7.4%
. 17058
 
5.4%
P 15094
 
4.7%
G 14306
 
4.5%
r 13261
 
4.2%
/ 7601
 
2.4%
Other values (37) 98026
30.8%

main_activity_engaged_in
Categorical

High correlation  Missing 

Distinct10
Distinct (%)0.1%
Missing2133
Missing (%)13.1%
Memory size127.2 KiB
Engaged in economic activity/ currently employed/ engaged in own business
5473 
Household activities
3465 
Student
2513 
Too old / Disable/ unable to work
859 
Retired and obtaining government/semi-government pension payment and is currently engaged in economic activity (employed elsewhere other than the place where he/she is receiving the pension from / engaged in own business)
707 
Other values (5)
1120 

Length

Max length221
Median length169
Mean length54.345052
Min length5

Characters and Unicode

Total characters768,276
Distinct characters34
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRetired and obtaining government/semi-government pension payment and is currently engaged in economic activity (employed elsewhere other than the place where he/she is receiving the pension from / engaged in own business)
2nd rowHousehold activities
3rd rowEngaged in economic activity/ currently employed/ engaged in own business
4th rowEngaged in economic activity/ currently employed/ engaged in own business
5th rowRetired - Obtaining government/semi-government pension payment and currently not engaged in economic activity (not employed elsewhere or not engaged in any own business)

Common Values

ValueCountFrequency (%)
Engaged in economic activity/ currently employed/ engaged in own business 5473
33.6%
Household activities 3465
21.3%
Student 2513
15.4%
Too old / Disable/ unable to work 859
 
5.3%
Retired and obtaining government/semi-government pension payment and is currently engaged in economic activity (employed elsewhere other than the place where he/she is receiving the pension from / engaged in own business) 707
 
4.3%
Retired - Obtaining government/semi-government pension payment and currently not engaged in economic activity (not employed elsewhere or not engaged in any own business) 364
 
2.2%
Retired from the private/semi-government sector and does not obtain any pension payment 266
 
1.6%
Seeking for and available to work 215
 
1.3%
Received other pension payments 159
 
1.0%
Other 116
 
0.7%
(Missing) 2133
 
13.1%

Length

2024-12-06T11:24:30.770998image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:30.877332image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
engaged 13088
 
12.4%
in 13088
 
12.4%
economic 6544
 
6.2%
activity 6544
 
6.2%
currently 6544
 
6.2%
employed 6544
 
6.2%
own 6544
 
6.2%
business 6544
 
6.2%
household 3465
 
3.3%
activities 3465
 
3.3%
Other values (36) 33339
31.5%

Most occurring characters

ValueCountFrequency (%)
91572
11.9%
e 90925
 
11.8%
n 74726
 
9.7%
i 61486
 
8.0%
o 47723
 
6.2%
t 44499
 
5.8%
s 34844
 
4.5%
a 32862
 
4.3%
c 31480
 
4.1%
g 30577
 
4.0%
Other values (24) 227582
29.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 642086
83.6%
Space Separator 91572
 
11.9%
Other Punctuation 15415
 
2.0%
Uppercase Letter 15360
 
2.0%
Dash Punctuation 1701
 
0.2%
Close Punctuation 1071
 
0.1%
Open Punctuation 1071
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 90925
14.2%
n 74726
11.6%
i 61486
 
9.6%
o 47723
 
7.4%
t 44499
 
6.9%
s 34844
 
5.4%
a 32862
 
5.1%
c 31480
 
4.9%
g 30577
 
4.8%
d 30490
 
4.7%
Other values (12) 162474
25.3%
Uppercase Letter
ValueCountFrequency (%)
E 5473
35.6%
H 3465
22.6%
S 2728
17.8%
R 1496
 
9.7%
T 859
 
5.6%
D 859
 
5.6%
O 480
 
3.1%
Space Separator
ValueCountFrequency (%)
91572
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 15415
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1701
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1071
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1071
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 657446
85.6%
Common 110830
 
14.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 90925
13.8%
n 74726
11.4%
i 61486
 
9.4%
o 47723
 
7.3%
t 44499
 
6.8%
s 34844
 
5.3%
a 32862
 
5.0%
c 31480
 
4.8%
g 30577
 
4.7%
d 30490
 
4.6%
Other values (19) 177834
27.0%
Common
ValueCountFrequency (%)
91572
82.6%
/ 15415
 
13.9%
- 1701
 
1.5%
) 1071
 
1.0%
( 1071
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 768276
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
91572
11.9%
e 90925
 
11.8%
n 74726
 
9.7%
i 61486
 
8.0%
o 47723
 
6.2%
t 44499
 
5.8%
s 34844
 
4.5%
a 32862
 
4.3%
c 31480
 
4.1%
g 30577
 
4.0%
Other values (24) 227582
29.6%

main_occupation
Categorical

Missing 

Distinct11
Distinct (%)0.2%
Missing9902
Missing (%)60.9%
Memory size127.2 KiB
Service worker and shop and market sales worker
1777 
Professional
1272 
Elementary occupation
605 
Clerk
527 
Technician and associate professional
501 
Other values (6)
1686 

Length

Max length47
Median length39
Mean length29.174466
Min length5

Characters and Unicode

Total characters185,783
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRelated to forces
2nd rowProfessional
3rd rowProfessional
4th rowCraft and related worker
5th rowClerk

Common Values

ValueCountFrequency (%)
Service worker and shop and market sales worker 1777
 
10.9%
Professional 1272
 
7.8%
Elementary occupation 605
 
3.7%
Clerk 527
 
3.2%
Technician and associate professional 501
 
3.1%
Legislator, senior official, and manager 469
 
2.9%
Skilled agricultural and fishery worker 333
 
2.0%
No occupation 296
 
1.8%
Craft and related worker 271
 
1.7%
Plant and machine operator and assembler 245
 
1.5%
(Missing) 9902
60.9%

Length

2024-12-06T11:24:31.024534image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and 5618
21.1%
worker 4158
15.6%
service 1777
 
6.7%
shop 1777
 
6.7%
market 1777
 
6.7%
sales 1777
 
6.7%
professional 1773
 
6.7%
occupation 901
 
3.4%
elementary 605
 
2.3%
clerk 527
 
2.0%
Other values (18) 5911
22.2%

Most occurring characters

ValueCountFrequency (%)
20233
10.9%
e 19589
10.5%
r 18530
 
10.0%
a 18090
 
9.7%
o 14121
 
7.6%
s 11712
 
6.3%
n 11327
 
6.1%
i 9074
 
4.9%
l 7785
 
4.2%
k 6795
 
3.7%
Other values (22) 48527
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 158244
85.2%
Space Separator 20233
 
10.9%
Uppercase Letter 6368
 
3.4%
Other Punctuation 938
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 19589
12.4%
r 18530
11.7%
a 18090
11.4%
o 14121
8.9%
s 11712
 
7.4%
n 11327
 
7.2%
i 9074
 
5.7%
l 7785
 
4.9%
k 6795
 
4.3%
d 6294
 
4.0%
Other values (12) 34927
22.1%
Uppercase Letter
ValueCountFrequency (%)
S 2110
33.1%
P 1517
23.8%
C 798
 
12.5%
E 605
 
9.5%
T 501
 
7.9%
L 469
 
7.4%
N 296
 
4.6%
R 72
 
1.1%
Space Separator
ValueCountFrequency (%)
20233
100.0%
Other Punctuation
ValueCountFrequency (%)
, 938
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 164612
88.6%
Common 21171
 
11.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 19589
11.9%
r 18530
11.3%
a 18090
11.0%
o 14121
 
8.6%
s 11712
 
7.1%
n 11327
 
6.9%
i 9074
 
5.5%
l 7785
 
4.7%
k 6795
 
4.1%
d 6294
 
3.8%
Other values (20) 41295
25.1%
Common
ValueCountFrequency (%)
20233
95.6%
, 938
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 185783
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20233
10.9%
e 19589
10.5%
r 18530
 
10.0%
a 18090
 
9.7%
o 14121
 
7.6%
s 11712
 
6.3%
n 11327
 
6.1%
i 9074
 
4.9%
l 7785
 
4.2%
k 6795
 
3.7%
Other values (22) 48527
26.1%

daily_wage_owner_or_not
Boolean

Missing 

Distinct2
Distinct (%)< 0.1%
Missing10090
Missing (%)62.0%
Memory size31.9 KiB
False
4064 
True
2116 
(Missing)
10090 
ValueCountFrequency (%)
False 4064
25.0%
True 2116
 
13.0%
(Missing) 10090
62.0%
2024-12-06T11:24:31.104041image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

employment_status_of_the_main_occupation
Categorical

High correlation  Missing 

Distinct6
Distinct (%)0.1%
Missing9902
Missing (%)60.9%
Memory size127.2 KiB
Private sector employee
3698 
Own account worker
1084 
Government employee
859 
Employer
415 
Semi government employee
 
159

Length

Max length26
Median length23
Mean length20.7288
Min length8

Characters and Unicode

Total characters132,001
Distinct characters27
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGovernment employee
2nd rowGovernment employee
3rd rowGovernment employee
4th rowPrivate sector employee
5th rowGovernment employee

Common Values

ValueCountFrequency (%)
Private sector employee 3698
 
22.7%
Own account worker 1084
 
6.7%
Government employee 859
 
5.3%
Employer 415
 
2.6%
Semi government employee 159
 
1.0%
Contributing family worker 153
 
0.9%
(Missing) 9902
60.9%

Length

2024-12-06T11:24:31.193332image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:31.288547image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
employee 4716
27.1%
private 3698
21.2%
sector 3698
21.2%
worker 1237
 
7.1%
own 1084
 
6.2%
account 1084
 
6.2%
government 1018
 
5.8%
employer 415
 
2.4%
semi 159
 
0.9%
contributing 153
 
0.9%

Most occurring characters

ValueCountFrequency (%)
e 25391
19.2%
o 12321
 
9.3%
r 11456
 
8.7%
11047
 
8.4%
t 9804
 
7.4%
m 6461
 
4.9%
c 5866
 
4.4%
l 5284
 
4.0%
y 5284
 
4.0%
p 5131
 
3.9%
Other values (17) 33956
25.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 114586
86.8%
Space Separator 11047
 
8.4%
Uppercase Letter 6368
 
4.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 25391
22.2%
o 12321
10.8%
r 11456
10.0%
t 9804
 
8.6%
m 6461
 
5.6%
c 5866
 
5.1%
l 5284
 
4.6%
y 5284
 
4.6%
p 5131
 
4.5%
a 4935
 
4.3%
Other values (10) 22653
19.8%
Uppercase Letter
ValueCountFrequency (%)
P 3698
58.1%
O 1084
 
17.0%
G 859
 
13.5%
E 415
 
6.5%
S 159
 
2.5%
C 153
 
2.4%
Space Separator
ValueCountFrequency (%)
11047
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 120954
91.6%
Common 11047
 
8.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 25391
21.0%
o 12321
10.2%
r 11456
 
9.5%
t 9804
 
8.1%
m 6461
 
5.3%
c 5866
 
4.8%
l 5284
 
4.4%
y 5284
 
4.4%
p 5131
 
4.2%
a 4935
 
4.1%
Other values (16) 29021
24.0%
Common
ValueCountFrequency (%)
11047
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 132001
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 25391
19.2%
o 12321
 
9.3%
r 11456
 
8.7%
11047
 
8.4%
t 9804
 
7.4%
m 6461
 
4.9%
c 5866
 
4.4%
l 5284
 
4.0%
y 5284
 
4.0%
p 5131
 
3.9%
Other values (17) 33956
25.7%
Distinct326
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean124.84774
Minimum0
Maximum168
Zeros699
Zeros (%)4.3%
Negative0
Negative (%)0.0%
Memory size127.2 KiB
2024-12-06T11:24:31.400315image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10
Q196
median140
Q3168
95-th percentile168
Maximum168
Range168
Interquartile range (IQR)72

Descriptive statistics

Standard deviation47.768165
Coefficient of variation (CV)0.38261138
Kurtosis0.29131003
Mean124.84774
Median Absolute Deviation (MAD)28
Skewness-1.0383694
Sum2031272.7
Variance2281.7976
MonotonicityNot monotonic
2024-12-06T11:24:31.515105image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
168 5380
33.1%
84 761
 
4.7%
0 699
 
4.3%
160 449
 
2.8%
150 439
 
2.7%
120 422
 
2.6%
140 360
 
2.2%
100 333
 
2.0%
108 301
 
1.9%
96 282
 
1.7%
Other values (316) 6844
42.1%
ValueCountFrequency (%)
0 699
4.3%
0.142 1
 
< 0.1%
0.147 1
 
< 0.1%
0.159 1
 
< 0.1%
0.168 1
 
< 0.1%
0.25 1
 
< 0.1%
0.3 1
 
< 0.1%
1 13
 
0.1%
2 13
 
0.1%
2.3 2
 
< 0.1%
ValueCountFrequency (%)
168 5380
33.1%
167.5 1
 
< 0.1%
167.3 1
 
< 0.1%
167.25 1
 
< 0.1%
167 36
 
0.2%
166.5 1
 
< 0.1%
166 87
 
0.5%
165.9 1
 
< 0.1%
165.75 1
 
< 0.1%
165.7 1
 
< 0.1%
Distinct3
Distinct (%)0.1%
Missing11967
Missing (%)73.6%
Memory size127.2 KiB
Yes, went daily during working days
2764 
No, worked from home
857 
Yes, went on most of the days
682 

Length

Max length35
Median length35
Mean length31.061585
Min length20

Characters and Unicode

Total characters133,658
Distinct characters22
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo, worked from home
2nd rowYes, went daily during working days
3rd rowYes, went daily during working days
4th rowYes, went on most of the days
5th rowNo, worked from home

Common Values

ValueCountFrequency (%)
Yes, went daily during working days 2764
 
17.0%
No, worked from home 857
 
5.3%
Yes, went on most of the days 682
 
4.2%
(Missing) 11967
73.6%

Length

2024-12-06T11:24:31.628065image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-06T11:24:31.719589image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
yes 3446
13.9%
went 3446
13.9%
days 3446
13.9%
daily 2764
11.2%
during 2764
11.2%
working 2764
11.2%
no 857
 
3.5%
worked 857
 
3.5%
from 857
 
3.5%
home 857
 
3.5%
Other values (4) 2728
11.0%

Most occurring characters

ValueCountFrequency (%)
20483
15.3%
d 9831
 
7.4%
n 9656
 
7.2%
e 9288
 
6.9%
i 8292
 
6.2%
o 8238
 
6.2%
s 7574
 
5.7%
r 7242
 
5.4%
w 7067
 
5.3%
a 6210
 
4.6%
Other values (12) 39777
29.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 104569
78.2%
Space Separator 20483
 
15.3%
Other Punctuation 4303
 
3.2%
Uppercase Letter 4303
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
d 9831
 
9.4%
n 9656
 
9.2%
e 9288
 
8.9%
i 8292
 
7.9%
o 8238
 
7.9%
s 7574
 
7.2%
r 7242
 
6.9%
w 7067
 
6.8%
a 6210
 
5.9%
y 6210
 
5.9%
Other values (8) 24961
23.9%
Uppercase Letter
ValueCountFrequency (%)
Y 3446
80.1%
N 857
 
19.9%
Space Separator
ValueCountFrequency (%)
20483
100.0%
Other Punctuation
ValueCountFrequency (%)
, 4303
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 108872
81.5%
Common 24786
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
d 9831
 
9.0%
n 9656
 
8.9%
e 9288
 
8.5%
i 8292
 
7.6%
o 8238
 
7.6%
s 7574
 
7.0%
r 7242
 
6.7%
w 7067
 
6.5%
a 6210
 
5.7%
y 6210
 
5.7%
Other values (10) 29264
26.9%
Common
ValueCountFrequency (%)
20483
82.6%
, 4303
 
17.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 133658
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20483
15.3%
d 9831
 
7.4%
n 9656
 
7.2%
e 9288
 
6.9%
i 8292
 
6.2%
o 8238
 
6.2%
s 7574
 
5.7%
r 7242
 
5.4%
w 7067
 
5.3%
a 6210
 
4.6%
Other values (12) 39777
29.8%

Interactions

2024-12-06T11:24:27.713742image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:27.484250image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:27.795243image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-12-06T11:24:27.629085image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2024-12-06T11:24:31.796549image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
agecurrent_attendance_in_any_education_instituitiondaily_wage_owner_or_notemployment_status_of_the_main_occupationethnicitygenderhighest_level_of_educationmain_activity_engaged_inmain_occupationmarital_statusmember_IDmember_went_out_for_work_or_not_during_last_weekno_of_hours_stayed_at_home_during_last_weekrelationship_to_the_head_of_householdreligion
age1.0000.3980.1250.1560.0510.0590.3020.3570.0870.3190.2530.1590.1420.3460.045
current_attendance_in_any_education_instituition0.3981.0000.0770.0490.0330.0370.2570.3270.0590.2760.2220.0380.0960.2800.030
daily_wage_owner_or_not0.1250.0771.0000.4280.1070.1350.3310.1230.2850.0870.1070.1180.1250.1130.112
employment_status_of_the_main_occupation0.1560.0490.4281.0000.0600.1470.1560.5270.3390.1130.1220.2170.1120.1450.075
ethnicity0.0510.0330.1070.0601.0000.0000.0640.0510.0470.0360.0550.0000.0200.0630.512
gender0.0590.0370.1350.1470.0001.0000.0420.4730.2090.1620.4790.1240.2830.5260.000
highest_level_of_education0.3020.2570.3310.1560.0640.0421.0000.1430.2100.1150.0970.1060.0660.1230.065
main_activity_engaged_in0.3570.3270.1230.5270.0510.4730.1431.0000.4540.2590.2260.1770.1990.3070.053
main_occupation0.0870.0590.2850.3390.0470.2090.2100.4541.0000.0740.0620.1820.0670.0880.054
marital_status0.3190.2760.0870.1130.0360.1620.1150.2590.0741.0000.2400.0730.0700.3310.033
member_ID0.2530.2220.1070.1220.0550.4790.0970.2260.0620.2401.0000.0170.0830.4010.049
member_went_out_for_work_or_not_during_last_week0.1590.0380.1180.2170.0000.1240.1060.1770.1820.0730.0171.0000.4200.1210.047
no_of_hours_stayed_at_home_during_last_week0.1420.0960.1250.1120.0200.2830.0660.1990.0670.0700.0830.4201.0000.1250.033
relationship_to_the_head_of_household0.3460.2800.1130.1450.0630.5260.1230.3070.0880.3310.4010.1210.1251.0000.071
religion0.0450.0300.1120.0750.5120.0000.0650.0530.0540.0330.0490.0470.0330.0711.000

Missing values

2024-12-06T11:24:27.930565image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-12-06T11:24:28.135842image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

household_IDmember_IDagerelationship_to_the_head_of_householdgenderethnicityreligionmarital_statuscurrent_attendance_in_any_education_instituitionhighest_level_of_educationmain_activity_engaged_inmain_occupationdaily_wage_owner_or_notemployment_status_of_the_main_occupationno_of_hours_stayed_at_home_during_last_weekmember_went_out_for_work_or_not_during_last_week
0ID0001I171Head of the householdMaleSinhalaBuddhismCurrently married (registered)Does not attendPassed G.C.E.(A/L) or equivalentRetired and obtaining government/semi-government pension payment and is currently engaged in economic activity (employed elsewhere other than the place where he/she is receiving the pension from / engaged in own business)Related to forcesNoGovernment employee168.0No, worked from home
1ID0001I266Wife/HusbandFemaleSinhalaBuddhismCurrently married (registered)Does not attendPassed G.C.E.(A/L) or equivalentHousehold activitiesNaNNaNNaN168.0NaN
2ID0001I332Son/daughterMaleSinhalaBuddhismCurrently married (registered)Does not attendPassed post Graduate Degree / DiplomaEngaged in economic activity/ currently employed/ engaged in own businessProfessionalNoGovernment employee70.0Yes, went daily during working days
3ID0001I430Son-in-law/Daughter in lawFemaleSinhalaBuddhismCurrently married (registered)Does not attendPassed post Graduate Degree / DiplomaEngaged in economic activity/ currently employed/ engaged in own businessProfessionalNoGovernment employee150.0Yes, went daily during working days
4ID0002I185Head of the householdMaleSinhalaBuddhismCurrently married (registered)Does not attendPassed Grade 6Retired - Obtaining government/semi-government pension payment and currently not engaged in economic activity (not employed elsewhere or not engaged in any own business)NaNNaNNaN168.0NaN
5ID0002I266Parents of the head of the Household/ spouseMaleSinhalaBuddhismCurrently married (registered)Does not attendPassed G.C.E.(A/L) or equivalentRetired and obtaining government/semi-government pension payment and is currently engaged in economic activity (employed elsewhere other than the place where he/she is receiving the pension from / engaged in own business)Craft and related workerNoPrivate sector employee0.0Yes, went on most of the days
6ID0002I359Son/daughterFemaleSinhalaBuddhismCurrently married (registered)Does not attendPassed G.C.E.(A/L) or equivalentRetired and obtaining government/semi-government pension payment and is currently engaged in economic activity (employed elsewhere other than the place where he/she is receiving the pension from / engaged in own business)ClerkNoGovernment employee168.0No, worked from home
7ID0003I144Head of the householdMaleSinhalaBuddhismCurrently married (registered)Does not attendPassed Degree / DiplomaRetired and obtaining government/semi-government pension payment and is currently engaged in economic activity (employed elsewhere other than the place where he/she is receiving the pension from / engaged in own business)ProfessionalNoGovernment employee100.0Yes, went daily during working days
8ID0003I241Wife/HusbandFemaleSinhalaBuddhismCurrently married (registered)Does not attendPassed post Graduate Degree / DiplomaRetired and obtaining government/semi-government pension payment and is currently engaged in economic activity (employed elsewhere other than the place where he/she is receiving the pension from / engaged in own business)ProfessionalNoGovernment employee100.0Yes, went daily during working days
9ID0003I374Parents of the head of the Household/ spouseFemaleSinhalaBuddhismWidowedDoes not attendPassed Degree / DiplomaRetired - Obtaining government/semi-government pension payment and currently not engaged in economic activity (not employed elsewhere or not engaged in any own business)NaNNaNNaN168.0NaN
household_IDmember_IDagerelationship_to_the_head_of_householdgenderethnicityreligionmarital_statuscurrent_attendance_in_any_education_instituitionhighest_level_of_educationmain_activity_engaged_inmain_occupationdaily_wage_owner_or_notemployment_status_of_the_main_occupationno_of_hours_stayed_at_home_during_last_weekmember_went_out_for_work_or_not_during_last_week
16260ID4060I178Head of the householdFemaleSinhalaBuddhismNever marriedDoes not attendPassed Grade 10Household activitiesNaNNaNNaN130.0NaN
16261ID4061I182Head of the householdFemaleSinhalaBuddhismWidowedDoes not attendPassed G.C.E.(O/L)Household activitiesNaNNaNNaN168.0NaN
16262ID4061I253Son/daughterMaleSinhalaBuddhismNever marriedDoes not attendPassed G.C.E.(O/L)Engaged in economic activity/ currently employed/ engaged in own businessElementary occupationYesPrivate sector employee70.0NaN
16263ID4062I173Head of the householdMaleSinhalaBuddhismCurrently married (registered)Does not attendPassed G.C.E.(O/L)Engaged in economic activity/ currently employed/ engaged in own businessService worker and shop and market sales workerYesOwn account worker168.0NaN
16264ID4062I266Wife/HusbandFemaleSinhalaBuddhismCurrently married (registered)Does not attendPassed G.C.E.(O/L)Household activitiesNaNNaNNaN168.0NaN
16265ID4063I162Head of the householdFemaleSinhalaBuddhismWidowedDoes not attendPassed Grade 10Household activitiesNaNNaNNaN48.0NaN
16266ID4063I249Son-in-law/Daughter in lawMaleSinhalaBuddhismCurrently married (registered)Does not attendPassed Grade 10Engaged in economic activity/ currently employed/ engaged in own businessService worker and shop and market sales workerYesPrivate sector employee120.0NaN
16267ID4063I342Son/daughterFemaleSinhalaBuddhismCurrently married (registered)Does not attendPassed G.C.E.(A/L) or equivalentHousehold activitiesNaNNaNNaN48.0NaN
16268ID4063I437Son/daughterMaleSinhalaBuddhismNever marriedDoes not attendPassed Grade 10Household activitiesNaNNaNNaN168.0NaN
16269ID4063I536Son/daughterMaleSinhalaBuddhismNever marriedDoes not attendPassed Grade 10Engaged in economic activity/ currently employed/ engaged in own businessTechnician and associate professionalNoPrivate sector employee0.0NaN