Overview
Brought to you by YData
Dataset statistics
Number of variables | 16 |
---|---|
Number of observations | 16,270 |
Missing cells | 45,187 |
Missing cells (%) | 17.4% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 2.0 MiB |
Average record size in memory | 128.0 B |
Variable types
Text | 1 |
---|---|
Categorical | 12 |
Numeric | 2 |
Boolean | 1 |
employment_status_of_the_main_occupation is highly overall correlated with main_activity_engaged_in | High correlation |
ethnicity is highly overall correlated with religion | High correlation |
gender is highly overall correlated with relationship_to_the_head_of_household | High correlation |
main_activity_engaged_in is highly overall correlated with employment_status_of_the_main_occupation | High correlation |
relationship_to_the_head_of_household is highly overall correlated with gender | High correlation |
religion is highly overall correlated with ethnicity | High correlation |
ethnicity is highly imbalanced (69.9%) | Imbalance |
current_attendance_in_any_education_instituition is highly imbalanced (56.1%) | Imbalance |
current_attendance_in_any_education_instituition has 418 (2.6%) missing values | Missing |
highest_level_of_education has 775 (4.8%) missing values | Missing |
main_activity_engaged_in has 2133 (13.1%) missing values | Missing |
main_occupation has 9902 (60.9%) missing values | Missing |
daily_wage_owner_or_not has 10090 (62.0%) missing values | Missing |
employment_status_of_the_main_occupation has 9902 (60.9%) missing values | Missing |
member_went_out_for_work_or_not_during_last_week has 11967 (73.6%) missing values | Missing |
no_of_hours_stayed_at_home_during_last_week has 699 (4.3%) zeros | Zeros |
Reproduction
Analysis started | 2024-12-06 05:54:26.223750 |
---|---|
Analysis finished | 2024-12-06 05:54:28.327003 |
Duration | 2.1 seconds |
Software version | ydata-profiling vv4.11.0 |
Download configuration | config.json |
Variables
household_ID
Text
Distinct | 4063 |
---|---|
Distinct (%) | 25.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 127.2 KiB |
Value | Count | Frequency (%) |
id0349 | 13 | 0.1% |
id3438 | 13 | 0.1% |
id0849 | 12 | 0.1% |
id1781 | 12 | 0.1% |
id2880 | 12 | 0.1% |
id0939 | 12 | 0.1% |
id3013 | 11 | 0.1% |
id0699 | 11 | 0.1% |
id2896 | 11 | 0.1% |
id2341 | 11 | 0.1% |
Other values (4053) | 16152 |
Most occurring characters
Value | Count | Frequency (%) |
I | 16270 | |
D | 16270 | |
0 | 9274 | |
1 | 8926 | |
2 | 8804 | |
3 | 8734 | |
4 | 5068 | 5.2% |
8 | 5009 | 5.1% |
6 | 4859 | 5.0% |
5 | 4827 | 4.9% |
Other values (2) | 9579 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 65080 | |
Uppercase Letter | 32540 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 9274 | |
1 | 8926 | |
2 | 8804 | |
3 | 8734 | |
4 | 5068 | |
8 | 5009 | |
6 | 4859 | |
5 | 4827 | |
7 | 4802 | |
9 | 4777 |
Uppercase Letter
Value | Count | Frequency (%) |
I | 16270 | |
D | 16270 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 65080 | |
Latin | 32540 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 9274 | |
1 | 8926 | |
2 | 8804 | |
3 | 8734 | |
4 | 5068 | |
8 | 5009 | |
6 | 4859 | |
5 | 4827 | |
7 | 4802 | |
9 | 4777 |
Latin
Value | Count | Frequency (%) |
I | 16270 | |
D | 16270 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 97620 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
I | 16270 | |
D | 16270 | |
0 | 9274 | |
1 | 8926 | |
2 | 8804 | |
3 | 8734 | |
4 | 5068 | 5.2% |
8 | 5009 | 5.1% |
6 | 4859 | 5.0% |
5 | 4827 | 4.9% |
Other values (2) | 9579 |
member_ID
Categorical
Distinct | 13 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 127.2 KiB |
I1 | |
---|---|
I2 | |
I3 | |
I4 | |
I5 | |
Other values (8) |
Common Values
Value | Count | Frequency (%) |
I1 | 4063 | |
I2 | 3877 | |
I3 | 3275 | |
I4 | 2457 | |
I5 | 1443 | 8.9% |
I6 | 671 | 4.1% |
I7 | 264 | 1.6% |
I8 | 120 | 0.7% |
I9 | 55 | 0.3% |
I10 | 26 | 0.2% |
Other values (3) | 19 | 0.1% |
Length
Value | Count | Frequency (%) |
i1 | 4063 | |
i2 | 3877 | |
i3 | 3275 | |
i4 | 2457 | |
i5 | 1443 | 8.9% |
i6 | 671 | 4.1% |
i7 | 264 | 1.6% |
i8 | 120 | 0.7% |
i9 | 55 | 0.3% |
i10 | 26 | 0.2% |
Other values (3) | 19 | 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
I | 16270 | |
1 | 4119 | 12.6% |
2 | 3883 | 11.9% |
3 | 3277 | 10.1% |
4 | 2457 | 7.5% |
5 | 1443 | 4.4% |
6 | 671 | 2.1% |
7 | 264 | 0.8% |
8 | 120 | 0.4% |
9 | 55 | 0.2% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 16315 | |
Uppercase Letter | 16270 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 4119 | |
2 | 3883 | |
3 | 3277 | |
4 | 2457 | |
5 | 1443 | 8.8% |
6 | 671 | 4.1% |
7 | 264 | 1.6% |
8 | 120 | 0.7% |
9 | 55 | 0.3% |
0 | 26 | 0.2% |
Uppercase Letter
Value | Count | Frequency (%) |
I | 16270 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 16315 | |
Latin | 16270 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 4119 | |
2 | 3883 | |
3 | 3277 | |
4 | 2457 | |
5 | 1443 | 8.8% |
6 | 671 | 4.1% |
7 | 264 | 1.6% |
8 | 120 | 0.7% |
9 | 55 | 0.3% |
0 | 26 | 0.2% |
Latin
Value | Count | Frequency (%) |
I | 16270 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 32585 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
I | 16270 | |
1 | 4119 | 12.6% |
2 | 3883 | 11.9% |
3 | 3277 | 10.1% |
4 | 2457 | 7.5% |
5 | 1443 | 4.4% |
6 | 671 | 2.1% |
7 | 264 | 0.8% |
8 | 120 | 0.4% |
9 | 55 | 0.2% |
age
Real number (ℝ)
Distinct | 97 |
---|---|
Distinct (%) | 0.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 38.39287 |
Minimum | 0 |
---|---|
Maximum | 98 |
Zeros | 149 |
Zeros (%) | 0.9% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 127.2 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 5 |
Q1 | 19 |
median | 38 |
Q3 | 56 |
95-th percentile | 75 |
Maximum | 98 |
Range | 98 |
Interquartile range (IQR) | 37 |
Descriptive statistics
Standard deviation | 22.075172 |
---|---|
Coefficient of variation (CV) | 0.57498103 |
Kurtosis | -0.99725102 |
Mean | 38.39287 |
Median Absolute Deviation (MAD) | 18 |
Skewness | 0.14195376 |
Sum | 624652 |
Variance | 487.31322 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
19 | 282 | 1.7% |
17 | 278 | 1.7% |
23 | 276 | 1.7% |
15 | 267 | 1.6% |
18 | 266 | 1.6% |
20 | 263 | 1.6% |
16 | 256 | 1.6% |
42 | 252 | 1.5% |
22 | 251 | 1.5% |
45 | 245 | 1.5% |
Other values (87) | 13634 |
Value | Count | Frequency (%) |
0 | 149 | |
1 | 131 | |
2 | 138 | |
3 | 186 | |
4 | 171 | |
5 | 177 | |
6 | 163 | |
7 | 164 | |
8 | 196 | |
9 | 198 |
Value | Count | Frequency (%) |
98 | 1 | < 0.1% |
96 | 1 | < 0.1% |
95 | 3 | < 0.1% |
93 | 8 | < 0.1% |
92 | 5 | < 0.1% |
91 | 7 | < 0.1% |
90 | 17 | |
89 | 20 | |
88 | 16 | |
87 | 14 |
relationship_to_the_head_of_household
Categorical
High correlation 
Distinct | 12 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 127.2 KiB |
Son/daughter | |
---|---|
Head of the household | |
Wife/Husband | |
Parents of the head of the Household/ spouse | |
Other relative | |
Other values (7) |
Length
Max length | 44 |
---|---|
Median length | 12 |
Mean length | 17.506884 |
Min length | 5 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Head of the household |
---|---|
2nd row | Wife/Husband |
3rd row | Son/daughter |
4th row | Son-in-law/Daughter in law |
5th row | Head of the household |
Common Values
Value | Count | Frequency (%) |
Son/daughter | 5654 | |
Head of the household | 4012 | |
Wife/Husband | 3226 | |
Parents of the head of the Household/ spouse | 1198 | 7.4% |
Other relative | 685 | 4.2% |
Grandson/ Granddaughter | 666 | 4.1% |
Son-in-law/Daughter in law | 434 | 2.7% |
Boarder | 237 | 1.5% |
Domestic servant/driver/watcher | 101 | 0.6% |
Other | 52 | 0.3% |
Other values (2) | 5 | < 0.1% |
Length
Value | Count | Frequency (%) |
of | 6408 | |
the | 6408 | |
son/daughter | 5654 | |
head | 5210 | |
household | 5210 | |
wife/husband | 3226 | |
parents | 1198 | 3.1% |
spouse | 1198 | 3.1% |
other | 737 | 1.9% |
relative | 685 | 1.8% |
Other values (13) | 3086 |
Most occurring characters
Value | Count | Frequency (%) |
e | 31961 | 11.2% |
o | 25125 | 8.8% |
h | 24420 | 8.6% |
22750 | 8.0% | |
d | 21639 | 7.6% |
a | 19715 | 6.9% |
u | 16391 | 5.8% |
t | 16090 | 5.6% |
n | 13486 | 4.7% |
s | 12904 | 4.5% |
Other values (24) | 80356 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 228043 | |
Space Separator | 22750 | 8.0% |
Uppercase Letter | 21794 | 7.7% |
Other Punctuation | 11382 | 4.0% |
Dash Punctuation | 868 | 0.3% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 31961 | |
o | 25125 | |
h | 24420 | |
d | 21639 | |
a | 19715 | |
u | 16391 | |
t | 16090 | |
n | 13486 | 5.9% |
s | 12904 | 5.7% |
r | 11587 | 5.1% |
Other values (11) | 34725 |
Uppercase Letter
Value | Count | Frequency (%) |
H | 8436 | |
S | 6088 | |
W | 3226 | 14.8% |
G | 1332 | 6.1% |
P | 1198 | 5.5% |
O | 737 | 3.4% |
D | 537 | 2.5% |
B | 237 | 1.1% |
R | 3 | < 0.1% |
Other Punctuation
Value | Count | Frequency (%) |
/ | 11380 | |
' | 2 | < 0.1% |
Space Separator
Value | Count | Frequency (%) |
22750 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 868 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 249837 | |
Common | 35000 | 12.3% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 31961 | |
o | 25125 | |
h | 24420 | |
d | 21639 | 8.7% |
a | 19715 | 7.9% |
u | 16391 | 6.6% |
t | 16090 | 6.4% |
n | 13486 | 5.4% |
s | 12904 | 5.2% |
r | 11587 | 4.6% |
Other values (20) | 56519 |
Common
Value | Count | Frequency (%) |
22750 | ||
/ | 11380 | |
- | 868 | 2.5% |
' | 2 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 284837 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 31961 | 11.2% |
o | 25125 | 8.8% |
h | 24420 | 8.6% |
22750 | 8.0% | |
d | 21639 | 7.6% |
a | 19715 | 6.9% |
u | 16391 | 5.8% |
t | 16090 | 5.6% |
n | 13486 | 4.7% |
s | 12904 | 4.5% |
Other values (24) | 80356 |
gender
Categorical
High correlation 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 127.2 KiB |
Female | |
---|---|
Male |
Common Values
Value | Count | Frequency (%) |
Female | 8386 | |
Male | 7884 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
female | 8386 | |
male | 7884 |
Most occurring characters
Value | Count | Frequency (%) |
e | 24656 | |
a | 16270 | |
l | 16270 | |
F | 8386 | 10.2% |
m | 8386 | 10.2% |
M | 7884 | 9.6% |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 65582 | |
Uppercase Letter | 16270 | 19.9% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 24656 | |
a | 16270 | |
l | 16270 | |
m | 8386 | 12.8% |
Uppercase Letter
Value | Count | Frequency (%) |
F | 8386 | |
M | 7884 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 81852 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 24656 | |
a | 16270 | |
l | 16270 | |
F | 8386 | 10.2% |
m | 8386 | 10.2% |
M | 7884 | 9.6% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 81852 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 24656 | |
a | 16270 | |
l | 16270 | |
F | 8386 | 10.2% |
m | 8386 | 10.2% |
M | 7884 | 9.6% |
ethnicity
Categorical
High correlation  Imbalance 
Distinct | 7 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 127.2 KiB |
Sinhala | |
---|---|
Sri Lankan Moor/Muslim | |
Sri Lankan Tamil | 572 |
Indian Tamil | 82 |
Malay | 42 |
Other values (2) | 46 |
Common Values
Value | Count | Frequency (%) |
Sinhala | 13560 | |
Sri Lankan Moor/Muslim | 1968 | 12.1% |
Sri Lankan Tamil | 572 | 3.5% |
Indian Tamil | 82 | 0.5% |
Malay | 42 | 0.3% |
Burgher | 32 | 0.2% |
Other | 14 | 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
sinhala | 13560 | |
sri | 2540 | 11.9% |
lankan | 2540 | 11.9% |
moor/muslim | 1968 | 9.2% |
tamil | 654 | 3.1% |
indian | 82 | 0.4% |
malay | 42 | 0.2% |
burgher | 32 | 0.1% |
other | 14 | 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
a | 33020 | |
n | 18804 | |
i | 18804 | |
l | 16224 | |
S | 16100 | |
h | 13606 | |
5162 | 3.5% | |
r | 4586 | 3.1% |
M | 3978 | 2.7% |
o | 3936 | 2.6% |
Other values (15) | 14636 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 118326 | |
Uppercase Letter | 23400 | 15.7% |
Space Separator | 5162 | 3.5% |
Other Punctuation | 1968 | 1.3% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 33020 | |
n | 18804 | |
i | 18804 | |
l | 16224 | |
h | 13606 | |
r | 4586 | 3.9% |
o | 3936 | 3.3% |
m | 2622 | 2.2% |
k | 2540 | 2.1% |
u | 2000 | 1.7% |
Other values (6) | 2184 | 1.8% |
Uppercase Letter
Value | Count | Frequency (%) |
S | 16100 | |
M | 3978 | 17.0% |
L | 2540 | 10.9% |
T | 654 | 2.8% |
I | 82 | 0.4% |
B | 32 | 0.1% |
O | 14 | 0.1% |
Space Separator
Value | Count | Frequency (%) |
5162 |
Other Punctuation
Value | Count | Frequency (%) |
/ | 1968 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 141726 | |
Common | 7130 | 4.8% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 33020 | |
n | 18804 | |
i | 18804 | |
l | 16224 | |
S | 16100 | |
h | 13606 | |
r | 4586 | 3.2% |
M | 3978 | 2.8% |
o | 3936 | 2.8% |
m | 2622 | 1.9% |
Other values (13) | 10046 | 7.1% |
Common
Value | Count | Frequency (%) |
5162 | ||
/ | 1968 | 27.6% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 148856 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 33020 | |
n | 18804 | |
i | 18804 | |
l | 16224 | |
S | 16100 | |
h | 13606 | |
5162 | 3.5% | |
r | 4586 | 3.1% |
M | 3978 | 2.7% |
o | 3936 | 2.6% |
Other values (15) | 14636 |
religion
Categorical
High correlation 
Distinct | 7 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 127.2 KiB |
Buddhism | |
---|---|
Roman Catholicism | |
Islam | |
Other Christian denominations | 547 |
Hinduism | 407 |
Other values (2) | 14 |
Common Values
Value | Count | Frequency (%) |
Buddhism | 10807 | |
Roman Catholicism | 2442 | 15.0% |
Islam | 2053 | 12.6% |
Other Christian denominations | 547 | 3.4% |
Hinduism | 407 | 2.5% |
Other | 8 | < 0.1% |
No religion | 6 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
buddhism | 10807 | |
roman | 2442 | 12.3% |
catholicism | 2442 | 12.3% |
islam | 2053 | 10.4% |
other | 555 | 2.8% |
christian | 547 | 2.8% |
denominations | 547 | 2.8% |
hinduism | 407 | 2.1% |
no | 6 | < 0.1% |
religion | 6 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
d | 22568 | |
i | 18705 | |
m | 18698 | |
s | 16803 | |
h | 14351 | |
u | 11214 | |
B | 10807 | |
a | 8031 | 5.1% |
o | 5990 | 3.8% |
n | 5043 | 3.2% |
Other values (13) | 25250 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 134659 | |
Uppercase Letter | 19259 | 12.2% |
Space Separator | 3542 | 2.2% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
d | 22568 | |
i | 18705 | |
m | 18698 | |
s | 16803 | |
h | 14351 | |
u | 11214 | |
a | 8031 | 6.0% |
o | 5990 | 4.4% |
n | 5043 | 3.7% |
l | 4501 | 3.3% |
Other values (5) | 8755 | 6.5% |
Uppercase Letter
Value | Count | Frequency (%) |
B | 10807 | |
C | 2989 | 15.5% |
R | 2442 | 12.7% |
I | 2053 | 10.7% |
O | 555 | 2.9% |
H | 407 | 2.1% |
N | 6 | < 0.1% |
Space Separator
Value | Count | Frequency (%) |
3542 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 153918 | |
Common | 3542 | 2.2% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
d | 22568 | |
i | 18705 | |
m | 18698 | |
s | 16803 | |
h | 14351 | |
u | 11214 | |
B | 10807 | |
a | 8031 | 5.2% |
o | 5990 | 3.9% |
n | 5043 | 3.3% |
Other values (12) | 21708 |
Common
Value | Count | Frequency (%) |
3542 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 157460 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
d | 22568 | |
i | 18705 | |
m | 18698 | |
s | 16803 | |
h | 14351 | |
u | 11214 | |
B | 10807 | |
a | 8031 | 5.1% |
o | 5990 | 3.8% |
n | 5043 | 3.2% |
Other values (13) | 25250 |
marital_status
Categorical
Distinct | 9 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 127.2 KiB |
Currently married (registered) | |
---|---|
Never married | |
Currently married (customary) | |
Widowed | |
Other | 138 |
Other values (4) | 181 |
Length
Max length | 33 |
---|---|
Median length | 30 |
Mean length | 21.736755 |
Min length | 5 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Currently married (registered) |
---|---|
2nd row | Currently married (registered) |
3rd row | Currently married (registered) |
4th row | Currently married (registered) |
5th row | Currently married (registered) |
Common Values
Value | Count | Frequency (%) |
Currently married (registered) | 7417 | |
Never married | 6330 | |
Currently married (customary) | 1311 | 8.1% |
Widowed | 893 | 5.5% |
Other | 138 | 0.8% |
Separated (not legally) | 64 | 0.4% |
Not married but lives as a Family | 52 | 0.3% |
Divorced | 44 | 0.3% |
Legally separated | 21 | 0.1% |