246.IN-DEPTH DEMOGRAPHIC AND SOCIOECONOMIC ANALYSIS OF 100+ LIVE INDIAN CITIZENS USING SAS PROCEDURES: PROC PRINT | PROC SORT | PROC FREQ | PROC MEANS | PROC SQL | ADVANCED MACROS FOR REAL-WORLD DATA INSIGHTS
- Get link
- X
- Other Apps
IN-DEPTH DEMOGRAPHIC AND SOCIOECONOMIC ANALYSIS OF 100+ LIVE INDIAN CITIZENS USING SAS PROCEDURES: PROC PRINT | PROC SORT | PROC FREQ | PROC MEANS | PROC SQL | ADVANCED MACROS FOR REAL-WORLD DATA INSIGHTS
/* Creating and analyzing data of live persons from India */
Step 1: Dataset Creation
options nocenter;
data Indian_Live_People;
length Name $15 Gender $6 City $15 State $15 Region $12 Occupation $15 MaritalStatus $10 Education $20 Language $10;
do PersonID = 1 to 20;
Name = catt('Person', put(PersonID, 3.));
Age = 18 + int(ranuni(0)*50); /* Ages 18 to 68 */
Gender = scan('Male Female Other', ceil(ranuni(0)*3));
City = scan('Hyderabad Chennai Mumbai Delhi Kolkata Bangalore Jaipur Lucknow Patna Bhopal', ceil(ranuni(0)*10));
State = scan('Telangana TamilNadu Maharashtra Delhi WestBengal Karnataka Rajasthan UP Bihar MP', ceil(ranuni(0)*10));
Region = scan('South South West North East South West North East Central', ceil(ranuni(0)*10));
Occupation = scan('Engineer Doctor Farmer Student Artist Driver Teacher Lawyer Accountant Police', ceil(ranuni(0)*10));
Income = 8000 + int(ranuni(0)*92000); /* Range: 8000 to 100000 */
MaritalStatus = scan('Single Married Divorced Widowed', ceil(ranuni(0)*4));
Education = scan('10th 12th BSc MSc BTech MTech MBBS MBA PhD Diploma', ceil(ranuni(0)*10));
Language = scan('Hindi Telugu Tamil Bengali Kannada Malayalam Marathi Urdu Punjabi Assamese', ceil(ranuni(0)*10));
output;
end;
run;
proc print;run;
Output:
Obs | Name | Gender | City | State | Region | Occupation | MaritalStatus | Education | Language | PersonID | Age | Income |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Person 1 | Male | Patna | Telangana | East | Police | Married | MTech | Bengali | 1 | 28 | 38265 |
2 | Person 2 | Male | Hyderabad | MP | West | Lawyer | Divorced | 10th | Tamil | 2 | 20 | 72950 |
3 | Person 3 | Other | Bangalore | MP | South | Artist | Married | BTech | Malayalam | 3 | 35 | 20778 |
4 | Person 4 | Female | Chennai | UP | East | Student | Married | MTech | Malayalam | 4 | 22 | 77156 |
5 | Person 5 | Other | Patna | Bihar | West | Accountant | Single | 12th | Telugu | 5 | 63 | 46538 |
6 | Person 6 | Female | Mumbai | Rajasthan | South | Artist | Single | MSc | Bengali | 6 | 43 | 17816 |
7 | Person 7 | Other | Patna | UP | East | Driver | Single | 12th | Bengali | 7 | 43 | 56047 |
8 | Person 8 | Male | Lucknow | Delhi | East | Police | Widowed | BSc | Telugu | 8 | 49 | 42725 |
9 | Person 9 | Other | Lucknow | Bihar | West | Driver | Married | PhD | Assamese | 9 | 50 | 21427 |
10 | Person 10 | Male | Lucknow | TamilNadu | South | Teacher | Divorced | Diploma | Punjabi | 10 | 48 | 57333 |
11 | Person 11 | Male | Delhi | Maharashtra | West | Engineer | Divorced | 12th | Kannada | 11 | 44 | 18315 |
12 | Person 12 | Male | Bhopal | Telangana | Central | Artist | Widowed | MSc | Malayalam | 12 | 61 | 30434 |
13 | Person 13 | Male | Kolkata | WestBengal | Central | Police | Divorced | MBA | Bengali | 13 | 33 | 18839 |
14 | Person 14 | Female | Hyderabad | Telangana | Central | Doctor | Divorced | BSc | Telugu | 14 | 56 | 80211 |
15 | Person 15 | Other | Patna | Delhi | South | Engineer | Widowed | 12th | Urdu | 15 | 66 | 83725 |
16 | Person 16 | Male | Patna | Karnataka | West | Farmer | Divorced | BTech | Kannada | 16 | 53 | 66285 |
17 | Person 17 | Other | Jaipur | Delhi | East | Farmer | Divorced | MTech | Malayalam | 17 | 67 | 29173 |
18 | Person 18 | Female | Chennai | Delhi | East | Farmer | Widowed | 10th | Telugu | 18 | 65 | 29256 |
19 | Person 19 | Female | Hyderabad | MP | South | Police | Married | Diploma | Malayalam | 19 | 63 | 14395 |
20 | Person 20 | Male | Kolkata | Bihar | North | Farmer | Single | MSc | Hindi | 20 | 67 | 31117 |
Step 2: View the Raw Dataset
proc print data=Indian_Live_People (obs=25);
title "Sample of Live Persons Dataset in India";
run;
Output:
Sample of Live Persons Dataset in India
Obs | Name | Gender | City | State | Region | Occupation | MaritalStatus | Education | Language | PersonID | Age | Income |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Person 1 | Male | Patna | Telangana | East | Police | Married | MTech | Bengali | 1 | 28 | 38265 |
2 | Person 2 | Male | Hyderabad | MP | West | Lawyer | Divorced | 10th | Tamil | 2 | 20 | 72950 |
3 | Person 3 | Other | Bangalore | MP | South | Artist | Married | BTech | Malayalam | 3 | 35 | 20778 |
4 | Person 4 | Female | Chennai | UP | East | Student | Married | MTech | Malayalam | 4 | 22 | 77156 |
5 | Person 5 | Other | Patna | Bihar | West | Accountant | Single | 12th | Telugu | 5 | 63 | 46538 |
6 | Person 6 | Female | Mumbai | Rajasthan | South | Artist | Single | MSc | Bengali | 6 | 43 | 17816 |
7 | Person 7 | Other | Patna | UP | East | Driver | Single | 12th | Bengali | 7 | 43 | 56047 |
8 | Person 8 | Male | Lucknow | Delhi | East | Police | Widowed | BSc | Telugu | 8 | 49 | 42725 |
9 | Person 9 | Other | Lucknow | Bihar | West | Driver | Married | PhD | Assamese | 9 | 50 | 21427 |
10 | Person 10 | Male | Lucknow | TamilNadu | South | Teacher | Divorced | Diploma | Punjabi | 10 | 48 | 57333 |
11 | Person 11 | Male | Delhi | Maharashtra | West | Engineer | Divorced | 12th | Kannada | 11 | 44 | 18315 |
12 | Person 12 | Male | Bhopal | Telangana | Central | Artist | Widowed | MSc | Malayalam | 12 | 61 | 30434 |
13 | Person 13 | Male | Kolkata | WestBengal | Central | Police | Divorced | MBA | Bengali | 13 | 33 | 18839 |
14 | Person 14 | Female | Hyderabad | Telangana | Central | Doctor | Divorced | BSc | Telugu | 14 | 56 | 80211 |
15 | Person 15 | Other | Patna | Delhi | South | Engineer | Widowed | 12th | Urdu | 15 | 66 | 83725 |
16 | Person 16 | Male | Patna | Karnataka | West | Farmer | Divorced | BTech | Kannada | 16 | 53 | 66285 |
17 | Person 17 | Other | Jaipur | Delhi | East | Farmer | Divorced | MTech | Malayalam | 17 | 67 | 29173 |
18 | Person 18 | Female | Chennai | Delhi | East | Farmer | Widowed | 10th | Telugu | 18 | 65 | 29256 |
19 | Person 19 | Female | Hyderabad | MP | South | Police | Married | Diploma | Malayalam | 19 | 63 | 14395 |
20 | Person 20 | Male | Kolkata | Bihar | North | Farmer | Single | MSc | Hindi | 20 | 67 | 31117 |
Step 3: Frequency Analysis
proc freq data=Indian_Live_People;
tables Gender Region Occupation MaritalStatus / nocum nopercent;
title "Frequency Distribution of Categorical Variables";
run;
Output:
Frequency Distribution of Categorical Variables
The FREQ Procedure
Gender | Frequency |
---|---|
Female | 5 |
Male | 9 |
Other | 6 |
Region | Frequency |
---|---|
Central | 3 |
East | 6 |
North | 1 |
South | 5 |
West | 5 |
Occupation | Frequency |
---|---|
Accountant | 1 |
Artist | 3 |
Doctor | 1 |
Driver | 2 |
Engineer | 2 |
Farmer | 4 |
Lawyer | 1 |
Police | 4 |
Student | 1 |
Teacher | 1 |
MaritalStatus | Frequency |
---|---|
Divorced | 7 |
Married | 5 |
Single | 4 |
Widowed | 4 |
Step 4: Summary Statistics Using PROC MEANS
proc means data=Indian_Live_People n mean min max std;
var Age Income;
title "Summary Statistics for Age and Income";
run;
Output:
Summary Statistics for Age and Income
The MEANS Procedure
Variable | N | Mean | Minimum | Maximum | Std Dev |
---|---|---|---|---|---|
Age Income | 20 20 | 48.8000000 42639.25 | 20.0000000 14395.00 | 67.0000000 83725.00 | 15.1400480 23329.52 |
Step 5: Sorting and Filtering Using PROC SORT
proc sort data=Indian_Live_People out=SortedPeople;
by descending Income;
run;
proc print data=SortedPeople (obs=10);
title "Top 10 Earners in the Dataset";
run;
Output:
Top 10 Earners in the Dataset
Obs | Name | Gender | City | State | Region | Occupation | MaritalStatus | Education | Language | PersonID | Age | Income |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Person 15 | Other | Patna | Delhi | South | Engineer | Widowed | 12th | Urdu | 15 | 66 | 83725 |
2 | Person 14 | Female | Hyderabad | Telangana | Central | Doctor | Divorced | BSc | Telugu | 14 | 56 | 80211 |
3 | Person 4 | Female | Chennai | UP | East | Student | Married | MTech | Malayalam | 4 | 22 | 77156 |
4 | Person 2 | Male | Hyderabad | MP | West | Lawyer | Divorced | 10th | Tamil | 2 | 20 | 72950 |
5 | Person 16 | Male | Patna | Karnataka | West | Farmer | Divorced | BTech | Kannada | 16 | 53 | 66285 |
6 | Person 10 | Male | Lucknow | TamilNadu | South | Teacher | Divorced | Diploma | Punjabi | 10 | 48 | 57333 |
7 | Person 7 | Other | Patna | UP | East | Driver | Single | 12th | Bengali | 7 | 43 | 56047 |
8 | Person 5 | Other | Patna | Bihar | West | Accountant | Single | 12th | Telugu | 5 | 63 | 46538 |
9 | Person 8 | Male | Lucknow | Delhi | East | Police | Widowed | BSc | Telugu | 8 | 49 | 42725 |
10 | Person 1 | Male | Patna | Telangana | East | Police | Married | MTech | Bengali | 1 | 28 | 38265 |
Step 6: Data Selection Using PROC SQL
proc sql;
select Name, Age, Income, Occupation, State
from Indian_Live_People
where Income > 50000 and Age between 25 and 45
order by Income desc;
quit;
Output:
Name | Age | Income | Occupation | State |
---|---|---|---|---|
Person 7 | 43 | 56047 | Driver | UP |
Step 7: Macro for Regional Summary
%macro RegionStats(region_name);
title "Region-wise Summary for ®ion_name Region";
proc means data=Indian_Live_People noprint;
where Region = "®ion_name";
var Age Income;
output out=RegionSummary_®ion_name mean=AvgAge AvgIncome;
run;
proc print data=RegionSummary_®ion_name;
run;
%mend RegionStats;
%RegionStats(South)
Output:
Region-wise Summary for South Region
Obs | _TYPE_ | _FREQ_ | AvgAge | AvgIncome |
---|---|---|---|---|
1 | 0 | 5 | 51 | 38809.4 |
%RegionStats(North)
Output:
Region-wise Summary for North Region
Obs | _TYPE_ | _FREQ_ | AvgAge | AvgIncome |
---|---|---|---|---|
1 | 0 | 1 | 67 | 31117 |
%RegionStats(West)
Output:
Region-wise Summary for West Region
Obs | _TYPE_ | _FREQ_ | AvgAge | AvgIncome |
---|---|---|---|---|
1 | 0 | 5 | 46 | 45103 |
Step 8: Group Analysis Using PROC SQL
proc sql;
select Occupation, Gender, count(*) as Count, avg(Income) as Avg_Income
from Indian_Live_People
group by Occupation, Gender
order by Avg_Income desc;
quit;
Output:
Occupation | Gender | Count | Avg_Income |
---|---|---|---|
Engineer | Other | 1 | 83725 |
Doctor | Female | 1 | 80211 |
Student | Female | 1 | 77156 |
Lawyer | Male | 1 | 72950 |
Teacher | Male | 1 | 57333 |
Farmer | Male | 2 | 48701 |
Accountant | Other | 1 | 46538 |
Driver | Other | 2 | 38737 |
Police | Male | 3 | 33276.33 |
Artist | Male | 1 | 30434 |
Farmer | Female | 1 | 29256 |
Farmer | Other | 1 | 29173 |
Artist | Other | 1 | 20778 |
Engineer | Male | 1 | 18315 |
Artist | Female | 1 | 17816 |
Police | Female | 1 | 14395 |
Step 9: Income Distribution Buckets
data Buckets;
set Indian_Live_People;
length IncomeBracket $15;
if Income < 20000 then IncomeBracket = 'Low';
else if Income < 50000 then IncomeBracket = 'Middle';
else if Income < 80000 then IncomeBracket = 'High';
else IncomeBracket = 'Very High';
run;
proc print;run;
Output:
Obs | Name | Gender | City | State | Region | Occupation | MaritalStatus | Education | Language | PersonID | Age | Income | IncomeBracket |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Person 1 | Male | Patna | Telangana | East | Police | Married | MTech | Bengali | 1 | 28 | 38265 | Middle |
2 | Person 2 | Male | Hyderabad | MP | West | Lawyer | Divorced | 10th | Tamil | 2 | 20 | 72950 | High |
3 | Person 3 | Other | Bangalore | MP | South | Artist | Married | BTech | Malayalam | 3 | 35 | 20778 | Middle |
4 | Person 4 | Female | Chennai | UP | East | Student | Married | MTech | Malayalam | 4 | 22 | 77156 | High |
5 | Person 5 | Other | Patna | Bihar | West | Accountant | Single | 12th | Telugu | 5 | 63 | 46538 | Middle |
6 | Person 6 | Female | Mumbai | Rajasthan | South | Artist | Single | MSc | Bengali | 6 | 43 | 17816 | Low |
7 | Person 7 | Other | Patna | UP | East | Driver | Single | 12th | Bengali | 7 | 43 | 56047 | High |
8 | Person 8 | Male | Lucknow | Delhi | East | Police | Widowed | BSc | Telugu | 8 | 49 | 42725 | Middle |
9 | Person 9 | Other | Lucknow | Bihar | West | Driver | Married | PhD | Assamese | 9 | 50 | 21427 | Middle |
10 | Person 10 | Male | Lucknow | TamilNadu | South | Teacher | Divorced | Diploma | Punjabi | 10 | 48 | 57333 | High |
11 | Person 11 | Male | Delhi | Maharashtra | West | Engineer | Divorced | 12th | Kannada | 11 | 44 | 18315 | Low |
12 | Person 12 | Male | Bhopal | Telangana | Central | Artist | Widowed | MSc | Malayalam | 12 | 61 | 30434 | Middle |
13 | Person 13 | Male | Kolkata | WestBengal | Central | Police | Divorced | MBA | Bengali | 13 | 33 | 18839 | Low |
14 | Person 14 | Female | Hyderabad | Telangana | Central | Doctor | Divorced | BSc | Telugu | 14 | 56 | 80211 | Very High |
15 | Person 15 | Other | Patna | Delhi | South | Engineer | Widowed | 12th | Urdu | 15 | 66 | 83725 | Very High |
16 | Person 16 | Male | Patna | Karnataka | West | Farmer | Divorced | BTech | Kannada | 16 | 53 | 66285 | High |
17 | Person 17 | Other | Jaipur | Delhi | East | Farmer | Divorced | MTech | Malayalam | 17 | 67 | 29173 | Middle |
18 | Person 18 | Female | Chennai | Delhi | East | Farmer | Widowed | 10th | Telugu | 18 | 65 | 29256 | Middle |
19 | Person 19 | Female | Hyderabad | MP | South | Police | Married | Diploma | Malayalam | 19 | 63 | 14395 | Low |
20 | Person 20 | Male | Kolkata | Bihar | North | Farmer | Single | MSc | Hindi | 20 | 67 | 31117 | Middle |
proc freq data=Buckets;
tables IncomeBracket;
title "Income Bracket Distribution";
run;
Output:
Income Bracket Distribution
The FREQ Procedure
IncomeBracket | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
High | 5 | 25.00 | 5 | 25.00 |
Low | 4 | 20.00 | 9 | 45.00 |
Middle | 9 | 45.00 | 18 | 90.00 |
Very High | 2 | 10.00 | 20 | 100.00 |
Step 10: Educational Background vs. Income
proc sql;
select Education, count(*) as People, avg(Income) as AverageIncome
from Indian_Live_People
group by Education
order by AverageIncome desc;
quit;
Output:
Education | People | AverageIncome |
---|---|---|
BSc | 2 | 61468 |
12th | 4 | 51156.25 |
10th | 2 | 51103 |
MTech | 3 | 48198 |
BTech | 2 | 43531.5 |
Diploma | 2 | 35864 |
MSc | 3 | 26455.67 |
PhD | 1 | 21427 |
MBA | 1 | 18839 |
Step 11: Custom Macro for Filtering
%macro FilterPeople(min_income=30000, min_age=25);
proc sql;
select Name, Occupation, Age, Income
from Indian_Live_People
where Income >= &min_income and Age >= &min_age
order by Income desc;
quit;
%mend;
%FilterPeople(min_income=50000, min_age=30)
Output:
Name | Occupation | Age | Income |
---|---|---|---|
Person 15 | Engineer | 66 | 83725 |
Person 14 | Doctor | 56 | 80211 |
Person 16 | Farmer | 53 | 66285 |
Person 10 | Teacher | 48 | 57333 |
Person 7 | Driver | 43 | 56047 |
Step 12: Frequency by City Using PROC FREQ
proc freq data=Indian_Live_People;
tables City;
title "City-wise Count of Individuals";
run;
City-wise Count of Individuals
The FREQ Procedure
City | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
Bangalore | 1 | 5.00 | 1 | 5.00 |
Bhopal | 1 | 5.00 | 2 | 10.00 |
Chennai | 2 | 10.00 | 4 | 20.00 |
Delhi | 1 | 5.00 | 5 | 25.00 |
Hyderabad | 3 | 15.00 | 8 | 40.00 |
Jaipur | 1 | 5.00 | 9 | 45.00 |
Kolkata | 2 | 10.00 | 11 | 55.00 |
Lucknow | 3 | 15.00 | 14 | 70.00 |
Mumbai | 1 | 5.00 | 15 | 75.00 |
Patna | 5 | 25.00 | 20 | 100.00 |
- Get link
- X
- Other Apps
Comments
Post a Comment