222.ORGANIZATIONAL DATA MANAGEMENT AND SEGMENTATION IN SAS USING PROC PRINT | PROC SQL | PROC MEANS | PROC FREQ | FOR DATA CLEANING VARIABLE ENRICHMENT CONDITIONAL UPDATES | DATA MERGING AND MISSING VALUE ANALYSIS
ORGANIZATIONAL DATA MANAGEMENT AND SEGMENTATION IN SAS USING PROC PRINT | PROC SQL | PROC MEANS | PROC FREQ | FOR DATA CLEANING VARIABLE ENRICHMENT CONDITIONAL UPDATES | DATA MERGING AND MISSING VALUE ANALYSIS
STEP 1 : CREATING THE DATASET
DATA NEWDATA;
INPUT ID NAME:&$:15. AGE CLASSTYPE$ SCHOOLTYPE:$15. OFFICETYPE:$15. DEPTTYPE:$18.;
DATALINES;
1 Priya Sharma 34 Class_A Public Regional Science
2 Rohan Verma 41 Class_B Private Central Mathematics
3 Meera Pillai 29 Class_C Public HeadOffice Literature
4 Arjun Mehta 38 Class_A Private Zonal Physics
5 Sneha Reddy 31 Class_D Public Regional Computer_Science
6 Kabir Iqbal 45 Class_B Government Central History
7 Isha Rao 36 Class_C Private HeadOffice Biology
8 Nikhil Thomas 40 Class_A Public Zonal Economics
9 Ananya Das 28 Class_D Government Regional Sociology
10 Vinay Kulkarni 33 Class_B Private Central Commerce
;
RUN;
PROC PRINT;RUN;
OUTPUT:
Obs | ID | NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE |
---|---|---|---|---|---|---|---|
1 | 1 | Priya Sharma | 34 | Class_A | Public | Regional | Science |
2 | 2 | Rohan Verma | 41 | Class_B | Private | Central | Mathematics |
3 | 3 | Meera Pillai | 29 | Class_C | Public | HeadOffice | Literature |
4 | 4 | Arjun Mehta | 38 | Class_A | Private | Zonal | Physics |
5 | 5 | Sneha Reddy | 31 | Class_D | Public | Regional | Computer_Science |
6 | 6 | Kabir Iqbal | 45 | Class_B | Government | Central | History |
7 | 7 | Isha Rao | 36 | Class_C | Private | HeadOffice | Biology |
8 | 8 | Nikhil Thomas | 40 | Class_A | Public | Zonal | Economics |
9 | 9 | Ananya Das | 28 | Class_D | Government | Regional | Sociology |
10 | 10 | Vinay Kulkarni | 33 | Class_B | Private | Central | Commerce |
STEP 2 : USING ASSINGMENT STATEMENT CREATINE VARIABLES
DATA NEW01;
RETAIN ID FIRST_NAME LAST_NAME AGE CLASSTYPE SCHOOLTYPE OFFICETYPE DEPTTYPE;
SET NEWDATA;
FIRST_NAME = SCAN(NAME,1);
LAST_NAME = SCAN(NAME,2);
DROP NAME;
AGE = AGE+1;
RUN;
PROC PRINT;
RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE |
---|---|---|---|---|---|---|---|---|
1 | 1 | Priya | Sharma | 35 | Class_A | Public | Regional | Science |
2 | 2 | Rohan | Verma | 42 | Class_B | Private | Central | Mathematics |
3 | 3 | Meera | Pillai | 30 | Class_C | Public | HeadOffice | Literature |
4 | 4 | Arjun | Mehta | 39 | Class_A | Private | Zonal | Physics |
5 | 5 | Sneha | Reddy | 32 | Class_D | Public | Regional | Computer_Science |
6 | 6 | Kabir | Iqbal | 46 | Class_B | Government | Central | History |
7 | 7 | Isha | Rao | 37 | Class_C | Private | HeadOffice | Biology |
8 | 8 | Nikhil | Thomas | 41 | Class_A | Public | Zonal | Economics |
9 | 9 | Ananya | Das | 29 | Class_D | Government | Regional | Sociology |
10 | 10 | Vinay | Kulkarni | 34 | Class_B | Private | Central | Commerce |
STEP 3 :CREATING 3 DIFFERENT DATASETS FROM ONE DATASET
DATA PRIVATE
PUBLIC
GOVT;
SET NEW01;
IF SCHOOLTYPE = "Private" THEN OUTPUT PRIVATE;
ELSE IF SCHOOLTYPE = "Public" THEN OUTPUT PUBLIC;
ELSE OUTPUT GOVT;
RUN;
PROC PRINT DATA=PRIVATE;
RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE |
---|---|---|---|---|---|---|---|---|
1 | 2 | Rohan | Verma | 42 | Class_B | Private | Central | Mathematics |
2 | 4 | Arjun | Mehta | 39 | Class_A | Private | Zonal | Physics |
3 | 7 | Isha | Rao | 37 | Class_C | Private | HeadOffice | Biology |
4 | 10 | Vinay | Kulkarni | 34 | Class_B | Private | Central | Commerce |
PROC PRINT DATA=PUBLIC;
RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE |
---|---|---|---|---|---|---|---|---|
1 | 1 | Priya | Sharma | 35 | Class_A | Public | Regional | Science |
2 | 3 | Meera | Pillai | 30 | Class_C | Public | HeadOffice | Literature |
3 | 5 | Sneha | Reddy | 32 | Class_D | Public | Regional | Computer_Science |
4 | 8 | Nikhil | Thomas | 41 | Class_A | Public | Zonal | Economics |
PROC PRINT DATA=GOVT;
RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE |
---|---|---|---|---|---|---|---|---|
1 | 6 | Kabir | Iqbal | 46 | Class_B | Government | Central | History |
2 | 9 | Ananya | Das | 29 | Class_D | Government | Regional | Sociology |
PROC SQL;
ALTER TABLE PRIVATE ADD FEES NUM FORMAT = COMMA12.;
QUIT;
PROC PRINT;RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE | FEES |
---|---|---|---|---|---|---|---|---|---|
1 | 2 | Rohan | Verma | 42 | Class_B | Private | Central | Mathematics | . |
2 | 4 | Arjun | Mehta | 39 | Class_A | Private | Zonal | Physics | . |
3 | 7 | Isha | Rao | 37 | Class_C | Private | HeadOffice | Biology | . |
4 | 10 | Vinay | Kulkarni | 34 | Class_B | Private | Central | Commerce | . |
PROC SQL;
ALTER TABLE PUBLIC ADD EMAIL CHAR (30);
RUN;
PROC PRINT;RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE | |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | Priya | Sharma | 35 | Class_A | Public | Regional | Science | |
2 | 3 | Meera | Pillai | 30 | Class_C | Public | HeadOffice | Literature | |
3 | 5 | Sneha | Reddy | 32 | Class_D | Public | Regional | Computer_Science | |
4 | 8 | Nikhil | Thomas | 41 | Class_A | Public | Zonal | Economics |
PROC SQL;
ALTER TABLE GOVT ADD STATUS CHAR(15);
QUIT;
PROC PRINT;RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE | STATUS |
---|---|---|---|---|---|---|---|---|---|
1 | 6 | Kabir | Iqbal | 46 | Class_B | Government | Central | History | |
2 | 9 | Ananya | Das | 29 | Class_D | Government | Regional | Sociology |
STEP 5 : UPDATING THE ALTERED TABLE
PROC SQL;
UPDATE PRIVATE
SET FEES = CASE
WHEN ID = 2 THEN 300000
WHEN ID = 4 THEN 500000
WHEN ID = 7 THEN 320000
WHEN ID = 10 THEN 400000
ELSE FEES
END;
QUIT;
PROC PRINT;RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE | FEES |
---|---|---|---|---|---|---|---|---|---|
1 | 2 | Rohan | Verma | 42 | Class_B | Private | Central | Mathematics | 300,000 |
2 | 4 | Arjun | Mehta | 39 | Class_A | Private | Zonal | Physics | 500,000 |
3 | 7 | Isha | Rao | 37 | Class_C | Private | HeadOffice | Biology | 320,000 |
4 | 10 | Vinay | Kulkarni | 34 | Class_B | Private | Central | Commerce | 400,000 |
PROC SQL;
UPDATE PUBLIC
SET EMAIL = CASE
WHEN ID = 1 THEN "Priya.Sharma@gmail.com"
WHEN ID = 3 THEN "Meera.Pillai@gmail.com"
WHEN ID = 5 THEN "Sneha.Reddy@gmail.com"
WHEN ID = 8 THEN "Nikhil.THomas@gmail.com"
ELSE EMAIL
END;
QUIT;
PROC PRINT;RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE | |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | Priya | Sharma | 35 | Class_A | Public | Regional | Science | Priya.Sharma@gmail.com |
2 | 3 | Meera | Pillai | 30 | Class_C | Public | HeadOffice | Literature | Meera.Pillai@gmail.com |
3 | 5 | Sneha | Reddy | 32 | Class_D | Public | Regional | Computer_Science | Sneha.Reddy@gmail.com |
4 | 8 | Nikhil | Thomas | 41 | Class_A | Public | Zonal | Economics | Nikhil.THomas@gmail.com |
UPDATE GOVT
SET STATUS = CASE
WHEN ID = 6 THEN "ACTIVE"
WHEN ID = 9 THEN ""
ELSE STATUS
END;
QUIT;
PROC PRINT;RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE | STATUS |
---|---|---|---|---|---|---|---|---|---|
1 | 6 | Kabir | Iqbal | 46 | Class_B | Government | Central | History | ACTIVE |
2 | 9 | Ananya | Das | 29 | Class_D | Government | Regional | Sociology |
DATA JOIN;
MERGE PRIVATE
PUBLIC
GOVT;
BY ID;
RUN;
PROC PRINT;RUN;
OUTPUT:
Obs | ID | FIRST_NAME | LAST_NAME | AGE | CLASSTYPE | SCHOOLTYPE | OFFICETYPE | DEPTTYPE | FEES | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | Priya | Sharma | 35 | Class_A | Public | Regional | Science | . | Priya.Sharma@gmail.com | |
2 | 2 | Rohan | Verma | 42 | Class_B | Private | Central | Mathematics | 300,000 | ||
3 | 3 | Meera | Pillai | 30 | Class_C | Public | HeadOffice | Literature | . | Meera.Pillai@gmail.com | |
4 | 4 | Arjun | Mehta | 39 | Class_A | Private | Zonal | Physics | 500,000 | ||
5 | 5 | Sneha | Reddy | 32 | Class_D | Public | Regional | Computer_Science | . | Sneha.Reddy@gmail.com | |
6 | 6 | Kabir | Iqbal | 46 | Class_B | Government | Central | History | . | ACTIVE | |
7 | 7 | Isha | Rao | 37 | Class_C | Private | HeadOffice | Biology | 320,000 | ||
8 | 8 | Nikhil | Thomas | 41 | Class_A | Public | Zonal | Economics | . | Nikhil.THomas@gmail.com | |
9 | 9 | Ananya | Das | 29 | Class_D | Government | Regional | Sociology | . | ||
10 | 10 | Vinay | Kulkarni | 34 | Class_B | Private | Central | Commerce | 400,000 |
OUTPUT:
Variable | N Miss | ||||||
---|---|---|---|---|---|---|---|
|
|
OUTPUT:
Frequency | Percent | Cumulative Frequency |
Cumulative Percent | |
---|---|---|---|---|
6 | 60.00 | 6 | 60.00 | |
Meera.Pillai@gmail.com | 1 | 10.00 | 7 | 70.00 |
Nikhil.THomas@gmail.com | 1 | 10.00 | 8 | 80.00 |
Priya.Sharma@gmail.com | 1 | 10.00 | 9 | 90.00 |
Sneha.Reddy@gmail.com | 1 | 10.00 | 10 | 100.00 |
OUTPUT:
STATUS | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
9 | 90.00 | 9 | 90.00 | |
ACTIVE | 1 | 10.00 | 10 | 100.00 |
Comments
Post a Comment