COMPREHENSIVE SAS TUTORIAL ON NAMKEEN DATA CREATION AND ANALYSIS USING DATA AND PUT, MERGE,PROC SQL, PROC COMPARE, PROC REPORT, PROC TABULATE, AND MACROS
option nocenter;
1: Creating the Base Datasets
/*Macro 1: Data Simulation*/
%macro namkeen_simulate;
/* === NORTH REGION NAMKEEN DATA === */
filename north temp;
data _null_;
file north;
put "101 Haldiram Bhujia 45 200 4.7";
put "102 Bikaji AlooBhujia 40 180 4.5";
put "103 Bikanervala Navratan 55 250 4.6";
put "104 PepsiCo LaysMasala 20 100 4.2";
put "105 Balaji RatlamiSev 35 150 4.4";
put "106 Pratap Gathiya 30 200 4.1";
run;
data work.namkeen_north;
infile north;
input SnackID Brand :$20. Type :$20. Price Weight_g Rating;
run;
proc print data=work.namkeen_north noobs;
title "NAMKEEN PRODUCTS - NORTH REGION";
run;
/* === SOUTH REGION NAMKEEN DATA === */
filename south temp;
data _null_;
file south;
put "107 Aachi Mixture 50 220 4.3";
put "108 GRB KaraBoondi 45 200 4.5";
put "109 MTR Chakli 60 250 4.6";
put "110 Maiyas Nippattu 55 230 4.2";
put "111 Lays ClassicSalt 20 100 4.1";
put "112 Haldiram MoongDal 40 150 4.7";
run;
data work.namkeen_south;
infile south;
input SnackID Brand :$20. Type :$20. Price Weight_g Rating;
run;
proc print data=work.namkeen_south noobs;
title "NAMKEEN PRODUCTS - SOUTH REGION";
run;
%mend namkeen_simulate;
%namkeen_simulate;
OUTPUT:
| SnackID | Brand | Type | Price | Weight_g | Rating |
|---|---|---|---|---|---|
| 101 | Haldiram | Bhujia | 45 | 200 | 4.7 |
| 102 | Bikaji | AlooBhujia | 40 | 180 | 4.5 |
| 103 | Bikanervala | Navratan | 55 | 250 | 4.6 |
| 104 | PepsiCo | LaysMasala | 20 | 100 | 4.2 |
| 105 | Balaji | RatlamiSev | 35 | 150 | 4.4 |
| 106 | Pratap | Gathiya | 30 | 200 | 4.1 |
| SnackID | Brand | Type | Price | Weight_g | Rating |
|---|---|---|---|---|---|
| 107 | Aachi | Mixture | 50 | 220 | 4.3 |
| 108 | GRB | KaraBoondi | 45 | 200 | 4.5 |
| 109 | MTR | Chakli | 60 | 250 | 4.6 |
| 110 | Maiyas | Nippattu | 55 | 230 | 4.2 |
| 111 | Lays | ClassicSalt | 20 | 100 | 4.1 |
| 112 | Haldiram | MoongDal | 40 | 150 | 4.7 |
2: Combining Datasets
/*2.1 Using SET Statement*/
data all_namkeen_set;
set namkeen_north namkeen_south;
run;
proc print data=all_namkeen_set;
title "NAMKEEN DATA COMBINED USING SET STATEMENT";
run;
OUTPUT:
| Obs | SnackID | Brand | Type | Price | Weight_g | Rating |
|---|---|---|---|---|---|---|
| 1 | 101 | Haldiram | Bhujia | 45 | 200 | 4.7 |
| 2 | 102 | Bikaji | AlooBhujia | 40 | 180 | 4.5 |
| 3 | 103 | Bikanervala | Navratan | 55 | 250 | 4.6 |
| 4 | 104 | PepsiCo | LaysMasala | 20 | 100 | 4.2 |
| 5 | 105 | Balaji | RatlamiSev | 35 | 150 | 4.4 |
| 6 | 106 | Pratap | Gathiya | 30 | 200 | 4.1 |
| 7 | 107 | Aachi | Mixture | 50 | 220 | 4.3 |
| 8 | 108 | GRB | KaraBoondi | 45 | 200 | 4.5 |
| 9 | 109 | MTR | Chakli | 60 | 250 | 4.6 |
| 10 | 110 | Maiyas | Nippattu | 55 | 230 | 4.2 |
| 11 | 111 | Lays | ClassicSalt | 20 | 100 | 4.1 |
| 12 | 112 | Haldiram | MoongDal | 40 | 150 | 4.7 |
/*2.2 Using MERGE Statement*/
Before merging, we’ll create two supporting datasets:
namkeen_price → contains price & weight
namkeen_rating → contains rating
data namkeen_price;
set all_namkeen_set(keep=SnackID Brand Type Price Weight_g);
run;
proc print data=namkeen_price;
run;
OUTPUT:
| Obs | SnackID | Brand | Type | Price | Weight_g |
|---|---|---|---|---|---|
| 1 | 101 | Haldiram | Bhujia | 45 | 200 |
| 2 | 102 | Bikaji | AlooBhujia | 40 | 180 |
| 3 | 103 | Bikanervala | Navratan | 55 | 250 |
| 4 | 104 | PepsiCo | LaysMasala | 20 | 100 |
| 5 | 105 | Balaji | RatlamiSev | 35 | 150 |
| 6 | 106 | Pratap | Gathiya | 30 | 200 |
| 7 | 107 | Aachi | Mixture | 50 | 220 |
| 8 | 108 | GRB | KaraBoondi | 45 | 200 |
| 9 | 109 | MTR | Chakli | 60 | 250 |
| 10 | 110 | Maiyas | Nippattu | 55 | 230 |
| 11 | 111 | Lays | ClassicSalt | 20 | 100 |
| 12 | 112 | Haldiram | MoongDal | 40 | 150 |
data namkeen_rating;
set all_namkeen_set(keep=SnackID Rating);
run;
proc print data=namkeen_rating;
run;
OUTPUT:
| Obs | SnackID | Rating |
|---|---|---|
| 1 | 101 | 4.7 |
| 2 | 102 | 4.5 |
| 3 | 103 | 4.6 |
| 4 | 104 | 4.2 |
| 5 | 105 | 4.4 |
| 6 | 106 | 4.1 |
| 7 | 107 | 4.3 |
| 8 | 108 | 4.5 |
| 9 | 109 | 4.6 |
| 10 | 110 | 4.2 |
| 11 | 111 | 4.1 |
| 12 | 112 | 4.7 |
proc sort data=namkeen_price Out=namkeen_price_sorted;
by SnackID;
run;
proc print data=namkeen_price_sorted;
run;
OUTPUT:
| Obs | SnackID | Brand | Type | Price | Weight_g |
|---|---|---|---|---|---|
| 1 | 101 | Haldiram | Bhujia | 45 | 200 |
| 2 | 102 | Bikaji | AlooBhujia | 40 | 180 |
| 3 | 103 | Bikanervala | Navratan | 55 | 250 |
| 4 | 104 | PepsiCo | LaysMasala | 20 | 100 |
| 5 | 105 | Balaji | RatlamiSev | 35 | 150 |
| 6 | 106 | Pratap | Gathiya | 30 | 200 |
| 7 | 107 | Aachi | Mixture | 50 | 220 |
| 8 | 108 | GRB | KaraBoondi | 45 | 200 |
| 9 | 109 | MTR | Chakli | 60 | 250 |
| 10 | 110 | Maiyas | Nippattu | 55 | 230 |
| 11 | 111 | Lays | ClassicSalt | 20 | 100 |
| 12 | 112 | Haldiram | MoongDal | 40 | 150 |
proc sort data=namkeen_rating Out=namkeen_rating_sorted;
by SnackID;
run;
proc print data=namkeen_rating_sorted;
run;
OUTPUT:
| Obs | SnackID | Rating |
|---|---|---|
| 1 | 101 | 4.7 |
| 2 | 102 | 4.5 |
| 3 | 103 | 4.6 |
| 4 | 104 | 4.2 |
| 5 | 105 | 4.4 |
| 6 | 106 | 4.1 |
| 7 | 107 | 4.3 |
| 8 | 108 | 4.5 |
| 9 | 109 | 4.6 |
| 10 | 110 | 4.2 |
| 11 | 111 | 4.1 |
| 12 | 112 | 4.7 |
data merged_namkeen;
merge namkeen_price_sorted(in=a) namkeen_rating_sorted(in=b);
by SnackID;
if a and b;
run;
proc print data=merged_namkeen;
title "NAMKEEN DATA COMBINED USING MERGE STATEMENT";
run;
OUTPUT:
| Obs | SnackID | Brand | Type | Price | Weight_g | Rating |
|---|---|---|---|---|---|---|
| 1 | 101 | Haldiram | Bhujia | 45 | 200 | 4.7 |
| 2 | 102 | Bikaji | AlooBhujia | 40 | 180 | 4.5 |
| 3 | 103 | Bikanervala | Navratan | 55 | 250 | 4.6 |
| 4 | 104 | PepsiCo | LaysMasala | 20 | 100 | 4.2 |
| 5 | 105 | Balaji | RatlamiSev | 35 | 150 | 4.4 |
| 6 | 106 | Pratap | Gathiya | 30 | 200 | 4.1 |
| 7 | 107 | Aachi | Mixture | 50 | 220 | 4.3 |
| 8 | 108 | GRB | KaraBoondi | 45 | 200 | 4.5 |
| 9 | 109 | MTR | Chakli | 60 | 250 | 4.6 |
| 10 | 110 | Maiyas | Nippattu | 55 | 230 | 4.2 |
| 11 | 111 | Lays | ClassicSalt | 20 | 100 | 4.1 |
| 12 | 112 | Haldiram | MoongDal | 40 | 150 | 4.7 |
/*2.3 Using PROC SQL Join*/
proc sql;
create table namkeen_sql as
select a.SnackID, a.Brand, a.Type, a.Price, a.Weight_g, b.Rating
from namkeen_price a
inner join namkeen_rating b
on a.SnackID = b.SnackID;
quit;
proc print data=namkeen_sql;
title "NAMKEEN DATA COMBINED USING PROC SQL JOIN";
run;
OUTPUT:
| Obs | SnackID | Brand | Type | Price | Weight_g | Rating |
|---|---|---|---|---|---|---|
| 1 | 101 | Haldiram | Bhujia | 45 | 200 | 4.7 |
| 2 | 102 | Bikaji | AlooBhujia | 40 | 180 | 4.5 |
| 3 | 103 | Bikanervala | Navratan | 55 | 250 | 4.6 |
| 4 | 104 | PepsiCo | LaysMasala | 20 | 100 | 4.2 |
| 5 | 105 | Balaji | RatlamiSev | 35 | 150 | 4.4 |
| 6 | 106 | Pratap | Gathiya | 30 | 200 | 4.1 |
| 7 | 107 | Aachi | Mixture | 50 | 220 | 4.3 |
| 8 | 108 | GRB | KaraBoondi | 45 | 200 | 4.5 |
| 9 | 109 | MTR | Chakli | 60 | 250 | 4.6 |
| 10 | 110 | Maiyas | Nippattu | 55 | 230 | 4.2 |
| 11 | 111 | Lays | ClassicSalt | 20 | 100 | 4.1 |
| 12 | 112 | Haldiram | MoongDal | 40 | 150 | 4.7 |
3: QC Check – PROC COMPARE
proc compare base=merged_namkeen compare=namkeen_sql briefsummary;
title "QC CHECK: COMPARISON BETWEEN MERGE AND PROC SQL OUTPUTS";
run;
OUTPUT:
The COMPARE Procedure
Comparison of WORK.MERGED_NAMKEEN with WORK.NAMKEEN_SQL
(Method=EXACT)
NOTE: No unequal values were found. All values compared are exactly equal. 4: TLF-style Summary Tables
/*4.1 Listing: PROC REPORT*/
proc report data=all_namkeen_set nowd headline headskip;
columns SnackID Brand Type Price Weight_g Rating;
define SnackID / order "Snack ID";
define Brand / display "Brand Name";
define Type / display "Namkeen Type";
define Price / display "Price (₹)";
define Weight_g / display "Weight (g)";
define Rating / display "Customer Rating";
title "LISTING REPORT OF NAMKEEN PRODUCTS IN INDIA";
run;
OUTPUT:
| Snack ID | Brand Name | Namkeen Type | Price (₹) | Weight (g) | Customer Rating |
|---|---|---|---|---|---|
| 101 | Haldiram | Bhujia | 45 | 200 | 4.7 |
| 102 | Bikaji | AlooBhujia | 40 | 180 | 4.5 |
| 103 | Bikanervala | Navratan | 55 | 250 | 4.6 |
| 104 | PepsiCo | LaysMasala | 20 | 100 | 4.2 |
| 105 | Balaji | RatlamiSev | 35 | 150 | 4.4 |
| 106 | Pratap | Gathiya | 30 | 200 | 4.1 |
| 107 | Aachi | Mixture | 50 | 220 | 4.3 |
| 108 | GRB | KaraBoondi | 45 | 200 | 4.5 |
| 109 | MTR | Chakli | 60 | 250 | 4.6 |
| 110 | Maiyas | Nippattu | 55 | 230 | 4.2 |
| 111 | Lays | ClassicSalt | 20 | 100 | 4.1 |
| 112 | Haldiram | MoongDal | 40 | 150 | 4.7 |
/*4.2 Table: PROC TABULATE Summary*/
proc tabulate data=all_namkeen_set;
class Brand Type;
var Price Rating;
table Brand*Type,
(Price Rating)*(mean min max);
title "SUMMARY TABLE OF PRICE AND RATING BY BRAND AND TYPE";
run;
OUTPUT:
| Price | Rating | ||||||
|---|---|---|---|---|---|---|---|
| Mean | Min | Max | Mean | Min | Max | ||
| Brand | Type | 50.00 | 50.00 | 50.00 | 4.30 | 4.30 | 4.30 |
| Aachi | Mixture | ||||||
| Balaji | RatlamiSev | 35.00 | 35.00 | 35.00 | 4.40 | 4.40 | 4.40 |
| Bikaji | AlooBhujia | 40.00 | 40.00 | 40.00 | 4.50 | 4.50 | 4.50 |
| Bikanervala | Navratan | 55.00 | 55.00 | 55.00 | 4.60 | 4.60 | 4.60 |
| GRB | KaraBoondi | 45.00 | 45.00 | 45.00 | 4.50 | 4.50 | 4.50 |
| Haldiram | Bhujia | 45.00 | 45.00 | 45.00 | 4.70 | 4.70 | 4.70 |
| MoongDal | 40.00 | 40.00 | 40.00 | 4.70 | 4.70 | 4.70 | |
| Lays | ClassicSalt | 20.00 | 20.00 | 20.00 | 4.10 | 4.10 | 4.10 |
| MTR | Chakli | 60.00 | 60.00 | 60.00 | 4.60 | 4.60 | 4.60 |
| Maiyas | Nippattu | 55.00 | 55.00 | 55.00 | 4.20 | 4.20 | 4.20 |
| PepsiCo | LaysMasala | 20.00 | 20.00 | 20.00 | 4.20 | 4.20 | 4.20 |
| Pratap | Gathiya | 30.00 | 30.00 | 30.00 | 4.10 | 4.10 | 4.10 |
5: Automation with Reporting Macro
/*Macro 2: Summary Reporting*/
%macro namkeen_report(ds=);
%put Generating summary report for dataset: &ds;
proc means data=&ds n mean min max;
var Price Weight_g Rating;
class Brand;
title "PROC MEANS SUMMARY FOR &ds";
run;
proc freq data=&ds;
tables Type / nocum nopercent;
title "FREQUENCY OF NAMKEEN TYPES IN &ds";
run;
proc sgplot data=&ds;
vbar Brand / response=Price stat=mean datalabel;
title "AVERAGE PRICE BY BRAND - &ds";
run;
proc sgplot data=&ds;
vbar Type / response=Rating stat=mean datalabel;
title "AVERAGE CUSTOMER RATING BY NAMKEEN TYPE - &ds";
run;
%mend namkeen_report;
%namkeen_report(ds=all_namkeen_set);
OUTPUT:
The MEANS Procedure
| Brand | N Obs | Variable | N | Mean | Minimum | Maximum |
|---|---|---|---|---|---|---|
| Aachi | 1 | Price Weight_g Rating | 1 1 1 | 50.0000000 220.0000000 4.3000000 | 50.0000000 220.0000000 4.3000000 | 50.0000000 220.0000000 4.3000000 |
| Balaji | 1 | Price Weight_g Rating | 1 1 1 | 35.0000000 150.0000000 4.4000000 | 35.0000000 150.0000000 4.4000000 | 35.0000000 150.0000000 4.4000000 |
| Bikaji | 1 | Price Weight_g Rating | 1 1 1 | 40.0000000 180.0000000 4.5000000 | 40.0000000 180.0000000 4.5000000 | 40.0000000 180.0000000 4.5000000 |
| Bikanervala | 1 | Price Weight_g Rating | 1 1 1 | 55.0000000 250.0000000 4.6000000 | 55.0000000 250.0000000 4.6000000 | 55.0000000 250.0000000 4.6000000 |
| GRB | 1 | Price Weight_g Rating | 1 1 1 | 45.0000000 200.0000000 4.5000000 | 45.0000000 200.0000000 4.5000000 | 45.0000000 200.0000000 4.5000000 |
| Haldiram | 2 | Price Weight_g Rating | 2 2 2 | 42.5000000 175.0000000 4.7000000 | 40.0000000 150.0000000 4.7000000 | 45.0000000 200.0000000 4.7000000 |
| Lays | 1 | Price Weight_g Rating | 1 1 1 | 20.0000000 100.0000000 4.1000000 | 20.0000000 100.0000000 4.1000000 | 20.0000000 100.0000000 4.1000000 |
| MTR | 1 | Price Weight_g Rating | 1 1 1 | 60.0000000 250.0000000 4.6000000 | 60.0000000 250.0000000 4.6000000 | 60.0000000 250.0000000 4.6000000 |
| Maiyas | 1 | Price Weight_g Rating | 1 1 1 | 55.0000000 230.0000000 4.2000000 | 55.0000000 230.0000000 4.2000000 | 55.0000000 230.0000000 4.2000000 |
| PepsiCo | 1 | Price Weight_g Rating | 1 1 1 | 20.0000000 100.0000000 4.2000000 | 20.0000000 100.0000000 4.2000000 | 20.0000000 100.0000000 4.2000000 |
| Pratap | 1 | Price Weight_g Rating | 1 1 1 | 30.0000000 200.0000000 4.1000000 | 30.0000000 200.0000000 4.1000000 | 30.0000000 200.0000000 4.1000000 |
The FREQ Procedure
| Type | Frequency |
|---|---|
| AlooBhujia | 1 |
| Bhujia | 1 |
| Chakli | 1 |
| ClassicSalt | 1 |
| Gathiya | 1 |
| KaraBoondi | 1 |
| LaysMasala | 1 |
| Mixture | 1 |
| MoongDal | 1 |
| Navratan | 1 |
| Nippattu | 1 |
| RatlamiSev | 1 |
No comments:
Post a Comment