292.Which Breakfast Is The Healthiest And Most Popular? — Using Sas Data Analytics

Which Breakfast Is The Healthiest And Most Popular? — Using Sas Data Analytics


/*CREATING A DATASET OF REALWORLD BREAKFAST ITEMS SAS*/

1) Create dataset 

options nocenter;

data work.realbreakfast;

  length ItemID $6 ItemName $40 Cuisine $20 Type $8;

  input ItemID $ ItemName : $40. Cuisine : $20. Type : $8. Calories PriceINR 

        PrepTimeMin;

  datalines;

B001 Idli SouthIndian Veg 120 30 12

B002 Dosa SouthIndian Veg 220 45 15

B003 Poha Maharashtrian Veg 260 35 10

B004 Paratha NorthIndian Veg 350 50 20

B005 EggBhurji Indian NonVeg 200 40 12

B006 BreadJam Continental Veg 180 25 5

B007 Upma SouthIndian Veg 240 30 10

B008 AlooParatha NorthIndian Veg 420 55 22

B009 PuriSabji NorthIndian Veg 380 60 18

B010 Omelette Continental NonVeg 220 35 8

B011 MasalaDosa SouthIndian Veg 260 55 18

B012 OatsPorridge Continental Veg 150 40 7

;

run;

proc print;run;

Output:

ObsItemIDItemNameCuisineTypeCaloriesPriceINRPrepTimeMin
1B001IdliSouthIndianVeg1203012
2B002DosaSouthIndianVeg2204515
3B003PohaMaharashtrianVeg2603510
4B004ParathaNorthIndianVeg3505020
5B005EggBhurjiIndianNonVeg2004012
6B006BreadJamContinentalVeg180255
7B007UpmaSouthIndianVeg2403010
8B008AlooParathaNorthIndianVeg4205522
9B009PuriSabjiNorthIndianVeg3806018
10B010OmeletteContinentalNonVeg220358
11B011MasalaDosaSouthIndianVeg2605518
12B012OatsPorridgeContinentalVeg150407


 2) Add formatted price and calorie category 

purpose: Create human-friendly categories for calories. 

proc format;

  value calcat

    low - 199 = 'Low'

    200 - 299 = 'Moderate'

    300 - high = 'High';

run;

Log:

NOTE: Format CALCAT has been output.

purpose: create formatted price column for display

purpose: attach calorie category 

Here we already know that Put Function 

Purpose: Converts numeric values to character values.

data work.realbreakfast_fmt;

  set work.realbreakfast;

  PriceINR_f = put(PriceINR, dollar8.);  

  CalorieCat = put(Calories, calcat.);  

run;

proc print;run;

Output:

ObsItemIDItemNameCuisineTypeCaloriesPriceINRPrepTimeMinPriceINR_fCalorieCat
1B001IdliSouthIndianVeg1203012$30Low
2B002DosaSouthIndianVeg2204515$45Moderate
3B003PohaMaharashtrianVeg2603510$35Moderate
4B004ParathaNorthIndianVeg3505020$50High
5B005EggBhurjiIndianNonVeg2004012$40Moderate
6B006BreadJamContinentalVeg180255$25Low
7B007UpmaSouthIndianVeg2403010$30Moderate
8B008AlooParathaNorthIndianVeg4205522$55High
9B009PuriSabjiNorthIndianVeg3806018$60High
10B010OmeletteContinentalNonVeg220358$35Moderate
11B011MasalaDosaSouthIndianVeg2605518$55Moderate
12B012OatsPorridgeContinentalVeg150407$40Low


3) PROC PRINT: show dataset 

purpose: Display the dataset in readable tabular form. 

proc print data=work.realbreakfast_fmt(obs=8) noobs label;

  title "Realworld Breakfast Items - Raw View";

  var ItemID ItemName Cuisine Type Calories CalorieCat PriceINR_f PrepTimeMin;

  label PriceINR_f = "Price (INR)"

        CalorieCat = "Calorie Category";

run;

Output:

Realworld Breakfast Items - Raw View

ItemIDItemNameCuisineTypeCaloriesCalorie CategoryPrice (INR)PrepTimeMin
B001IdliSouthIndianVeg120Low$3012
B002DosaSouthIndianVeg220Moderate$4515
B003PohaMaharashtrianVeg260Moderate$3510
B004ParathaNorthIndianVeg350High$5020
B005EggBhurjiIndianNonVeg200Moderate$4012
B006BreadJamContinentalVeg180Low$255
B007UpmaSouthIndianVeg240Moderate$3010
B008AlooParathaNorthIndianVeg420High$5522

4) PROC MEANS: numeric summaries 

purpose: Compute central tendency and spread for numeric variables. 

proc means data=work.realbreakfast n mean std min max maxdec=1;

  var Calories PriceINR PrepTimeMin;

  title "Numeric Summary for Calories, Price and Prep Time";

run;

Output:

Numeric Summary for Calories, Price and Prep Time

The MEANS Procedure

VariableNMeanStd DevMinimumMaximum
Calories
PriceINR
PrepTimeMin
12
12
12
250.0
41.7
13.1
91.6
11.3
5.5
120.0
25.0
5.0
420.0
60.0
22.0

5) PROC FREQ: categorical distributions 

purpose: Show counts across categorical variables. 

proc freq data=work.realbreakfast_fmt;

  tables Cuisine Type CalorieCat / nocum nopercent;

  title "Frequency counts by Cuisine, Type and Calorie Category";

run;

Output:

Frequency counts by Cuisine, Type and Calorie Category

The FREQ Procedure

CuisineFrequency
Continental3
Indian1
Maharashtrian1
NorthIndian3
SouthIndian4
TypeFrequency
NonVeg2
Veg10
CalorieCatFrequency
High3
Low3
Moderate6

6) PROC SQL: aggregated summaries and joins 

purpose: Use SQL to group by cuisine and compute aggregated metrics. 

proc sql;

  create table work.by_cuisine as

  select Cuisine,

         count(*) as N,

         mean(Calories) format=8.1 as MeanCalories,

         mean(PriceINR) format=8.1 as MeanPrice,

         mean(PrepTimeMin) format=8.1 as MeanPrep

  from work.realbreakfast

  group by Cuisine

  order by MeanPrice desc;

quit;

proc print;run;

Output:

ObsCuisineNMeanCaloriesMeanPriceMeanPrep
1NorthIndian3383.355.020.0
2Indian1200.040.012.0
3SouthIndian4210.040.013.8
4Maharashtrian1260.035.010.0
5Continental3183.333.36.7


7) PROC SORT: prepare for reporting 

purpose: Sort dataset by descending calories for top-calorie reporting. 

proc sort data=work.realbreakfast out=work.realbreakfast_sorted;

  by descending Calories;

run;

proc print data=work.realbreakfast_sorted;

run;

Output:

ObsItemIDItemNameCuisineTypeCaloriesPriceINRPrepTimeMin
1B008AlooParathaNorthIndianVeg4205522
2B009PuriSabjiNorthIndianVeg3806018
3B004ParathaNorthIndianVeg3505020
4B003PohaMaharashtrianVeg2603510
5B011MasalaDosaSouthIndianVeg2605518
6B007UpmaSouthIndianVeg2403010
7B002DosaSouthIndianVeg2204515
8B010OmeletteContinentalNonVeg220358
9B005EggBhurjiIndianNonVeg2004012
10B006BreadJamContinentalVeg180255
11B012OatsPorridgeContinentalVeg150407
12B001IdliSouthIndianVeg1203012


8) Macro: summarize by Type (Veg/NonVeg) using PROC MEANS and PROC SQL 

purpose: Macro to generate numeric summary and top-calorie list for a given Type. 

%macro summary_by_type(type=);

  %put NOTE: Starting summary for &type;

  proc means data=work.realbreakfast(where=(upcase(Type)=upcase("&type"))) n mean std min max maxdec=1;

    var Calories PriceINR PrepTimeMin;

    title "Summary for &type Items";

  run;


  proc sql;

    create table work.top_&type as

    select ItemID, ItemName, Calories, PriceINR, PrepTimeMin

    from work.realbreakfast

    where upcase(Type)=upcase("&type")

    order by Calories desc;

  quit;

%mend;


%summary_by_type(type=Veg);

Log:

NOTE: Starting summary for Veg
NOTE: There were 10 observations read from the data set WORK.REALBREAKFAST.
WHERE UPCASE(Type)='VEG';

Output:

Summary for Veg Items

The MEANS Procedure

VariableNMeanStd DevMinimumMaximum
Calories
PriceINR
PrepTimeMin
10
10
10
258.0
42.5
13.7
99.0
12.3
5.8
120.0
25.0
5.0
420.0
60.0
22.0

%summary_by_type(type=NonVeg);

Log:

NOTE: Starting summary for NonVeg
NOTE: There were 2 observations read from the data set WORK.REALBREAKFAST.
WHERE UPCASE(Type)='NONVEG';

Output:

Summary for NonVeg Items

The MEANS Procedure

VariableNMeanStd DevMinimumMaximum
Calories
PriceINR
PrepTimeMin
2
2
2
210.0
37.5
10.0
14.1
3.5
2.8
200.0
35.0
8.0
220.0
40.0
12.0

9) PROC REPORT: fancy printable report 

purpose: Produce a formatted, report-style listing for presentation. 

proc report data=work.realbreakfast_fmtnowd headline spacing=1;

  column ItemID ItemName Cuisine Type Calories PriceINR_f PrepTimeMin CalorieCat;

  define ItemID / group 'ID' width=6;

  define ItemName / display 'Breakfast Item' width=30;

  define Cuisine / group 'Cuisine';

  define Type / group 'Type';

  define Calories / analysis mean 'Calories' format=8.;

  define PriceINR_f / display 'Price (INR)';

  define PrepTimeMin / analysis mean 'Prep Time (min)';

  define CalorieCat / display 'Cal Cat';

  title "Formatted Breakfast Items Report";

run;

Output:

Formatted Breakfast Items Report

IDBreakfast ItemCuisineTypeCaloriesPrice (INR)Prep Time (min)Cal Cat
B001IdliSouthIndianVeg120$3012Low
B002DosaSouthIndianVeg220$4515Moderate
B003PohaMaharashtrianVeg260$3510Moderate
B004ParathaNorthIndianVeg350$5020High
B005EggBhurjiIndianNonVeg200$4012Moderate
B006BreadJamContinentalVeg180$255Low
B007UpmaSouthIndianVeg240$3010Moderate
B008AlooParathaNorthIndianVeg420$5522High
B009PuriSabjiNorthIndianVeg380$6018High
B010OmeletteContinentalNonVeg220$358Moderate
B011MasalaDosaSouthIndianVeg260$5518Moderate
B012OatsPorridgeContinentalVeg150$407Low

10) PROC SQL: select items with calories > 300 (high calorie items) 

purpose: Create a table of high-calorie breakfast items. 

proc sql;

  create table work.high_cal as

  select * from work.realbreakfast

  where Calories > 300

  order by Calories desc;

quit;

proc print;run;

Output:

ObsItemIDItemNameCuisineTypeCaloriesPriceINRPrepTimeMin
1B008AlooParathaNorthIndianVeg4205522
2B009PuriSabjiNorthIndianVeg3806018
3B004ParathaNorthIndianVeg3505020


11) Simple QC checks: duplicate ItemID and missing values 

purpose: Basic QC checks to identify duplicates and missing numeric values. 

proc sql;

  select 'DUPLICATE_ITEMIDS' as CheckType, ItemID, count(*) as CNT

  from work.realbreakfast

  group by ItemID

  having count(*) > 1;

quit;

Log:

NOTE: No rows were selected.

proc means data=work.realbreakfast n nmiss;

  var Calories PriceINR PrepTimeMin;

  title "Missing-value check for numeric fields";

run;

Output:

Missing-value check for numeric fields

The MEANS Procedure

VariableNN Miss
Calories
PriceINR
PrepTimeMin
12
12
12
0
0
0


To Visit My Previous E-Commerce Dataset:Click Here
To Visit My Previous Length,Input,Retain Statements:Click Here
To Visit My Previous Urban Traffic Dataset:Click Here
To Visit My Previous Home Energy Consumption Dataset:Click Here







Follow Us On : 


 


--->FOLLOW OUR BLOG FOR MORE INFORMATION.
--->PLEASE DO COMMENTS AND SHARE OUR BLOG.


Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study