201.DETAILED ANALYSIS OF DIFFERENT TYPES OF OILS USING PROC PRINT | PROC SORT | PROC FREQ | PROC MEANS | PROC UNIVARIATE | PROC REPORT | PROC SGPLOT | PROC SQL | PROC TRANSPOSE | MACROS IN SAS

DETAILED ANALYSIS OF DIFFERENT TYPES OF OILS USING PROC PRINT | PROC SORT | PROC FREQ | PROC MEANS | PROC UNIVARIATE | PROC REPORT | PROC SGPLOT | PROC SQL | PROC TRANSPOSE | MACROS IN SAS


/*A dataset of different types of oils*/

Step 1: Create the Dataset

data Oils_Dataset;

    length  Oil_ID 8 Oil_Name $20 Category $15 Origin $15 Usage_Type $12;

    input Oil_ID Oil_Name $ Category $ Origin $ Smoke_Point Usage_Type $ 

          Price_Per_Litre Health_Rating;

    datalines;

1 Olive_Oil Cooking Italy 190 Cooking 850 9

2 Coconut_Oil Cooking India 175 Cooking 700 7

3 Canola_Oil Cooking Canada 205 Cooking 650 8

4 Almond_Oil Cosmetic USA 220 Skincare 1200 6

5 Argan_Oil Cosmetic Morocco 215 Haircare 1400 7

6 Sunflower_Oil Cooking Ukraine 225 Cooking 600 8

7 Sesame_Oil Cooking China 210 Cooking 780 7

8 Mustard_Oil Cooking India 200 Cooking 500 6

9 Avocado_Oil Cooking Mexico 250 Cooking 1300 9

10 Jojoba_Oil Cosmetic USA 260 Skincare 1500 6

11 Flaxseed_Oil Cooking Canada 190 Cooking 850 7

12 Palm_Oil Cooking Malaysia 230 Cooking 550 5

13 Grapeseed_Oil Cooking France 215 Cooking 700 7

14 Castor_Oil Medicinal India 250 Laxative 400 5

15 Neem_Oil Medicinal India 210 Pesticide 300 4

16 Tea_Tree_Oil Medicinal Australia 190 Antifungal 1300 8

17 Rice_Bran_Oil Cooking Japan 245 Cooking 900 8

18 Wheatgerm_Oil Cosmetic Germany 210 Haircare 1100 7

19 Hempseed_Oil Cooking USA 170 Cooking 1050 7

20 Safflower_Oil Cooking USA 225 Cooking 650 6

;

run;

proc print;run;

Output:

Obs Oil_ID Oil_Name Category Origin Usage_Type Smoke_Point Price_Per_Litre Health_Rating
1 1 Olive_Oil Cooking Italy Cooking 190 850 9
2 2 Coconut_Oil Cooking India Cooking 175 700 7
3 3 Canola_Oil Cooking Canada Cooking 205 650 8
4 4 Almond_Oil Cosmetic USA Skincare 220 1200 6
5 5 Argan_Oil Cosmetic Morocco Haircare 215 1400 7
6 6 Sunflower_Oil Cooking Ukraine Cooking 225 600 8
7 7 Sesame_Oil Cooking China Cooking 210 780 7
8 8 Mustard_Oil Cooking India Cooking 200 500 6
9 9 Avocado_Oil Cooking Mexico Cooking 250 1300 9
10 10 Jojoba_Oil Cosmetic USA Skincare 260 1500 6
11 11 Flaxseed_Oil Cooking Canada Cooking 190 850 7
12 12 Palm_Oil Cooking Malaysia Cooking 230 550 5
13 13 Grapeseed_Oil Cooking France Cooking 215 700 7
14 14 Castor_Oil Medicinal India Laxative 250 400 5
15 15 Neem_Oil Medicinal India Pesticide 210 300 4
16 16 Tea_Tree_Oil Medicinal Australia Antifungal 190 1300 8
17 17 Rice_Bran_Oil Cooking Japan Cooking 245 900 8
18 18 Wheatgerm_Oil Cosmetic Germany Haircare 210 1100 7
19 19 Hempseed_Oil Cooking USA Cooking 170 1050 7
20 20 Safflower_Oil Cooking USA Cooking 225 650 6


Step 2: Display the Dataset — PROC PRINT

proc print data=Oils_Dataset label noobs;

    title "Complete Oils Dataset";

run;

Output:

Complete Oils Dataset

Oil_ID Oil_Name Category Origin Usage_Type Smoke_Point Price_Per_Litre Health_Rating
1 Olive_Oil Cooking Italy Cooking 190 850 9
2 Coconut_Oil Cooking India Cooking 175 700 7
3 Canola_Oil Cooking Canada Cooking 205 650 8
4 Almond_Oil Cosmetic USA Skincare 220 1200 6
5 Argan_Oil Cosmetic Morocco Haircare 215 1400 7
6 Sunflower_Oil Cooking Ukraine Cooking 225 600 8
7 Sesame_Oil Cooking China Cooking 210 780 7
8 Mustard_Oil Cooking India Cooking 200 500 6
9 Avocado_Oil Cooking Mexico Cooking 250 1300 9
10 Jojoba_Oil Cosmetic USA Skincare 260 1500 6
11 Flaxseed_Oil Cooking Canada Cooking 190 850 7
12 Palm_Oil Cooking Malaysia Cooking 230 550 5
13 Grapeseed_Oil Cooking France Cooking 215 700 7
14 Castor_Oil Medicinal India Laxative 250 400 5
15 Neem_Oil Medicinal India Pesticide 210 300 4
16 Tea_Tree_Oil Medicinal Australia Antifungal 190 1300 8
17 Rice_Bran_Oil Cooking Japan Cooking 245 900 8
18 Wheatgerm_Oil Cosmetic Germany Haircare 210 1100 7
19 Hempseed_Oil Cooking USA Cooking 170 1050 7
20 Safflower_Oil Cooking USA Cooking 225 650 6

Step 3: Sort by Category & Price — PROC SORT

proc sort data=Oils_Dataset out=Sorted_Oils;

    by Category descending Price_Per_Litre;

run;

proc print;run;

Output:

Obs Oil_ID Oil_Name Category Origin Usage_Type Smoke_Point Price_Per_Litre Health_Rating
1 9 Avocado_Oil Cooking Mexico Cooking 250 1300 9
2 19 Hempseed_Oil Cooking USA Cooking 170 1050 7
3 17 Rice_Bran_Oil Cooking Japan Cooking 245 900 8
4 1 Olive_Oil Cooking Italy Cooking 190 850 9
5 11 Flaxseed_Oil Cooking Canada Cooking 190 850 7
6 7 Sesame_Oil Cooking China Cooking 210 780 7
7 2 Coconut_Oil Cooking India Cooking 175 700 7
8 13 Grapeseed_Oil Cooking France Cooking 215 700 7
9 3 Canola_Oil Cooking Canada Cooking 205 650 8
10 20 Safflower_Oil Cooking USA Cooking 225 650 6
11 6 Sunflower_Oil Cooking Ukraine Cooking 225 600 8
12 12 Palm_Oil Cooking Malaysia Cooking 230 550 5
13 8 Mustard_Oil Cooking India Cooking 200 500 6
14 10 Jojoba_Oil Cosmetic USA Skincare 260 1500 6
15 5 Argan_Oil Cosmetic Morocco Haircare 215 1400 7
16 4 Almond_Oil Cosmetic USA Skincare 220 1200 6
17 18 Wheatgerm_Oil Cosmetic Germany Haircare 210 1100 7
18 16 Tea_Tree_Oil Medicinal Australia Antifungal 190 1300 8
19 14 Castor_Oil Medicinal India Laxative 250 400 5
20 15 Neem_Oil Medicinal India Pesticide 210 300 4

Step 4: Frequency Distribution — PROC FREQ

proc freq data=Oils_Dataset;

    tables Category Origin Usage_Type / nocum;

    title "Frequency of Oil Categories and Origins";

run;

Output:

Frequency of Oil Categories and Origins

The FREQ Procedure

Category Frequency Percent
Cooking 13 65.00
Cosmetic 4 20.00
Medicinal 3 15.00

Origin Frequency Percent
Australia 1 5.00
Canada 2 10.00
China 1 5.00
France 1 5.00
Germany 1 5.00
India 4 20.00
Italy 1 5.00
Japan 1 5.00
Malaysia 1 5.00
Mexico 1 5.00
Morocco 1 5.00
USA 4 20.00
Ukraine 1 5.00

Usage_Type Frequency Percent
Antifungal 1 5.00
Cooking 13 65.00
Haircare 2 10.00
Laxative 1 5.00
Pesticide 1 5.00
Skincare 2 10.00

Step 5: Statistical Summary — PROC MEANS

proc means data=Oils_Dataset min max mean median std maxdec=2;

    var Price_Per_Litre Health_Rating Smoke_Point;

    title "Descriptive Statistics for Numerical Variables";

run;

Output:

Descriptive Statistics for Numerical Variables

The MEANS Procedure

Variable Minimum Maximum Mean Median Std Dev
Price_Per_Litre
Health_Rating
Smoke_Point
300.00
4.00
170.00
1500.00
9.00
260.00
864.00
6.85
214.25
815.00
7.00
212.50
345.15
1.31
24.83

Step 6: Distribution & Outliers — PROC UNIVARIATE

proc univariate data=Oils_Dataset;

    var Price_Per_Litre;

    histogram / normal;

    inset mean std min max;

    title "Univariate Analysis of Oil Price";

run;

Output:


Step 7: 

Univariate Analysis of Oil Price

The UNIVARIATE Procedure
Variable: Price_Per_Litre

Moments
N 20 Sum Weights 20
Mean 864 Sum Observations 17280
Std Deviation 345.152903 Variance 119130.526
Skewness 0.32520808 Kurtosis -0.8880889
Uncorrected SS 17193400 Corrected SS 2263480
Coeff Variation 39.9482526 Std Error Mean 77.1785353

Basic Statistical Measures
Location Variability
Mean 864.0000 Std Deviation 345.15290
Median 815.0000 Variance 119131
Mode 650.0000 Range 1200
    Interquartile Range 525.00000

Note: The mode displayed is the smallest of 4 modes with a count of 2.


Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 11.19482 Pr > |t| <.0001
Sign M 10 Pr >= |M| <.0001
Signed Rank S 105 Pr >= |S| <.0001

Quantiles (Definition 5)
Level Quantile
100% Max 1500
99% 1500
95% 1450
90% 1350
75% Q3 1150
50% Median 815
25% Q1 625
10% 450
5% 350
1% 300
0% Min 300

Extreme Observations
Lowest Highest
Value Obs Value Obs
300 15 1200 4
400 14 1300 9
500 8 1300 16
550 12 1400 5
600 6 1500 10

Univariate Analysis of Oil Price

The UNIVARIATE Procedure
Fitted Normal Distribution for Price_Per_Litre

Parameters for Normal Distribution
Parameter Symbol Estimate
Mean Mu 864
Std Dev Sigma 345.1529

Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value
Kolmogorov-Smirnov D 0.13266059 Pr > D >0.150
Cramer-von Mises W-Sq 0.05699813 Pr > W-Sq >0.250
Anderson-Darling A-Sq 0.32913203 Pr > A-Sq >0.250

Quantiles for Normal Distribution
Percent Quantile
Observed Estimated
1.0 300.000 61.0543
5.0 350.000 296.2740
10.0 450.000 421.6688
25.0 625.000 631.1979
50.0 815.000 864.0000
75.0 1150.000 1096.8021
90.0 1350.000 1306.3312
95.0 1450.000 1431.7260
99.0 1500.000 1666.9457

Report with Grouping — PROC REPORT

proc report data=Oils_Dataset nowd;

    column Category Origin Price_Per_Litre Health_Rating;

    define Category / group;

    define Origin / group;

    define Price_Per_Litre / analysis mean format=8.2;

    define Health_Rating / analysis mean format=8.2;

    title "Mean Price and Health Rating by Category and Origin";

run;

Output:

Mean Price and Health Rating by Category and Origin

Category Origin Price_Per_Litre Health_Rating
Cooking Canada 750.00 7.50
  China 780.00 7.00
  France 700.00 7.00
  India 600.00 6.50
  Italy 850.00 9.00
  Japan 900.00 8.00
  Malaysia 550.00 5.00
  Mexico 1300.00 9.00
  USA 850.00 6.50
  Ukraine 600.00 8.00
Cosmetic Germany 1100.00 7.00
  Morocco 1400.00 7.00
  USA 1350.00 6.00
Medicinal Australia 1300.00 8.00
  India 350.00 4.50


Step 8: Plotting — PROC SGPLOT

proc sgplot data=Oils_Dataset;

    vbar Category / response=Price_Per_Litre stat=mean;

    title "Average Price per Litre by Oil Category";

run;

Log:

NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           0.71 seconds
      cpu time            0.09 seconds

NOTE: Listing image output written to SGPlot1.png.
NOTE: There were 20 observations read from the data set WORK.OILS_DATASET.

proc sgplot data=Oils_Dataset;

    scatter x=Smoke_Point y=Health_Rating / group=Category;

    title "Health Rating vs Smoke Point by Category";

run;

Output:

NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           0.42 seconds
      cpu time            0.07 seconds

NOTE: Listing image output written to SGPlot3.png.
NOTE: There were 20 observations read from the data set WORK.OILS_DATASET.

Step 9: SQL Querying — PROC SQL

proc sql;

    select Oil_Name, Category, Origin, Price_Per_Litre

    from Oils_Dataset

    where Price_Per_Litre > 1000;

quit;

Output:


Oil_Name Category Origin Price_Per_Litre
Almond_Oil Cosmetic USA 1200
Argan_Oil Cosmetic Morocco 1400
Avocado_Oil Cooking Mexico 1300
Jojoba_Oil Cosmetic USA 1500
Tea_Tree_Oil Medicinal Australia 1300
Wheatgerm_Oil Cosmetic Germany 1100
Hempseed_Oil Cooking USA 1050

proc sql;

    select Category, avg(Health_Rating) as Avg_Health_Rating

    from Oils_Dataset

    group by Category;

quit;

Output:

Category Avg_Health_Rating
Cooking 7.230769
Cosmetic 6.5
Medicinal 5.666667

Step 10: Macro Programming

Macro to Generate Summary for Any Variable

%macro summarize(var);

    proc means data=Oils_Dataset mean std min max;

        var &var;

        title "Summary Statistics for &var";

    run;

%mend summarize;


%summarize(Price_Per_Litre);

Output:

Summary Statistics for Price_Per_Litre

The MEANS Procedure

Analysis Variable : Price_Per_Litre
Mean Std Dev Minimum Maximum
864.0000000 345.1529028 300.0000000 1500.00

%summarize(Smoke_Point);

Output:

Summary Statistics for Smoke_Point

The MEANS Procedure

Analysis Variable : Smoke_Point
Mean Std Dev Minimum Maximum
214.2500000 24.8296830 170.0000000 260.0000000


Macro with PROC SQL inside

%macro filter_price(price);

    proc sql;

        select Oil_Name, Price_Per_Litre

        from Oils_Dataset

        where Price_Per_Litre > &price;

    quit;

%mend filter_price;


%filter_price(1000);

Output:

Oil_Name Price_Per_Litre
Almond_Oil 1200
Argan_Oil 1400
Avocado_Oil 1300
Jojoba_Oil 1500
Tea_Tree_Oil 1300
Wheatgerm_Oil 1100
Hempseed_Oil 1050

Step 11: Transpose — PROC TRANSPOSE

Transpose Health Ratings

proc transpose data=Oils_Dataset out=Transposed_Health(drop=_NAME_);

    id Oil_Name;

    var Health_Rating;

run;

proc print data=Transposed_Health;

    title "Transposed Health Ratings";

run;

Output:

Transposed Health Ratings

Obs Olive_Oil Coconut_Oil Canola_Oil Almond_Oil Argan_Oil Sunflower_Oil Sesame_Oil Mustard_Oil Avocado_Oil Jojoba_Oil Flaxseed_Oil Palm_Oil Grapeseed_Oil Castor_Oil Neem_Oil Tea_Tree_Oil Rice_Bran_Oil Wheatgerm_Oil Hempseed_Oil Safflower_Oil
1 9 7 8 6 7 8 7 6 9 6 7 5 7 5 4 8 8 7 7 6

Step 12: Custom Format for Rating

proc format;

    value healthfmt 

        1 - 4 = 'Poor'

        5 - 6 = 'Moderate'

        7 - 8 = 'Good'

        9 - 10 = 'Excellent';

run;


proc freq data=Oils_Dataset;

    tables Health_Rating;

    format Health_Rating healthfmt.;

    title "Formatted Health Rating Frequency";

run;

Output:

Formatted Health Rating Frequency

The FREQ Procedure

Health_Rating Frequency Percent Cumulative
Frequency
Cumulative
Percent
Poor 1 5.00 1 5.00
Moderate 6 30.00 7 35.00
Good 11 55.00 18 90.00
Excellent 2 10.00 20 100.00




To Visit My Previous Online Streaming Flatform Dataset:Click Here
To Visit My Previous Statistical Evaluation Of Clinical Trials:Click Here
To Visit My Previous Unlocking Retail Insights Dataset:Click Here
To Visit My Previous Sas Interview Questions-1:Click Here





--->PLEASE FOLLOW OUR BLOG FOR MORE INFORMATION.
--->PLEASE DO COMMENTS AND SHARE OUR BLOG.

PLEASE FOLLOW OUR TELEGRAM CHANNEL CLICK HERE

PLEASE FOLLOW OUR FACEBOOK PAGE  CLICK HERE

PLEASE FOLLOW OUR INSTAGRAM PAGE CLICK HERE


Comments