201.DETAILED ANALYSIS OF DIFFERENT TYPES OF OILS USING PROC PRINT | PROC SORT | PROC FREQ | PROC MEANS | PROC UNIVARIATE | PROC REPORT | PROC SGPLOT | PROC SQL | PROC TRANSPOSE

DETAILED ANALYSIS OF DIFFERENT TYPES OF OILS USING PROC PRINT | PROC SORT | PROC FREQ | PROC MEANS | PROC UNIVARIATE | PROC REPORT | PROC SGPLOT | PROC SQL | PROC TRANSPOSE | MACROS IN SAS

/*A dataset of different types of oils*/

Step 1: Create the Dataset

data Oils_Dataset;

length Oil_ID 8 Oil_Name $20 Category $15 Origin $15 Usage_Type $12;

input Oil_ID Oil_Name $ Category $ Origin $ Smoke_Point Usage_Type $

Price_Per_Litre Health_Rating;

datalines;

1 Olive_Oil Cooking Italy 190 Cooking 850 9

2 Coconut_Oil Cooking India 175 Cooking 700 7

3 Canola_Oil Cooking Canada 205 Cooking 650 8

4 Almond_Oil Cosmetic USA 220 Skincare 1200 6

5 Argan_Oil Cosmetic Morocco 215 Haircare 1400 7

6 Sunflower_Oil Cooking Ukraine 225 Cooking 600 8

7 Sesame_Oil Cooking China 210 Cooking 780 7

8 Mustard_Oil Cooking India 200 Cooking 500 6

9 Avocado_Oil Cooking Mexico 250 Cooking 1300 9

10 Jojoba_Oil Cosmetic USA 260 Skincare 1500 6

11 Flaxseed_Oil Cooking Canada 190 Cooking 850 7

12 Palm_Oil Cooking Malaysia 230 Cooking 550 5

13 Grapeseed_Oil Cooking France 215 Cooking 700 7

14 Castor_Oil Medicinal India 250 Laxative 400 5

15 Neem_Oil Medicinal India 210 Pesticide 300 4

16 Tea_Tree_Oil Medicinal Australia 190 Antifungal 1300 8

17 Rice_Bran_Oil Cooking Japan 245 Cooking 900 8

18 Wheatgerm_Oil Cosmetic Germany 210 Haircare 1100 7

19 Hempseed_Oil Cooking USA 170 Cooking 1050 7

20 Safflower_Oil Cooking USA 225 Cooking 650 6

;

run;

proc print;run;

Output:

Obs	Oil_ID	Oil_Name	Category	Origin	Usage_Type	Smoke_Point	Price_Per_Litre	Health_Rating
1	1	Olive_Oil	Cooking	Italy	Cooking	190	850	9
2	2	Coconut_Oil	Cooking	India	Cooking	175	700	7
3	3	Canola_Oil	Cooking	Canada	Cooking	205	650	8
4	4	Almond_Oil	Cosmetic	USA	Skincare	220	1200	6
5	5	Argan_Oil	Cosmetic	Morocco	Haircare	215	1400	7
6	6	Sunflower_Oil	Cooking	Ukraine	Cooking	225	600	8
7	7	Sesame_Oil	Cooking	China	Cooking	210	780	7
8	8	Mustard_Oil	Cooking	India	Cooking	200	500	6
9	9	Avocado_Oil	Cooking	Mexico	Cooking	250	1300	9
10	10	Jojoba_Oil	Cosmetic	USA	Skincare	260	1500	6
11	11	Flaxseed_Oil	Cooking	Canada	Cooking	190	850	7
12	12	Palm_Oil	Cooking	Malaysia	Cooking	230	550	5
13	13	Grapeseed_Oil	Cooking	France	Cooking	215	700	7
14	14	Castor_Oil	Medicinal	India	Laxative	250	400	5
15	15	Neem_Oil	Medicinal	India	Pesticide	210	300	4
16	16	Tea_Tree_Oil	Medicinal	Australia	Antifungal	190	1300	8
17	17	Rice_Bran_Oil	Cooking	Japan	Cooking	245	900	8
18	18	Wheatgerm_Oil	Cosmetic	Germany	Haircare	210	1100	7
19	19	Hempseed_Oil	Cooking	USA	Cooking	170	1050	7
20	20	Safflower_Oil	Cooking	USA	Cooking	225	650	6

Step 2: Display the Dataset — PROC PRINT

proc print data=Oils_Dataset label noobs;

title "Complete Oils Dataset";

run;

Output:

Complete Oils Dataset

Oil_ID	Oil_Name	Category	Origin	Usage_Type	Smoke_Point	Price_Per_Litre	Health_Rating
1	Olive_Oil	Cooking	Italy	Cooking	190	850	9
2	Coconut_Oil	Cooking	India	Cooking	175	700	7
3	Canola_Oil	Cooking	Canada	Cooking	205	650	8
4	Almond_Oil	Cosmetic	USA	Skincare	220	1200	6
5	Argan_Oil	Cosmetic	Morocco	Haircare	215	1400	7
6	Sunflower_Oil	Cooking	Ukraine	Cooking	225	600	8
7	Sesame_Oil	Cooking	China	Cooking	210	780	7
8	Mustard_Oil	Cooking	India	Cooking	200	500	6
9	Avocado_Oil	Cooking	Mexico	Cooking	250	1300	9
10	Jojoba_Oil	Cosmetic	USA	Skincare	260	1500	6
11	Flaxseed_Oil	Cooking	Canada	Cooking	190	850	7
12	Palm_Oil	Cooking	Malaysia	Cooking	230	550	5
13	Grapeseed_Oil	Cooking	France	Cooking	215	700	7
14	Castor_Oil	Medicinal	India	Laxative	250	400	5
15	Neem_Oil	Medicinal	India	Pesticide	210	300	4
16	Tea_Tree_Oil	Medicinal	Australia	Antifungal	190	1300	8
17	Rice_Bran_Oil	Cooking	Japan	Cooking	245	900	8
18	Wheatgerm_Oil	Cosmetic	Germany	Haircare	210	1100	7
19	Hempseed_Oil	Cooking	USA	Cooking	170	1050	7
20	Safflower_Oil	Cooking	USA	Cooking	225	650	6

Step 3: Sort by Category & Price — PROC SORT

proc sort data=Oils_Dataset out=Sorted_Oils;

by Category descending Price_Per_Litre;

run;

proc print;run;

Output:

Obs	Oil_ID	Oil_Name	Category	Origin	Usage_Type	Smoke_Point	Price_Per_Litre	Health_Rating
1	9	Avocado_Oil	Cooking	Mexico	Cooking	250	1300	9
2	19	Hempseed_Oil	Cooking	USA	Cooking	170	1050	7
3	17	Rice_Bran_Oil	Cooking	Japan	Cooking	245	900	8
4	1	Olive_Oil	Cooking	Italy	Cooking	190	850	9
5	11	Flaxseed_Oil	Cooking	Canada	Cooking	190	850	7
6	7	Sesame_Oil	Cooking	China	Cooking	210	780	7
7	2	Coconut_Oil	Cooking	India	Cooking	175	700	7
8	13	Grapeseed_Oil	Cooking	France	Cooking	215	700	7
9	3	Canola_Oil	Cooking	Canada	Cooking	205	650	8
10	20	Safflower_Oil	Cooking	USA	Cooking	225	650	6
11	6	Sunflower_Oil	Cooking	Ukraine	Cooking	225	600	8
12	12	Palm_Oil	Cooking	Malaysia	Cooking	230	550	5
13	8	Mustard_Oil	Cooking	India	Cooking	200	500	6
14	10	Jojoba_Oil	Cosmetic	USA	Skincare	260	1500	6
15	5	Argan_Oil	Cosmetic	Morocco	Haircare	215	1400	7
16	4	Almond_Oil	Cosmetic	USA	Skincare	220	1200	6
17	18	Wheatgerm_Oil	Cosmetic	Germany	Haircare	210	1100	7
18	16	Tea_Tree_Oil	Medicinal	Australia	Antifungal	190	1300	8
19	14	Castor_Oil	Medicinal	India	Laxative	250	400	5
20	15	Neem_Oil	Medicinal	India	Pesticide	210	300	4

Step 4: Frequency Distribution — PROC FREQ

proc freq data=Oils_Dataset;

tables Category Origin Usage_Type / nocum;

title "Frequency of Oil Categories and Origins";

run;

Output:

Frequency of Oil Categories and Origins

The FREQ Procedure

Category	Frequency	Percent
Cooking	13	65.00
Cosmetic	4	20.00
Medicinal	3	15.00

Origin	Frequency	Percent
Australia	1	5.00
Canada	2	10.00
China	1	5.00
France	1	5.00
Germany	1	5.00
India	4	20.00
Italy	1	5.00
Japan	1	5.00
Malaysia	1	5.00
Mexico	1	5.00
Morocco	1	5.00
USA	4	20.00
Ukraine	1	5.00

Usage_Type	Frequency	Percent
Antifungal	1	5.00
Cooking	13	65.00
Haircare	2	10.00
Laxative	1	5.00
Pesticide	1	5.00
Skincare	2	10.00

Step 5: Statistical Summary — PROC MEANS

proc means data=Oils_Dataset min max mean median std maxdec=2;

var Price_Per_Litre Health_Rating Smoke_Point;

title "Descriptive Statistics for Numerical Variables";

run;

Output:

Descriptive Statistics for Numerical Variables

The MEANS Procedure

Price_Per_Litre
Health_Rating
Smoke_Point

300.00

4.00

170.00

1500.00

9.00

260.00

864.00

6.85

214.25

815.00

7.00

212.50

345.15

1.31

24.83

Step 6: Distribution & Outliers — PROC UNIVARIATE

proc univariate data=Oils_Dataset;

var Price_Per_Litre;

histogram / normal;

inset mean std min max;

title "Univariate Analysis of Oil Price";

run;

Output:

Step 7:

Univariate Analysis of Oil Price

The UNIVARIATE Procedure

Variable: Price_Per_Litre

Moments
N	20	Sum Weights	20
Mean	864	Sum Observations	17280
Std Deviation	345.152903	Variance	119130.526
Skewness	0.32520808	Kurtosis	-0.8880889
Uncorrected SS	17193400	Corrected SS	2263480
Coeff Variation	39.9482526	Std Error Mean	77.1785353

Basic Statistical Measures
Location		Variability
Mean	864.0000	Std Deviation	345.15290
Median	815.0000	Variance	119131
Mode	650.0000	Range	1200
		Interquartile Range	525.00000

Note: The mode displayed is the smallest of 4 modes with a count of 2.

Tests for Location: Mu0=0
Test	Statistic		p Value
Student's t	t	11.19482	Pr > \|t\|	<.0001
Sign	M	10	Pr >= \|M\|	<.0001
Signed Rank	S	105	Pr >= \|S\|	<.0001

Quantiles (Definition 5)
Level	Quantile
100% Max	1500
99%	1500
95%	1450
90%	1350
75% Q3	1150
50% Median	815
25% Q1	625
10%	450
5%	350
1%	300
0% Min	300

Extreme Observations
Lowest		Highest
Value	Obs	Value	Obs
300	15	1200	4
400	14	1300	9
500	8	1300	16
550	12	1400	5
600	6	1500	10

Univariate Analysis of Oil Price

The UNIVARIATE Procedure

Fitted Normal Distribution for Price_Per_Litre

Parameters for Normal Distribution
Parameter	Symbol	Estimate
Mean	Mu	864
Std Dev	Sigma	345.1529

Goodness-of-Fit Tests for Normal Distribution
Test	Statistic		p Value
Kolmogorov-Smirnov	D	0.13266059	Pr > D	>0.150
Cramer-von Mises	W-Sq	0.05699813	Pr > W-Sq	>0.250
Anderson-Darling	A-Sq	0.32913203	Pr > A-Sq	>0.250

Quantiles for Normal Distribution
Percent	Quantile
Percent	Observed	Estimated
1.0	300.000	61.0543
5.0	350.000	296.2740
10.0	450.000	421.6688
25.0	625.000	631.1979
50.0	815.000	864.0000
75.0	1150.000	1096.8021
90.0	1350.000	1306.3312
95.0	1450.000	1431.7260
99.0	1500.000	1666.9457

Report with Grouping — PROC REPORT

proc report data=Oils_Dataset nowd;

column Category Origin Price_Per_Litre Health_Rating;

define Category / group;

define Origin / group;

define Price_Per_Litre / analysis mean format=8.2;

define Health_Rating / analysis mean format=8.2;

title "Mean Price and Health Rating by Category and Origin";

run;

Output:

Mean Price and Health Rating by Category and Origin

Category	Origin	Price_Per_Litre	Health_Rating
Cooking	Canada	750.00	7.50
	China	780.00	7.00
	France	700.00	7.00
	India	600.00	6.50
	Italy	850.00	9.00
	Japan	900.00	8.00
	Malaysia	550.00	5.00
	Mexico	1300.00	9.00
	USA	850.00	6.50
	Ukraine	600.00	8.00
Cosmetic	Germany	1100.00	7.00
	Morocco	1400.00	7.00
	USA	1350.00	6.00
Medicinal	Australia	1300.00	8.00
	India	350.00	4.50

Step 8: Plotting — PROC SGPLOT

proc sgplot data=Oils_Dataset;

vbar Category / response=Price_Per_Litre stat=mean;

title "Average Price per Litre by Oil Category";

run;

Log:

NOTE: PROCEDURE SGPLOT used (Total process time):

real time 0.71 seconds

cpu time 0.09 seconds

NOTE: Listing image output written to SGPlot1.png.

NOTE: There were 20 observations read from the data set WORK.OILS_DATASET.

proc sgplot data=Oils_Dataset;

scatter x=Smoke_Point y=Health_Rating / group=Category;

title "Health Rating vs Smoke Point by Category";

run;

Output:

NOTE: PROCEDURE SGPLOT used (Total process time):

real time 0.42 seconds

cpu time 0.07 seconds

NOTE: Listing image output written to SGPlot3.png.

NOTE: There were 20 observations read from the data set WORK.OILS_DATASET.

Step 9: SQL Querying — PROC SQL

proc sql;

select Oil_Name, Category, Origin, Price_Per_Litre

from Oils_Dataset

where Price_Per_Litre > 1000;

quit;

Output:

Oil_Name	Category	Origin	Price_Per_Litre
Almond_Oil	Cosmetic	USA	1200
Argan_Oil	Cosmetic	Morocco	1400
Avocado_Oil	Cooking	Mexico	1300
Jojoba_Oil	Cosmetic	USA	1500
Tea_Tree_Oil	Medicinal	Australia	1300
Wheatgerm_Oil	Cosmetic	Germany	1100
Hempseed_Oil	Cooking	USA	1050

proc sql;

select Category, avg(Health_Rating) as Avg_Health_Rating

from Oils_Dataset

group by Category;

quit;

Output:

Category	Avg_Health_Rating
Cooking	7.230769
Cosmetic	6.5
Medicinal	5.666667

Step 10: Macro Programming

Macro to Generate Summary for Any Variable

%macro summarize(var);

proc means data=Oils_Dataset mean std min max;

var &var;

title "Summary Statistics for &var";

run;

%mend summarize;

%summarize(Price_Per_Litre);

Output:

Summary Statistics for Price_Per_Litre

The MEANS Procedure

Analysis Variable : Price_Per_Litre
Mean	Std Dev	Minimum	Maximum
864.0000000	345.1529028	300.0000000	1500.00

%summarize(Smoke_Point);

Output:

Summary Statistics for Smoke_Point

The MEANS Procedure

Analysis Variable : Smoke_Point
Mean	Std Dev	Minimum	Maximum
214.2500000	24.8296830	170.0000000	260.0000000

Macro with PROC SQL inside

%macro filter_price(price);

proc sql;

select Oil_Name, Price_Per_Litre

from Oils_Dataset

where Price_Per_Litre > &price;

quit;

%mend filter_price;

%filter_price(1000);

Output:

Oil_Name	Price_Per_Litre
Almond_Oil	1200
Argan_Oil	1400
Avocado_Oil	1300
Jojoba_Oil	1500
Tea_Tree_Oil	1300
Wheatgerm_Oil	1100
Hempseed_Oil	1050

Step 11: Transpose — PROC TRANSPOSE

Transpose Health Ratings

proc transpose data=Oils_Dataset out=Transposed_Health(drop=_NAME_);

id Oil_Name;

var Health_Rating;

run;

proc print data=Transposed_Health;

title "Transposed Health Ratings";

run;

Output:

Transposed Health Ratings

Obs	Olive_Oil	Coconut_Oil	Canola_Oil	Almond_Oil	Argan_Oil	Sunflower_Oil	Sesame_Oil	Mustard_Oil	Avocado_Oil	Jojoba_Oil	Flaxseed_Oil	Palm_Oil	Grapeseed_Oil	Castor_Oil	Neem_Oil	Tea_Tree_Oil	Rice_Bran_Oil	Wheatgerm_Oil	Hempseed_Oil	Safflower_Oil
1	9	7	8	6	7	8	7	6	9	6	7	5	7	5	4	8	8	7	7	6

Step 12: Custom Format for Rating

proc format;

value healthfmt

1 - 4 = 'Poor'

5 - 6 = 'Moderate'

7 - 8 = 'Good'

9 - 10 = 'Excellent';

run;

proc freq data=Oils_Dataset;

tables Health_Rating;

format Health_Rating healthfmt.;

title "Formatted Health Rating Frequency";

run;

Output:

Formatted Health Rating Frequency

The FREQ Procedure

Health_Rating	Frequency	Percent	Cumulative Frequency	Cumulative Percent
Poor	1	5.00	1	5.00
Moderate	6	30.00	7	35.00
Good	11	55.00	18	90.00
Excellent	2	10.00	20	100.00

Search This Blog

SAS Learning Hub

201.DETAILED ANALYSIS OF DIFFERENT TYPES OF OILS USING PROC PRINT | PROC SORT | PROC FREQ | PROC MEANS | PROC UNIVARIATE | PROC REPORT | PROC SGPLOT | PROC SQL | PROC TRANSPOSE | MACROS IN SAS

DETAILED ANALYSIS OF DIFFERENT TYPES OF OILS USING PROC PRINT | PROC SORT | PROC FREQ | PROC MEANS | PROC UNIVARIATE | PROC REPORT | PROC SGPLOT | PROC SQL | PROC TRANSPOSE | MACROS IN SAS

Comments

Post a Comment

Popular posts from this blog

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study

383.Which Pharma Distributors Are High-Risk? Can SAS Detect Delays, Temperature Violations, and Fraud?