186.ANALYZING A UNIQUE WATER BOTTLES DATASET IN SAS | USING PROC PRINT | PROC SORT | PROC MEANS | PROC FREQ | PROC SQL | PROC FORMAT | PROC REPORT | PROC GCHART | PROC UNIVARIATE | PROC SGPLOT | PROC CONTENTS | PROC DATASETS | PROC TRANSPOSE | PROC TABULATE FOR COMPREHENSIVE DATA EXPLORATION | SUMMARY STATISTICS | GRAPHICAL VISUALIZATION | AND SQL-BASED INSIGHTS
- Get link
- X
- Other Apps
ANALYZING A UNIQUE WATER BOTTLES DATASET IN SAS | USING PROC PRINT | PROC SORT | PROC MEANS | PROC FREQ | PROC SQL | PROC FORMAT | PROC REPORT | PROC GCHART | PROC UNIVARIATE | PROC SGPLOT | PROC CONTENTS | PROC DATASETS | PROC TRANSPOSE | PROC TABULATE FOR COMPREHENSIVE DATA EXPLORATION | SUMMARY STATISTICS | GRAPHICAL VISUALIZATION | AND SQL-BASED INSIGHTS
/*A detailed SAS project focused on a dataset involving different types of water bottles.*/
/*Creating a unique water bottle dataset*/
Step 1: Data Preparation in SAS
data water_bottles;
infile datalines dlm=' ' dsd truncover;
input Bottle_ID
Brand :$20.
Material :$20.
Capacity_ml
Color :$15.
Insulated :$3.
Price
Stock
Country :$20.;
datalines;
1 AquaPure Plastic 1000 Blue Yes 10.99 150 USA
2 ChillMate Stainless_Steel 750 Black Yes 15.50 75 Germany
3 EcoJug Glass 500 Transparent No 12.00 200 India
4 HydroBoost Plastic 2000 Green Yes 14.25 100 USA
5 Glacier Stainless_Steel 1000 Silver Yes 18.99 50 Japan
6 ChillMate Aluminum 750 Red No 11.49 30 China
7 AquaPure Glass 600 Blue No 13.75 120 India
8 HydroBoost Plastic 1500 Orange Yes 9.99 85 USA
9 EcoJug Stainless_Steel 500 White Yes 16.49 60 Germany
10 Glacier Plastic 2000 Grey No 8.99 45 China
11 AquaPure Aluminum 1000 Yellow No 10.50 95 India
12 EcoJug Plastic 600 Green Yes 9.75 40 USA
13 ChillMate Glass 750 Black No 14.20 25 Japan
14 HydroBoost Aluminum 500 Blue No 10.00 65 Germany
15 Glacier Stainless_Steel 2000 Silver Yes 22.99 15 China
16 AquaPure Plastic 1500 White No 7.99 210 India
17 ChillMate Plastic 750 Red No 8.49 120 USA
18 EcoJug Glass 1000 Transparent No 15.49 90 India
19 HydroBoost Plastic 500 Black Yes 6.99 200 USA
20 Glacier Aluminum 1000 Grey Yes 13.00 80 Japan
;
run;
Step 2: Basic PROC Exploration
2.1:To View the Data
proc print data=water_bottles noobs label;
title "Water Bottles Dataset";
run;
Output:
Water Bottles Dataset |
Bottle_ID | Brand | Material | Capacity_ml | Color | Insulated | Price | Stock | Country |
---|---|---|---|---|---|---|---|---|
1 | AquaPure | Plastic | 1000 | Blue | Yes | 10.99 | 150 | USA |
2 | ChillMate | Stainless_Steel | 750 | Black | Yes | 15.50 | 75 | Germany |
3 | EcoJug | Glass | 500 | Transparent | No | 12.00 | 200 | India |
4 | HydroBoost | Plastic | 2000 | Green | Yes | 14.25 | 100 | USA |
5 | Glacier | Stainless_Steel | 1000 | Silver | Yes | 18.99 | 50 | Japan |
6 | ChillMate | Aluminum | 750 | Red | No | 11.49 | 30 | China |
7 | AquaPure | Glass | 600 | Blue | No | 13.75 | 120 | India |
8 | HydroBoost | Plastic | 1500 | Orange | Yes | 9.99 | 85 | USA |
9 | EcoJug | Stainless_Steel | 500 | White | Yes | 16.49 | 60 | Germany |
10 | Glacier | Plastic | 2000 | Grey | No | 8.99 | 45 | China |
11 | AquaPure | Aluminum | 1000 | Yellow | No | 10.50 | 95 | India |
12 | EcoJug | Plastic | 600 | Green | Yes | 9.75 | 40 | USA |
13 | ChillMate | Glass | 750 | Black | No | 14.20 | 25 | Japan |
14 | HydroBoost | Aluminum | 500 | Blue | No | 10.00 | 65 | Germany |
15 | Glacier | Stainless_Steel | 2000 | Silver | Yes | 22.99 | 15 | China |
16 | AquaPure | Plastic | 1500 | White | No | 7.99 | 210 | India |
17 | ChillMate | Plastic | 750 | Red | No | 8.49 | 120 | USA |
18 | EcoJug | Glass | 1000 | Transparent | No | 15.49 | 90 | India |
19 | HydroBoost | Plastic | 500 | Black | Yes | 6.99 | 200 | USA |
20 | Glacier | Aluminum | 1000 | Grey | Yes | 13.00 | 80 | Japan |
2.2:Dataset Structure
proc contents data=water_bottles;
title "Dataset Structure for Water Bottles";
run;
Output:
Dataset Structure for Water
Bottles |
Data Set Name | WORK.WATER_BOTTLES | Observations | 20 |
---|---|---|---|
Member Type | DATA | Variables | 9 |
Engine | V9 | Indexes | 0 |
Created | 14/09/2015 00:14:33 | Observation Length | 112 |
Last Modified | 14/09/2015 00:14:33 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | WINDOWS_64 | ||
Encoding | wlatin1 Western (Windows) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 584 |
Obs in First Data Page | 20 |
Number of Data Set Repairs | 0 |
ExtendObsCounter | YES |
Filename | C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD10640_DESKTOP-QFAA4KV_\water_bottles.sas7bdat |
Release Created | 9.0401M2 |
Host Created | X64_8HOME |
Alphabetic List of Variables and Attributes | |||
---|---|---|---|
# | Variable | Type | Len |
1 | Bottle_ID | Num | 8 |
2 | Brand | Char | 20 |
4 | Capacity_ml | Num | 8 |
5 | Color | Char | 15 |
9 | Country | Char | 20 |
6 | Insulated | Char | 3 |
3 | Material | Char | 20 |
7 | Price | Num | 8 |
8 | Stock | Num | 8 |
2.3: PROC SORT and PROC FREQ
proc sort data=water_bottles out=sorted_material;
by Material;
run;
proc freq data=sorted_material;
tables Material;
title "Frequency of Bottle Materials";
run;
Output:
Frequency of Bottle
Materials |
Material | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Aluminum | 4 | 20.00 | 4 | 20.00 |
Glass | 4 | 20.00 | 8 | 40.00 |
Plastic | 8 | 40.00 | 16 | 80.00 |
Stainless_Steel | 4 | 20.00 | 20 | 100.00 |
Step 3: Aggregated Analysis
3.1: Summary Statistics by Material
proc means data=water_bottles n mean min max maxdec=2;
class Material;
var Price Capacity_ml;
title "Summary Stats by Bottle Material";
run;
Output:
Summary Stats by Bottle
Material |
Material | N Obs | Variable | N | Mean | Minimum | Maximum | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Aluminum | 4 |
|
|
|
|
| ||||||||||
Glass | 4 |
|
|
|
|
| ||||||||||
Plastic | 8 |
|
|
|
|
| ||||||||||
Stainless_Steel | 4 |
|
|
|
|
|
Step 4: Using PROC SQL for Advanced Queries
4.1: Average Price by Brand
proc sql;
title "Average Price of Bottles by Brand";
select Brand, mean(Price) as Avg_Price format=dollar8.2
from water_bottles
group by Brand
order by Avg_Price desc;
quit;
Output:
Average Price of Bottles by
Brand |
Brand | Avg_Price |
---|---|
Glacier | $15.99 |
EcoJug | $13.43 |
ChillMate | $12.42 |
AquaPure | $10.81 |
HydroBoost | $10.31 |
4.2: Top 5 Most Expensive Bottles
proc sql outobs=5;
title "Top 5 Expensive Bottles";
select * from water_bottles
order by Price desc;
quit;
Output:
Top 5 Expensive Bottles |
Bottle_ID | Brand | Material | Capacity_ml | Color | Insulated | Price | Stock | Country |
---|---|---|---|---|---|---|---|---|
15 | Glacier | Stainless_Steel | 2000 | Silver | Yes | 22.99 | 15 | China |
5 | Glacier | Stainless_Steel | 1000 | Silver | Yes | 18.99 | 50 | Japan |
9 | EcoJug | Stainless_Steel | 500 | White | Yes | 16.49 | 60 | Germany |
2 | ChillMate | Stainless_Steel | 750 | Black | Yes | 15.5 | 75 | Germany |
18 | EcoJug | Glass | 1000 | Transparent | No | 15.49 | 90 | India |
4.3: Total Stock Value by Country
proc sql;
title "Total Stock Value by Country";
select Country, sum(Stock * Price) as Total_Stock_Value format=dollar10.2
from water_bottles
group by Country;
quit;
Output:
Total Stock Value by
Country |
Country | Total_Stock_Value |
---|---|
China | $1,094.10 |
Germany | $2,801.90 |
India | $8,119.50 |
Japan | $2,344.50 |
USA | $6,729.45 |
Step 5: Using SAS Macros for Repeated Analysis
Define Macro
%macro analyze_by_material(mat=);
proc sql;
title "Analysis for Material: &mat";
select Brand, Capacity_ml, Price, Stock
from water_bottles
where Material="&mat";
quit;
proc means data=water_bottles(where=(Material="&mat")) n mean min max maxdec=2;
var Price Capacity_ml Stock;
title "Summary for &mat Bottles";
run;
%mend;
Call the Macro
%analyze_by_material(mat=Plastic)
Output:
Summary for Plastic
Bottles |
Variable | N | Mean | Minimum | Maximum | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
%analyze_by_material(mat=Glass)
Output:
Summary for Glass
Bottles |
Variable | N | Mean | Minimum | Maximum | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
%analyze_by_material(mat=Aluminum)
Output:
Summary for Aluminum
Bottles |
Variable | N | Mean | Minimum | Maximum | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
Step 6: Custom Format and Labeling
proc format;
value $insfmt 'Yes' = 'Insulated'
'No' = 'Not Insulated';
run;
proc print data=water_bottles;
format Insulated $insfmt.;
title "Water Bottles with Custom Insulation Labels";
run;
Output:
Water Bottles with Custom Insulation Labels |
Obs | Bottle_ID | Brand | Material | Capacity_ml | Color | Insulated | Price | Stock | Country |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | AquaPure | Plastic | 1000 | Blue | Insulated | 10.99 | 150 | USA |
2 | 2 | ChillMate | Stainless_Steel | 750 | Black | Insulated | 15.50 | 75 | Germany |
3 | 3 | EcoJug | Glass | 500 | Transparent | Not Insulated | 12.00 | 200 | India |
4 | 4 | HydroBoost | Plastic | 2000 | Green | Insulated | 14.25 | 100 | USA |
5 | 5 | Glacier | Stainless_Steel | 1000 | Silver | Insulated | 18.99 | 50 | Japan |
6 | 6 | ChillMate | Aluminum | 750 | Red | Not Insulated | 11.49 | 30 | China |
7 | 7 | AquaPure | Glass | 600 | Blue | Not Insulated | 13.75 | 120 | India |
8 | 8 | HydroBoost | Plastic | 1500 | Orange | Insulated | 9.99 | 85 | USA |
9 | 9 | EcoJug | Stainless_Steel | 500 | White | Insulated | 16.49 | 60 | Germany |
10 | 10 | Glacier | Plastic | 2000 | Grey | Not Insulated | 8.99 | 45 | China |
11 | 11 | AquaPure | Aluminum | 1000 | Yellow | Not Insulated | 10.50 | 95 | India |
12 | 12 | EcoJug | Plastic | 600 | Green | Insulated | 9.75 | 40 | USA |
13 | 13 | ChillMate | Glass | 750 | Black | Not Insulated | 14.20 | 25 | Japan |
14 | 14 | HydroBoost | Aluminum | 500 | Blue | Not Insulated | 10.00 | 65 | Germany |
15 | 15 | Glacier | Stainless_Steel | 2000 | Silver | Insulated | 22.99 | 15 | China |
16 | 16 | AquaPure | Plastic | 1500 | White | Not Insulated | 7.99 | 210 | India |
17 | 17 | ChillMate | Plastic | 750 | Red | Not Insulated | 8.49 | 120 | USA |
18 | 18 | EcoJug | Glass | 1000 | Transparent | Not Insulated | 15.49 | 90 | India |
19 | 19 | HydroBoost | Plastic | 500 | Black | Insulated | 6.99 | 200 | USA |
20 | 20 | Glacier | Aluminum | 1000 | Grey | Insulated | 13.00 | 80 | Japan |
Step 7: Creating a Flag Variable
data flagged_bottles;
set water_bottles;
if Price > 15 then High_Priced = "Yes";
else High_Priced = "No";
run;
proc freq data=flagged_bottles;
tables High_Priced;
title "High Priced Bottles";
run;
Output:
High Priced Bottles |
High_Priced | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
No | 15 | 75.00 | 15 | 75.00 |
Yes | 5 | 25.00 | 20 | 100.00 |
Step 8: Dataset Split – By Insulation
data insulated non_insulated;
set water_bottles;
if Insulated = "Yes" then output insulated;
else output non_insulated;
run;
proc print data=insulated;
title "Insulated Bottles";
run;
Output:
Insulated Bottles |
Obs | Bottle_ID | Brand | Material | Capacity_ml | Color | Insulated | Price | Stock | Country |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | AquaPure | Plastic | 1000 | Blue | Yes | 10.99 | 150 | USA |
2 | 2 | ChillMate | Stainless_Steel | 750 | Black | Yes | 15.50 | 75 | Germany |
3 | 4 | HydroBoost | Plastic | 2000 | Green | Yes | 14.25 | 100 | USA |
4 | 5 | Glacier | Stainless_Steel | 1000 | Silver | Yes | 18.99 | 50 | Japan |
5 | 8 | HydroBoost | Plastic | 1500 | Orange | Yes | 9.99 | 85 | USA |
6 | 9 | EcoJug | Stainless_Steel | 500 | White | Yes | 16.49 | 60 | Germany |
7 | 12 | EcoJug | Plastic | 600 | Green | Yes | 9.75 | 40 | USA |
8 | 15 | Glacier | Stainless_Steel | 2000 | Silver | Yes | 22.99 | 15 | China |
9 | 19 | HydroBoost | Plastic | 500 | Black | Yes | 6.99 | 200 | USA |
10 | 20 | Glacier | Aluminum | 1000 | Grey | Yes | 13.00 | 80 | Japan |
proc print data=non_insulated;
title "Non-Insulated Bottles";
run;
Output:
Non-Insulated Bottles |
Obs | Bottle_ID | Brand | Material | Capacity_ml | Color | Insulated | Price | Stock | Country |
---|---|---|---|---|---|---|---|---|---|
1 | 3 | EcoJug | Glass | 500 | Transparent | No | 12.00 | 200 | India |
2 | 6 | ChillMate | Aluminum | 750 | Red | No | 11.49 | 30 | China |
3 | 7 | AquaPure | Glass | 600 | Blue | No | 13.75 | 120 | India |
4 | 10 | Glacier | Plastic | 2000 | Grey | No | 8.99 | 45 | China |
5 | 11 | AquaPure | Aluminum | 1000 | Yellow | No | 10.50 | 95 | India |
6 | 13 | ChillMate | Glass | 750 | Black | No | 14.20 | 25 | Japan |
7 | 14 | HydroBoost | Aluminum | 500 | Blue | No | 10.00 | 65 | Germany |
8 | 16 | AquaPure | Plastic | 1500 | White | No | 7.99 | 210 | India |
9 | 17 | ChillMate | Plastic | 750 | Red | No | 8.49 | 120 | USA |
10 | 18 | EcoJug | Glass | 1000 | Transparent | No | 15.49 | 90 | India |
Step 9: Exporting the Dataset
proc export data=water_bottles
outfile="/mnt/data/water_bottles.csv"
dbms=csv
replace;
run;
- Get link
- X
- Other Apps
Comments
Post a Comment