186.ANALYZING A UNIQUE WATER BOTTLES DATASET IN SAS | USING PROC PRINT | PROC SORT | PROC MEANS | PROC FREQ | PROC SQL | PROC FORMAT | PROC REPORT | PROC GCHART | PROC UNIVARIATE | PROC SGPLOT | PROC CONTENTS | PROC DATASETS | PROC TRANSPOSE | PROC TABULATE FOR COMPREHENSIVE DATA EXPLORATION | SUMMARY STATISTICS | GRAPHICAL VISUALIZATION | AND SQL-BASED INSIGHTS

ANALYZING A UNIQUE WATER BOTTLES DATASET IN SAS | USING PROC PRINT | PROC SORT | PROC MEANS | PROC FREQ | PROC SQL | PROC FORMAT | PROC REPORT | PROC GCHART | PROC UNIVARIATE | PROC SGPLOT | PROC CONTENTS | PROC DATASETS | PROC TRANSPOSE | PROC TABULATE FOR COMPREHENSIVE DATA EXPLORATION | SUMMARY STATISTICS | GRAPHICAL VISUALIZATION | AND SQL-BASED INSIGHTS

/*A detailed SAS project focused on a dataset involving different types of water bottles.*/

/*Creating a unique water bottle dataset*/

Step 1: Data Preparation in SAS

data water_bottles;

    infile datalines dlm=' ' dsd truncover;

    input Bottle_ID 

          Brand :$20. 

          Material :$20. 

          Capacity_ml 

          Color :$15. 

          Insulated :$3. 

          Price 

          Stock 

          Country :$20.;

    datalines;

1 AquaPure Plastic 1000 Blue Yes 10.99 150 USA

2 ChillMate Stainless_Steel 750 Black Yes 15.50 75 Germany

3 EcoJug Glass 500 Transparent No 12.00 200 India

4 HydroBoost Plastic 2000 Green Yes 14.25 100 USA

5 Glacier Stainless_Steel 1000 Silver Yes 18.99 50 Japan

6 ChillMate Aluminum 750 Red No 11.49 30 China

7 AquaPure Glass 600 Blue No 13.75 120 India

8 HydroBoost Plastic 1500 Orange Yes 9.99 85 USA

9 EcoJug Stainless_Steel 500 White Yes 16.49 60 Germany

10 Glacier Plastic 2000 Grey No 8.99 45 China

11 AquaPure Aluminum 1000 Yellow No 10.50 95 India

12 EcoJug Plastic 600 Green Yes 9.75 40 USA

13 ChillMate Glass 750 Black No 14.20 25 Japan

14 HydroBoost Aluminum 500 Blue No 10.00 65 Germany

15 Glacier Stainless_Steel 2000 Silver Yes 22.99 15 China

16 AquaPure Plastic 1500 White No 7.99 210 India

17 ChillMate Plastic 750 Red No 8.49 120 USA

18 EcoJug Glass 1000 Transparent No 15.49 90 India

19 HydroBoost Plastic 500 Black Yes 6.99 200 USA

20 Glacier Aluminum 1000 Grey Yes 13.00 80 Japan

;

run;

Step 2: Basic PROC Exploration

2.1:To View the Data

proc print data=water_bottles noobs label;

    title "Water Bottles Dataset";

run;

Output:

                                                                              Water Bottles Dataset

Bottle_ID Brand Material Capacity_ml Color Insulated Price Stock Country
1 AquaPure Plastic 1000 Blue Yes 10.99 150 USA
2 ChillMate Stainless_Steel 750 Black Yes 15.50 75 Germany
3 EcoJug Glass 500 Transparent No 12.00 200 India
4 HydroBoost Plastic 2000 Green Yes 14.25 100 USA
5 Glacier Stainless_Steel 1000 Silver Yes 18.99 50 Japan
6 ChillMate Aluminum 750 Red No 11.49 30 China
7 AquaPure Glass 600 Blue No 13.75 120 India
8 HydroBoost Plastic 1500 Orange Yes 9.99 85 USA
9 EcoJug Stainless_Steel 500 White Yes 16.49 60 Germany
10 Glacier Plastic 2000 Grey No 8.99 45 China
11 AquaPure Aluminum 1000 Yellow No 10.50 95 India
12 EcoJug Plastic 600 Green Yes 9.75 40 USA
13 ChillMate Glass 750 Black No 14.20 25 Japan
14 HydroBoost Aluminum 500 Blue No 10.00 65 Germany
15 Glacier Stainless_Steel 2000 Silver Yes 22.99 15 China
16 AquaPure Plastic 1500 White No 7.99 210 India
17 ChillMate Plastic 750 Red No 8.49 120 USA
18 EcoJug Glass 1000 Transparent No 15.49 90 India
19 HydroBoost Plastic 500 Black Yes 6.99 200 USA
20 Glacier Aluminum 1000 Grey Yes 13.00 80 Japan


 2.2:Dataset Structure

 proc contents data=water_bottles;

    title "Dataset Structure for Water Bottles";

run;

Output:

                                                                          Dataset Structure for Water Bottles

                                                                              The CONTENTS Procedure

Data Set Name WORK.WATER_BOTTLES Observations 20
Member Type DATA Variables 9
Engine V9 Indexes 0
Created 14/09/2015 00:14:33 Observation Length 112
Last Modified 14/09/2015 00:14:33 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 584
Obs in First Data Page 20
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD10640_DESKTOP-QFAA4KV_\water_bottles.sas7bdat
Release Created 9.0401M2
Host Created X64_8HOME


Alphabetic List of Variables and Attributes
# Variable Type Len
1 Bottle_ID Num 8
2 Brand Char 20
4 Capacity_ml Num 8
5 Color Char 15
9 Country Char 20
6 Insulated Char 3
3 Material Char 20
7 Price Num 8
8 Stock Num 8


 2.3: PROC SORT and PROC FREQ

 proc sort data=water_bottles out=sorted_material;

    by Material;

run;

proc freq data=sorted_material;

    tables Material;

    title "Frequency of Bottle Materials";

run;

Output:

                                                       Frequency of Bottle Materials

                                                               The FREQ Procedure

Material Frequency Percent Cumulative
Frequency
Cumulative
Percent
Aluminum 4 20.00 4 20.00
Glass 4 20.00 8 40.00
Plastic 8 40.00 16 80.00
Stainless_Steel 4 20.00 20 100.00


Step 3: Aggregated Analysis

 3.1: Summary Statistics by Material

proc means data=water_bottles n mean min max maxdec=2;

    class Material;

    var Price Capacity_ml;

    title "Summary Stats by Bottle Material";

run;

Output:

                                                        Summary Stats by Bottle Material

                                                               The MEANS Procedure

Material N Obs Variable N Mean Minimum Maximum
Aluminum 4
Price
Capacity_ml
4
4
11.25
812.50
10.00
500.00
13.00
1000.00
Glass 4
Price
Capacity_ml
4
4
13.86
712.50
12.00
500.00
15.49
1000.00
Plastic 8
Price
Capacity_ml
8
8
9.68
1231.25
6.99
500.00
14.25
2000.00
Stainless_Steel 4
Price
Capacity_ml
4
4
18.49
1062.50
15.50
500.00
22.99
2000.00


Step 4: Using PROC SQL for Advanced Queries

 4.1: Average Price by Brand

proc sql;

    title "Average Price of Bottles by Brand";

    select Brand, mean(Price) as Avg_Price format=dollar8.2

    from water_bottles

    group by Brand

    order by Avg_Price desc;

quit;

Output:

                                                         Average Price of Bottles by Brand

Brand Avg_Price
Glacier $15.99
EcoJug $13.43
ChillMate $12.42
AquaPure $10.81
HydroBoost $10.31

 

4.2: Top 5 Most Expensive Bottles

proc sql outobs=5;

    title "Top 5 Expensive Bottles";

    select * from water_bottles

    order by Price desc;

quit;

Output:

                                                                       Top 5 Expensive Bottles

Bottle_ID Brand Material Capacity_ml Color Insulated Price Stock Country
15 Glacier Stainless_Steel 2000 Silver Yes 22.99 15 China
5 Glacier Stainless_Steel 1000 Silver Yes 18.99 50 Japan
9 EcoJug Stainless_Steel 500 White Yes 16.49 60 Germany
2 ChillMate Stainless_Steel 750 Black Yes 15.5 75 Germany
18 EcoJug Glass 1000 Transparent No 15.49 90 India


4.3: Total Stock Value by Country

proc sql;

    title "Total Stock Value by Country";

    select Country, sum(Stock * Price) as Total_Stock_Value format=dollar10.2

    from water_bottles

    group by Country;

quit;

Output:

                                                             Total Stock Value by Country

Country Total_Stock_Value
China $1,094.10
Germany $2,801.90
India $8,119.50
Japan $2,344.50
USA $6,729.45


Step 5: Using SAS Macros for Repeated Analysis

Define Macro

%macro analyze_by_material(mat=);

proc sql;

  title "Analysis for Material: &mat";

  select Brand, Capacity_ml, Price, Stock

  from water_bottles

  where Material="&mat";

quit;


proc means data=water_bottles(where=(Material="&mat")) n mean min max maxdec=2;

  var Price Capacity_ml Stock;

  title "Summary for &mat Bottles";

  run;

%mend;


Call the Macro

%analyze_by_material(mat=Plastic)

Output:

                                                             Summary for Plastic Bottles

                                                                The MEANS Procedure

Variable N Mean Minimum Maximum
Price
Capacity_ml
Stock
8
8
8
9.68
1231.25
118.75
6.99
500.00
40.00
14.25
2000.00
210.00

%analyze_by_material(mat=Glass)

Output:

                                                             Summary for Glass Bottles

                                                                The MEANS Procedure

Variable N Mean Minimum Maximum
Price
Capacity_ml
Stock
4
4
4
13.86
712.50
108.75
12.00
500.00
25.00
15.49
1000.00
200.00


%analyze_by_material(mat=Aluminum)

Output:

                                                          Summary for Aluminum Bottles

                                                              The MEANS Procedure    

Variable N Mean Minimum Maximum
Price
Capacity_ml
Stock
4
4
4
11.25
812.50
67.50
10.00
500.00
30.00
13.00
1000.00
95.00


Step 6: Custom Format and Labeling

proc format;

    value $insfmt 'Yes' = 'Insulated'

                   'No' = 'Not Insulated';

run;


proc print data=water_bottles;

    format Insulated $insfmt.;

    title "Water Bottles with Custom Insulation Labels";

run;

Output:

                                                          Water Bottles with Custom Insulation Labels

Obs Bottle_ID Brand Material Capacity_ml Color Insulated Price Stock Country
1 1 AquaPure Plastic 1000 Blue Insulated 10.99 150 USA
2 2 ChillMate Stainless_Steel 750 Black Insulated 15.50 75 Germany
3 3 EcoJug Glass 500 Transparent Not Insulated 12.00 200 India
4 4 HydroBoost Plastic 2000 Green Insulated 14.25 100 USA
5 5 Glacier Stainless_Steel 1000 Silver Insulated 18.99 50 Japan
6 6 ChillMate Aluminum 750 Red Not Insulated 11.49 30 China
7 7 AquaPure Glass 600 Blue Not Insulated 13.75 120 India
8 8 HydroBoost Plastic 1500 Orange Insulated 9.99 85 USA
9 9 EcoJug Stainless_Steel 500 White Insulated 16.49 60 Germany
10 10 Glacier Plastic 2000 Grey Not Insulated 8.99 45 China
11 11 AquaPure Aluminum 1000 Yellow Not Insulated 10.50 95 India
12 12 EcoJug Plastic 600 Green Insulated 9.75 40 USA
13 13 ChillMate Glass 750 Black Not Insulated 14.20 25 Japan
14 14 HydroBoost Aluminum 500 Blue Not Insulated 10.00 65 Germany
15 15 Glacier Stainless_Steel 2000 Silver Insulated 22.99 15 China
16 16 AquaPure Plastic 1500 White Not Insulated 7.99 210 India
17 17 ChillMate Plastic 750 Red Not Insulated 8.49 120 USA
18 18 EcoJug Glass 1000 Transparent Not Insulated 15.49 90 India
19 19 HydroBoost Plastic 500 Black Insulated 6.99 200 USA
20 20 Glacier Aluminum 1000 Grey Insulated 13.00 80 Japan


Step 7: Creating a Flag Variable

data flagged_bottles;

    set water_bottles;

    if Price > 15 then High_Priced = "Yes";

    else High_Priced = "No";

run;


proc freq data=flagged_bottles;

    tables High_Priced;

    title "High Priced Bottles";

run;

Output:

                                                                 High Priced Bottles

                                                                The FREQ Procedure

High_Priced Frequency Percent Cumulative
Frequency
Cumulative
Percent
No 15 75.00 15 75.00
Yes 5 25.00 20 100.00


Step 8: Dataset Split – By Insulation

data insulated non_insulated;

    set water_bottles;

    if Insulated = "Yes" then output insulated;

    else output non_insulated;

run;


proc print data=insulated;

    title "Insulated Bottles";

run;

Output:

                                                                          Insulated Bottles

Obs Bottle_ID Brand Material Capacity_ml Color Insulated Price Stock Country
1 1 AquaPure Plastic 1000 Blue Yes 10.99 150 USA
2 2 ChillMate Stainless_Steel 750 Black Yes 15.50 75 Germany
3 4 HydroBoost Plastic 2000 Green Yes 14.25 100 USA
4 5 Glacier Stainless_Steel 1000 Silver Yes 18.99 50 Japan
5 8 HydroBoost Plastic 1500 Orange Yes 9.99 85 USA
6 9 EcoJug Stainless_Steel 500 White Yes 16.49 60 Germany
7 12 EcoJug Plastic 600 Green Yes 9.75 40 USA
8 15 Glacier Stainless_Steel 2000 Silver Yes 22.99 15 China
9 19 HydroBoost Plastic 500 Black Yes 6.99 200 USA
10 20 Glacier Aluminum 1000 Grey Yes 13.00 80 Japan


proc print data=non_insulated;

    title "Non-Insulated Bottles";

run;

Output:

                                                                      Non-Insulated Bottles

Obs Bottle_ID Brand Material Capacity_ml Color Insulated Price Stock Country
1 3 EcoJug Glass 500 Transparent No 12.00 200 India
2 6 ChillMate Aluminum 750 Red No 11.49 30 China
3 7 AquaPure Glass 600 Blue No 13.75 120 India
4 10 Glacier Plastic 2000 Grey No 8.99 45 China
5 11 AquaPure Aluminum 1000 Yellow No 10.50 95 India
6 13 ChillMate Glass 750 Black No 14.20 25 Japan
7 14 HydroBoost Aluminum 500 Blue No 10.00 65 Germany
8 16 AquaPure Plastic 1500 White No 7.99 210 India
9 17 ChillMate Plastic 750 Red No 8.49 120 USA
10 18 EcoJug Glass 1000 Transparent No 15.49 90 India


Step 9: Exporting the Dataset

proc export data=water_bottles

    outfile="/mnt/data/water_bottles.csv"

    dbms=csv

    replace;

run;


PRACTICE AND COMMENT YOUR CODE: 

--->PLEASE FOLLOW OUR BLOG FOR MORE UPDATES.

PLEASE FOLLOW OUR TELEGRAM CHANNEL CLICK HERE

PLEASE FOLLOW OUR FACEBOOK PAGE  CLICK HERE

PLEASE FOLLOW OUR INSTAGRAM PAGE CLICK HERE

Comments