153.UNLOCKING RETAIL INSIGHTS: A COMPREHENSIVE GUIDE TO ANALYZING ONLINE TRANSACTION DATA USING SAS PROCEDURES

 UNLOCKING RETAIL INSIGHTS: A COMPREHENSIVE GUIDE TO ANALYZING ONLINE TRANSACTION DATA USING SAS PROCEDURES


/*Create a unique dataset and apply various SAS procedures to analyze and manipulate the data */

Dataset: Online Retail Transactions

Variables:

Transaction_ID: Unique identifier for each transaction

Customer_ID: Unique identifier for each customer

Product_ID: Unique identifier for each product

Product_Category: Category to which the product belongs (e.g., Electronics, Clothing, Home & Kitchen)

Quantity: Number of units purchased

Price: Price per unit

Transaction_Date: Date of the transaction

Customer_Location: Geographic location of the customer


Creating the Dataset in SAS

/*First, we'll create the dataset using the DATA step in SAS:*/

data online_retail;

    input Transaction_ID $ Customer_ID $ Product_ID $ Product_Category $ Quantity Price Transaction_Date :date9. Customer_Location $15.;

    format Transaction_Date date9.;

    datalines;

T001 C001 P001 Electronics 2 299.99 01JAN2025 Hyderabad

T002 C002 P002 Clothing 1 49.99 02JAN2025 Mumbai

T003 C001 P003 Home_Kitchen 3 19.99 03JAN2025 Hyderabad

T004 C003 P001 Electronics 1 299.99 04JAN2025 Delhi

T005 C004 P004 Clothing 2 39.99 05JAN2025 Bangalore

T006 C002 P005 Home_Kitchen 1 24.99 06JAN2025 Mumbai

T007 C005 P006 Electronics 1 199.99 07JAN2025 Chennai

T008 C006 P007 Clothing 4 29.99 08JAN2025 Kolkata

T009 C007 P008 Home_Kitchen 2 14.99 09JAN2025 Pune

T010 C008 P009 Electronics 1 499.99 10JAN2025 Ahmedabad

;

run;

proc print data=online_retail;

    title 'Online Retail Transactions';

run;

Output:

                                                                              Online Retail Transactions

Obs Transaction_ID Customer_ID Product_ID Product_Category Quantity Price Transaction_Date Customer_Location
1 T001 C001 P001 Electron 2 299.99 01JAN2025 Hyderabad
2 T002 C002 P002 Clothing 1 49.99 02JAN2025 Mumbai
3 T003 C001 P003 Home_Kit 3 19.99 03JAN2025 Hyderabad
4 T004 C003 P001 Electron 1 299.99 04JAN2025 Delhi
5 T005 C004 P004 Clothing 2 39.99 05JAN2025 Bangalore
6 T006 C002 P005 Home_Kit 1 24.99 06JAN2025 Mumbai
7 T007 C005 P006 Electron 1 199.99 07JAN2025 Chennai
8 T008 C006 P007 Clothing 4 29.99 08JAN2025 Kolkata
9 T009 C007 P008 Home_Kit 2 14.99 09JAN2025 Pune
10 T010 C008 P009 Electron 1 499.99 10JAN2025 Ahmedabad


1.PROC CONTENTS: Dataset Metadata

proc contents data=online_retail;

    title 'Dataset Metadata';

run;

Explanation: Provides metadata about the dataset, including variable names, types, and formats.

Output:

                                                                    Dataset Metadata

                                                       The CONTENTS Procedure

Data Set Name WORK.ONLINE_RETAIL Observations 10
Member Type DATA Variables 8
Engine V9 Indexes 0
Created 14/09/2015 00:22:51 Observation Length 72
Last Modified 14/09/2015 00:22:51 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 908
Obs in First Data Page 10
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD15212_DESKTOP-QFAA4KV_\online_retail.sas7bdat
Release Created 9.0401M2
Host Created X64_8HOME


Alphabetic List of Variables and Attributes
# Variable Type Len Format
2 Customer_ID Char 8  
8 Customer_Location Char 15  
6 Price Num 8  
4 Product_Category Char 8  
3 Product_ID Char 8  
5 Quantity Num 8  
7 Transaction_Date Num 8 DATE9.
1 Transaction_ID Char 8  


2.PROC SORT: Sorting the Dataset

proc sort data=online_retail out=sorted_data;

    by Customer_Location;

run;

proc print;run;

Explanation: Sorts the dataset by Customer_Location.

Output:

Obs Transaction_ID Customer_ID Product_ID Product_Category Quantity Price Transaction_Date Customer_Location
1 T010 C008 P009 Electron 1 499.99 10JAN2025 Ahmedabad
2 T005 C004 P004 Clothing 2 39.99 05JAN2025 Bangalore
3 T007 C005 P006 Electron 1 199.99 07JAN2025 Chennai
4 T004 C003 P001 Electron 1 299.99 04JAN2025 Delhi
5 T001 C001 P001 Electron 2 299.99 01JAN2025 Hyderabad
6 T003 C001 P003 Home_Kit 3 19.99 03JAN2025 Hyderabad
7 T008 C006 P007 Clothing 4 29.99 08JAN2025 Kolkata
8 T002 C002 P002 Clothing 1 49.99 02JAN2025 Mumbai
9 T006 C002 P005 Home_Kit 1 24.99 06JAN2025 Mumbai
10 T009 C007 P008 Home_Kit 2 14.99 09JAN2025 Pune


3.PROC FREQ: Frequency Analysis

proc freq data=online_retail;

    tables Product_Category Customer_Location;

    title 'Frequency Analysis of Product Categories and Customer Locations';

run;

Explanation: Generates frequency tables for Product_Category and Customer_Location.

Output:

                          Frequency Analysis of Product Categories and Customer Locations

                                                       The FREQ Procedure

Product_Category Frequency Percent Cumulative
Frequency
Cumulative
Percent
Clothing 3 30.00 3 30.00
Electron 4 40.00 7 70.00
Home_Kit 3 30.00 10 100.00


Customer_Location Frequency Percent Cumulative
Frequency
Cumulative
Percent
Ahmedabad 1 10.00 1 10.00
Bangalore 1 10.00 2 20.00
Chennai 1 10.00 3 30.00
Delhi 1 10.00 4 40.00
Hyderabad 2 20.00 6 60.00
Kolkata 1 10.00 7 70.00
Mumbai 2 20.00 9 90.00
Pune 1 10.00 10 100.00


4.PROC MEANS: Descriptive Statistics

proc means data=online_retail n mean std min max;

    var Quantity Price;

    title 'Descriptive Statistics for Quantity and Price';

run;

Explanation: Calculates descriptive statistics for Quantity and Price.

Output:

                                              Descriptive Statistics for Quantity and Price

                                                          The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum
Quantity
Price
10
10
1.8000000
147.9900000
1.0327956
169.0529174
1.0000000
14.9900000
4.0000000
499.9900000


5.PROC UNIVARIATE: Detailed Statistical Analysis

proc univariate data=online_retail;

    var Price;

    histogram Price;

    title 'Univariate Analysis of Price';

run;

Explanation: Provides detailed statistics and a histogram for the Price variable.

Output:

                                                            Univariate Analysis of Price

                                                          The UNIVARIATE Procedure

                                                                     Variable: Price

Moments
N 10 Sum Weights 10
Mean 147.99 Sum Observations 1479.9
Std Deviation 169.052917 Variance 28578.8889
Skewness 1.16783226 Kurtosis 0.37971014
Uncorrected SS 476220.401 Corrected SS 257210
Coeff Variation 114.232663 Std Error Mean 53.4592264


Basic Statistical Measures
Location Variability
Mean 147.9900 Std Deviation 169.05292
Median 44.9900 Variance 28579
Mode 299.9900 Range 485.00000
    Interquartile Range 275.00000


Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 2.768278 Pr > |t| 0.0218
Sign M 5 Pr >= |M| 0.0020
Signed Rank S 27.5 Pr >= |S| 0.0020


Quantiles (Definition 5)
Level Quantile
100% Max 499.99
99% 499.99
95% 499.99
90% 399.99
75% Q3 299.99
50% Median 44.99
25% Q1 24.99
10% 17.49
5% 14.99
1% 14.99
0% Min 14.99


Extreme Observations
Lowest Highest
Value Obs Value Obs
14.99 9 49.99 2
19.99 3 199.99 7
24.99 6 299.99 1
29.99 8 299.99 4
39.99 5 499.99 10


6.PROC GCHART: Bar Chart of Product Categories

proc gchart data=online_retail;

    vbar Product_Category / sumvar=Quantity type=sum;

    title 'Total Quantity Sold per Product Category';

run;

Explanation: Creates a vertical bar chart showing the total quantity sold per product category.


7.PROC SQL: SQL Queries in SAS

proc sql;

    select Customer_ID, sum(Quantity*Price) as Total_Spent

    from online_retail

    group by Customer_ID

    order by Total_Spent desc;

quit;

Explanation: Calculates the total amount spent by each customer.

Output:

Customer_ID Total_Spent
C001 659.95
C008 499.99
C003 299.99
C005 199.99
C006 119.96
C004 79.98
C002 74.98
C007 29.98


8.PROC FORMAT: Custom Formats

proc format;

    value $catfmt

        'Electronics' = 'Elec'

        'Clothing' = 'Cloth'

        'Home_Kitchen' = 'Home';

run;


proc print data=online_retail;

    format Product_Category $catfmt.;

    title 'Dataset with Formatted Product Categories';

run;

Explanation: Applies custom formats to Product_Category.

Output:

                                                                Dataset with Formatted Product Categories

Obs Transaction_ID Customer_ID Product_ID Product_Category Quantity Price Transaction_Date Customer_Location
1 T001 C001 P001 Elect 2 299.99 01JAN2025 Hyderabad
2 T002 C002 P002 Cloth 1 49.99 02JAN2025 Mumbai
3 T003 C001 P003 Home_ 3 19.99 03JAN2025 Hyderabad
4 T004 C003 P001 Elect 1 299.99 04JAN2025 Delhi
5 T005 C004 P004 Cloth 2 39.99 05JAN2025 Bangalore
6 T006 C002 P005 Home_ 1 24.99 06JAN2025 Mumbai
7 T007 C005 P006 Elect 1 199.99 07JAN2025 Chennai
8 T008 C006 P007 Cloth 4 29.99 08JAN2025 Kolkata
9 T009 C007 P008 Home_ 2 14.99 09JAN2025 Pune
10 T010 C008 P009 Elect 1 499.99 10JAN2025 Ahmedabad


9.PROC REPORT: Customized Reports

proc report data=online_retail nowd;

    column Customer_Location Product_Category Quantity;

    define Customer_Location / group;

    define Product_Category / group;

    define Quantity / sum;

    title 'Report: Quantity Sold by Location and Category';

run;

Explanation: Generates a report showing the sum of quantities sold by location and product category.

Output

                                            Report: Quantity Sold by Location and Category

Customer_Location Product_Category Quantity
Ahmedabad Electron 1
Bangalore Clothing 2
Chennai Electron 1
Delhi Electron 1
Hyderabad Electron 2
  Home_Kit 3
Kolkata Clothing 4
Mumbai Clothing 1
  Home_Kit 1
Pune Home_Kit 2


10.PROC SGPLOT: Scatter Plot

proc sgplot data=online_retail;

    scatter x=Quantity y=Price / group=Product_Category;

    title 'Scatter Plot of Quantity vs. Price by Product Category';

run;

Explanation: Creates a scatter plot of Quantity vs. Price, grouped by Product_Category.


11.PROC TRANSPOSE: Reshaping the Dataset

proc transpose data=online_retail out=transposed_data;

    by Customer_ID notsorted;

    id Product_Category;

    var Quantity;

run;

proc print;run;

Explanation: Transposes the dataset to have product categories as columns.

Output:

Obs Customer_ID _NAME_ Electron Clothing Home_Kit
1 C001 Quantity 2 . .
2 C002 Quantity . 1 .
3 C001 Quantity . . 3
4 C003 Quantity 1 . .
5 C004 Quantity . 2 .
6 C002 Quantity . . 1
7 C005 Quantity 1 . .
8 C006 Quantity . 4 .
9 C007 Quantity . . 2
10 C008 Quantity 1 . .


12.PROC SORT with NODUPKEY: Removing Duplicates

proc sort data=online_retail nodupkey out=unique_transactions;

    by Transaction_ID;

run;

proc print;run;

Explanation: Removes duplicate transactions based on Transaction_ID.

Output:

Obs Transaction_ID Customer_ID Product_ID Product_Category Quantity Price Transaction_Date Customer_Location
1 T001 C001 P001 Electron 2 299.99 01JAN2025 Hyderabad
2 T002 C002 P002 Clothing 1 49.99 02JAN2025 Mumbai
3 T003 C001 P003 Home_Kit 3 19.99 03JAN2025 Hyderabad
4 T004 C003 P001 Electron 1 299.99 04JAN2025 Delhi
5 T005 C004 P004 Clothing 2 39.99 05JAN2025 Bangalore
6 T006 C002 P005 Home_Kit 1 24.99 06JAN2025 Mumbai
7 T007 C005 P006 Electron 1 199.99 07JAN2025 Chennai
8 T008 C006 P007 Clothing 4 29.99 08JAN2025 Kolkata
9 T009 C007 P008 Home_Kit 2 14.99 09JAN2025 Pune
10 T010 C008 P009 Electron 1 499.99 10JAN2025 Ahmedabad


13.PROC DATASETS: Managing Datasets

proc datasets library=work;

    copy out=backup;

    select online_retail;

run;

Explanation: Copies the online_retail dataset to a backup library.

Output:

Directory
Libref WORK
Engine V9
Physical Name C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD15212_DESKTOP-QFAA4KV_
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD15212_DESKTOP-QFAA4KV_


# Name Member Type File Size Last Modified
1 FORMATS CATALOG 17408 14/04/2025 09:42:19
2 ONLINE_RETAIL DATA 131072 14/04/2025 09:23:54
3 SORTED_DATA DATA 131072 14/04/2025 09:38:20
4 TRANSPOSED_DATA DATA 131072 14/04/2025 09:46:23
5 UNIQUE_TRANSACTIONS DATA 131072 14/04/2025 09:47:21


14.PROC EXPORT: Exporting Data

proc export data=online_retail

    outfile='C:\Users\YourUsername\Documents\online_retail.csv'

    dbms = csv

replace;

run;

Explanation: Export the csv file to out of the sas environment.


PRACTICE AND COMMENT YOUR CODE: 

-->PLEASE FOLLOW OUR BLOG FOR MORE UPDATES.

TO FOLLOW OUR TELEGRAM CHANNEL CLICK HERE

Comments