153.UNLOCKING RETAIL INSIGHTS: A COMPREHENSIVE GUIDE TO ANALYZING ONLINE TRANSACTION DATA USING SAS PROCEDURES
UNLOCKING RETAIL INSIGHTS: A COMPREHENSIVE GUIDE TO ANALYZING ONLINE TRANSACTION DATA USING SAS PROCEDURES
/*Create a unique dataset and apply various SAS procedures to analyze and manipulate the data */
Dataset: Online Retail Transactions
Variables:
Transaction_ID: Unique identifier for each transaction
Customer_ID: Unique identifier for each customer
Product_ID: Unique identifier for each product
Product_Category: Category to which the product belongs (e.g., Electronics, Clothing, Home & Kitchen)
Quantity: Number of units purchased
Price: Price per unit
Transaction_Date: Date of the transaction
Customer_Location: Geographic location of the customer
Creating the Dataset in SAS
/*First, we'll create the dataset using the DATA step in SAS:*/
data online_retail;
input Transaction_ID $ Customer_ID $ Product_ID $ Product_Category $ Quantity Price Transaction_Date :date9. Customer_Location $15.;
format Transaction_Date date9.;
datalines;
T001 C001 P001 Electronics 2 299.99 01JAN2025 Hyderabad
T002 C002 P002 Clothing 1 49.99 02JAN2025 Mumbai
T003 C001 P003 Home_Kitchen 3 19.99 03JAN2025 Hyderabad
T004 C003 P001 Electronics 1 299.99 04JAN2025 Delhi
T005 C004 P004 Clothing 2 39.99 05JAN2025 Bangalore
T006 C002 P005 Home_Kitchen 1 24.99 06JAN2025 Mumbai
T007 C005 P006 Electronics 1 199.99 07JAN2025 Chennai
T008 C006 P007 Clothing 4 29.99 08JAN2025 Kolkata
T009 C007 P008 Home_Kitchen 2 14.99 09JAN2025 Pune
T010 C008 P009 Electronics 1 499.99 10JAN2025 Ahmedabad
;
run;
proc print data=online_retail;
title 'Online Retail Transactions';
run;
Output:
Online Retail
Transactions |
Obs | Transaction_ID | Customer_ID | Product_ID | Product_Category | Quantity | Price | Transaction_Date | Customer_Location |
---|---|---|---|---|---|---|---|---|
1 | T001 | C001 | P001 | Electron | 2 | 299.99 | 01JAN2025 | Hyderabad |
2 | T002 | C002 | P002 | Clothing | 1 | 49.99 | 02JAN2025 | Mumbai |
3 | T003 | C001 | P003 | Home_Kit | 3 | 19.99 | 03JAN2025 | Hyderabad |
4 | T004 | C003 | P001 | Electron | 1 | 299.99 | 04JAN2025 | Delhi |
5 | T005 | C004 | P004 | Clothing | 2 | 39.99 | 05JAN2025 | Bangalore |
6 | T006 | C002 | P005 | Home_Kit | 1 | 24.99 | 06JAN2025 | Mumbai |
7 | T007 | C005 | P006 | Electron | 1 | 199.99 | 07JAN2025 | Chennai |
8 | T008 | C006 | P007 | Clothing | 4 | 29.99 | 08JAN2025 | Kolkata |
9 | T009 | C007 | P008 | Home_Kit | 2 | 14.99 | 09JAN2025 | Pune |
10 | T010 | C008 | P009 | Electron | 1 | 499.99 | 10JAN2025 | Ahmedabad |
1.PROC CONTENTS: Dataset Metadata
proc contents data=online_retail;
title 'Dataset Metadata';
run;
Explanation: Provides metadata about the dataset, including variable names, types, and formats.
Output:
Dataset Metadata |
Data Set Name | WORK.ONLINE_RETAIL | Observations | 10 |
---|---|---|---|
Member Type | DATA | Variables | 8 |
Engine | V9 | Indexes | 0 |
Created | 14/09/2015 00:22:51 | Observation Length | 72 |
Last Modified | 14/09/2015 00:22:51 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | WINDOWS_64 | ||
Encoding | wlatin1 Western (Windows) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 908 |
Obs in First Data Page | 10 |
Number of Data Set Repairs | 0 |
ExtendObsCounter | YES |
Filename | C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD15212_DESKTOP-QFAA4KV_\online_retail.sas7bdat |
Release Created | 9.0401M2 |
Host Created | X64_8HOME |
Alphabetic List of Variables and Attributes | ||||
---|---|---|---|---|
# | Variable | Type | Len | Format |
2 | Customer_ID | Char | 8 | |
8 | Customer_Location | Char | 15 | |
6 | Price | Num | 8 | |
4 | Product_Category | Char | 8 | |
3 | Product_ID | Char | 8 | |
5 | Quantity | Num | 8 | |
7 | Transaction_Date | Num | 8 | DATE9. |
1 | Transaction_ID | Char | 8 |
2.PROC SORT: Sorting the Dataset
proc sort data=online_retail out=sorted_data;
by Customer_Location;
run;
proc print;run;
Explanation: Sorts the dataset by Customer_Location.
Output:
Obs | Transaction_ID | Customer_ID | Product_ID | Product_Category | Quantity | Price | Transaction_Date | Customer_Location |
---|---|---|---|---|---|---|---|---|
1 | T010 | C008 | P009 | Electron | 1 | 499.99 | 10JAN2025 | Ahmedabad |
2 | T005 | C004 | P004 | Clothing | 2 | 39.99 | 05JAN2025 | Bangalore |
3 | T007 | C005 | P006 | Electron | 1 | 199.99 | 07JAN2025 | Chennai |
4 | T004 | C003 | P001 | Electron | 1 | 299.99 | 04JAN2025 | Delhi |
5 | T001 | C001 | P001 | Electron | 2 | 299.99 | 01JAN2025 | Hyderabad |
6 | T003 | C001 | P003 | Home_Kit | 3 | 19.99 | 03JAN2025 | Hyderabad |
7 | T008 | C006 | P007 | Clothing | 4 | 29.99 | 08JAN2025 | Kolkata |
8 | T002 | C002 | P002 | Clothing | 1 | 49.99 | 02JAN2025 | Mumbai |
9 | T006 | C002 | P005 | Home_Kit | 1 | 24.99 | 06JAN2025 | Mumbai |
10 | T009 | C007 | P008 | Home_Kit | 2 | 14.99 | 09JAN2025 | Pune |
3.PROC FREQ: Frequency Analysis
proc freq data=online_retail;
tables Product_Category Customer_Location;
title 'Frequency Analysis of Product Categories and Customer Locations';
run;
Explanation: Generates frequency tables for Product_Category and Customer_Location.
Output:
Frequency Analysis of Product Categories and Customer
Locations |
Product_Category | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Clothing | 3 | 30.00 | 3 | 30.00 |
Electron | 4 | 40.00 | 7 | 70.00 |
Home_Kit | 3 | 30.00 | 10 | 100.00 |
Customer_Location | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Ahmedabad | 1 | 10.00 | 1 | 10.00 |
Bangalore | 1 | 10.00 | 2 | 20.00 |
Chennai | 1 | 10.00 | 3 | 30.00 |
Delhi | 1 | 10.00 | 4 | 40.00 |
Hyderabad | 2 | 20.00 | 6 | 60.00 |
Kolkata | 1 | 10.00 | 7 | 70.00 |
Mumbai | 2 | 20.00 | 9 | 90.00 |
Pune | 1 | 10.00 | 10 | 100.00 |
4.PROC MEANS: Descriptive Statistics
proc means data=online_retail n mean std min max;
var Quantity Price;
title 'Descriptive Statistics for Quantity and Price';
run;
Explanation: Calculates descriptive statistics for Quantity and Price.
Output:
Descriptive Statistics for Quantity and
Price |
Variable | N | Mean | Std Dev | Minimum | Maximum | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
5.PROC UNIVARIATE: Detailed Statistical Analysis
proc univariate data=online_retail;
var Price;
histogram Price;
title 'Univariate Analysis of Price';
run;
Explanation: Provides detailed statistics and a histogram for the Price variable.
Output:
Univariate Analysis of
Price |
Moments | |||
---|---|---|---|
N | 10 | Sum Weights | 10 |
Mean | 147.99 | Sum Observations | 1479.9 |
Std Deviation | 169.052917 | Variance | 28578.8889 |
Skewness | 1.16783226 | Kurtosis | 0.37971014 |
Uncorrected SS | 476220.401 | Corrected SS | 257210 |
Coeff Variation | 114.232663 | Std Error Mean | 53.4592264 |
Basic Statistical Measures | |||
---|---|---|---|
Location | Variability | ||
Mean | 147.9900 | Std Deviation | 169.05292 |
Median | 44.9900 | Variance | 28579 |
Mode | 299.9900 | Range | 485.00000 |
Interquartile Range | 275.00000 |
Tests for Location: Mu0=0 | ||||
---|---|---|---|---|
Test | Statistic | p Value | ||
Student's t | t | 2.768278 | Pr > |t| | 0.0218 |
Sign | M | 5 | Pr >= |M| | 0.0020 |
Signed Rank | S | 27.5 | Pr >= |S| | 0.0020 |
Quantiles (Definition 5) | |
---|---|
Level | Quantile |
100% Max | 499.99 |
99% | 499.99 |
95% | 499.99 |
90% | 399.99 |
75% Q3 | 299.99 |
50% Median | 44.99 |
25% Q1 | 24.99 |
10% | 17.49 |
5% | 14.99 |
1% | 14.99 |
0% Min | 14.99 |
Extreme Observations | |||
---|---|---|---|
Lowest | Highest | ||
Value | Obs | Value | Obs |
14.99 | 9 | 49.99 | 2 |
19.99 | 3 | 199.99 | 7 |
24.99 | 6 | 299.99 | 1 |
29.99 | 8 | 299.99 | 4 |
39.99 | 5 | 499.99 | 10 |
6.PROC GCHART: Bar Chart of Product Categories
proc gchart data=online_retail;
vbar Product_Category / sumvar=Quantity type=sum;
title 'Total Quantity Sold per Product Category';
run;
Explanation: Creates a vertical bar chart showing the total quantity sold per product category.
7.PROC SQL: SQL Queries in SAS
proc sql;
select Customer_ID, sum(Quantity*Price) as Total_Spent
from online_retail
group by Customer_ID
order by Total_Spent desc;
quit;
Explanation: Calculates the total amount spent by each customer.
Output:
Customer_ID | Total_Spent |
---|---|
C001 | 659.95 |
C008 | 499.99 |
C003 | 299.99 |
C005 | 199.99 |
C006 | 119.96 |
C004 | 79.98 |
C002 | 74.98 |
C007 | 29.98 |
8.PROC FORMAT: Custom Formats
proc format;
value $catfmt
'Electronics' = 'Elec'
'Clothing' = 'Cloth'
'Home_Kitchen' = 'Home';
run;
proc print data=online_retail;
format Product_Category $catfmt.;
title 'Dataset with Formatted Product Categories';
run;
Explanation: Applies custom formats to Product_Category.
Output:
Dataset with Formatted Product
Categories |
Obs | Transaction_ID | Customer_ID | Product_ID | Product_Category | Quantity | Price | Transaction_Date | Customer_Location |
---|---|---|---|---|---|---|---|---|
1 | T001 | C001 | P001 | Elect | 2 | 299.99 | 01JAN2025 | Hyderabad |
2 | T002 | C002 | P002 | Cloth | 1 | 49.99 | 02JAN2025 | Mumbai |
3 | T003 | C001 | P003 | Home_ | 3 | 19.99 | 03JAN2025 | Hyderabad |
4 | T004 | C003 | P001 | Elect | 1 | 299.99 | 04JAN2025 | Delhi |
5 | T005 | C004 | P004 | Cloth | 2 | 39.99 | 05JAN2025 | Bangalore |
6 | T006 | C002 | P005 | Home_ | 1 | 24.99 | 06JAN2025 | Mumbai |
7 | T007 | C005 | P006 | Elect | 1 | 199.99 | 07JAN2025 | Chennai |
8 | T008 | C006 | P007 | Cloth | 4 | 29.99 | 08JAN2025 | Kolkata |
9 | T009 | C007 | P008 | Home_ | 2 | 14.99 | 09JAN2025 | Pune |
10 | T010 | C008 | P009 | Elect | 1 | 499.99 | 10JAN2025 | Ahmedabad |
9.PROC REPORT: Customized Reports
proc report data=online_retail nowd;
column Customer_Location Product_Category Quantity;
define Customer_Location / group;
define Product_Category / group;
define Quantity / sum;
title 'Report: Quantity Sold by Location and Category';
run;
Explanation: Generates a report showing the sum of quantities sold by location and product category.
Output
Report: Quantity Sold by Location and
Category |
Customer_Location | Product_Category | Quantity |
---|---|---|
Ahmedabad | Electron | 1 |
Bangalore | Clothing | 2 |
Chennai | Electron | 1 |
Delhi | Electron | 1 |
Hyderabad | Electron | 2 |
Home_Kit | 3 | |
Kolkata | Clothing | 4 |
Mumbai | Clothing | 1 |
Home_Kit | 1 | |
Pune | Home_Kit | 2 |
10.PROC SGPLOT: Scatter Plot
proc sgplot data=online_retail;
scatter x=Quantity y=Price / group=Product_Category;
title 'Scatter Plot of Quantity vs. Price by Product Category';
run;
Explanation: Creates a scatter plot of Quantity vs. Price, grouped by Product_Category.
11.PROC TRANSPOSE: Reshaping the Dataset
proc transpose data=online_retail out=transposed_data;
by Customer_ID notsorted;
id Product_Category;
var Quantity;
run;
proc print;run;
Explanation: Transposes the dataset to have product categories as columns.
Output:
Obs | Customer_ID | _NAME_ | Electron | Clothing | Home_Kit |
---|---|---|---|---|---|
1 | C001 | Quantity | 2 | . | . |
2 | C002 | Quantity | . | 1 | . |
3 | C001 | Quantity | . | . | 3 |
4 | C003 | Quantity | 1 | . | . |
5 | C004 | Quantity | . | 2 | . |
6 | C002 | Quantity | . | . | 1 |
7 | C005 | Quantity | 1 | . | . |
8 | C006 | Quantity | . | 4 | . |
9 | C007 | Quantity | . | . | 2 |
10 | C008 | Quantity | 1 | . | . |
12.PROC SORT with NODUPKEY: Removing Duplicates
proc sort data=online_retail nodupkey out=unique_transactions;
by Transaction_ID;
run;
proc print;run;
Explanation: Removes duplicate transactions based on Transaction_ID.
Output:
Obs | Transaction_ID | Customer_ID | Product_ID | Product_Category | Quantity | Price | Transaction_Date | Customer_Location |
---|---|---|---|---|---|---|---|---|
1 | T001 | C001 | P001 | Electron | 2 | 299.99 | 01JAN2025 | Hyderabad |
2 | T002 | C002 | P002 | Clothing | 1 | 49.99 | 02JAN2025 | Mumbai |
3 | T003 | C001 | P003 | Home_Kit | 3 | 19.99 | 03JAN2025 | Hyderabad |
4 | T004 | C003 | P001 | Electron | 1 | 299.99 | 04JAN2025 | Delhi |
5 | T005 | C004 | P004 | Clothing | 2 | 39.99 | 05JAN2025 | Bangalore |
6 | T006 | C002 | P005 | Home_Kit | 1 | 24.99 | 06JAN2025 | Mumbai |
7 | T007 | C005 | P006 | Electron | 1 | 199.99 | 07JAN2025 | Chennai |
8 | T008 | C006 | P007 | Clothing | 4 | 29.99 | 08JAN2025 | Kolkata |
9 | T009 | C007 | P008 | Home_Kit | 2 | 14.99 | 09JAN2025 | Pune |
10 | T010 | C008 | P009 | Electron | 1 | 499.99 | 10JAN2025 | Ahmedabad |
13.PROC DATASETS: Managing Datasets
proc datasets library=work;
copy out=backup;
select online_retail;
run;
Explanation: Copies the online_retail dataset to a backup library.
Output:
Directory | |
---|---|
Libref | WORK |
Engine | V9 |
Physical Name | C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD15212_DESKTOP-QFAA4KV_ |
Filename | C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD15212_DESKTOP-QFAA4KV_ |
# | Name | Member Type | File Size | Last Modified |
---|---|---|---|---|
1 | FORMATS | CATALOG | 17408 | 14/04/2025 09:42:19 |
2 | ONLINE_RETAIL | DATA | 131072 | 14/04/2025 09:23:54 |
3 | SORTED_DATA | DATA | 131072 | 14/04/2025 09:38:20 |
4 | TRANSPOSED_DATA | DATA | 131072 | 14/04/2025 09:46:23 |
5 | UNIQUE_TRANSACTIONS | DATA | 131072 | 14/04/2025 09:47:21 |
14.PROC EXPORT: Exporting Data
proc export data=online_retail
outfile='C:\Users\YourUsername\Documents\online_retail.csv'
dbms = csv
replace;
run;
Explanation: Export the csv file to out of the sas environment.
Comments
Post a Comment