E-COMMERCE ANALYTICS PROJECT USING DATA STEP | PROC SQL | PROC SUMMARY | PROC RANK | PROC SGPLOT WITH MACROS AND DATE FORMATS LIKE INTCK & INTNX
options nocenter;
Step 1: Create a raw e-commerce dataset
data ecommerce_raw;
length Product_Type $20 Brand $15 User_Age_Group $7;
format Order_Date Launch_Date date9.;
input Product_ID Product_Type $ Brand $ Sales Rating Return_Rate User_Age_Group $
Order_Date :date9. Launch_Date :date9.;
datalines;
1 Smartphone BrandX 95000 4.7 2.5 25-34 15JAN2025 01NOV2024
2 Laptop BrandY 120000 4.5 3.0 25-34 20JAN2025 15OCT2024
3 Headphones BrandZ 45000 4.2 5.0 18-24 05JAN2025 20SEP2024
4 Smartwatch BrandX 60000 4.0 4.5 18-24 10JAN2025 10NOV2024
5 Shoes BrandA 38000 3.8 6.5 35-44 18JAN2025 01AUG2024
6 Clothing BrandB 52000 4.1 7.0 35-44 25JAN2025 05JUL2024
7 Groceries BrandC 30000 4.6 1.5 45+ 02JAN2025 01JAN2024
8 Beauty BrandD 42000 4.3 3.5 25-34 12JAN2025 10JUL2024
9 Books BrandE 25000 4.8 1.0 18-24 08JAN2025 01JAN2023
10 Furniture BrandF 135000 4.4 2.0 35-44 28JAN2025 15SEP2024
11 Tablet BrandY 78000 4.2 3.2 25-34 22JAN2025 10DEC2024
12 GameConsole BrandG 110000 4.9 2.8 18-24 30JAN2025 01NOV2024
;
run;
proc print data=ecommerce_raw;
run;
OUTPUT:
| Obs | Product_Type | Brand | User_Age_Group | Order_Date | Launch_Date | Product_ID | Sales | Rating | Return_Rate |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Smartphone | BrandX | 25-34 | 15JAN2025 | 01NOV2024 | 1 | 95000 | 4.7 | 2.5 |
| 2 | Laptop | BrandY | 25-34 | 20JAN2025 | 15OCT2024 | 2 | 120000 | 4.5 | 3.0 |
| 3 | Headphones | BrandZ | 18-24 | 05JAN2025 | 20SEP2024 | 3 | 45000 | 4.2 | 5.0 |
| 4 | Smartwatch | BrandX | 18-24 | 10JAN2025 | 10NOV2024 | 4 | 60000 | 4.0 | 4.5 |
| 5 | Shoes | BrandA | 35-44 | 18JAN2025 | 01AUG2024 | 5 | 38000 | 3.8 | 6.5 |
| 6 | Clothing | BrandB | 35-44 | 25JAN2025 | 05JUL2024 | 6 | 52000 | 4.1 | 7.0 |
| 7 | Groceries | BrandC | 45+ | 02JAN2025 | 01JAN2024 | 7 | 30000 | 4.6 | 1.5 |
| 8 | Beauty | BrandD | 25-34 | 12JAN2025 | 10JUL2024 | 8 | 42000 | 4.3 | 3.5 |
| 9 | Books | BrandE | 18-24 | 08JAN2025 | 01JAN2023 | 9 | 25000 | 4.8 | 1.0 |
| 10 | Furniture | BrandF | 35-44 | 28JAN2025 | 15SEP2024 | 10 | 135000 | 4.4 | 2.0 |
| 11 | Tablet | BrandY | 25-34 | 22JAN2025 | 10DEC2024 | 11 | 78000 | 4.2 | 3.2 |
| 12 | GameConsole | BrandG | 18-24 | 30JAN2025 | 01NOV2024 | 12 | 110000 | 4.9 | 2.8 |
Step 2: Macro to add date-based metrics
%macro add_date_metrics(in=, out=, analysis_date=);
data &out.;
set &in.;
format Order_Date Launch_Date Return_Window_End date9.;
/* Days between launch and order (product maturity) */
Days_Since_Launch = intck('day', Launch_Date, Order_Date);
/* Months between launch and order */
Months_Between_Launch_Order = intck('month', Launch_Date, Order_Date);
/* 30-day return window end date using INTNX */
Return_Window_End = intnx('day', Order_Date, 30, 'same');
/* Days since order relative to analysis date */
Days_Since_Order = intck('day', Order_Date, &analysis_date);
run;
proc print data=&out.;
run;
%mend;
%let analysis_date = '31JAN2025'd;
%add_date_metrics(in=ecommerce_raw,out=ecommerce_dates,analysis_date=&analysis_date);
OUTPUT:
| Obs | Product_Type | Brand | User_Age_Group | Order_Date | Launch_Date | Product_ID | Sales | Rating | Return_Rate | Return_Window_End | Days_Since_Launch | Months_Between_Launch_Order | Days_Since_Order |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Smartphone | BrandX | 25-34 | 15JAN2025 | 01NOV2024 | 1 | 95000 | 4.7 | 2.5 | 14FEB2025 | 75 | 2 | 16 |
| 2 | Laptop | BrandY | 25-34 | 20JAN2025 | 15OCT2024 | 2 | 120000 | 4.5 | 3.0 | 19FEB2025 | 97 | 3 | 11 |
| 3 | Headphones | BrandZ | 18-24 | 05JAN2025 | 20SEP2024 | 3 | 45000 | 4.2 | 5.0 | 04FEB2025 | 107 | 4 | 26 |
| 4 | Smartwatch | BrandX | 18-24 | 10JAN2025 | 10NOV2024 | 4 | 60000 | 4.0 | 4.5 | 09FEB2025 | 61 | 2 | 21 |
| 5 | Shoes | BrandA | 35-44 | 18JAN2025 | 01AUG2024 | 5 | 38000 | 3.8 | 6.5 | 17FEB2025 | 170 | 5 | 13 |
| 6 | Clothing | BrandB | 35-44 | 25JAN2025 | 05JUL2024 | 6 | 52000 | 4.1 | 7.0 | 24FEB2025 | 204 | 6 | 6 |
| 7 | Groceries | BrandC | 45+ | 02JAN2025 | 01JAN2024 | 7 | 30000 | 4.6 | 1.5 | 01FEB2025 | 367 | 12 | 29 |
| 8 | Beauty | BrandD | 25-34 | 12JAN2025 | 10JUL2024 | 8 | 42000 | 4.3 | 3.5 | 11FEB2025 | 186 | 6 | 19 |
| 9 | Books | BrandE | 18-24 | 08JAN2025 | 01JAN2023 | 9 | 25000 | 4.8 | 1.0 | 07FEB2025 | 738 | 24 | 23 |
| 10 | Furniture | BrandF | 35-44 | 28JAN2025 | 15SEP2024 | 10 | 135000 | 4.4 | 2.0 | 27FEB2025 | 135 | 4 | 3 |
| 11 | Tablet | BrandY | 25-34 | 22JAN2025 | 10DEC2024 | 11 | 78000 | 4.2 | 3.2 | 21FEB2025 | 43 | 1 | 9 |
| 12 | GameConsole | BrandG | 18-24 | 30JAN2025 | 01NOV2024 | 12 | 110000 | 4.9 | 2.8 | 01MAR2025 | 90 | 2 | 1 |
Step 3: Macro for market segmentation
%macro market_segment(in=, out=);
data &out.;
set &in.;
length Market_Segment $15;
if Sales >= 100000 and Return_Rate < 3 then Market_Segment = 'Premium Loyal';
else if Sales >= 100000 and Return_Rate >= 3 then Market_Segment = 'Premium Risky';
else if 60000 <= Sales < 100000 then Market_Segment = 'Mid-Market';
else Market_Segment = 'Value Segment';
run;
proc print data=&out.;
run;
%mend;
%market_segment(in=ecommerce_dates,out=ecommerce_final);
OUTPUT:
| Obs | Product_Type | Brand | User_Age_Group | Order_Date | Launch_Date | Product_ID | Sales | Rating | Return_Rate | Return_Window_End | Days_Since_Launch | Months_Between_Launch_Order | Days_Since_Order | Market_Segment |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Smartphone | BrandX | 25-34 | 15JAN2025 | 01NOV2024 | 1 | 95000 | 4.7 | 2.5 | 14FEB2025 | 75 | 2 | 16 | Mid-Market |
| 2 | Laptop | BrandY | 25-34 | 20JAN2025 | 15OCT2024 | 2 | 120000 | 4.5 | 3.0 | 19FEB2025 | 97 | 3 | 11 | Premium Risky |
| 3 | Headphones | BrandZ | 18-24 | 05JAN2025 | 20SEP2024 | 3 | 45000 | 4.2 | 5.0 | 04FEB2025 | 107 | 4 | 26 | Value Segment |
| 4 | Smartwatch | BrandX | 18-24 | 10JAN2025 | 10NOV2024 | 4 | 60000 | 4.0 | 4.5 | 09FEB2025 | 61 | 2 | 21 | Mid-Market |
| 5 | Shoes | BrandA | 35-44 | 18JAN2025 | 01AUG2024 | 5 | 38000 | 3.8 | 6.5 | 17FEB2025 | 170 | 5 | 13 | Value Segment |
| 6 | Clothing | BrandB | 35-44 | 25JAN2025 | 05JUL2024 | 6 | 52000 | 4.1 | 7.0 | 24FEB2025 | 204 | 6 | 6 | Value Segment |
| 7 | Groceries | BrandC | 45+ | 02JAN2025 | 01JAN2024 | 7 | 30000 | 4.6 | 1.5 | 01FEB2025 | 367 | 12 | 29 | Value Segment |
| 8 | Beauty | BrandD | 25-34 | 12JAN2025 | 10JUL2024 | 8 | 42000 | 4.3 | 3.5 | 11FEB2025 | 186 | 6 | 19 | Value Segment |
| 9 | Books | BrandE | 18-24 | 08JAN2025 | 01JAN2023 | 9 | 25000 | 4.8 | 1.0 | 07FEB2025 | 738 | 24 | 23 | Value Segment |
| 10 | Furniture | BrandF | 35-44 | 28JAN2025 | 15SEP2024 | 10 | 135000 | 4.4 | 2.0 | 27FEB2025 | 135 | 4 | 3 | Premium Loyal |
| 11 | Tablet | BrandY | 25-34 | 22JAN2025 | 10DEC2024 | 11 | 78000 | 4.2 | 3.2 | 21FEB2025 | 43 | 1 | 9 | Mid-Market |
| 12 | GameConsole | BrandG | 18-24 | 30JAN2025 | 01NOV2024 | 12 | 110000 | 4.9 | 2.8 | 01MAR2025 | 90 | 2 | 1 | Premium Loyal |
Step 4: PROC SQL – Summaries by age group and market segment
proc sql;
create table seg_age_summary as
select User_Age_Group,
Market_Segment,
count(*) as Product_Count,
mean(Sales) as Avg_Sales format=comma12.,
mean(Rating) as Avg_Rating format=8.2,
mean(Return_Rate) as Avg_Return_Rate format=8.2
from ecommerce_final
group by User_Age_Group, Market_Segment
order by User_Age_Group, Market_Segment;
quit;
proc print data=seg_age_summary;
run;
OUTPUT:
| Obs | User_Age_Group | Market_Segment | Product_Count | Avg_Sales | Avg_Rating | Avg_Return_Rate |
|---|---|---|---|---|---|---|
| 1 | 18-24 | Mid-Market | 1 | 60,000 | 4.00 | 4.50 |
| 2 | 18-24 | Premium Loyal | 1 | 110,000 | 4.90 | 2.80 |
| 3 | 18-24 | Value Segment | 2 | 35,000 | 4.50 | 3.00 |
| 4 | 25-34 | Mid-Market | 2 | 86,500 | 4.45 | 2.85 |
| 5 | 25-34 | Premium Risky | 1 | 120,000 | 4.50 | 3.00 |
| 6 | 25-34 | Value Segment | 1 | 42,000 | 4.30 | 3.50 |
| 7 | 35-44 | Premium Loyal | 1 | 135,000 | 4.40 | 2.00 |
| 8 | 35-44 | Value Segment | 2 | 45,000 | 3.95 | 6.75 |
| 9 | 45+ | Value Segment | 1 | 30,000 | 4.60 | 1.50 |
/* Another SQL summary: Product type level metrics */
proc sql;
create table product_type_summary as
select Product_Type,
count(*) as Product_Count,
sum(Sales) as Total_Sales format=comma12.,
mean(Rating) as Avg_Rating format=8.2,
mean(Return_Rate) as Avg_Return_Rate format=8.2,
mean(Days_Since_Launch) as Avg_Days_Since_Launch format=8.2
from ecommerce_final
group by Product_Type
order by Total_Sales desc;
quit;
proc print data=product_type_summary;
run;
OUTPUT:
| Obs | Product_Type | Product_Count | Total_Sales | Avg_Rating | Avg_Return_Rate | Avg_Days_Since_Launch |
|---|---|---|---|---|---|---|
| 1 | Furniture | 1 | 135,000 | 4.40 | 2.00 | 135.00 |
| 2 | Laptop | 1 | 120,000 | 4.50 | 3.00 | 97.00 |
| 3 | GameConsole | 1 | 110,000 | 4.90 | 2.80 | 90.00 |
| 4 | Smartphone | 1 | 95,000 | 4.70 | 2.50 | 75.00 |
| 5 | Tablet | 1 | 78,000 | 4.20 | 3.20 | 43.00 |
| 6 | Smartwatch | 1 | 60,000 | 4.00 | 4.50 | 61.00 |
| 7 | Clothing | 1 | 52,000 | 4.10 | 7.00 | 204.00 |
| 8 | Headphones | 1 | 45,000 | 4.20 | 5.00 | 107.00 |
| 9 | Beauty | 1 | 42,000 | 4.30 | 3.50 | 186.00 |
| 10 | Shoes | 1 | 38,000 | 3.80 | 6.50 | 170.00 |
| 11 | Groceries | 1 | 30,000 | 4.60 | 1.50 | 367.00 |
| 12 | Books | 1 | 25,000 | 4.80 | 1.00 | 738.00 |
Step 5: PROC SUMMARY – Brand level summary
proc summary data=ecommerce_final nway;
class Brand;
var Sales Rating Return_Rate Days_Since_Launch;
output out=brand_summary
n(Sales) = Count_Products
sum(Sales) = Total_Sales
mean(Sales) = Avg_Sales
mean(Rating) = Avg_Rating
mean(Return_Rate) = Avg_Return_Rate
mean(Days_Since_Launch) = Avg_Days_Since_Launch;
run;
proc print data=brand_summary;
run;
OUTPUT:
| Obs | Brand | _TYPE_ | _FREQ_ | Count_Products | Total_Sales | Avg_Sales | Avg_Rating | Avg_Return_Rate | Avg_Days_Since_Launch |
|---|---|---|---|---|---|---|---|---|---|
| 1 | BrandA | 1 | 1 | 1 | 38000 | 38000 | 3.80 | 6.5 | 170 |
| 2 | BrandB | 1 | 1 | 1 | 52000 | 52000 | 4.10 | 7.0 | 204 |
| 3 | BrandC | 1 | 1 | 1 | 30000 | 30000 | 4.60 | 1.5 | 367 |
| 4 | BrandD | 1 | 1 | 1 | 42000 | 42000 | 4.30 | 3.5 | 186 |
| 5 | BrandE | 1 | 1 | 1 | 25000 | 25000 | 4.80 | 1.0 | 738 |
| 6 | BrandF | 1 | 1 | 1 | 135000 | 135000 | 4.40 | 2.0 | 135 |
| 7 | BrandG | 1 | 1 | 1 | 110000 | 110000 | 4.90 | 2.8 | 90 |
| 8 | BrandX | 1 | 2 | 2 | 155000 | 77500 | 4.35 | 3.5 | 68 |
| 9 | BrandY | 1 | 2 | 2 | 198000 | 99000 | 4.35 | 3.1 | 70 |
| 10 | BrandZ | 1 | 1 | 1 | 45000 | 45000 | 4.20 | 5.0 | 107 |
Step 6: PROC RANK – Ranking by Sales and Rating
proc rank data=ecommerce_final out=ecommerce_ranked ties=low descending;
var Sales Rating;
ranks Rank_Sales Rank_Rating;
run;
proc print data=ecommerce_ranked;
run;
OUTPUT:
| Obs | Product_Type | Brand | User_Age_Group | Order_Date | Launch_Date | Product_ID | Sales | Rating | Return_Rate | Return_Window_End | Days_Since_Launch | Months_Between_Launch_Order | Days_Since_Order | Market_Segment | Rank_Sales | Rank_Rating |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Smartphone | BrandX | 25-34 | 15JAN2025 | 01NOV2024 | 1 | 95000 | 4.7 | 2.5 | 14FEB2025 | 75 | 2 | 16 | Mid-Market | 4 | 3 |
| 2 | Laptop | BrandY | 25-34 | 20JAN2025 | 15OCT2024 | 2 | 120000 | 4.5 | 3.0 | 19FEB2025 | 97 | 3 | 11 | Premium Risky | 2 | 5 |
| 3 | Headphones | BrandZ | 18-24 | 05JAN2025 | 20SEP2024 | 3 | 45000 | 4.2 | 5.0 | 04FEB2025 | 107 | 4 | 26 | Value Segment | 8 | 8 |
| 4 | Smartwatch | BrandX | 18-24 | 10JAN2025 | 10NOV2024 | 4 | 60000 | 4.0 | 4.5 | 09FEB2025 | 61 | 2 | 21 | Mid-Market | 6 | 11 |
| 5 | Shoes | BrandA | 35-44 | 18JAN2025 | 01AUG2024 | 5 | 38000 | 3.8 | 6.5 | 17FEB2025 | 170 | 5 | 13 | Value Segment | 10 | 12 |
| 6 | Clothing | BrandB | 35-44 | 25JAN2025 | 05JUL2024 | 6 | 52000 | 4.1 | 7.0 | 24FEB2025 | 204 | 6 | 6 | Value Segment | 7 | 10 |
| 7 | Groceries | BrandC | 45+ | 02JAN2025 | 01JAN2024 | 7 | 30000 | 4.6 | 1.5 | 01FEB2025 | 367 | 12 | 29 | Value Segment | 11 | 4 |
| 8 | Beauty | BrandD | 25-34 | 12JAN2025 | 10JUL2024 | 8 | 42000 | 4.3 | 3.5 | 11FEB2025 | 186 | 6 | 19 | Value Segment | 9 | 7 |
| 9 | Books | BrandE | 18-24 | 08JAN2025 | 01JAN2023 | 9 | 25000 | 4.8 | 1.0 | 07FEB2025 | 738 | 24 | 23 | Value Segment | 12 | 2 |
| 10 | Furniture | BrandF | 35-44 | 28JAN2025 | 15SEP2024 | 10 | 135000 | 4.4 | 2.0 | 27FEB2025 | 135 | 4 | 3 | Premium Loyal | 1 | 6 |
| 11 | Tablet | BrandY | 25-34 | 22JAN2025 | 10DEC2024 | 11 | 78000 | 4.2 | 3.2 | 21FEB2025 | 43 | 1 | 9 | Mid-Market | 5 | 8 |
| 12 | GameConsole | BrandG | 18-24 | 30JAN2025 | 01NOV2024 | 12 | 110000 | 4.9 | 2.8 | 01MAR2025 | 90 | 2 | 1 | Premium Loyal | 3 | 1 |
7a: Bar chart – Total sales by product type
proc sgplot data=ecommerce_final;
vbar Product_Type / response=Sales stat=sum datalabel;
yaxis label="Total Sales";
xaxis label="Product Type";
title "Total Sales by Product Type";
run;
OUTPUT:
Step 7b: Scatter plot – Sales vs Rating by age group
proc sgplot data=ecommerce_final;
scatter x=Rating y=Sales / group=User_Age_Group datalabel=Product_Type;
xaxis label="Average Rating";
yaxis label="Sales";
title "Sales vs Rating by User Age Group";
run;
No comments:
Post a Comment