407.How many months of active lifecycle does each service have, and can date functions like INTCK and INTNX predict the next performance review cycle accurately?
Subscription Services Performance, Utilization & Fraud Analytics Using SAS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA | SET | LENGTH | FORMAT | INPUT | DATALINES | IF–THEN–ELSE | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | COALESCE | INTCK | INTNX | PROC SORT | PROC SQL | PROC FREQ | PROC MEANS | PROC UNIVARIATE | PROC CORR | PROC SGPLOT | PROC APPEND | PROC TRANSPOSE | MERGE | PROC DATASETS | DELETE | %MACRO | %MEND
TABLE OF CONTENTS
1.
Introduction
2.
Business Context
3.
Project Objectives
4.
Dataset Design
5.
Raw Data Creation (With Intentional Errors)
6.
Identifying and Explaining Errors
7.
Corrected Full-Length SAS Code
8.
Utilization Classification Using Macros
9.
Fraud Detection Logic Using Macros
10. Statistical
Analysis Using Procedures
11. Correlation
& Distribution Analysis
12. Visualization
Using PROC SGPLOT
13. Dataset
Transformation (APPEND, MERGE, TRANSPOSE)
14. Data
Cleaning Using Character & Numeric Functions
15. Date
Handling Using MDY, INTCK, INTNX
16. Business
Insights
17. 21
Key Points About This Project
18. Conclusion
1. INTRODUCTION
Subscription-based services are everywhere
today — Netflix, Spotify, SaaS platforms, online education platforms, cloud
services, and telecom providers.
For any subscription business, key questions
are:
·
How many subscribers are active?
·
What is churn rate?
·
Are customers engaged?
·
Is fraud happening?
·
Which services are high retention?
·
Which ones need strategic action?
This project simulates a subscription services
dataset and performs deep analytics using SAS.
2. BUSINESS CONTEXT
A company operates multiple subscription-based
services across industries:
·
Streaming
·
EdTech
·
SaaS
·
Cloud Storage
·
Fitness Apps
·
Gaming Platforms
Management wants:
·
Performance evaluation
·
Utilization classification
·
Fraud risk detection
·
Retention strategy planning
·
Statistical distribution analysis
3. PROJECT OBJECTIVES
· Create
structured dataset
·
Apply data cleaning
·
Use date functions for lifecycle
analysis
·
Classify utilization level
·
Detect fraud patterns
·
Perform statistical summary
·
Correlation analysis
·
Visualization
·
Transform datasets
·
Practice SAS programming techniques
4. DATASET DESIGN
Variables:
|
Variable |
Type |
Description |
|
Service_Name |
Character |
Name of service |
|
Subscribers |
Numeric |
Total subscribers |
|
Monthly_Fee |
Numeric |
Monthly fee (USD) |
|
Churn_Rate |
Numeric |
% churn |
|
Engagement_Score |
Numeric |
0–100 scale |
|
Support_Tickets |
Numeric |
Monthly tickets |
|
Retention_Level |
Character |
High/Medium/Low |
|
Fraud_Flag |
Numeric |
0/1 |
|
Launch_Date |
Date |
Service launch |
|
Last_Review_Date |
Date |
Last evaluation |
|
Revenue |
Numeric |
Derived |
5. RAW DATA CREATION (INTENTIONAL ERRORS INCLUDED)
data subscription_raw;
length Service_Name $30 Retention_Level $10;
format Launch_Date Last_Review_Date date9.;
input Service_Name $ Subscribers Monthly_Fee Churn_Rate Engagement_Score Support_Tickets
Retention_Level $ Fraud_Flag Launch_Date :date9. Last_Review_Date :date9.;
datalines;
Netflix 500000 15.99 5 88 1200 High 0 01JAN2020 15JAN2025
Spotify 420000 9.99 7 80 900 Medium 0 15FEB2019 12JAN2025
EdTechPro 150000 29.99 12 65 1100 Low 1 10MAR2021 10JAN2025
CloudBox 200000 19.99 4 90 400 High 0 05APR2018 01JAN2025
GameWorld 180000 14.99 8 75 600 Medium 0 11MAY2020 10JAN2025
FitLife 120000 25.99 15 60 1300 Low 1 20JUN2022 05JAN2025
StreamPlus 300000 18.99 6 85 700 High 0 15JUL2019 02JAN2025
LearnNow 100000 22.99 18 55 1500 Low 1 01AUG2023 01JAN2025
MusicMax 250000 11.99 9 78 650 Medium 0 10SEP2020 03JAN2025
SecureVPN 80000 7.99 20 50 2000 Low 1 05OCT2024 05JAN2025
BookClub 60000 12.99 3 92 200 High 0 01NOV2017 10JAN2025
CloudBox 210000 19.99 4 91 390 High 0 05APR2018 01JAN2025
;
run;
proc print data=subscription_raw;
run;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 |
| 2 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 |
| 3 | EdTechPro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 1 |
| 4 | CloudBox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 |
| 5 | GameWorld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 |
| 6 | FitLife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 1 |
| 7 | StreamPlus | High | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 |
| 8 | LearnNow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 1 |
| 9 | MusicMax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 |
| 10 | SecureVPN | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 |
| 11 | BookClub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 |
| 12 | CloudBox | High | 05APR2018 | 01JAN2025 | 210000 | 19.99 | 4 | 91 | 390 | 0 |
6. IDENTIFYING
INTENTIONAL ERRORS
❌ ERROR 1: Duplicate Service (CloudBox)
- Duplicate record present.
- Causes problems in
aggregation.
❌ ERROR 2: Retention_Level manually entered
- Could be inconsistent.
- Better to derive from churn
rate.
❌ ERROR 3: Fraud_Flag manually entered
- Should be derived logically.
7. CORRECTED FULL-LENGTH SAS CODE
STEP 1: CLEAN DATA
proc sort data=subscription_raw nodupkey out=subscription_clean;
by Service_Name;
run;
proc print data=subscription_clean;
run;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BookClub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 |
| 2 | CloudBox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 |
| 3 | EdTechPro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 1 |
| 4 | FitLife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 1 |
| 5 | GameWorld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 |
| 6 | LearnNow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 1 |
| 7 | MusicMax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 |
| 8 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 |
| 9 | SecureVPN | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 |
| 10 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 |
| 11 | StreamPlus | High | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 |
·
Removes duplicate services.
·
NODUPKEY ensures unique service names.
STEP 2: DERIVE ADDITIONAL VARIABLES USING SET + FUNCTIONS
data subscription_final;
set subscription_clean;
Service_Name = propcase(strip(Service_Name));
Revenue = Subscribers * Monthly_Fee;
Months_Active = intck('month', Launch_Date, Last_Review_Date);
Next_Review = intnx('month', Last_Review_Date, 3, 'same');
format Next_Review date9.;
if Churn_Rate <=5 then Retention_Level='High';
else if Churn_Rate <=10 then Retention_Level='Medium';
else Retention_Level='Low';
run;
proc print data=subscription_final;
run;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag | Revenue | Months_Active | Next_Review |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bookclub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 |
| 2 | Cloudbox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 |
| 3 | Edtechpro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 1 | 4498500 | 46 | 10APR2025 |
| 4 | Fitlife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 1 | 3118800 | 31 | 05APR2025 |
| 5 | Gameworld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 |
| 6 | Learnnow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 1 | 2299000 | 17 | 01APR2025 |
| 7 | Musicmax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 |
| 8 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 |
| 9 | Securevpn | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 |
| 10 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 |
| 11 | Streamplus | Medium | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 |
Functions Used:
|
Function |
Purpose |
|
strip |
remove spaces |
|
propcase |
proper case formatting |
|
intck |
count months |
|
intnx |
calculate next date |
|
numeric multiplication |
revenue calculation |
8. UTILIZATION CLASSIFICATION MACRO
%macro utilization_class;
data subscription_util;
set subscription_final;
length Util_Class $10.;
Utilization_Percentage = (Engagement_Score/100)*100;
if Utilization_Percentage >=85 then Util_Class='Very High';
else if Utilization_Percentage >=70 then Util_Class='High';
else if Utilization_Percentage >=50 then Util_Class='Moderate';
else Util_Class='Low';
run;
proc print data=subscription_util;
run;
%mend;
%utilization_class;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag | Revenue | Months_Active | Next_Review | Util_Class | Utilization_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bookclub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 2 | Cloudbox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 3 | Edtechpro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 1 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 4 | Fitlife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 1 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 5 | Gameworld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 6 | Learnnow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 1 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 7 | Musicmax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 8 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 9 | Securevpn | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 10 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 11 | Streamplus | Medium | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
9. FRAUD DETECTION MACRO
%macro fraud_detection;
data subscription_fraud;
set subscription_util;
if Churn_Rate >15 and Engagement_Score <60 and Support_Tickets >1500
then Fraud_Flag=1;
else Fraud_Flag=0;
run;
proc print data=subscription_fraud;
run;
%mend;
%fraud_detection;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag | Revenue | Months_Active | Next_Review | Util_Class | Utilization_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bookclub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 2 | Cloudbox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 3 | Edtechpro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 4 | Fitlife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 5 | Gameworld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 6 | Learnnow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 7 | Musicmax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 8 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 9 | Securevpn | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 10 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 11 | Streamplus | Medium | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
Fraud logic:
·
High churn (>15%)
·
Low engagement (<60)
·
High support tickets (>1500)
10. PROC SQL ANALYSIS
proc sql;
create table revenue_summary as
select Service_Name,
Revenue,
Subscribers,
Churn_Rate
from subscription_fraud
order by Revenue desc;
quit;
proc print data=revenue_summary;
run;
| Obs | Service_Name | Revenue | Subscribers | Churn_Rate |
|---|---|---|---|---|
| 1 | Netflix | 7995000 | 500000 | 5 |
| 2 | Streamplus | 5697000 | 300000 | 6 |
| 3 | Edtechpro | 4498500 | 150000 | 12 |
| 4 | Spotify | 4195800 | 420000 | 7 |
| 5 | Cloudbox | 3998000 | 200000 | 4 |
| 6 | Fitlife | 3118800 | 120000 | 15 |
| 7 | Musicmax | 2997500 | 250000 | 9 |
| 8 | Gameworld | 2698200 | 180000 | 8 |
| 9 | Learnnow | 2299000 | 100000 | 18 |
| 10 | Bookclub | 779400 | 60000 | 3 |
| 11 | Securevpn | 639200 | 80000 | 20 |
11. PROC FREQ
proc freq data=subscription_fraud;
tables Retention_Level Fraud_Flag Util_Class;
run;
The FREQ Procedure
| Retention_Level | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| High | 3 | 27.27 | 3 | 27.27 |
| Low | 4 | 36.36 | 7 | 63.64 |
| Medium | 4 | 36.36 | 11 | 100.00 |
| Fraud_Flag | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| 0 | 10 | 90.91 | 10 | 90.91 |
| 1 | 1 | 9.09 | 11 | 100.00 |
| Util_Class | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| High | 3 | 27.27 | 3 | 27.27 |
| Moderate | 4 | 36.36 | 7 | 63.64 |
| Very High | 4 | 36.36 | 11 | 100.00 |
12. PROC MEANS
proc means data=subscription_fraud mean sum min max;
var Subscribers Revenue Churn_Rate Engagement_Score;
run;
The MEANS Procedure
| Variable | Mean | Sum | Minimum | Maximum |
|---|---|---|---|---|
Subscribers Revenue Churn_Rate Engagement_Score | 214545.45 3537854.55 9.7272727 74.3636364 | 2360000.00 38916400.00 107.0000000 818.0000000 | 60000.00 639200.00 3.0000000 50.0000000 | 500000.00 7995000.00 20.0000000 92.0000000 |
13. PROC UNIVARIATE
proc univariate data=subscription_fraud;
var Revenue;
histogram Revenue;
run;
The UNIVARIATE Procedure
Variable: Revenue
| Moments | |||
|---|---|---|---|
| N | 11 | Sum Weights | 11 |
| Mean | 3537854.55 | Sum Observations | 38916400 |
| Std Deviation | 2116935.53 | Variance | 4.48142E12 |
| Skewness | 0.66750697 | Kurtosis | 0.84450611 |
| Uncorrected SS | 1.82495E14 | Corrected SS | 4.48142E13 |
| Coeff Variation | 59.8367032 | Std Error Mean | 638280.077 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 3537855 | Std Deviation | 2116936 |
| Median | 3118800 | Variance | 4.48142E12 |
| Mode | . | Range | 7355800 |
| Interquartile Range | 2199500 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 5.542793 | Pr > |t| | 0.0002 |
| Sign | M | 5.5 | Pr >= |M| | 0.0010 |
| Signed Rank | S | 33 | Pr >= |S| | 0.0010 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 7995000 |
| 99% | 7995000 |
| 95% | 7995000 |
| 90% | 5697000 |
| 75% Q3 | 4498500 |
| 50% Median | 3118800 |
| 25% Q1 | 2299000 |
| 10% | 779400 |
| 5% | 639200 |
| 1% | 639200 |
| 0% Min | 639200 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 639200 | 9 | 3998000 | 2 |
| 779400 | 1 | 4195800 | 10 |
| 2299000 | 6 | 4498500 | 3 |
| 2698200 | 5 | 5697000 | 11 |
| 2997500 | 7 | 7995000 | 8 |
The UNIVARIATE Procedure
14. PROC CORR
proc corr data=subscription_fraud;
var Subscribers Revenue Churn_Rate Engagement_Score;
run;
The CORR Procedure
| 4 Variables: | Subscribers Revenue Churn_Rate Engagement_Score |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Subscribers | 11 | 214545 | 142082 | 2360000 | 60000 | 500000 |
| Revenue | 11 | 3537855 | 2116936 | 38916400 | 639200 | 7995000 |
| Churn_Rate | 11 | 9.72727 | 5.76352 | 107.00000 | 3.00000 | 20.00000 |
| Engagement_Score | 11 | 74.36364 | 14.70560 | 818.00000 | 50.00000 | 92.00000 |
| Pearson Correlation Coefficients, N = 11 Prob > |r| under H0: Rho=0 | ||||
|---|---|---|---|---|
| Subscribers | Revenue | Churn_Rate | Engagement_Score | |
| Subscribers | 1.00000 | 0.84148 0.0012 | -0.49413 0.1224 | 0.50597 0.1123 |
| Revenue | 0.84148 0.0012 | 1.00000 | -0.43603 0.1800 | 0.43409 0.1822 |
| Churn_Rate | -0.49413 0.1224 | -0.43603 0.1800 | 1.00000 | -0.98979 <.0001 |
| Engagement_Score | 0.50597 0.1123 | 0.43409 0.1822 | -0.98979 <.0001 | 1.00000 |
15. PROC SGPLOT
proc sgplot data=subscription_fraud;
scatter x=Subscribers y=Revenue;
run;
16. APPEND EXAMPLE
proc append base=subscription_fraud
data=subscription_fraud force;
run;
proc print data=subscription_fraud;
run;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag | Revenue | Months_Active | Next_Review | Util_Class | Utilization_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bookclub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 2 | Cloudbox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 3 | Edtechpro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 4 | Fitlife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 5 | Gameworld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 6 | Learnnow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 7 | Musicmax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 8 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 9 | Securevpn | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 10 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 11 | Streamplus | Medium | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
| 12 | Bookclub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 13 | Cloudbox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 14 | Edtechpro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 15 | Fitlife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 16 | Gameworld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 17 | Learnnow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 18 | Musicmax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 19 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 20 | Securevpn | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 21 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 22 | Streamplus | Medium | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
What Happens Here?
When you append a dataset to itself:
·
SAS reads subscription_fraud
·
Then appends it back to subscription_fraud
·
Result → The dataset doubles in size
If original observations = 11
After append → 22 observations
Why This Is a Problem
·
❌
Creates duplicate records
·
❌ Inflates revenue totals
·
❌ Distorts churn rate statistics
·
❌ Corrupts fraud detection results
·
❌ Impacts PROC MEANS, FREQ, CORR output
In subscription analytics, this would falsely show:
·
Double subscribers
·
Double revenue
·
Wrong fraud count
proc sort data=subscription_fraud nodupkey;by Service_Name;run;
proc print data=subscription_fraud;
run;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag | Revenue | Months_Active | Next_Review | Util_Class | Utilization_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bookclub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 2 | Cloudbox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 3 | Edtechpro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 4 | Fitlife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 5 | Gameworld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 6 | Learnnow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 7 | Musicmax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 8 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 9 | Securevpn | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 10 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 11 | Streamplus | Medium | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
·
Removes duplicate services.
·
NODUPKEY ensures unique service names.
STEP 1: Appending a NEW month's data
data subscription_newmonth;
set subscription_fraud;
Subscribers = Subscribers + 5000;
Last_Review_Date = intnx('month', Last_Review_Date, 1, 'same');
format Last_Review_Date date9.;
run;
proc print data=subscription_newmonth;
run;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag | Revenue | Months_Active | Next_Review | Util_Class | Utilization_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bookclub | High | 01NOV2017 | 10FEB2025 | 65000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 2 | Cloudbox | High | 05APR2018 | 01FEB2025 | 205000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 3 | Edtechpro | Low | 10MAR2021 | 10FEB2025 | 155000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 4 | Fitlife | Low | 20JUN2022 | 05FEB2025 | 125000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 5 | Gameworld | Medium | 11MAY2020 | 10FEB2025 | 185000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 6 | Learnnow | Low | 01AUG2023 | 01FEB2025 | 105000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 7 | Musicmax | Medium | 10SEP2020 | 03FEB2025 | 255000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 8 | Netflix | High | 01JAN2020 | 15FEB2025 | 505000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 9 | Securevpn | Low | 05OCT2024 | 05FEB2025 | 85000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 10 | Spotify | Medium | 15FEB2019 | 12FEB2025 | 425000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 11 | Streamplus | Medium | 15JUL2019 | 02FEB2025 | 305000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
1.Now we are appending NEW records
proc append base=subscription_fraud
data=subscription_newmonth;
run;
proc print data=subscription_fraud;
run;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag | Revenue | Months_Active | Next_Review | Util_Class | Utilization_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bookclub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 2 | Cloudbox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 3 | Edtechpro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 4 | Fitlife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 5 | Gameworld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 6 | Learnnow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 7 | Musicmax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 8 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 9 | Securevpn | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 10 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 11 | Streamplus | Medium | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
| 12 | Bookclub | High | 01NOV2017 | 10FEB2025 | 65000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 13 | Cloudbox | High | 05APR2018 | 01FEB2025 | 205000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 14 | Edtechpro | Low | 10MAR2021 | 10FEB2025 | 155000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 15 | Fitlife | Low | 20JUN2022 | 05FEB2025 | 125000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 16 | Gameworld | Medium | 11MAY2020 | 10FEB2025 | 185000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 17 | Learnnow | Low | 01AUG2023 | 01FEB2025 | 105000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 18 | Musicmax | Medium | 10SEP2020 | 03FEB2025 | 255000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 19 | Netflix | High | 01JAN2020 | 15FEB2025 | 505000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 20 | Securevpn | Low | 05OCT2024 | 05FEB2025 | 85000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 21 | Spotify | Medium | 15FEB2019 | 12FEB2025 | 425000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 22 | Streamplus | Medium | 15JUL2019 | 02FEB2025 | 305000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
When
Is Self-Append Acceptable?
Very rare cases:
·
Stress testing
·
Performance benchmarking
·
Simulating large datasets
17. TRANSPOSE
proc transpose data=subscription_fraud out=subscription_trans;
var Revenue Subscribers;
run;
proc print data=subscription_trans;
run;
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 | COL10 | COL11 | COL12 | COL13 | COL14 | COL15 | COL16 | COL17 | COL18 | COL19 | COL20 | COL21 | COL22 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Revenue | 779400 | 3998000 | 4498500 | 3118800 | 2698200 | 2299000 | 2997500 | 7995000 | 639200 | 4195800 | 5697000 | 779400 | 3998000 | 4498500 | 3118800 | 2698200 | 2299000 | 2997500 | 7995000 | 639200 | 4195800 | 5697000 |
| 2 | Subscribers | 60000 | 200000 | 150000 | 120000 | 180000 | 100000 | 250000 | 500000 | 80000 | 420000 | 300000 | 65000 | 205000 | 155000 | 125000 | 185000 | 105000 | 255000 | 505000 | 85000 | 425000 | 305000 |
18. MERGE EXAMPLE
data extra_info;
input Service_Name $20. Region $;
datalines;
Netflix US
Spotify EU
;
run;
proc print data=extra_info;
run;
| Obs | Service_Name | Region |
|---|---|---|
| 1 | Netflix US | Spotify |
proc sort data=subscription_fraud; by Service_Name; run;
proc print data=subscription_fraud;
run;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag | Revenue | Months_Active | Next_Review | Util_Class | Utilization_Percentage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bookclub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 2 | Bookclub | High | 01NOV2017 | 10FEB2025 | 65000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 |
| 3 | Cloudbox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 4 | Cloudbox | High | 05APR2018 | 01FEB2025 | 205000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 |
| 5 | Edtechpro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 6 | Edtechpro | Low | 10MAR2021 | 10FEB2025 | 155000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 |
| 7 | Fitlife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 8 | Fitlife | Low | 20JUN2022 | 05FEB2025 | 125000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 |
| 9 | Gameworld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 10 | Gameworld | Medium | 11MAY2020 | 10FEB2025 | 185000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 |
| 11 | Learnnow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 12 | Learnnow | Low | 01AUG2023 | 01FEB2025 | 105000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 |
| 13 | Musicmax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 14 | Musicmax | Medium | 10SEP2020 | 03FEB2025 | 255000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 |
| 15 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 16 | Netflix | High | 01JAN2020 | 15FEB2025 | 505000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 |
| 17 | Securevpn | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 18 | Securevpn | Low | 05OCT2024 | 05FEB2025 | 85000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 |
| 19 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 20 | Spotify | Medium | 15FEB2019 | 12FEB2025 | 425000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 |
| 21 | Streamplus | Medium | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
| 22 | Streamplus | Medium | 15JUL2019 | 02FEB2025 | 305000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
proc sort data=extra_info; by Service_Name; run;
proc print data=extra_info;
run;
| Obs | Service_Name | Region |
|---|---|---|
| 1 | Netflix US | Spotify |
data merged_data;
merge subscription_fraud extra_info;
by Service_Name;
run;
proc print data=merged_data;
run;
| Obs | Service_Name | Retention_Level | Launch_Date | Last_Review_Date | Subscribers | Monthly_Fee | Churn_Rate | Engagement_Score | Support_Tickets | Fraud_Flag | Revenue | Months_Active | Next_Review | Util_Class | Utilization_Percentage | Region |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Bookclub | High | 01NOV2017 | 10JAN2025 | 60000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 | |
| 2 | Bookclub | High | 01NOV2017 | 10FEB2025 | 65000 | 12.99 | 3 | 92 | 200 | 0 | 779400 | 86 | 10APR2025 | Very High | 92 | |
| 3 | Cloudbox | High | 05APR2018 | 01JAN2025 | 200000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 | |
| 4 | Cloudbox | High | 05APR2018 | 01FEB2025 | 205000 | 19.99 | 4 | 90 | 400 | 0 | 3998000 | 81 | 01APR2025 | Very High | 90 | |
| 5 | Edtechpro | Low | 10MAR2021 | 10JAN2025 | 150000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 | |
| 6 | Edtechpro | Low | 10MAR2021 | 10FEB2025 | 155000 | 29.99 | 12 | 65 | 1100 | 0 | 4498500 | 46 | 10APR2025 | Moderate | 65 | |
| 7 | Fitlife | Low | 20JUN2022 | 05JAN2025 | 120000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 | |
| 8 | Fitlife | Low | 20JUN2022 | 05FEB2025 | 125000 | 25.99 | 15 | 60 | 1300 | 0 | 3118800 | 31 | 05APR2025 | Moderate | 60 | |
| 9 | Gameworld | Medium | 11MAY2020 | 10JAN2025 | 180000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 | |
| 10 | Gameworld | Medium | 11MAY2020 | 10FEB2025 | 185000 | 14.99 | 8 | 75 | 600 | 0 | 2698200 | 56 | 10APR2025 | High | 75 | |
| 11 | Learnnow | Low | 01AUG2023 | 01JAN2025 | 100000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 | |
| 12 | Learnnow | Low | 01AUG2023 | 01FEB2025 | 105000 | 22.99 | 18 | 55 | 1500 | 0 | 2299000 | 17 | 01APR2025 | Moderate | 55 | |
| 13 | Musicmax | Medium | 10SEP2020 | 03JAN2025 | 250000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 | |
| 14 | Musicmax | Medium | 10SEP2020 | 03FEB2025 | 255000 | 11.99 | 9 | 78 | 650 | 0 | 2997500 | 52 | 03APR2025 | High | 78 | |
| 15 | Netflix | High | 01JAN2020 | 15JAN2025 | 500000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 | |
| 16 | Netflix | High | 01JAN2020 | 15FEB2025 | 505000 | 15.99 | 5 | 88 | 1200 | 0 | 7995000 | 60 | 15APR2025 | Very High | 88 | |
| 17 | Netflix US | . | . | . | . | . | . | . | . | . | . | . | . | Spotify | ||
| 18 | Securevpn | Low | 05OCT2024 | 05JAN2025 | 80000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 | |
| 19 | Securevpn | Low | 05OCT2024 | 05FEB2025 | 85000 | 7.99 | 20 | 50 | 2000 | 1 | 639200 | 3 | 05APR2025 | Moderate | 50 | |
| 20 | Spotify | Medium | 15FEB2019 | 12JAN2025 | 420000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 | |
| 21 | Spotify | Medium | 15FEB2019 | 12FEB2025 | 425000 | 9.99 | 7 | 80 | 900 | 0 | 4195800 | 71 | 12APR2025 | High | 80 | |
| 22 | Streamplus | Medium | 15JUL2019 | 02JAN2025 | 300000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 | |
| 23 | Streamplus | Medium | 15JUL2019 | 02FEB2025 | 305000 | 18.99 | 6 | 85 | 700 | 0 | 5697000 | 66 | 02APR2025 | Very High | 85 |
19. DELETE TEMP DATASETS
proc datasets library=work;
delete subscription_raw subscription_clean;
quit;
20. BUSINESS INSIGHTS
·
High engagement leads to high revenue.
·
High churn correlates negatively with
retention.
·
Fraud risk detected in high support
& high churn cases.
·
Netflix & Cloud services show
strongest performance.
21.18 Key Points About This Project
·
Designed a realistic subscription
services dataset with 12+ observations.
·
Included core KPIs: Subscribers,
Monthly_Fee, Churn_Rate, Engagement_Score, Support_Tickets.
·
Derived Revenue dynamically instead of
storing static values.
·
Introduced intentional duplicate records
to simulate real-world data errors.
·
Removed duplicates using PROC SORT
NODUPKEY.
·
Standardized character variables using STRIP, PROPCASE, UPCASE, LOWCASE.
·
Used INTCK to calculate Months_Active.
·
Used INTNX to project Next_Review_Date.
·
Derived Retention_Level based on churn
thresholds.
·
Built Utilization classification using a
reusable SAS macro.
·
Implemented fraud detection logic using
conditional rules inside a macro.
·
Generated business summaries using PROC SQL.
·
Performed descriptive statistics with PROC MEANS.
·
Analyzed distribution patterns using PROC UNIVARIATE.
·
Evaluated variable relationships using PROC CORR.
·
Created performance visualizations using
PROC SGPLOT.
·
Demonstrated dataset manipulation using SET, MERGE, APPEND, and TRANSPOSE.
·
Applied PROC DATASETS DELETE
for workspace cleanup and memory management.
Conclusion:
This project demonstrates how SAS can be used
to perform end-to-end analytics in a subscription-based business environment.
Starting from raw data creation, we handled duplicate records, inconsistent
character values, and derived business variables using structured programming techniques.
The project highlights the importance of data validation and controlled
transformation before performing statistical analysis.
By applying macros, we automated utilization
classification and fraud detection logic, making the solution scalable and
reusable. Statistical procedures such as PROC MEANS and PROC UNIVARIATE
provided insights into central tendency and distribution behavior, while PROC
CORR helped understand the strength of relationships between engagement,
revenue, and churn rate. Visualization using PROC SGPLOT enhanced
interpretability of performance metrics.
The project mirrors real-world analytics tasks
performed in SaaS, OTT, fintech, and telecom industries. It demonstrates strong
understanding of DATA step processing, SQL integration, macro programming, date
handling functions, and dataset transformations such as APPEND, MERGE, and
TRANSPOSE.
Overall, this project strengthens practical
SAS programming skills and prepares a programmer for real-time data analysis,
performance monitoring, and fraud detection scenarios in enterprise
environments.
SAS INTERVIEW QUESTIONS
1.What is the difference between DATA step and PROC step in
SAS?
2.What is the difference between LENGTH and FORMAT statements?
3.What is the Program Data Vector (PDV)?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent ONLINE PLATFORM data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment