407.How many months of active lifecycle does each service have, and can date functions like INTCK and INTNX predict the next performance review cycle accurately?

Subscription Services Performance, Utilization & Fraud Analytics Using SAS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA | SET | LENGTH | FORMAT | INPUT | DATALINES | IF–THEN–ELSE | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | COALESCE | INTCK | INTNX | PROC SORT | PROC SQL | PROC FREQ | PROC MEANS | PROC UNIVARIATE | PROC CORR | PROC SGPLOT | PROC APPEND | PROC TRANSPOSE | MERGE | PROC DATASETS | DELETE | %MACRO | %MEND

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

TABLE OF CONTENTS

1.     Introduction

2.     Business Context

3.     Project Objectives

4.     Dataset Design

5.     Raw Data Creation (With Intentional Errors)

6.     Identifying and Explaining Errors

7.     Corrected Full-Length SAS Code

8.     Utilization Classification Using Macros

9.     Fraud Detection Logic Using Macros

10.  Statistical Analysis Using Procedures

11.  Correlation & Distribution Analysis

12.  Visualization Using PROC SGPLOT

13.  Dataset Transformation (APPEND, MERGE, TRANSPOSE)

14.  Data Cleaning Using Character & Numeric Functions

15.  Date Handling Using MDY, INTCK, INTNX

16.  Business Insights

17.  21 Key Points About This Project

18.  Conclusion

1. INTRODUCTION

Subscription-based services are everywhere today — Netflix, Spotify, SaaS platforms, online education platforms, cloud services, and telecom providers.

For any subscription business, key questions are:

·       How many subscribers are active?

·       What is churn rate?

·       Are customers engaged?

·       Is fraud happening?

·       Which services are high retention?

·       Which ones need strategic action?

This project simulates a subscription services dataset and performs deep analytics using SAS.

2. BUSINESS CONTEXT

A company operates multiple subscription-based services across industries:

·       Streaming

·       EdTech

·       SaaS

·       Cloud Storage

·       Fitness Apps

·       Gaming Platforms

Management wants:

·       Performance evaluation

·       Utilization classification

·       Fraud risk detection

·       Retention strategy planning

·       Statistical distribution analysis

3. PROJECT OBJECTIVES

·  Create structured dataset

·  Apply data cleaning

·  Use date functions for lifecycle analysis

·  Classify utilization level

·  Detect fraud patterns

·  Perform statistical summary

·  Correlation analysis

·  Visualization

·  Transform datasets

·  Practice SAS programming techniques

4. DATASET DESIGN

Variables:

Variable

Type

Description

Service_Name

Character

Name of service

Subscribers

Numeric

Total subscribers

Monthly_Fee

Numeric

Monthly fee (USD)

Churn_Rate

Numeric

% churn

Engagement_Score

Numeric

0–100 scale

Support_Tickets

Numeric

Monthly tickets

Retention_Level

Character

High/Medium/Low

Fraud_Flag

Numeric

0/1

Launch_Date

Date

Service launch

Last_Review_Date

Date

Last evaluation

Revenue

Numeric

Derived

5. RAW DATA CREATION (INTENTIONAL ERRORS INCLUDED)

data subscription_raw;

 length Service_Name $30 Retention_Level $10;

 format Launch_Date Last_Review_Date date9.;

 input Service_Name $ Subscribers Monthly_Fee Churn_Rate Engagement_Score Support_Tickets

       Retention_Level $ Fraud_Flag Launch_Date :date9. Last_Review_Date :date9.;

datalines;

Netflix 500000 15.99 5 88 1200 High 0 01JAN2020 15JAN2025

Spotify 420000 9.99 7 80 900 Medium 0 15FEB2019 12JAN2025

EdTechPro 150000 29.99 12 65 1100 Low 1 10MAR2021 10JAN2025

CloudBox 200000 19.99 4 90 400 High 0 05APR2018 01JAN2025

GameWorld 180000 14.99 8 75 600 Medium 0 11MAY2020 10JAN2025

FitLife 120000 25.99 15 60 1300 Low 1 20JUN2022 05JAN2025

StreamPlus 300000 18.99 6 85 700 High 0 15JUL2019 02JAN2025

LearnNow 100000 22.99 18 55 1500 Low 1 01AUG2023 01JAN2025

MusicMax 250000 11.99 9 78 650 Medium 0 10SEP2020 03JAN2025

SecureVPN 80000 7.99 20 50 2000 Low 1 05OCT2024 05JAN2025

BookClub 60000 12.99 3 92 200 High 0 01NOV2017 10JAN2025

CloudBox 210000 19.99 4 91 390 High 0 05APR2018 01JAN2025

;

run;

proc print data=subscription_raw;

run;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_Flag
1NetflixHigh01JAN202015JAN202550000015.9958812000
2SpotifyMedium15FEB201912JAN20254200009.997809000
3EdTechProLow10MAR202110JAN202515000029.99126511001
4CloudBoxHigh05APR201801JAN202520000019.994904000
5GameWorldMedium11MAY202010JAN202518000014.998756000
6FitLifeLow20JUN202205JAN202512000025.99156013001
7StreamPlusHigh15JUL201902JAN202530000018.996857000
8LearnNowLow01AUG202301JAN202510000022.99185515001
9MusicMaxMedium10SEP202003JAN202525000011.999786500
10SecureVPNLow05OCT202405JAN2025800007.99205020001
11BookClubHigh01NOV201710JAN20256000012.993922000
12CloudBoxHigh05APR201801JAN202521000019.994913900

6. IDENTIFYING INTENTIONAL ERRORS

❌ ERROR 1: Duplicate Service (CloudBox)

  • Duplicate record present.
  • Causes problems in aggregation.

❌ ERROR 2: Retention_Level manually entered

  • Could be inconsistent.
  • Better to derive from churn rate.

❌ ERROR 3: Fraud_Flag manually entered

  • Should be derived logically.

7. CORRECTED FULL-LENGTH SAS CODE

STEP 1: CLEAN DATA

proc sort data=subscription_raw nodupkey out=subscription_clean;

 by Service_Name;

run;

proc print data=subscription_clean;

run;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_Flag
1BookClubHigh01NOV201710JAN20256000012.993922000
2CloudBoxHigh05APR201801JAN202520000019.994904000
3EdTechProLow10MAR202110JAN202515000029.99126511001
4FitLifeLow20JUN202205JAN202512000025.99156013001
5GameWorldMedium11MAY202010JAN202518000014.998756000
6LearnNowLow01AUG202301JAN202510000022.99185515001
7MusicMaxMedium10SEP202003JAN202525000011.999786500
8NetflixHigh01JAN202015JAN202550000015.9958812000
9SecureVPNLow05OCT202405JAN2025800007.99205020001
10SpotifyMedium15FEB201912JAN20254200009.997809000
11StreamPlusHigh15JUL201902JAN202530000018.996857000

·  Removes duplicate services.

·  NODUPKEY ensures unique service names.

STEP 2: DERIVE ADDITIONAL VARIABLES USING SET + FUNCTIONS

data subscription_final;

set subscription_clean;

Service_Name = propcase(strip(Service_Name));

Revenue = Subscribers * Monthly_Fee;

Months_Active = intck('month', Launch_Date, Last_Review_Date);

Next_Review = intnx('month', Last_Review_Date, 3, 'same');

format Next_Review date9.;

if Churn_Rate <=5 then Retention_Level='High';

else if Churn_Rate <=10 then Retention_Level='Medium';

else Retention_Level='Low';

run;

proc print data=subscription_final;

run;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_FlagRevenueMonths_ActiveNext_Review
1BookclubHigh01NOV201710JAN20256000012.9939220007794008610APR2025
2CloudboxHigh05APR201801JAN202520000019.99490400039980008101APR2025
3EdtechproLow10MAR202110JAN202515000029.9912651100144985004610APR2025
4FitlifeLow20JUN202205JAN202512000025.9915601300131188003105APR2025
5GameworldMedium11MAY202010JAN202518000014.99875600026982005610APR2025
6LearnnowLow01AUG202301JAN202510000022.9918551500122990001701APR2025
7MusicmaxMedium10SEP202003JAN202525000011.99978650029975005203APR2025
8NetflixHigh01JAN202015JAN202550000015.995881200079950006015APR2025
9SecurevpnLow05OCT202405JAN2025800007.99205020001639200305APR2025
10SpotifyMedium15FEB201912JAN20254200009.99780900041958007112APR2025
11StreamplusMedium15JUL201902JAN202530000018.99685700056970006602APR2025

Functions Used:

Function

Purpose

strip

remove spaces

propcase

proper case formatting

intck

count months

intnx

calculate next date

numeric multiplication

revenue calculation

8. UTILIZATION CLASSIFICATION MACRO

%macro utilization_class;

data subscription_util;

set subscription_final;

length Util_Class $10.;

Utilization_Percentage = (Engagement_Score/100)*100;

if Utilization_Percentage >=85 then Util_Class='Very High';

else if Utilization_Percentage >=70 then Util_Class='High';

else if Utilization_Percentage >=50 then Util_Class='Moderate';

else Util_Class='Low';

run;

proc print data=subscription_util;

run;

%mend;


%utilization_class;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_FlagRevenueMonths_ActiveNext_ReviewUtil_ClassUtilization_Percentage
1BookclubHigh01NOV201710JAN20256000012.9939220007794008610APR2025Very High92
2CloudboxHigh05APR201801JAN202520000019.99490400039980008101APR2025Very High90
3EdtechproLow10MAR202110JAN202515000029.9912651100144985004610APR2025Moderate65
4FitlifeLow20JUN202205JAN202512000025.9915601300131188003105APR2025Moderate60
5GameworldMedium11MAY202010JAN202518000014.99875600026982005610APR2025High75
6LearnnowLow01AUG202301JAN202510000022.9918551500122990001701APR2025Moderate55
7MusicmaxMedium10SEP202003JAN202525000011.99978650029975005203APR2025High78
8NetflixHigh01JAN202015JAN202550000015.995881200079950006015APR2025Very High88
9SecurevpnLow05OCT202405JAN2025800007.99205020001639200305APR2025Moderate50
10SpotifyMedium15FEB201912JAN20254200009.99780900041958007112APR2025High80
11StreamplusMedium15JUL201902JAN202530000018.99685700056970006602APR2025Very High85

9. FRAUD DETECTION MACRO

%macro fraud_detection;

data subscription_fraud;

set subscription_util;

if Churn_Rate >15 and Engagement_Score <60 and Support_Tickets >1500 

then Fraud_Flag=1;

else Fraud_Flag=0;

run;

proc print data=subscription_fraud;

run;

%mend;


%fraud_detection;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_FlagRevenueMonths_ActiveNext_ReviewUtil_ClassUtilization_Percentage
1BookclubHigh01NOV201710JAN20256000012.9939220007794008610APR2025Very High92
2CloudboxHigh05APR201801JAN202520000019.99490400039980008101APR2025Very High90
3EdtechproLow10MAR202110JAN202515000029.9912651100044985004610APR2025Moderate65
4FitlifeLow20JUN202205JAN202512000025.9915601300031188003105APR2025Moderate60
5GameworldMedium11MAY202010JAN202518000014.99875600026982005610APR2025High75
6LearnnowLow01AUG202301JAN202510000022.9918551500022990001701APR2025Moderate55
7MusicmaxMedium10SEP202003JAN202525000011.99978650029975005203APR2025High78
8NetflixHigh01JAN202015JAN202550000015.995881200079950006015APR2025Very High88
9SecurevpnLow05OCT202405JAN2025800007.99205020001639200305APR2025Moderate50
10SpotifyMedium15FEB201912JAN20254200009.99780900041958007112APR2025High80
11StreamplusMedium15JUL201902JAN202530000018.99685700056970006602APR2025Very High85

Fraud logic:

·  High churn (>15%)

·  Low engagement (<60)

·  High support tickets (>1500)

10. PROC SQL ANALYSIS

proc sql;

create table revenue_summary as

select Service_Name,

       Revenue,

       Subscribers,

       Churn_Rate

from subscription_fraud

order by Revenue desc;

quit;

proc print data=revenue_summary;

run;

ObsService_NameRevenueSubscribersChurn_Rate
1Netflix79950005000005
2Streamplus56970003000006
3Edtechpro449850015000012
4Spotify41958004200007
5Cloudbox39980002000004
6Fitlife311880012000015
7Musicmax29975002500009
8Gameworld26982001800008
9Learnnow229900010000018
10Bookclub779400600003
11Securevpn6392008000020

11. PROC FREQ

proc freq data=subscription_fraud;

 tables Retention_Level Fraud_Flag Util_Class;

run;

The FREQ Procedure

Retention_LevelFrequencyPercentCumulative
Frequency
Cumulative
Percent
High327.27327.27
Low436.36763.64
Medium436.3611100.00
Fraud_FlagFrequencyPercentCumulative
Frequency
Cumulative
Percent
01090.911090.91
119.0911100.00
Util_ClassFrequencyPercentCumulative
Frequency
Cumulative
Percent
High327.27327.27
Moderate436.36763.64
Very High436.3611100.00

12. PROC MEANS

proc means data=subscription_fraud mean sum min max;

  var Subscribers Revenue Churn_Rate Engagement_Score;

run;

The MEANS Procedure

VariableMeanSumMinimumMaximum
Subscribers
Revenue
Churn_Rate
Engagement_Score
214545.45
3537854.55
9.7272727
74.3636364
2360000.00
38916400.00
107.0000000
818.0000000
60000.00
639200.00
3.0000000
50.0000000
500000.00
7995000.00
20.0000000
92.0000000

13. PROC UNIVARIATE

proc univariate data=subscription_fraud;

  var Revenue;

  histogram Revenue;

run;

The UNIVARIATE Procedure

Variable: Revenue

Moments
N11Sum Weights11
Mean3537854.55Sum Observations38916400
Std Deviation2116935.53Variance4.48142E12
Skewness0.66750697Kurtosis0.84450611
Uncorrected SS1.82495E14Corrected SS4.48142E13
Coeff Variation59.8367032Std Error Mean638280.077
Basic Statistical Measures
LocationVariability
Mean3537855Std Deviation2116936
Median3118800Variance4.48142E12
Mode.Range7355800
  Interquartile Range2199500
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt5.542793Pr > |t|0.0002
SignM5.5Pr >= |M|0.0010
Signed RankS33Pr >= |S|0.0010
Quantiles (Definition 5)
LevelQuantile
100% Max7995000
99%7995000
95%7995000
90%5697000
75% Q34498500
50% Median3118800
25% Q12299000
10%779400
5%639200
1%639200
0% Min639200
Extreme Observations
LowestHighest
ValueObsValueObs
639200939980002
7794001419580010
2299000644985003
26982005569700011
2997500779950008

The UNIVARIATE Procedure

Histogram for Revenue

14. PROC CORR

proc corr data=subscription_fraud;

  var Subscribers Revenue Churn_Rate Engagement_Score;

run;

The CORR Procedure

4 Variables:Subscribers Revenue Churn_Rate Engagement_Score
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
Subscribers11214545142082236000060000500000
Revenue1135378552116936389164006392007995000
Churn_Rate119.727275.76352107.000003.0000020.00000
Engagement_Score1174.3636414.70560818.0000050.0000092.00000
Pearson Correlation Coefficients, N = 11
Prob > |r| under H0: Rho=0
 SubscribersRevenueChurn_RateEngagement_Score
Subscribers
1.00000
 
0.84148
0.0012
-0.49413
0.1224
0.50597
0.1123
Revenue
0.84148
0.0012
1.00000
 
-0.43603
0.1800
0.43409
0.1822
Churn_Rate
-0.49413
0.1224
-0.43603
0.1800
1.00000
 
-0.98979
<.0001
Engagement_Score
0.50597
0.1123
0.43409
0.1822
-0.98979
<.0001
1.00000
 

15. PROC SGPLOT

proc sgplot data=subscription_fraud;

  scatter x=Subscribers y=Revenue;

run;

The SGPlot Procedure

16. APPEND EXAMPLE

proc append base=subscription_fraud 

            data=subscription_fraud force;

run;

proc print data=subscription_fraud;

run;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_FlagRevenueMonths_ActiveNext_ReviewUtil_ClassUtilization_Percentage
1BookclubHigh01NOV201710JAN20256000012.9939220007794008610APR2025Very High92
2CloudboxHigh05APR201801JAN202520000019.99490400039980008101APR2025Very High90
3EdtechproLow10MAR202110JAN202515000029.9912651100044985004610APR2025Moderate65
4FitlifeLow20JUN202205JAN202512000025.9915601300031188003105APR2025Moderate60
5GameworldMedium11MAY202010JAN202518000014.99875600026982005610APR2025High75
6LearnnowLow01AUG202301JAN202510000022.9918551500022990001701APR2025Moderate55
7MusicmaxMedium10SEP202003JAN202525000011.99978650029975005203APR2025High78
8NetflixHigh01JAN202015JAN202550000015.995881200079950006015APR2025Very High88
9SecurevpnLow05OCT202405JAN2025800007.99205020001639200305APR2025Moderate50
10SpotifyMedium15FEB201912JAN20254200009.99780900041958007112APR2025High80
11StreamplusMedium15JUL201902JAN202530000018.99685700056970006602APR2025Very High85
12BookclubHigh01NOV201710JAN20256000012.9939220007794008610APR2025Very High92
13CloudboxHigh05APR201801JAN202520000019.99490400039980008101APR2025Very High90
14EdtechproLow10MAR202110JAN202515000029.9912651100044985004610APR2025Moderate65
15FitlifeLow20JUN202205JAN202512000025.9915601300031188003105APR2025Moderate60
16GameworldMedium11MAY202010JAN202518000014.99875600026982005610APR2025High75
17LearnnowLow01AUG202301JAN202510000022.9918551500022990001701APR2025Moderate55
18MusicmaxMedium10SEP202003JAN202525000011.99978650029975005203APR2025High78
19NetflixHigh01JAN202015JAN202550000015.995881200079950006015APR2025Very High88
20SecurevpnLow05OCT202405JAN2025800007.99205020001639200305APR2025Moderate50
21SpotifyMedium15FEB201912JAN20254200009.99780900041958007112APR2025High80
22StreamplusMedium15JUL201902JAN202530000018.99685700056970006602APR2025Very High85

What Happens Here?

When you append a dataset to itself:

·  SAS reads subscription_fraud

·  Then appends it back to subscription_fraud

·  Result → The dataset doubles in size

If original observations = 11
After append → 22 observations

Why This Is a Problem

·    Creates duplicate records

·  ❌ Inflates revenue totals

·  ❌ Distorts churn rate statistics

·  ❌ Corrupts fraud detection results

·  ❌ Impacts PROC MEANS, FREQ, CORR output

In subscription analytics, this would falsely show:

·  Double subscribers

·  Double revenue

·  Wrong fraud count

proc sort data=subscription_fraud nodupkey;by Service_Name;run;

proc print data=subscription_fraud;

run;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_FlagRevenueMonths_ActiveNext_ReviewUtil_ClassUtilization_Percentage
1BookclubHigh01NOV201710JAN20256000012.9939220007794008610APR2025Very High92
2CloudboxHigh05APR201801JAN202520000019.99490400039980008101APR2025Very High90
3EdtechproLow10MAR202110JAN202515000029.9912651100044985004610APR2025Moderate65
4FitlifeLow20JUN202205JAN202512000025.9915601300031188003105APR2025Moderate60
5GameworldMedium11MAY202010JAN202518000014.99875600026982005610APR2025High75
6LearnnowLow01AUG202301JAN202510000022.9918551500022990001701APR2025Moderate55
7MusicmaxMedium10SEP202003JAN202525000011.99978650029975005203APR2025High78
8NetflixHigh01JAN202015JAN202550000015.995881200079950006015APR2025Very High88
9SecurevpnLow05OCT202405JAN2025800007.99205020001639200305APR2025Moderate50
10SpotifyMedium15FEB201912JAN20254200009.99780900041958007112APR2025High80
11StreamplusMedium15JUL201902JAN202530000018.99685700056970006602APR2025Very High85

·  Removes duplicate services.

·  NODUPKEY ensures unique service names.

STEP 1: Appending a NEW month's data

data subscription_newmonth;

set subscription_fraud;

Subscribers = Subscribers + 5000;

Last_Review_Date = intnx('month', Last_Review_Date, 1, 'same');

format Last_Review_Date date9.;

run;

proc print data=subscription_newmonth;

run;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_FlagRevenueMonths_ActiveNext_ReviewUtil_ClassUtilization_Percentage
1BookclubHigh01NOV201710FEB20256500012.9939220007794008610APR2025Very High92
2CloudboxHigh05APR201801FEB202520500019.99490400039980008101APR2025Very High90
3EdtechproLow10MAR202110FEB202515500029.9912651100044985004610APR2025Moderate65
4FitlifeLow20JUN202205FEB202512500025.9915601300031188003105APR2025Moderate60
5GameworldMedium11MAY202010FEB202518500014.99875600026982005610APR2025High75
6LearnnowLow01AUG202301FEB202510500022.9918551500022990001701APR2025Moderate55
7MusicmaxMedium10SEP202003FEB202525500011.99978650029975005203APR2025High78
8NetflixHigh01JAN202015FEB202550500015.995881200079950006015APR2025Very High88
9SecurevpnLow05OCT202405FEB2025850007.99205020001639200305APR2025Moderate50
10SpotifyMedium15FEB201912FEB20254250009.99780900041958007112APR2025High80
11StreamplusMedium15JUL201902FEB202530500018.99685700056970006602APR2025Very High85

1.Now we are appending NEW records

proc append base=subscription_fraud 

            data=subscription_newmonth;

run;

proc print data=subscription_fraud;

run;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_FlagRevenueMonths_ActiveNext_ReviewUtil_ClassUtilization_Percentage
1BookclubHigh01NOV201710JAN20256000012.9939220007794008610APR2025Very High92
2CloudboxHigh05APR201801JAN202520000019.99490400039980008101APR2025Very High90
3EdtechproLow10MAR202110JAN202515000029.9912651100044985004610APR2025Moderate65
4FitlifeLow20JUN202205JAN202512000025.9915601300031188003105APR2025Moderate60
5GameworldMedium11MAY202010JAN202518000014.99875600026982005610APR2025High75
6LearnnowLow01AUG202301JAN202510000022.9918551500022990001701APR2025Moderate55
7MusicmaxMedium10SEP202003JAN202525000011.99978650029975005203APR2025High78
8NetflixHigh01JAN202015JAN202550000015.995881200079950006015APR2025Very High88
9SecurevpnLow05OCT202405JAN2025800007.99205020001639200305APR2025Moderate50
10SpotifyMedium15FEB201912JAN20254200009.99780900041958007112APR2025High80
11StreamplusMedium15JUL201902JAN202530000018.99685700056970006602APR2025Very High85
12BookclubHigh01NOV201710FEB20256500012.9939220007794008610APR2025Very High92
13CloudboxHigh05APR201801FEB202520500019.99490400039980008101APR2025Very High90
14EdtechproLow10MAR202110FEB202515500029.9912651100044985004610APR2025Moderate65
15FitlifeLow20JUN202205FEB202512500025.9915601300031188003105APR2025Moderate60
16GameworldMedium11MAY202010FEB202518500014.99875600026982005610APR2025High75
17LearnnowLow01AUG202301FEB202510500022.9918551500022990001701APR2025Moderate55
18MusicmaxMedium10SEP202003FEB202525500011.99978650029975005203APR2025High78
19NetflixHigh01JAN202015FEB202550500015.995881200079950006015APR2025Very High88
20SecurevpnLow05OCT202405FEB2025850007.99205020001639200305APR2025Moderate50
21SpotifyMedium15FEB201912FEB20254250009.99780900041958007112APR2025High80
22StreamplusMedium15JUL201902FEB202530500018.99685700056970006602APR2025Very High85

When Is Self-Append Acceptable?

Very rare cases:

·  Stress testing

·  Performance benchmarking

·  Simulating large datasets

17. TRANSPOSE

proc transpose data=subscription_fraud out=subscription_trans;

  var Revenue Subscribers;

run;

proc print data=subscription_trans;

run;

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10COL11COL12COL13COL14COL15COL16COL17COL18COL19COL20COL21COL22
1Revenue779400399800044985003118800269820022990002997500799500063920041958005697000779400399800044985003118800269820022990002997500799500063920041958005697000
2Subscribers60000200000150000120000180000100000250000500000800004200003000006500020500015500012500018500010500025500050500085000425000305000

18. MERGE EXAMPLE

data extra_info;

input Service_Name $20. Region $;

datalines;

Netflix US

Spotify EU

;

run;

proc print data=extra_info;

run;

ObsService_NameRegion
1Netflix USSpotify

proc sort data=subscription_fraud; by Service_Name; run;

proc print data=subscription_fraud;

run;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_FlagRevenueMonths_ActiveNext_ReviewUtil_ClassUtilization_Percentage
1BookclubHigh01NOV201710JAN20256000012.9939220007794008610APR2025Very High92
2BookclubHigh01NOV201710FEB20256500012.9939220007794008610APR2025Very High92
3CloudboxHigh05APR201801JAN202520000019.99490400039980008101APR2025Very High90
4CloudboxHigh05APR201801FEB202520500019.99490400039980008101APR2025Very High90
5EdtechproLow10MAR202110JAN202515000029.9912651100044985004610APR2025Moderate65
6EdtechproLow10MAR202110FEB202515500029.9912651100044985004610APR2025Moderate65
7FitlifeLow20JUN202205JAN202512000025.9915601300031188003105APR2025Moderate60
8FitlifeLow20JUN202205FEB202512500025.9915601300031188003105APR2025Moderate60
9GameworldMedium11MAY202010JAN202518000014.99875600026982005610APR2025High75
10GameworldMedium11MAY202010FEB202518500014.99875600026982005610APR2025High75
11LearnnowLow01AUG202301JAN202510000022.9918551500022990001701APR2025Moderate55
12LearnnowLow01AUG202301FEB202510500022.9918551500022990001701APR2025Moderate55
13MusicmaxMedium10SEP202003JAN202525000011.99978650029975005203APR2025High78
14MusicmaxMedium10SEP202003FEB202525500011.99978650029975005203APR2025High78
15NetflixHigh01JAN202015JAN202550000015.995881200079950006015APR2025Very High88
16NetflixHigh01JAN202015FEB202550500015.995881200079950006015APR2025Very High88
17SecurevpnLow05OCT202405JAN2025800007.99205020001639200305APR2025Moderate50
18SecurevpnLow05OCT202405FEB2025850007.99205020001639200305APR2025Moderate50
19SpotifyMedium15FEB201912JAN20254200009.99780900041958007112APR2025High80
20SpotifyMedium15FEB201912FEB20254250009.99780900041958007112APR2025High80
21StreamplusMedium15JUL201902JAN202530000018.99685700056970006602APR2025Very High85
22StreamplusMedium15JUL201902FEB202530500018.99685700056970006602APR2025Very High85

proc sort data=extra_info; by Service_Name; run;

proc print data=extra_info;

run;

ObsService_NameRegion
1Netflix USSpotify

data merged_data;

merge subscription_fraud extra_info;

by Service_Name;

run;

proc print data=merged_data;

run;

ObsService_NameRetention_LevelLaunch_DateLast_Review_DateSubscribersMonthly_FeeChurn_RateEngagement_ScoreSupport_TicketsFraud_FlagRevenueMonths_ActiveNext_ReviewUtil_ClassUtilization_PercentageRegion
1BookclubHigh01NOV201710JAN20256000012.9939220007794008610APR2025Very High92 
2BookclubHigh01NOV201710FEB20256500012.9939220007794008610APR2025Very High92 
3CloudboxHigh05APR201801JAN202520000019.99490400039980008101APR2025Very High90 
4CloudboxHigh05APR201801FEB202520500019.99490400039980008101APR2025Very High90 
5EdtechproLow10MAR202110JAN202515000029.9912651100044985004610APR2025Moderate65 
6EdtechproLow10MAR202110FEB202515500029.9912651100044985004610APR2025Moderate65 
7FitlifeLow20JUN202205JAN202512000025.9915601300031188003105APR2025Moderate60 
8FitlifeLow20JUN202205FEB202512500025.9915601300031188003105APR2025Moderate60 
9GameworldMedium11MAY202010JAN202518000014.99875600026982005610APR2025High75 
10GameworldMedium11MAY202010FEB202518500014.99875600026982005610APR2025High75 
11LearnnowLow01AUG202301JAN202510000022.9918551500022990001701APR2025Moderate55 
12LearnnowLow01AUG202301FEB202510500022.9918551500022990001701APR2025Moderate55 
13MusicmaxMedium10SEP202003JAN202525000011.99978650029975005203APR2025High78 
14MusicmaxMedium10SEP202003FEB202525500011.99978650029975005203APR2025High78 
15NetflixHigh01JAN202015JAN202550000015.995881200079950006015APR2025Very High88 
16NetflixHigh01JAN202015FEB202550500015.995881200079950006015APR2025Very High88 
17Netflix US ........... .Spotify
18SecurevpnLow05OCT202405JAN2025800007.99205020001639200305APR2025Moderate50 
19SecurevpnLow05OCT202405FEB2025850007.99205020001639200305APR2025Moderate50 
20SpotifyMedium15FEB201912JAN20254200009.99780900041958007112APR2025High80 
21SpotifyMedium15FEB201912FEB20254250009.99780900041958007112APR2025High80 
22StreamplusMedium15JUL201902JAN202530000018.99685700056970006602APR2025Very High85 
23StreamplusMedium15JUL201902FEB202530500018.99685700056970006602APR2025Very High85 

19. DELETE TEMP DATASETS

proc datasets library=work;

delete subscription_raw subscription_clean;

quit;

NOTE: Deleting WORK.SUBSCRIPTION_RAW (memtype=DATA).
NOTE: Deleting WORK.SUBSCRIPTION_CLEAN (memtype=DATA).

20BUSINESS INSIGHTS

·  High engagement leads to high revenue.

·  High churn correlates negatively with retention.

·  Fraud risk detected in high support & high churn cases.

·  Netflix & Cloud services show strongest performance.

21.18 Key Points About This Project

·  Designed a realistic subscription services dataset with 12+ observations.

·  Included core KPIs: Subscribers, Monthly_Fee, Churn_Rate, Engagement_Score, Support_Tickets.

·  Derived Revenue dynamically instead of storing static values.

·  Introduced intentional duplicate records to simulate real-world data errors.

·  Removed duplicates using PROC SORT NODUPKEY.

·  Standardized character variables using STRIP, PROPCASE, UPCASE, LOWCASE.

·  Used INTCK to calculate Months_Active.

·  Used INTNX to project Next_Review_Date.

·  Derived Retention_Level based on churn thresholds.

·  Built Utilization classification using a reusable SAS macro.

·  Implemented fraud detection logic using conditional rules inside a macro.

·  Generated business summaries using PROC SQL.

·  Performed descriptive statistics with PROC MEANS.

·  Analyzed distribution patterns using PROC UNIVARIATE.

·  Evaluated variable relationships using PROC CORR.

·  Created performance visualizations using PROC SGPLOT.

·  Demonstrated dataset manipulation using SET, MERGE, APPEND, and TRANSPOSE.

·  Applied PROC DATASETS DELETE for workspace cleanup and memory management.

Conclusion:

This project demonstrates how SAS can be used to perform end-to-end analytics in a subscription-based business environment. Starting from raw data creation, we handled duplicate records, inconsistent character values, and derived business variables using structured programming techniques. The project highlights the importance of data validation and controlled transformation before performing statistical analysis.

By applying macros, we automated utilization classification and fraud detection logic, making the solution scalable and reusable. Statistical procedures such as PROC MEANS and PROC UNIVARIATE provided insights into central tendency and distribution behavior, while PROC CORR helped understand the strength of relationships between engagement, revenue, and churn rate. Visualization using PROC SGPLOT enhanced interpretability of performance metrics.

The project mirrors real-world analytics tasks performed in SaaS, OTT, fintech, and telecom industries. It demonstrates strong understanding of DATA step processing, SQL integration, macro programming, date handling functions, and dataset transformations such as APPEND, MERGE, and TRANSPOSE.

Overall, this project strengthens practical SAS programming skills and prepares a programmer for real-time data analysis, performance monitoring, and fraud detection scenarios in enterprise environments.


SAS INTERVIEW QUESTIONS

1.What is the difference between DATA step and PROC step in SAS?

2.What is the difference between LENGTH and FORMAT statements?

3.What is the Program Data Vector (PDV)?


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent ONLINE PLATFORM data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project

2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project

3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy

Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study