381.Can SAS Uncover Call Drops, Network Congestion, and Fraud from Telecom Call Records?

Can SAS Uncover Call Drops, Network Congestion, and Fraud from Telecom Call Records?


HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC SQL |  PROC PRINT | PROC SGPLOT | MACROS | PROC CORR | PROC MEANS | PROC FREQ | PROC UNIVARIATE | PROC DATASETS DELETE

1. Introduction – Why Telecom Call Data Analytics Matters

Every second, millions of phone calls, video calls, roaming sessions, and data connections are created across telecom networks worldwide. Each of these interactions generates a Call Detail Record (CDR) containing critical information such as caller identity, call duration, network load, drop rate, and billing amount.

Telecom companies like Airtel, Jio, Vodafone, AT&T, and Verizon rely heavily on these records to monitor:

·       Network congestion and capacity

·       Call quality and drop rates

·       Revenue generation and leakage

·       Fraudulent calling behavior

·       Regional performance and infrastructure health

With growing 5G adoption and increasing digital traffic, analyzing call records has become a mission-critical business function. Even a small percentage of undetected call drops or fraud can lead to massive financial losses and customer dissatisfaction.

This project simulates a real-world telecom analytics environment using SAS, where call records are processed, cleaned, analyzed, and converted into business-ready insights.


2. What This Telecom Analytics Project Simulates

This SAS project replicates the complete data pipeline used inside telecom companies. It is designed to behave like a real operational system where:

·       Millions of daily calls are recorded

·       Network congestion is monitored

·       Revenue is calculated per region

·       Fraud is detected using business rules

·       Performance is summarized for management

The dataset includes realistic telecom variables such as:

·       Caller ID – Identifies each subscriber

·       Region – Geographic location of the call

·       Call Duration – Network usage in minutes

·       Call Type – Voice, Data, Roaming, or International

·       Network Load – Network congestion level

·       Drop Rate – Quality of service indicator

·       Revenue – Money earned from the call

·       Call Date & Billing Month – Time-based analysis

Using these variables, the project demonstrates how telecom operators make data-driven decisions to optimize performance and protect revenue.


3. How This Project Reflects Real Industry Use Cases

The logic implemented in this project closely resembles what is used in:

·       Telecom billing systems

·       Fraud monitoring platforms

·       Network operations centers

·       Regulatory reporting systems

·       Customer experience analytics


4. Why SAS Is Used for Telecom Analytics

SAS is widely used in the telecom industry because it provides:

·       High-performance data processing

·       Powerful SQL-based analytics

·       Advanced statistical procedures

·       Strong reporting and visualization

·       Reliable automation through macros

Telecom companies trust SAS for handling billions of call records with accuracy, speed, and auditability. This project demonstrates how SAS can be used to build a complete telecom analytics solution from raw data to executive-ready insights.


5. Who Should Study This Project

This tutorial is ideal for:

·       SAS learners and programmers

·       Data analysts and data scientists

·       Telecom analytics professionals

·       Interview preparation candidates

·       Blog readers learning real-world analytics


6. BUSINESS CONTEXT

Telecom operators manage millions of calls per day.
Each call generates a Call Detail Record (CDR) with:

Field

Purpose

Caller ID

Who made the call

Region

Where call originated

Call Duration

Network usage

Call Type

Voice / Data / Roaming / International

Network Load

Congestion

Drop Rate

Call failures

Revenue

Billing impact

Call Date

Time intelligence

Telecom companies must:

• Control network congestion
• Detect revenue leakage & fraud
• Improve customer experience
• Optimize tower & spectrum usage


7. SAS PROGRAM

STEP-1 — Create Telecom Call Dataset

data telecom_raw;

    length Caller_ID $6 Region $10 Call_Type $15;

    format Call_Date date9.;

    input Caller_ID $ Region $ Call_Date : date9. Call_Duration Network_Load Drop_Rate 

          Revenue Call_Type;

    datalines;

C001 South 01JAN2025 15 78 2.5 120 Voice

C002 North 02JAN2025 22 82 3.1 210 Data

C003 East 03JAN2025 5 45 0.9 50 Voice

C004 West 04JAN2025 35 95 6.5 350 International

C005 South 05JAN2025 18 70 1.8 160 Voice

C006 North 06JAN2025 60 98 8.2 600 Roaming

C007 East 07JAN2025 12 55 1.2 100 Data

C008 West 08JAN2025 42 90 5.4 420 International

C009 South 09JAN2025 8 40 0.7 60 Voice

C010 North 10JAN2025 55 92 7.8 560 Roaming

C011 East 11JAN2025 25 75 3.5 230 Data

C012 West 12JAN2025 48 88 4.9 480 International

C013 South 13JAN2025 10 50 1.0 80 Voice

C014 North 14JAN2025 65 97 8.9 650 Roaming

C015 East 15JAN2025 20 72 2.2 190 Data

C016 West 16JAN2025 38 85 4.5 390 International

;

run;

proc print data=telecom_raw;

run;

OUTPUT:

ObsCaller_IDRegionCall_TypeCall_DateCall_DurationNetwork_LoadDrop_RateRevenue
1C001SouthVoice01JAN202515782.5120
2C002NorthData02JAN202522823.1210
3C003EastVoice03JAN20255450.950
4C004WestInternational04JAN202535956.5350
5C005SouthVoice05JAN202518701.8160
6C006NorthRoaming06JAN202560988.2600
7C007EastData07JAN202512551.2100
8C008WestInternational08JAN202542905.4420
9C009SouthVoice09JAN20258400.760
10C010NorthRoaming10JAN202555927.8560
11C011EastData11JAN202525753.5230
12C012WestInternational12JAN202548884.9480
13C013SouthVoice13JAN202510501.080
14C014NorthRoaming14JAN202565978.9650
15C015EastData15JAN202520722.2190
16C016WestInternational16JAN202538854.5390

• Simulates telecom call records
• Each row = one call
• Date handled via DATE9
• Call type classification
• Revenue tracking


STEP-2 — Clean & Standardize Text

data telecom_clean;

    set telecom_raw;

    Caller_ID = strip(upcase(Caller_ID));

    Region = propcase(trim(Region));

    Call_Type = propcase(Call_Type);

run;

proc print data=telecom_clean;

run;

OUTPUT:

ObsCaller_IDRegionCall_TypeCall_DateCall_DurationNetwork_LoadDrop_RateRevenue
1C001SouthVoice01JAN202515782.5120
2C002NorthData02JAN202522823.1210
3C003EastVoice03JAN20255450.950
4C004WestInternational04JAN202535956.5350
5C005SouthVoice05JAN202518701.8160
6C006NorthRoaming06JAN202560988.2600
7C007EastData07JAN202512551.2100
8C008WestInternational08JAN202542905.4420
9C009SouthVoice09JAN20258400.760
10C010NorthRoaming10JAN202555927.8560
11C011EastData11JAN202525753.5230
12C012WestInternational12JAN202548884.9480
13C013SouthVoice13JAN202510501.080
14C014NorthRoaming14JAN202565978.9650
15C015EastData15JAN202520722.2190
16C016WestInternational16JAN202538854.5390

• Removes spaces
• Standardizes case
• Improves joins
• Avoids duplicates
• Production readiness


STEP-3 — Add Billing Month Using INTNX

data telecom_dates;

    set telecom_clean;

    Bill_Month = intnx('month', Call_Date, 0, 'b');

    format Bill_Month monyy7.;

run;

proc print data=telecom_dates;

run;

OUTPUT:

ObsCaller_IDRegionCall_TypeCall_DateCall_DurationNetwork_LoadDrop_RateRevenueBill_Month
1C001SouthVoice01JAN202515782.5120JAN2025
2C002NorthData02JAN202522823.1210JAN2025
3C003EastVoice03JAN20255450.950JAN2025
4C004WestInternational04JAN202535956.5350JAN2025
5C005SouthVoice05JAN202518701.8160JAN2025
6C006NorthRoaming06JAN202560988.2600JAN2025
7C007EastData07JAN202512551.2100JAN2025
8C008WestInternational08JAN202542905.4420JAN2025
9C009SouthVoice09JAN20258400.760JAN2025
10C010NorthRoaming10JAN202555927.8560JAN2025
11C011EastData11JAN202525753.5230JAN2025
12C012WestInternational12JAN202548884.9480JAN2025
13C013SouthVoice13JAN202510501.080JAN2025
14C014NorthRoaming14JAN202565978.9650JAN2025
15C015EastData15JAN202520722.2190JAN2025
16C016WestInternational16JAN202538854.5390JAN2025

• Groups by month
• Enables trends
• Supports billing
• Works with PROC SQL
• Date normalization


STEP-4 — Network Utilization Macro

%macro utilization(input=, output=);

data &output;

    set &input;

    length Utilization $10;

    if Network_Load < 60 then Utilization="Low";

    else if Network_Load < 85 then Utilization="Medium";

    else Utilization="High";

run;

proc print data=&output;

run;

%mend;


%utilization(input=telecom_dates, output=telecom_util);

OUTPUT:

ObsCaller_IDRegionCall_TypeCall_DateCall_DurationNetwork_LoadDrop_RateRevenueBill_MonthUtilization
1C001SouthVoice01JAN202515782.5120JAN2025Medium
2C002NorthData02JAN202522823.1210JAN2025Medium
3C003EastVoice03JAN20255450.950JAN2025Low
4C004WestInternational04JAN202535956.5350JAN2025High
5C005SouthVoice05JAN202518701.8160JAN2025Medium
6C006NorthRoaming06JAN202560988.2600JAN2025High
7C007EastData07JAN202512551.2100JAN2025Low
8C008WestInternational08JAN202542905.4420JAN2025High
9C009SouthVoice09JAN20258400.760JAN2025Low
10C010NorthRoaming10JAN202555927.8560JAN2025High
11C011EastData11JAN202525753.5230JAN2025Medium
12C012WestInternational12JAN202548884.9480JAN2025High
13C013SouthVoice13JAN202510501.080JAN2025Low
14C014NorthRoaming14JAN202565978.9650JAN2025High
15C015EastData15JAN202520722.2190JAN2025Medium
16C016WestInternational16JAN202538854.5390JAN2025High

• Categorizes traffic
• Reusable logic
• Business rules
• Automation
• Network optimization


STEP-5 — Fraud Risk Macro

%macro fraud(input=, output=);

data &output;

    set &input;

    length Fraud_Flag $5;

    if Call_Duration > 50 and Revenue > 500 then Fraud_Flag="YES";

    else Fraud_Flag="NO";

run;

proc print data=&output;

run;

%mend;


%fraud(input=telecom_util, output=telecom_fraud);

OUTPUT:

ObsCaller_IDRegionCall_TypeCall_DateCall_DurationNetwork_LoadDrop_RateRevenueBill_MonthUtilizationFraud_Flag
1C001SouthVoice01JAN202515782.5120JAN2025MediumNO
2C002NorthData02JAN202522823.1210JAN2025MediumNO
3C003EastVoice03JAN20255450.950JAN2025LowNO
4C004WestInternational04JAN202535956.5350JAN2025HighNO
5C005SouthVoice05JAN202518701.8160JAN2025MediumNO
6C006NorthRoaming06JAN202560988.2600JAN2025HighYES
7C007EastData07JAN202512551.2100JAN2025LowNO
8C008WestInternational08JAN202542905.4420JAN2025HighNO
9C009SouthVoice09JAN20258400.760JAN2025LowNO
10C010NorthRoaming10JAN202555927.8560JAN2025HighYES
11C011EastData11JAN202525753.5230JAN2025MediumNO
12C012WestInternational12JAN202548884.9480JAN2025HighNO
13C013SouthVoice13JAN202510501.080JAN2025LowNO
14C014NorthRoaming14JAN202565978.9650JAN2025HighYES
15C015EastData15JAN202520722.2190JAN2025MediumNO
16C016WestInternational16JAN202538854.5390JAN2025HighNO

• Finds abnormal calls
• Uses duration + revenue
• Flags risky records
• Telecom risk model
• Audit support


STEP-6 — PROC SQL Revenue Summary

proc sql;

create table revenue_summary as

select Region,

       sum(Revenue) as Total_Revenue,

       avg(Drop_Rate) as Avg_Drop

from telecom_fraud

group by Region;

quit;

proc print data=revenue_summary;

run;

OUTPUT:

ObsRegionTotal_RevenueAvg_Drop
1East5701.950
2North20207.000
3South4201.500
4West16405.325

• Summarizes data
• Region comparison
• Business reporting
• KPI generation
• SQL analytics


STEP-7 — PROC FREQ

proc freq data=telecom_fraud;

 tables Fraud_Flag*Call_Type;

run;

OUTPUT:

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of Fraud_Flag by Call_Type
Fraud_FlagCall_Type
DataInternationalRoamingVoiceTotal
NO
4
25.00
30.77
100.00
4
25.00
30.77
100.00
0
0.00
0.00
0.00
5
31.25
38.46
100.00
13
81.25
 
 
YES
0
0.00
0.00
0.00
0
0.00
0.00
0.00
3
18.75
100.00
100.00
0
0.00
0.00
0.00
3
18.75
 
 
Total
4
25.00
4
25.00
3
18.75
5
31.25
16
100.00

• Cross analysis
• Fraud by type
• Monitoring tool
• Decision support
• Risk insight

STEP-8 — PROC MEANS

proc means data=telecom_fraud mean max min;

 var Call_Duration Network_Load Revenue Drop_Rate;

run;

OUTPUT:

The MEANS Procedure

VariableMeanMaximumMinimum
Call_Duration
Network_Load
Revenue
Drop_Rate
29.8750000
75.7500000
290.6250000
3.9437500
65.0000000
98.0000000
650.0000000
8.9000000
5.0000000
40.0000000
50.0000000
0.7000000

• Central tendency
• Extremes
• Network quality
• Revenue spread
• Operations control

STEP-9 — PROC UNIVARIATE

proc univariate data=telecom_fraud;

 var Revenue Call_Duration;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Revenue

Moments
N16Sum Weights16
Mean290.625Sum Observations4650
Std Deviation203.583521Variance41446.25
Skewness0.49019766Kurtosis-1.1828857
Uncorrected SS1973100Corrected SS621693.75
Coeff Variation70.0502438Std Error Mean50.8958802
Basic Statistical Measures
LocationVariability
Mean290.6250Std Deviation203.58352
Median220.0000Variance41446
Mode.Range600.00000
  Interquartile Range340.00000
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt5.710187Pr > |t|<.0001
SignM8Pr >= |M|<.0001
Signed RankS68Pr >= |S|<.0001
Quantiles (Definition 5)
LevelQuantile
100% Max650
99%650
95%650
90%600
75% Q3450
50% Median220
25% Q1110
10%60
5%50
1%50
0% Min50
Extreme Observations
LowestHighest
ValueObsValueObs
5034208
60948012
801356010
10076006
120165014

The UNIVARIATE Procedure

Variable: Call_Duration

Moments
N16Sum Weights16
Mean29.875Sum Observations478
Std Deviation19.4897409Variance379.85
Skewness0.5041715Kurtosis-1.0629722
Uncorrected SS19978Corrected SS5697.75
Coeff Variation65.2376264Std Error Mean4.87243522
Basic Statistical Measures
LocationVariability
Mean29.87500Std Deviation19.48974
Median23.50000Variance379.85000
Mode.Range60.00000
  Interquartile Range31.50000
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt6.131431Pr > |t|<.0001
SignM8Pr >= |M|<.0001
Signed RankS68Pr >= |S|<.0001
Quantiles (Definition 5)
LevelQuantile
100% Max65.0
99%65.0
95%65.0
90%60.0
75% Q345.0
50% Median23.5
25% Q113.5
10%8.0
5%5.0
1%5.0
0% Min5.0
Extreme Observations
LowestHighest
ValueObsValueObs
53428
894812
10135510
127606
1516514

• Identifies anomalies
• Detects skewness
• Risk analysis
• Revenue leakage
• Performance monitoring

STEP-10 — PROC CORR

proc corr data=telecom_fraud;

 var Call_Duration Network_Load Revenue Drop_Rate;

run;

OUTPUT:

The CORR Procedure

4 Variables:Call_Duration Network_Load Revenue Drop_Rate
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
Call_Duration1629.8750019.48974478.000005.0000065.00000
Network_Load1675.7500019.02455121240.0000098.00000
Revenue16290.62500203.58352465050.00000650.00000
Drop_Rate163.943752.7572963.100000.700008.90000
Pearson Correlation Coefficients, N = 16
Prob > |r| under H0: Rho=0
 Call_DurationNetwork_LoadRevenueDrop_Rate
Call_Duration
1.00000
 
0.87194
<.0001
0.99907
<.0001
0.96639
<.0001
Network_Load
0.87194
<.0001
1.00000
 
0.86809
<.0001
0.89278
<.0001
Revenue
0.99907
<.0001
0.86809
<.0001
1.00000
 
0.96597
<.0001
Drop_Rate
0.96639
<.0001
0.89278
<.0001
0.96597
<.0001
1.00000
 

• Dependency check
• Congestion impact
• Drop analysis
• Financial link
• Engineering decisions

STEP-11 — PROC SGPLOT

proc sgplot data=telecom_fraud;

 scatter x=Network_Load y=Drop_Rate;

run;

OUTPUT:

The SGPlot Procedure

• Congestion vs drops
• Easy interpretation
• Executive dashboards
• Operations insight
• Quality tracking


STEP-12 — PROC TRANSPOSE

proc transpose data=revenue_summary out=rev_wide;

 by Region;

 id Region;

 var Total_Revenue;

run;

proc print data=rev_wide;

run;

OUTPUT:

ObsRegion_NAME_EastNorthSouthWest
1EastTotal_Revenue570...
2NorthTotal_Revenue.2020..
3SouthTotal_Revenue..420.
4WestTotal_Revenue...1640

• Converts rows to columns
• BI tools ready
• Excel export
• Reporting layout
• Management view


STEP-13 — PROC DATASETS DELETE

proc datasets lib=work nolist;

 delete telecom_raw telecom_clean telecom_dates;

quit;

OUTPUT:

NOTE: Deleting WORK.TELECOM_RAW (memtype=DATA).

• Removes temp tables
• Optimizes workspace
• Professional coding
• Faster SAS
• Production hygiene


FINAL CONCLUSION

This Telecom Analytics project replicates real telecom operations systems where companies:

• Monitor call traffic
• Optimize network capacity
• Detect fraud
• Protect revenue
• Improve customer experience


INTERVIEW QUESTIONS FOR YOU

1.What is the Program Data Vector (PDV)?

2.What is the difference between LENGTH and FORMAT in SAS?

3.What is the use of the KEEP and DROP statements?


About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent Telecom data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

SAS Programmer Interviews

SAS Programmer Job Seekers

SAS Analysts


Follow Us On : 


 


--->Follow our blog for more SAS-based analytics projects and industry data models.

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:



Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study