389.Is Your City’s Water Really Safe?A Complete SAS Monitoring System

Is Your City’s Water Really Safe?A Complete SAS Monitoring System

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC SQL |  PROC PRINT | PROC SGPLOT | MACROS | PROC CORR | PROC MEANS | PROC FREQ | PROC UNIVARIATE | APPEND | PROC SORT | MERGE | PROC DATASETS DELETE | DATA FUNCTIONS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

INTRODUCTION

Water treatment plants play a critical role in maintaining public health and environmental safety. Every day, millions of liters of water are processed, treated, filtered, and supplied to households, industries, and hospitals.

In modern organizations, this process is no longer managed only by engineers and technicians. It is heavily supported by data analytics systems that monitor plant performance, contamination levels, operational efficiency, energy consumption, and regulatory compliance.

The goal of the project is to show how a data analyst or SAS programmer can:

·       Build a complete dataset from scratch

·       Apply business rules and transformations

·       Perform statistical analysis

·       Detect anomalies or fraud

·       Generate meaningful insights for decision-makers

The project is designed in a way that is easy to understand, interview-friendly, and aligned with how real companies use SAS in environmental and infrastructure analytics.


TABLE OF CONTENTS

1.     Business Background

2.     Dataset Design and Variables

3.     Data Creation Using DATA Step

4.     Data Cleaning and Standardization

5.     Date Intelligence and Time Calculations

6.     SQL-Based Business Queries

7.     Descriptive Statistics (PROC MEANS)

8.     Distribution Analysis (PROC UNIVARIATE)

9.     Frequency Analysis (PROC FREQ)

10.  Correlation Study (PROC CORR)

11.  Visualization Using PROC SGPLOT

12.  Utilization Classification Macro

13.  Fraud / Anomaly Detection Macro

14.  Data Engineering Operations

15.  Dataset Recreation and Appending

16.  Real-World Business Benefits

17.  Conclusion


BUSINESS BACKGROUND

Government agencies, municipal corporations, and private utility companies must continuously track the operational health of water treatment plants.

Some of the common business questions they face are:

·       Which plants are producing the highest volume of water?

·       Are contamination levels within safe limits?

·       Which plants consume excessive energy?

·       Are there any plants showing suspicious performance data?

·       Are regulatory compliance scores improving or declining?

This project models exactly these questions using SAS.


DATASET DESIGN

The dataset represents daily operational metrics of water treatment plants.

Variable

Business Meaning

Plant_Name

Name of water plant

City

City location

State

State location

Profit

Financial performance

Capacity_MLD

Water capacity in million liters

Contaminant_Level

Pollution index

Treatment_Efficiency

Effectiveness of purification

Energy_Usage

Power consumption

Compliance_Score

Regulatory rating

Report_Date

Monitoring date

These variables are commonly found in utility dashboards and compliance systems.


1. Business Context

Water treatment plants are responsible for converting raw water into safe drinking water. Governments and private companies continuously monitor:

  • How much water is produced?
  • How efficient is treatment?
  • Is contamination within limits?
  • Are energy costs rising?
  • Is any plant showing suspicious or abnormal data?

This project simulates how a Data Analyst / SAS Programmer would build a monitoring system for multiple water plants across India.

This type of analysis is used in:

  • Smart Cities
  • Pollution Control Boards
  • Municipal Corporations
  • Environmental Compliance Audits

2. Dataset Creation

data water_plants;

    input Plant_Name:$18. City:$12. State:$12. Profit Capacity_MLD Contaminant_Level 

          Treatment_Efficiency Energy_Usage Compliance_Score Report_Date :date9.;

    format Report_Date date9.;

datalines;

KrishnaPlant Hyderabad Telangana 500000 120 3.2 92 450 95 01JAN2025

GodavariUnit Warangal Telangana 420000 100 4.5 89 430 90 05JAN2025

CauveryHub Chennai TamilNadu 610000 150 2.8 94 480 97 10JAN2025

GangaStation Patna Bihar 300000 90 6.2 85 410 82 12JAN2025

YamunaWorks Delhi Delhi 700000 180 3.5 91 520 96 15JAN2025

NarmadaPlant Bhopal MP 380000 95 5.0 87 400 88 18JAN2025

Sabarmati Ahmedabad Gujarat 450000 110 4.1 90 435 92 20JAN2025

MahanadiUnit Cuttack Odisha 290000 85 6.5 83 390 80 22JAN2025

PeriyarWorks Kochi Kerala 340000 100 3.8 93 420 95 25JAN2025

Brahmaputra Assam Assam 310000 88 5.9 86 405 85 28JAN2025

Tungabhadra Bellary Karnataka 360000 105 4.4 89 415 90 01FEB2025

BhimaPlant Pune Maharashtra 520000 140 3.1 94 470 96 05FEB2025

KaveriSouth Trichy TamilNadu 430000 115 4.7 88 440 91 10FEB2025

Hooghly Kolkata WestBengal 480000 130 3.9 92 460 94 15FEB2025

LuniWorks Jodhpur Rajasthan 260000 75 7.2 80 370 78 18FEB2025

ChambalUnit Kota Rajasthan 300000 90 6.1 85 395 84 22FEB2025

TeestaPlant Siliguri WestBengal 310000 85 5.5 87 400 86 25FEB2025

IndusHub Amritsar Punjab 390000 100 4.0 90 420 92 01MAR2025

SutlejUnit Ludhiana Punjab 410000 110 3.6 91 430 94 05MAR2025

PennarWorks Nellore AP 350000 95 4.9 88 415 90 10MAR2025

;

run;

proc print data=water_plants;

run;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_Date
1KrishnaPlantHyderabadTelangana5000001203.2924509501JAN2025
2GodavariUnitWarangalTelangana4200001004.5894309005JAN2025
3CauveryHubChennaiTamilNadu6100001502.8944809710JAN2025
4GangaStationPatnaBihar300000906.2854108212JAN2025
5YamunaWorksDelhiDelhi7000001803.5915209615JAN2025
6NarmadaPlantBhopalMP380000955.0874008818JAN2025
7SabarmatiAhmedabadGujarat4500001104.1904359220JAN2025
8MahanadiUnitCuttackOdisha290000856.5833908022JAN2025
9PeriyarWorksKochiKerala3400001003.8934209525JAN2025
10BrahmaputraAssamAssam310000885.9864058528JAN2025
11TungabhadraBellaryKarnataka3600001054.4894159001FEB2025
12BhimaPlantPuneMaharashtra5200001403.1944709605FEB2025
13KaveriSouthTrichyTamilNadu4300001154.7884409110FEB2025
14HooghlyKolkataWestBengal4800001303.9924609415FEB2025
15LuniWorksJodhpurRajasthan260000757.2803707818FEB2025
16ChambalUnitKotaRajasthan300000906.1853958422FEB2025
17TeestaPlantSiliguriWestBengal310000855.5874008625FEB2025
18IndusHubAmritsarPunjab3900001004.0904209201MAR2025
19SutlejUnitLudhianaPunjab4100001103.6914309405MAR2025
20PennarWorksNelloreAP350000954.9884159010MAR2025

The DATA step is used to simulate raw operational data.

·       Manual data simulation

·       Date formatting

·       Mixed numeric and character variables


3. Character Cleaning

data water_clean;

    set water_plants;

    Plant_Name = propcase(strip(Plant_Name));

    City = upcase(strip(City));

    State = lowcase(strip(State));

run;

proc print data=water_clean;

run;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_Date
1KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025
2GodavariunitWARANGALtelangana4200001004.5894309005JAN2025
3CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025
4GangastationPATNAbihar300000906.2854108212JAN2025
5YamunaworksDELHIdelhi7000001803.5915209615JAN2025
6NarmadaplantBHOPALmp380000955.0874008818JAN2025
7SabarmatiAHMEDABADgujarat4500001104.1904359220JAN2025
8MahanadiunitCUTTACKodisha290000856.5833908022JAN2025
9PeriyarworksKOCHIkerala3400001003.8934209525JAN2025
10BrahmaputraASSAMassam310000885.9864058528JAN2025
11TungabhadraBELLARYkarnataka3600001054.4894159001FEB2025
12BhimaplantPUNEmaharashtra5200001403.1944709605FEB2025
13KaverisouthTRICHYtamilnadu4300001154.7884409110FEB2025
14HooghlyKOLKATAwestbengal4800001303.9924609415FEB2025
15LuniworksJODHPURrajasthan260000757.2803707818FEB2025
16ChambalunitKOTArajasthan300000906.1853958422FEB2025
17TeestaplantSILIGURIwestbengal310000855.5874008625FEB2025
18IndushubAMRITSARpunjab3900001004.0904209201MAR2025
19SutlejunitLUDHIANApunjab4100001103.6914309405MAR2025
20PennarworksNELLOREap350000954.9884159010MAR2025

Functions like:

·       STRIP

·       TRIM

·       PROPCASE

·       UPCASE

·       LOWCASE

are used to standardize text values.

·       Avoid duplicate records due to spelling differences

·       Improve grouping accuracy in reports

·       Ensure clean joins and merges


4. Date Calculations

data water_dates;

    set water_clean;

    Days_From_Start = intck('day','01JAN2025'd, Report_Date);

    Next_Audit = intnx('month', Report_Date, 3);

run;

proc print data=water_dates; 

run;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_DateDays_From_StartNext_Audit
1KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025023832
2GodavariunitWARANGALtelangana4200001004.5894309005JAN2025423832
3CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025923832
4GangastationPATNAbihar300000906.2854108212JAN20251123832
5YamunaworksDELHIdelhi7000001803.5915209615JAN20251423832
6NarmadaplantBHOPALmp380000955.0874008818JAN20251723832
7SabarmatiAHMEDABADgujarat4500001104.1904359220JAN20251923832
8MahanadiunitCUTTACKodisha290000856.5833908022JAN20252123832
9PeriyarworksKOCHIkerala3400001003.8934209525JAN20252423832
10BrahmaputraASSAMassam310000885.9864058528JAN20252723832
11TungabhadraBELLARYkarnataka3600001054.4894159001FEB20253123862
12BhimaplantPUNEmaharashtra5200001403.1944709605FEB20253523862
13KaverisouthTRICHYtamilnadu4300001154.7884409110FEB20254023862
14HooghlyKOLKATAwestbengal4800001303.9924609415FEB20254523862
15LuniworksJODHPURrajasthan260000757.2803707818FEB20254823862
16ChambalunitKOTArajasthan300000906.1853958422FEB20255223862
17TeestaplantSILIGURIwestbengal310000855.5874008625FEB20255523862
18IndushubAMRITSARpunjab3900001004.0904209201MAR20255923893
19SutlejunitLUDHIANApunjab4100001103.6914309405MAR20256323893
20PennarworksNELLOREap350000954.9884159010MAR20256823893

Functions used:

·       MDY

·       INTCK

·       INTNX

These allow:

·       Time-based comparisons

·       Audit scheduling

·       Trend analysis

This is critical in real systems where time is the main dimension.


5. PROC SQL – Business Queries

proc sql;

    select State,

           avg(Treatment_Efficiency) as Avg_Efficiency,

           sum(Profit) as Total_Profit

    from water_dates

    group by State;

quit;

OUTPUT:

StateAvg_EfficiencyTotal_Profit
ap88350000
assam86310000
bihar85300000
delhi91700000
gujarat90450000
karnataka89360000
kerala93340000
maharashtra94520000
mp87380000
odisha83290000
punjab90.5800000
rajasthan82.5560000
tamilnadu911040000
telangana90.5920000
westbengal89.5790000

Used for:

·       Aggregation

·       Grouping

·       Summarization


·       Average efficiency by state

·       Total profit by region

This shows how SAS integrates SQL-style thinking into analytics.


6. PROC MEANS

proc means data=water_dates mean min max;

    var Capacity_MLD Contaminant_Level Treatment_Efficiency Energy_Usage;

run;

OUTPUT:

The MEANS Procedure

VariableMeanMinimumMaximum
Capacity_MLD
Contaminant_Level
Treatment_Efficiency
Energy_Usage
108.1500000
4.6450000
88.7000000
427.7500000
75.0000000
2.8000000
80.0000000
370.0000000
180.0000000
7.2000000
94.0000000
520.0000000

Used to compute:

·       Mean

·       Minimum

·       Maximum

This gives a quick health check of the system.

Managers use this to:

·       Identify extreme values

·       Spot underperforming plants


7. PROC UNIVARIATE

proc univariate data=water_dates;

    var Contaminant_Level;

    histogram;

run;

OUTPUT:

The UNIVARIATE Procedure

Variable: Contaminant_Level

Moments
N20Sum Weights20
Mean4.645Sum Observations92.9
Std Deviation1.24286384Variance1.54471053
Skewness0.47131104Kurtosis-0.7097626
Uncorrected SS460.87Corrected SS29.3495
Coeff Variation26.7570256Std Error Mean0.2779128
Basic Statistical Measures
LocationVariability
Mean4.645000Std Deviation1.24286
Median4.450000Variance1.54471
Mode.Range4.40000
  Interquartile Range2.00000
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt16.71388Pr > |t|<.0001
SignM10Pr >= |M|<.0001
Signed RankS105Pr >= |S|<.0001
Quantiles (Definition 5)
LevelQuantile
100% Max7.20
99%7.20
95%6.85
90%6.35
75% Q35.70
50% Median4.45
25% Q13.70
10%3.15
5%2.95
1%2.80
0% Min2.80
Extreme Observations
LowestHighest
ValueObsValueObs
2.835.910
3.1126.116
3.216.24
3.556.58
3.6197.215

The UNIVARIATE Procedure

Histogram for Contaminant_Level

Used for:

·       Distribution analysis

·       Outlier detection

This helps detect:

·       Abnormally high contamination

·       Unusual efficiency values

8. PROC CORR

proc corr data=water_dates;

    var Energy_Usage Treatment_Efficiency Compliance_Score;

run;

OUTPUT:

The CORR Procedure

3 Variables:Energy_Usage Treatment_Efficiency Compliance_Score
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
Energy_Usage20427.7500035.111668555370.00000520.00000
Treatment_Efficiency2088.700003.70064177480.0000094.00000
Compliance_Score2089.750005.61834179578.0000097.00000
Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
 Energy_UsageTreatment_EfficiencyCompliance_Score
Energy_Usage
1.00000
 
0.77832
<.0001
0.82275
<.0001
Treatment_Efficiency
0.77832
<.0001
1.00000
 
0.97332
<.0001
Compliance_Score
0.82275
<.0001
0.97332
<.0001
1.00000
 

Used to understand relationships such as:

·       Does high energy usage reduce efficiency?

·       Does contamination impact compliance?

This introduces analytical thinking beyond simple reporting.

9. PROC SGPLOT

proc sgplot data=water_dates;

    scatter x=Contaminant_Level y=Treatment_Efficiency;

run;

OUTPUT:

The SGPlot Procedure

Visualization makes:

·       Patterns visible

·       Outliers obvious

·       Trends easy to explain to management

Graphs are essential for:

·       Board presentations

·       Government reviews

·       Public dashboards


10. Utilization Classification Macro

%macro utilization;

data utilization;

    set water_dates;

length utilization $10.;

    if Capacity_MLD > 130 then Utilization = "High";

    else if 90 <= Capacity_MLD <= 130 then Utilization = "Medium";

    else Utilization = "Low";

run;

proc print data=utilization; 

run;

%mend;


%utilization;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_DateDays_From_StartNext_Auditutilization
1KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025023832Medium
2GodavariunitWARANGALtelangana4200001004.5894309005JAN2025423832Medium
3CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025923832High
4GangastationPATNAbihar300000906.2854108212JAN20251123832Medium
5YamunaworksDELHIdelhi7000001803.5915209615JAN20251423832High
6NarmadaplantBHOPALmp380000955.0874008818JAN20251723832Medium
7SabarmatiAHMEDABADgujarat4500001104.1904359220JAN20251923832Medium
8MahanadiunitCUTTACKodisha290000856.5833908022JAN20252123832Low
9PeriyarworksKOCHIkerala3400001003.8934209525JAN20252423832Medium
10BrahmaputraASSAMassam310000885.9864058528JAN20252723832Low
11TungabhadraBELLARYkarnataka3600001054.4894159001FEB20253123862Medium
12BhimaplantPUNEmaharashtra5200001403.1944709605FEB20253523862High
13KaverisouthTRICHYtamilnadu4300001154.7884409110FEB20254023862Medium
14HooghlyKOLKATAwestbengal4800001303.9924609415FEB20254523862Medium
15LuniworksJODHPURrajasthan260000757.2803707818FEB20254823862Low
16ChambalunitKOTArajasthan300000906.1853958422FEB20255223862Medium
17TeestaplantSILIGURIwestbengal310000855.5874008625FEB20255523862Low
18IndushubAMRITSARpunjab3900001004.0904209201MAR20255923893Medium
19SutlejunitLUDHIANApunjab4100001103.6914309405MAR20256323893Medium
20PennarworksNELLOREap350000954.9884159010MAR20256823893Medium

This macro classifies plants into:

·       High

·       Medium

·       Low utilization

Business purpose:

·       Capacity planning

·       Resource allocation

·       Infrastructure investment decisions


11. Fraud Detection Macro

%macro fraud_check;

data fraud;

    set water_dates;

    if Treatment_Efficiency > 95 and Contaminant_Level > 6 then Flag = "Suspicious";

    else Flag = "Normal";

run;

proc print data=fraud; 

run;

%mend;


%fraud_check;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_DateDays_From_StartNext_AuditFlag
1KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025023832Normal
2GodavariunitWARANGALtelangana4200001004.5894309005JAN2025423832Normal
3CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025923832Normal
4GangastationPATNAbihar300000906.2854108212JAN20251123832Normal
5YamunaworksDELHIdelhi7000001803.5915209615JAN20251423832Normal
6NarmadaplantBHOPALmp380000955.0874008818JAN20251723832Normal
7SabarmatiAHMEDABADgujarat4500001104.1904359220JAN20251923832Normal
8MahanadiunitCUTTACKodisha290000856.5833908022JAN20252123832Normal
9PeriyarworksKOCHIkerala3400001003.8934209525JAN20252423832Normal
10BrahmaputraASSAMassam310000885.9864058528JAN20252723832Normal
11TungabhadraBELLARYkarnataka3600001054.4894159001FEB20253123862Normal
12BhimaplantPUNEmaharashtra5200001403.1944709605FEB20253523862Normal
13KaverisouthTRICHYtamilnadu4300001154.7884409110FEB20254023862Normal
14HooghlyKOLKATAwestbengal4800001303.9924609415FEB20254523862Normal
15LuniworksJODHPURrajasthan260000757.2803707818FEB20254823862Normal
16ChambalunitKOTArajasthan300000906.1853958422FEB20255223862Normal
17TeestaplantSILIGURIwestbengal310000855.5874008625FEB20255523862Normal
18IndushubAMRITSARpunjab3900001004.0904209201MAR20255923893Normal
19SutlejunitLUDHIANApunjab4100001103.6914309405MAR20256323893Normal
20PennarworksNELLOREap350000954.9884159010MAR20256823893Normal

Flags plants when:

·       Efficiency is very high

·       But contamination is also high

This represents:

·       Data manipulation

·       Sensor failure

·       Reporting issues

This type of logic is used in:

·       Smart cities

·       Banking fraud

·       Healthcare monitoring


12. PROC FREQ

proc freq data=fraud;

    tables Flag;

run;

OUTPUT:

The FREQ Procedure

FlagFrequencyPercentCumulative
Frequency
Cumulative
Percent
Normal20100.0020100.00

The project uses:

·       SET

·       MERGE

·       APPEND

·       TRANSPOSE

·       PROC DATASETS DELETE

These simulate:

·       Batch processing

·       Historical data storage

·       Cleanup of temporary datasets

This is exactly how enterprise SAS jobs are built.

13. TRANSPOSE

proc transpose data=water_dates out=water_t;

    var Profit Energy_Usage;

    id Plant_Name;

run;

proc print data=water_t; 

run;

OUTPUT:

Obs_NAME_KrishnaplantGodavariunitCauveryhubGangastationYamunaworksNarmadaplantSabarmatiMahanadiunitPeriyarworksBrahmaputraTungabhadraBhimaplantKaverisouthHooghlyLuniworksChambalunitTeestaplantIndushubSutlejunitPennarworks
1Profit500000420000610000300000700000380000450000290000340000310000360000520000430000480000260000300000310000390000410000350000
2Energy_Usage450430480410520400435390420405415470440460370395400420430415
Transpose used for Rows to Columns,Columns To Rows.

14. APPEND

data water_copy;

    set water_dates;

run;

proc print data=water_copy; 

run;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_DateDays_From_StartNext_Audit
1KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025023832
2GodavariunitWARANGALtelangana4200001004.5894309005JAN2025423832
3CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025923832
4GangastationPATNAbihar300000906.2854108212JAN20251123832
5YamunaworksDELHIdelhi7000001803.5915209615JAN20251423832
6NarmadaplantBHOPALmp380000955.0874008818JAN20251723832
7SabarmatiAHMEDABADgujarat4500001104.1904359220JAN20251923832
8MahanadiunitCUTTACKodisha290000856.5833908022JAN20252123832
9PeriyarworksKOCHIkerala3400001003.8934209525JAN20252423832
10BrahmaputraASSAMassam310000885.9864058528JAN20252723832
11TungabhadraBELLARYkarnataka3600001054.4894159001FEB20253123862
12BhimaplantPUNEmaharashtra5200001403.1944709605FEB20253523862
13KaverisouthTRICHYtamilnadu4300001154.7884409110FEB20254023862
14HooghlyKOLKATAwestbengal4800001303.9924609415FEB20254523862
15LuniworksJODHPURrajasthan260000757.2803707818FEB20254823862
16ChambalunitKOTArajasthan300000906.1853958422FEB20255223862
17TeestaplantSILIGURIwestbengal310000855.5874008625FEB20255523862
18IndushubAMRITSARpunjab3900001004.0904209201MAR20255923893
19SutlejunitLUDHIANApunjab4100001103.6914309405MAR20256323893
20PennarworksNELLOREap350000954.9884159010MAR20256823893

proc append base=water_dates 

            data=water_copy;

run;

proc print data=water_dates; 

run;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_DateDays_From_StartNext_Audit
1KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025023832
2GodavariunitWARANGALtelangana4200001004.5894309005JAN2025423832
3CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025923832
4GangastationPATNAbihar300000906.2854108212JAN20251123832
5YamunaworksDELHIdelhi7000001803.5915209615JAN20251423832
6NarmadaplantBHOPALmp380000955.0874008818JAN20251723832
7SabarmatiAHMEDABADgujarat4500001104.1904359220JAN20251923832
8MahanadiunitCUTTACKodisha290000856.5833908022JAN20252123832
9PeriyarworksKOCHIkerala3400001003.8934209525JAN20252423832
10BrahmaputraASSAMassam310000885.9864058528JAN20252723832
11TungabhadraBELLARYkarnataka3600001054.4894159001FEB20253123862
12BhimaplantPUNEmaharashtra5200001403.1944709605FEB20253523862
13KaverisouthTRICHYtamilnadu4300001154.7884409110FEB20254023862
14HooghlyKOLKATAwestbengal4800001303.9924609415FEB20254523862
15LuniworksJODHPURrajasthan260000757.2803707818FEB20254823862
16ChambalunitKOTArajasthan300000906.1853958422FEB20255223862
17TeestaplantSILIGURIwestbengal310000855.5874008625FEB20255523862
18IndushubAMRITSARpunjab3900001004.0904209201MAR20255923893
19SutlejunitLUDHIANApunjab4100001103.6914309405MAR20256323893
20PennarworksNELLOREap350000954.9884159010MAR20256823893
21KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025023832
22GodavariunitWARANGALtelangana4200001004.5894309005JAN2025423832
23CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025923832
24GangastationPATNAbihar300000906.2854108212JAN20251123832
25YamunaworksDELHIdelhi7000001803.5915209615JAN20251423832
26NarmadaplantBHOPALmp380000955.0874008818JAN20251723832
27SabarmatiAHMEDABADgujarat4500001104.1904359220JAN20251923832
28MahanadiunitCUTTACKodisha290000856.5833908022JAN20252123832
29PeriyarworksKOCHIkerala3400001003.8934209525JAN20252423832
30BrahmaputraASSAMassam310000885.9864058528JAN20252723832
31TungabhadraBELLARYkarnataka3600001054.4894159001FEB20253123862
32BhimaplantPUNEmaharashtra5200001403.1944709605FEB20253523862
33KaverisouthTRICHYtamilnadu4300001154.7884409110FEB20254023862
34HooghlyKOLKATAwestbengal4800001303.9924609415FEB20254523862
35LuniworksJODHPURrajasthan260000757.2803707818FEB20254823862
36ChambalunitKOTArajasthan300000906.1853958422FEB20255223862
37TeestaplantSILIGURIwestbengal310000855.5874008625FEB20255523862
38IndushubAMRITSARpunjab3900001004.0904209201MAR20255923893
39SutlejunitLUDHIANApunjab4100001103.6914309405MAR20256323893
40PennarworksNELLOREap350000954.9884159010MAR20256823893

15. MERGE

proc sort data=utilization;by Plant_Name;run;

proc print data=utilization; 

run;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_DateDays_From_StartNext_Auditutilization
1BhimaplantPUNEmaharashtra5200001403.1944709605FEB20253523862High
2BrahmaputraASSAMassam310000885.9864058528JAN20252723832Low
3CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025923832High
4ChambalunitKOTArajasthan300000906.1853958422FEB20255223862Medium
5GangastationPATNAbihar300000906.2854108212JAN20251123832Medium
6GodavariunitWARANGALtelangana4200001004.5894309005JAN2025423832Medium
7HooghlyKOLKATAwestbengal4800001303.9924609415FEB20254523862Medium
8IndushubAMRITSARpunjab3900001004.0904209201MAR20255923893Medium
9KaverisouthTRICHYtamilnadu4300001154.7884409110FEB20254023862Medium
10KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025023832Medium
11LuniworksJODHPURrajasthan260000757.2803707818FEB20254823862Low
12MahanadiunitCUTTACKodisha290000856.5833908022JAN20252123832Low
13NarmadaplantBHOPALmp380000955.0874008818JAN20251723832Medium
14PennarworksNELLOREap350000954.9884159010MAR20256823893Medium
15PeriyarworksKOCHIkerala3400001003.8934209525JAN20252423832Medium
16SabarmatiAHMEDABADgujarat4500001104.1904359220JAN20251923832Medium
17SutlejunitLUDHIANApunjab4100001103.6914309405MAR20256323893Medium
18TeestaplantSILIGURIwestbengal310000855.5874008625FEB20255523862Low
19TungabhadraBELLARYkarnataka3600001054.4894159001FEB20253123862Medium
20YamunaworksDELHIdelhi7000001803.5915209615JAN20251423832High

proc sort data=fraud;by Plant_Name;run;

proc print data=fraud; 

run;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_DateDays_From_StartNext_AuditFlag
1BhimaplantPUNEmaharashtra5200001403.1944709605FEB20253523862Normal
2BrahmaputraASSAMassam310000885.9864058528JAN20252723832Normal
3CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025923832Normal
4ChambalunitKOTArajasthan300000906.1853958422FEB20255223862Normal
5GangastationPATNAbihar300000906.2854108212JAN20251123832Normal
6GodavariunitWARANGALtelangana4200001004.5894309005JAN2025423832Normal
7HooghlyKOLKATAwestbengal4800001303.9924609415FEB20254523862Normal
8IndushubAMRITSARpunjab3900001004.0904209201MAR20255923893Normal
9KaverisouthTRICHYtamilnadu4300001154.7884409110FEB20254023862Normal
10KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025023832Normal
11LuniworksJODHPURrajasthan260000757.2803707818FEB20254823862Normal
12MahanadiunitCUTTACKodisha290000856.5833908022JAN20252123832Normal
13NarmadaplantBHOPALmp380000955.0874008818JAN20251723832Normal
14PennarworksNELLOREap350000954.9884159010MAR20256823893Normal
15PeriyarworksKOCHIkerala3400001003.8934209525JAN20252423832Normal
16SabarmatiAHMEDABADgujarat4500001104.1904359220JAN20251923832Normal
17SutlejunitLUDHIANApunjab4100001103.6914309405MAR20256323893Normal
18TeestaplantSILIGURIwestbengal310000855.5874008625FEB20255523862Normal
19TungabhadraBELLARYkarnataka3600001054.4894159001FEB20253123862Normal
20YamunaworksDELHIdelhi7000001803.5915209615JAN20251423832Normal

data merged;

    merge utilization 

          fraud;

    by Plant_Name;

run;

proc print data=merged; 

run;

OUTPUT:

ObsPlant_NameCityStateProfitCapacity_MLDContaminant_LevelTreatment_EfficiencyEnergy_UsageCompliance_ScoreReport_DateDays_From_StartNext_AuditutilizationFlag
1BhimaplantPUNEmaharashtra5200001403.1944709605FEB20253523862HighNormal
2BrahmaputraASSAMassam310000885.9864058528JAN20252723832LowNormal
3CauveryhubCHENNAItamilnadu6100001502.8944809710JAN2025923832HighNormal
4ChambalunitKOTArajasthan300000906.1853958422FEB20255223862MediumNormal
5GangastationPATNAbihar300000906.2854108212JAN20251123832MediumNormal
6GodavariunitWARANGALtelangana4200001004.5894309005JAN2025423832MediumNormal
7HooghlyKOLKATAwestbengal4800001303.9924609415FEB20254523862MediumNormal
8IndushubAMRITSARpunjab3900001004.0904209201MAR20255923893MediumNormal
9KaverisouthTRICHYtamilnadu4300001154.7884409110FEB20254023862MediumNormal
10KrishnaplantHYDERABADtelangana5000001203.2924509501JAN2025023832MediumNormal
11LuniworksJODHPURrajasthan260000757.2803707818FEB20254823862LowNormal
12MahanadiunitCUTTACKodisha290000856.5833908022JAN20252123832LowNormal
13NarmadaplantBHOPALmp380000955.0874008818JAN20251723832MediumNormal
14PennarworksNELLOREap350000954.9884159010MAR20256823893MediumNormal
15PeriyarworksKOCHIkerala3400001003.8934209525JAN20252423832MediumNormal
16SabarmatiAHMEDABADgujarat4500001104.1904359220JAN20251923832MediumNormal
17SutlejunitLUDHIANApunjab4100001103.6914309405MAR20256323893MediumNormal
18TeestaplantSILIGURIwestbengal310000855.5874008625FEB20255523862LowNormal
19TungabhadraBELLARYkarnataka3600001054.4894159001FEB20253123862MediumNormal
20YamunaworksDELHIdelhi7000001803.5915209615JAN20251423832HighNormal
Sort is used to order in Ascending or Descending.
Merge is used to Join the datasets.

16. PROC DATASETS DELETE

proc datasets library=work;

    delete water_t;

quit;

LOG:

NOTE: Deleting WORK.WATER_T (memtype=DATA).

This is used to delete the datasets.

This Project Demonstrates

This single project demonstrates:

Skill

Covered

Real Business Problem

YES

Data Creation

YES

SQL Analytics

YES

Statistical Profiling

YES

Correlation

YES

Visualization

YES

Macros

YES

Fraud Logic

YES

Date Intelligence

YES

Data Engineering

YES

Cleanup

YES


Real-Life Mapping

Variable

Real Meaning

Capacity_MLD

Water output capacity

Contaminant_Level

Pollution index

Treatment_Efficiency

Quality of filtration

Energy_Usage

Cost driver

Compliance_Score

Government rating

Fraud Flag

Possible data manipulation


REAL-WORLD BUSINESS BENEFITS

If implemented in real life, this system would help:

·       Improve water quality

·       Reduce energy costs

·       Detect operational fraud

·       Ensure regulatory compliance

·       Support smart city initiatives

It directly supports:

·       Environmental sustainability

·       Public health protection

·       Government transparency


Why Companies Love This Kind of Project

Because it shows:

·       You understand business, not just syntax

·       You know how to build systems

·       You think like a consultant

·       You can explain things simply

·       You can design end-to-end pipelines


CONCLUSION

This SAS project on Water Treatment Plants Analytics shows how data can be transformed into meaningful business insights.

Instead of focusing only on syntax, the project focuses on:

·       Real business problems

·       Real analytical workflows

·       Real enterprise logic

It demonstrates that a SAS programmer is not just someone who writes code, but someone who:

·       Understands the domain

·       Designs analytical systems

·       Supports business decisions



INTERVIEW QUESTIONS FOR YOU

·  What is PROC FORMAT ?

·  What is PROC REPORT ?

·  What is PROC SGPLOT ?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent Water Plant data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics and smart cities

·  EV and energy industry professionals

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy



Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study