411.Can We Detect And Correct Fraudulent And Erroneous Air Quality Sensor Network Data Using Only Proc Sql In Sas?Air Quality Sensor Network Fraud Detection And Error Correction Using PROC SQL

Air Quality Sensor Network Fraud Detection And Error Correction Using PROC SQL

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —PROC SQL | CREATE TABLE | INSERT INTO | SELECT | FROM | WHERE | CASE WHEN THEN ELSE END | COALESCE | STRIP | TRIM | PROPCASE | UPCASE | LOWCASE | CAT | CATX | CALCULATED | INTCK | INTNX | MDY | TODAY | FORMAT | GROUP BY | HAVING | ORDER BY | UNION ALL | JOIN | LEFT JOIN | ON | AVG | SUM | COUNT | MIN | MAX | DISTINCT | CREATE TABLE AS SELECT | ALTER TABLE | ADD COLUMN | UPDATE | SET | DELETE FROM | IS NULL | IS NOT NULL | BETWEEN | LIKE | %MACRO | %MEND | PROC DATASETS DELETE | QUIT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

Air quality sensor networks are widely used in smart cities, environmental monitoring, and industrial compliance systems. These sensors continuously measure pollutants such as PM2.5 (fine particulate matter) and PM10 (coarse particulate matter). Governments use this information to issue health warnings, regulate industries, and protect public health.

However, real-world sensor data is rarely perfect. Many problems occur, such as:

  • Incorrect calibration
  • Missing values
  • Negative or impossible readings
  • Incorrect dates
  • Fraudulent manipulation
  • Inconsistent text formats
  • Duplicate records
  • Sensors reporting unrealistic accuracy

A SAS programmer must detect, correct, and validate such issues before analysis.

In this project, we will build a complete Air Quality Sensor Network analytics system using only PROC SQL We will:

  • Create a dataset with 15+ observations
  • Add intentional errors
  • Detect errors using SQL
  • Correct errors using SQL functions
  • Create utilization classification
  • Detect fraud using macros
  • Derive dates using MDY, INTCK, INTNX logic
  • Perform append, merge, transpose using SQL
  • Use numeric and character functions
  • Delete temporary datasets
  • Explain every line clearly

Table Of Contents

  1. Business Context
  2. Dataset Design
  3. Raw Dataset Creation With Intentional Errors
  4. Identifying Errors Using PROC SQL
  5. Correcting Errors Using PROC SQL Functions
  6. Character Functions Explanation
  7. Numeric Functions Explanation
  8. Date Derivations Using MDY, INTCK, INTNX
  9. Utilization Classification Logic
  10. Fraud Detection Using Macro + SQL
  11. Append Using PROC SQL
  12. Merge Using PROC SQL
  13. Transpose Using PROC SQL
  14. Statistical Preparation Tables
  15. PROC DATASETS DELETE Cleanup
  16. Final Clean Dataset
  17. Conclusion

1. Business Context

Suppose the National Air Monitoring Authority installed sensors in Indian cities:

Delhi
Mumbai
Hyderabad
Chennai
Kolkata
Pune

Each sensor reports:

  • Pollution levels
  • Calibration status
  • Accuracy
  • Battery level
  • Service dates

Problems reported:

  • Some sensors show negative pollution
  • Some sensors show accuracy >100%
  • Some sensors not calibrated
  • Some sensors inactive for long time

Goal:

Build fraud detection and quality validation system using PROC SQL

2. Dataset Structure Design

Variables:

Variable

Description

Sensor_ID

Unique sensor identifier

City

Sensor location

PM2_5

Fine particle level

PM10

Coarse particle level

Calibration_Status

Calibration condition

Data_Gaps

Missing data days

Accuracy_Score

Accuracy percentage

Install_Date

Installation date

Service_Date

Last service date

Battery_Level

Battery %

Signal_Strength

Signal quality %

3. Raw Dataset Creation With Intentional Errors

PROC SQL;

CREATE TABLE air_quality_raw

(Sensor_ID CHAR(6),

City CHAR(20),

PM2_5 NUM,

PM10 NUM,

Calibration_Status CHAR(20),

Data_Gaps NUM,

Accuracy_Score NUM,

Install_Date DATE,

Service_Date DATE,

Battery_Level NUM,

Signal_Strength NUM);


INSERT INTO air_quality_raw

VALUES('S001','delhi',250,400,'notcalibrated',15,55,'01JAN2023'd,'01JAN2024'd,45,60);


INSERT INTO air_quality_raw

VALUES('S002','Mumbai ',35,60,'Calibrated',2,98,'15FEB2023'd,'15JAN2024'd,90,85);


INSERT INTO air_quality_raw

VALUES('S003','CHENNAI',-10,80,'Calibrated',5,88,'10MAR2023'd,'10FEB2024'd,75,70);


INSERT INTO air_quality_raw

VALUES('S004','Kolkata',120,.,'Calibrated',25,40,'01APR2023'd,'01JAN2024'd,30,40);


INSERT INTO air_quality_raw

VALUES('S005','Hyderabad',80,120,'unknown',0,105,'05MAY2023'd,'05JAN2024'd,95,90);


INSERT INTO air_quality_raw

VALUES('S006','Pune',60,95,'Calibrated',50,65,'07JUN2023'd,'07JAN2024'd,50,55);


INSERT INTO air_quality_raw

VALUES('S007','Delhi',300,500,'Calibrated',3,92,'01JUL2023'd,'01FEB2024'd,88,88);


INSERT INTO air_quality_raw

VALUES('S008','mumbai',45,70,'NotCalibrated',40,45,'01AUG2023'd,'01FEB2024'd,35,45);


INSERT INTO air_quality_raw

VALUES('S009','Chennai',55,85,'Calibrated',1,99,'01SEP2023'd,'01FEB2024'd,92,91);


INSERT INTO air_quality_raw

VALUES('S010','Kolkata',65,110,'Calibrated',10,85,'01OCT2023'd,'01FEB2024'd,80,75);


INSERT INTO air_quality_raw

VALUES('S011','Hyderabad',75,115,'Calibrated',0,97,'01NOV2023'd,'01FEB2024'd,95,95);


INSERT INTO air_quality_raw

VALUES('S012','Pune',85,140,'Calibrated',60,35,'01DEC2023'd,'01FEB2024'd,25,35);


INSERT INTO air_quality_raw

VALUES('S013','Delhi',95,150,'Calibrated',5,93,'01JAN2024'd,'01FEB2024'd,90,88);


INSERT INTO air_quality_raw

VALUES('S014','Mumbai',105,170,'Calibrated',7,89,'01JAN2024'd,'01FEB2024'd,85,82);


INSERT INTO air_quality_raw

VALUES('S015','Chennai',115,180,'Calibrated',9,87,'01JAN2024'd,'01FEB2024'd,82,80);


QUIT;

proc print data=air_quality_raw;

run;

OUTPUT:

ObsSensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_Strength
1S001delhi250400notcalibrated155501JAN2301JAN244560
2S002Mumbai3560Calibrated29815FEB2315JAN249085
3S003CHENNAI-1080Calibrated58810MAR2310FEB247570
4S004Kolkata120.Calibrated254001APR2301JAN243040
5S005Hyderabad80120unknown010505MAY2305JAN249590
6S006Pune6095Calibrated506507JUN2307JAN245055
7S007Delhi300500Calibrated39201JUL2301FEB248888
8S008mumbai4570NotCalibrated404501AUG2301FEB243545
9S009Chennai5585Calibrated19901SEP2301FEB249291
10S010Kolkata65110Calibrated108501OCT2301FEB248075
11S011Hyderabad75115Calibrated09701NOV2301FEB249595
12S012Pune85140Calibrated603501DEC2301FEB242535
13S013Delhi95150Calibrated59301JAN2401FEB249088
14S014Mumbai105170Calibrated78901JAN2401FEB248582
15S015Chennai115180Calibrated98701JAN2401FEB248280

Explanation Line-by-Line

PROC SQL
→ Starts SQL procedure

CREATE TABLE
→ Creates dataset

CHAR / NUM
→ Defines variable types

INSERT INTO
→ Adds observations

Intentional Errors Present

Error 1
City inconsistent case
delhi, Delhi, DELHI

Error 2
Trailing space
Mumbai(space)

Error 3
Negative PM2_5
-10 impossible

Error 4
Missing PM10

Error 5
Accuracy >100

Error 6
Calibration inconsistent

unknown
NotCalibrated
notcalibrated

Error 7
Excessive Data gaps

4. Identify Errors Using PROC SQL

PROC SQL;

SELECT *

FROM air_quality_raw

WHERE PM2_5 <0 OR PM10 IS NULL OR Accuracy_Score>100;

QUIT;

OUTPUT:

Sensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_Strength
S003CHENNAI-1080Calibrated58810MAR2310FEB247570
S004Kolkata120.Calibrated254001APR2301JAN243040
S005Hyderabad80120unknown010505MAY2305JAN249590

5. Correct Dataset Using PROC SQL

PROC SQL;

CREATE TABLE air_quality_clean AS

SELECT

STRIP(Sensor_ID) AS Sensor_ID,

PROPCASE(STRIP(City)) AS City,


CASE

WHEN PM2_5<0 THEN . ELSE PM2_5 END AS PM2_5,


COALESCE(PM10,PM2_5*1.5) AS PM10,

UPCASE(STRIP(Calibration_Status)) AS Calibration_Status,

Data_Gaps,


CASE

WHEN Accuracy_Score>100 THEN 100 ELSE Accuracy_Score END AS Accuracy_Score,


Install_Date FORMAT=DATE9.,

Service_Date FORMAT=DATE9.,

Battery_Level,

Signal_Strength

FROM air_quality_raw;

QUIT;

proc print data=air_quality_clean;

run;

OUTPUT:

ObsSensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_Strength
1S001Delhi250400NOTCALIBRATED155501JAN202301JAN20244560
2S002Mumbai3560CALIBRATED29815FEB202315JAN20249085
3S003Chennai.80CALIBRATED58810MAR202310FEB20247570
4S004Kolkata120180CALIBRATED254001APR202301JAN20243040
5S005Hyderabad80120UNKNOWN010005MAY202305JAN20249590
6S006Pune6095CALIBRATED506507JUN202307JAN20245055
7S007Delhi300500CALIBRATED39201JUL202301FEB20248888
8S008Mumbai4570NOTCALIBRATED404501AUG202301FEB20243545
9S009Chennai5585CALIBRATED19901SEP202301FEB20249291
10S010Kolkata65110CALIBRATED108501OCT202301FEB20248075
11S011Hyderabad75115CALIBRATED09701NOV202301FEB20249595
12S012Pune85140CALIBRATED603501DEC202301FEB20242535
13S013Delhi95150CALIBRATED59301JAN202401FEB20249088
14S014Mumbai105170CALIBRATED78901JAN202401FEB20248582
15S015Chennai115180CALIBRATED98701JAN202401FEB20248280

WHERE filters bad records

Explanation of Character Functions

STRIP
removes spaces

PROPCASE
Delhi correct format

UPCASE
CALIBRATED standard format

COALESCE
replaces missing value

Example

PM10 missing
replace PM2.5 *1.5

Explanation Numeric Logic

CASE
conditional correction

Accuracy >100
set to 100

Negative pollution
set missing

6. Date Derivations Using INTCK and INTNX

PROC SQL;

CREATE TABLE air_quality_dates AS

SELECT *,

INTCK('DAY',Install_Date,Service_Date) AS Service_Days,

INTNX('MONTH',Service_Date,6,'SAME') FORMAT=DATE9. AS Next_Service_Date

FROM air_quality_clean;

QUIT;

proc print data=air_quality_dates;

run;

OUTPUT:

ObsSensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_StrengthService_DaysNext_Service_Date
1S001Delhi250400NOTCALIBRATED155501JAN202301JAN2024456036501JUL2024
2S002Mumbai3560CALIBRATED29815FEB202315JAN2024908533415JUL2024
3S003Chennai.80CALIBRATED58810MAR202310FEB2024757033710AUG2024
4S004Kolkata120180CALIBRATED254001APR202301JAN2024304027501JUL2024
5S005Hyderabad80120UNKNOWN010005MAY202305JAN2024959024505JUL2024
6S006Pune6095CALIBRATED506507JUN202307JAN2024505521407JUL2024
7S007Delhi300500CALIBRATED39201JUL202301FEB2024888821501AUG2024
8S008Mumbai4570NOTCALIBRATED404501AUG202301FEB2024354518401AUG2024
9S009Chennai5585CALIBRATED19901SEP202301FEB2024929115301AUG2024
10S010Kolkata65110CALIBRATED108501OCT202301FEB2024807512301AUG2024
11S011Hyderabad75115CALIBRATED09701NOV202301FEB202495959201AUG2024
12S012Pune85140CALIBRATED603501DEC202301FEB202425356201AUG2024
13S013Delhi95150CALIBRATED59301JAN202401FEB202490883101AUG2024
14S014Mumbai105170CALIBRATED78901JAN202401FEB202485823101AUG2024
15S015Chennai115180CALIBRATED98701JAN202401FEB202482803101AUG2024

Explanation

INTCK
difference between dates

INTNX
adds future date

7. Utilization Classification

PROC SQL;

CREATE TABLE utilization AS

SELECT *,

CASE

WHEN Data_Gaps<=5 THEN 'HIGH'

WHEN Data_Gaps<=20 THEN 'MEDIUM' ELSE 'LOW'

END AS Utilization

FROM air_quality_dates;

QUIT;

proc print data=utilization;

run;

OUTPUT:

ObsSensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_StrengthService_DaysNext_Service_DateUtilization
1S001Delhi250400NOTCALIBRATED155501JAN202301JAN2024456036501JUL2024MEDIUM
2S002Mumbai3560CALIBRATED29815FEB202315JAN2024908533415JUL2024HIGH
3S003Chennai.80CALIBRATED58810MAR202310FEB2024757033710AUG2024HIGH
4S004Kolkata120180CALIBRATED254001APR202301JAN2024304027501JUL2024LOW
5S005Hyderabad80120UNKNOWN010005MAY202305JAN2024959024505JUL2024HIGH
6S006Pune6095CALIBRATED506507JUN202307JAN2024505521407JUL2024LOW
7S007Delhi300500CALIBRATED39201JUL202301FEB2024888821501AUG2024HIGH
8S008Mumbai4570NOTCALIBRATED404501AUG202301FEB2024354518401AUG2024LOW
9S009Chennai5585CALIBRATED19901SEP202301FEB2024929115301AUG2024HIGH
10S010Kolkata65110CALIBRATED108501OCT202301FEB2024807512301AUG2024MEDIUM
11S011Hyderabad75115CALIBRATED09701NOV202301FEB202495959201AUG2024HIGH
12S012Pune85140CALIBRATED603501DEC202301FEB202425356201AUG2024LOW
13S013Delhi95150CALIBRATED59301JAN202401FEB202490883101AUG2024HIGH
14S014Mumbai105170CALIBRATED78901JAN202401FEB202485823101AUG2024MEDIUM
15S015Chennai115180CALIBRATED98701JAN202401FEB202482803101AUG2024MEDIUM

Logic:

Low gaps = high utilization

Explanation

CASE
creates classification

8. Fraud Detection Macro

%MACRO FRAUD_CHECK;

PROC SQL;

CREATE TABLE fraud_detection AS

SELECT *,

CASE

WHEN Accuracy_Score<60 AND Data_Gaps>30 THEN 1

WHEN Calibration_Status NE 'CALIBRATED' THEN 1

ELSE 0

END AS Fraud_Flag

FROM utilization;

QUIT;

proc print data=fraud_detection;

run;

%MEND;


%FRAUD_CHECK;

OUTPUT:

ObsSensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_StrengthService_DaysNext_Service_DateUtilizationFraud_Flag
1S001Delhi250400NOTCALIBRATED155501JAN202301JAN2024456036501JUL2024MEDIUM1
2S002Mumbai3560CALIBRATED29815FEB202315JAN2024908533415JUL2024HIGH0
3S003Chennai.80CALIBRATED58810MAR202310FEB2024757033710AUG2024HIGH0
4S004Kolkata120180CALIBRATED254001APR202301JAN2024304027501JUL2024LOW0
5S005Hyderabad80120UNKNOWN010005MAY202305JAN2024959024505JUL2024HIGH1
6S006Pune6095CALIBRATED506507JUN202307JAN2024505521407JUL2024LOW0
7S007Delhi300500CALIBRATED39201JUL202301FEB2024888821501AUG2024HIGH0
8S008Mumbai4570NOTCALIBRATED404501AUG202301FEB2024354518401AUG2024LOW1
9S009Chennai5585CALIBRATED19901SEP202301FEB2024929115301AUG2024HIGH0
10S010Kolkata65110CALIBRATED108501OCT202301FEB2024807512301AUG2024MEDIUM0
11S011Hyderabad75115CALIBRATED09701NOV202301FEB202495959201AUG2024HIGH0
12S012Pune85140CALIBRATED603501DEC202301FEB202425356201AUG2024LOW1
13S013Delhi95150CALIBRATED59301JAN202401FEB202490883101AUG2024HIGH0
14S014Mumbai105170CALIBRATED78901JAN202401FEB202485823101AUG2024MEDIUM0
15S015Chennai115180CALIBRATED98701JAN202401FEB202482803101AUG2024MEDIUM0

Fraud indicators

Low accuracy
Not calibrated
High data gaps

Explanation

Macro reusable logic

Fraud_Flag

1 fraud
0 normal

9. Append Example Using PROC SQL

proc sql;

create table new_sensor as

select

    'S016' as Sensor_ID length=5,

    'Delhi' as City length=10,

    80 as PM2_5,

    120 as PM10,

    'CALIBRATED' as Calibration_Status length=12,

    3 as Data_Gaps,

    95 as Accuracy_Score,

    today() as Install_Date format=date9.,

    today() as Service_Date format=date9.,

    90 as Battery_Level,

    90 as Signal_Strength

from work.fraud_detection(obs=1);/*By Creating Dummy Record*/ 

quit;

proc print data=new_sensor;

run;

OUTPUT:

ObsSensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_Strength
1S016Delhi80120CALIBRATED39525FEB202625FEB20269090

PROC SQL;

CREATE TABLE combined AS

SELECT * FROM fraud_detection

UNION ALL

SELECT * FROM new_sensor;

QUIT;

proc print data=combined;

run;

OUTPUT:

ObsSensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_StrengthService_DaysNext_Service_DateUtilizationFraud_Flag
1S001Delhi250400NOTCALIBRATED155501JAN202301JAN2024456036501JUL2024MEDIUM1
2S002Mumbai3560CALIBRATED29815FEB202315JAN2024908533415JUL2024HIGH0
3S003Chennai.80CALIBRATED58810MAR202310FEB2024757033710AUG2024HIGH0
4S004Kolkata120180CALIBRATED254001APR202301JAN2024304027501JUL2024LOW0
5S005Hyderabad80120UNKNOWN010005MAY202305JAN2024959024505JUL2024HIGH1
6S006Pune6095CALIBRATED506507JUN202307JAN2024505521407JUL2024LOW0
7S007Delhi300500CALIBRATED39201JUL202301FEB2024888821501AUG2024HIGH0
8S008Mumbai4570NOTCALIBRATED404501AUG202301FEB2024354518401AUG2024LOW1
9S009Chennai5585CALIBRATED19901SEP202301FEB2024929115301AUG2024HIGH0
10S010Kolkata65110CALIBRATED108501OCT202301FEB2024807512301AUG2024MEDIUM0
11S011Hyderabad75115CALIBRATED09701NOV202301FEB202495959201AUG2024HIGH0
12S012Pune85140CALIBRATED603501DEC202301FEB202425356201AUG2024LOW1
13S013Delhi95150CALIBRATED59301JAN202401FEB202490883101AUG2024HIGH0
14S014Mumbai105170CALIBRATED78901JAN202401FEB202485823101AUG2024MEDIUM0
15S015Chennai115180CALIBRATED98701JAN202401FEB202482803101AUG2024MEDIUM0
16S016Delhi80120CALIBRATED39525FEB202625FEB20269090.. .

10. Merge Example Using PROC SQL

Maintenance table

PROC SQL;

CREATE TABLE maintenance

(Sensor_ID CHAR(6),

Technician CHAR(20));


INSERT INTO maintenance

VALUES('S001','Ravi');


INSERT INTO maintenance

VALUES('S002','Amit');


QUIT;

proc print data=maintenance;

run;

OUTPUT:

ObsSensor_IDTechnician
1S001Ravi
2S002Amit

Merge

PROC SQL;

CREATE TABLE merged AS

SELECT

A.*,

B.Technician


FROM combined A LEFT JOIN 

     maintenance B


ON A.Sensor_ID=B.Sensor_ID;

QUIT;

proc print data=merged;

run;

OUTPUT:

ObsSensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_StrengthService_DaysNext_Service_DateUtilizationFraud_FlagTechnician
1S001Delhi250400NOTCALIBRATED155501JAN202301JAN2024456036501JUL2024MEDIUM1Ravi
2S002Mumbai3560CALIBRATED29815FEB202315JAN2024908533415JUL2024HIGH0Amit
3S003Chennai.80CALIBRATED58810MAR202310FEB2024757033710AUG2024HIGH0 
4S004Kolkata120180CALIBRATED254001APR202301JAN2024304027501JUL2024LOW0 
5S005Hyderabad80120UNKNOWN010005MAY202305JAN2024959024505JUL2024HIGH1 
6S006Pune6095CALIBRATED506507JUN202307JAN2024505521407JUL2024LOW0 
7S007Delhi300500CALIBRATED39201JUL202301FEB2024888821501AUG2024HIGH0 
8S008Mumbai4570NOTCALIBRATED404501AUG202301FEB2024354518401AUG2024LOW1 
9S009Chennai5585CALIBRATED19901SEP202301FEB2024929115301AUG2024HIGH0 
10S010Kolkata65110CALIBRATED108501OCT202301FEB2024807512301AUG2024MEDIUM0 
11S011Hyderabad75115CALIBRATED09701NOV202301FEB202495959201AUG2024HIGH0 
12S012Pune85140CALIBRATED603501DEC202301FEB202425356201AUG2024LOW1 
13S013Delhi95150CALIBRATED59301JAN202401FEB202490883101AUG2024HIGH0 
14S014Mumbai105170CALIBRATED78901JAN202401FEB202485823101AUG2024MEDIUM0 
15S015Chennai115180CALIBRATED98701JAN202401FEB202482803101AUG2024MEDIUM0 
16S016Delhi80120CALIBRATED39525FEB202625FEB20269090.. . 

Explanation

LEFT JOIN
merge datasets

11. Transpose Example Using PROC SQL

PROC SQL;

CREATE TABLE city_summary AS

SELECT

City,

AVG(PM2_5) AS AVG_PM25,

AVG(PM10) AS AVG_PM10

FROM merged

GROUP BY City;

QUIT;

proc print data=city_summary;

run;

OUTPUT:

ObsCityAVG_PM25AVG_PM10
1Chennai85.000115.0
2Delhi181.250292.5
3Hyderabad77.500117.5
4Kolkata92.500145.0
5Mumbai61.667100.0
6Pune72.500117.5

Explanation

GROUP BY
aggregation

12. Create Statistical Ready Dataset

PROC SQL;

CREATE TABLE final_dataset AS

SELECT *,

PM10-PM2_5 AS Pollution_Difference

FROM merged;

QUIT;

proc print data=final_dataset;

run;

OUTPUT:

ObsSensor_IDCityPM2_5PM10Calibration_StatusData_GapsAccuracy_ScoreInstall_DateService_DateBattery_LevelSignal_StrengthService_DaysNext_Service_DateUtilizationFraud_FlagTechnicianPollution_Difference
1S001Delhi250400NOTCALIBRATED155501JAN202301JAN2024456036501JUL2024MEDIUM1Ravi150
2S002Mumbai3560CALIBRATED29815FEB202315JAN2024908533415JUL2024HIGH0Amit25
3S003Chennai.80CALIBRATED58810MAR202310FEB2024757033710AUG2024HIGH0 .
4S004Kolkata120180CALIBRATED254001APR202301JAN2024304027501JUL2024LOW0 60
5S005Hyderabad80120UNKNOWN010005MAY202305JAN2024959024505JUL2024HIGH1 40
6S006Pune6095CALIBRATED506507JUN202307JAN2024505521407JUL2024LOW0 35
7S007Delhi300500CALIBRATED39201JUL202301FEB2024888821501AUG2024HIGH0 200
8S008Mumbai4570NOTCALIBRATED404501AUG202301FEB2024354518401AUG2024LOW1 25
9S009Chennai5585CALIBRATED19901SEP202301FEB2024929115301AUG2024HIGH0 30
10S010Kolkata65110CALIBRATED108501OCT202301FEB2024807512301AUG2024MEDIUM0 45
11S011Hyderabad75115CALIBRATED09701NOV202301FEB202495959201AUG2024HIGH0 40
12S012Pune85140CALIBRATED603501DEC202301FEB202425356201AUG2024LOW1 55
13S013Delhi95150CALIBRATED59301JAN202401FEB202490883101AUG2024HIGH0 55
14S014Mumbai105170CALIBRATED78901JAN202401FEB202485823101AUG2024MEDIUM0 65
15S015Chennai115180CALIBRATED98701JAN202401FEB202482803101AUG2024MEDIUM0 65
16S016Delhi80120CALIBRATED39525FEB202625FEB20269090.. . 40

13. Delete Temporary Tables

PROC DATASETS LIB=WORK;

DELETE

air_quality_raw

air_quality_clean

air_quality_dates

utilization;

QUIT;

LOG:

NOTE: Deleting WORK.AIR_QUALITY_RAW (memtype=DATA).NOTE: Deleting WORK.AIR_QUALITY_CLEAN (memtype=DATA).NOTE: Deleting WORK.AIR_QUALITY_DATES (memtype=DATA).NOTE: Deleting WORK.UTILIZATION (memtype=DATA).

Explanation

Deletes unused tables

Improves performance

Final Dataset Contains

Sensor_ID
City
PM2_5
PM10
Calibration_Status
Accuracy_Score
Utilization
Fraud_Flag
Service_Days
Next_Service_Date

Interview Questions Answered

1.Question
How detect fraud?

Answer
CASE logic

2.Question
How merge datasets?

Answer
JOIN

3.Question
How clean text?

Answer

STRIP

PROPCASE

4.Question
How handle missing numeric?

Answer

COALESCE

Benefits

1.Improves data quality
2.Detects faulty sensors
3.Supports government decisions
4.Prevents fraud

Conclusion

This project demonstrated a complete real-world air quality sensor network analytics system using only PROC SQL in SAS. We created a dataset with intentional errors such as negative pollution values, missing PM10 readings, inconsistent calibration status, and accuracy scores exceeding 100 percent. Using SQL functions such as STRIP, PROPCASE, UPCASE, COALESCE, and CASE, we cleaned and standardized the data.

We derived service intervals using INTCK and future service schedules using INTNX. We classified utilization based on data gaps and implemented fraud detection using reusable macro logic. We simulated append operations using UNION ALL, merge operations using JOIN, and transpose-style summaries using GROUP BY aggregation.

Finally, we removed temporary datasets to maintain a clean SAS environment.

This approach reflects real industry workflows used in:

Environmental monitoring
Clinical trials
Bank fraud detection
IoT analytics

and demonstrates strong PROC SQL expertise required for SAS programmer interviews.


SAS INTERVIEW QUESTIONS

1. How do you insert records? 

2. What is PROC SQL in SAS? 

3. How do you create a dataset using PROC SQL?


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent AIR QUALITY data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Which Country Truly Dominates the Olympics? – A Complete SAS Medal Efficiency Analytics Project

2.Which Airports Are Really the Busiest? – An End-to-End SAS Airport Traffic Analytics Project

3.Can Data Predict Election Outcomes? – A Complete SAS Voting Analytics Project

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------








Comments

Popular posts from this blog

409.Can We Build a Reliable Emergency Services Analytics & Fraud Detection System in SAS While Identifying and Fixing Intentional Errors?

397.If a satellite has excellent signal strength but very high latency, can it still deliver good quality communication? Why or why not?A Sas Study

401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study