424.Can Advanced SAS Programming Detect Fraud And Correct Data Errors In Public Library Utilization Systems?

Smart Data Validation and Fraud Detection in Public Library Utilization Systems Using Advanced SAS Techniques

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | SET | MERGE | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | INPUT | LENGTH | FORMAT | IF THEN ELSE | MDY | INTCK | INTNX | COALESCE | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | MACRO | MEND

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1. Introduction

Public libraries play an important role in the knowledge ecosystem of cities. They provide access to books, digital content, community learning spaces, and research materials. Modern libraries maintain extensive databases that track information such as the number of books available, member registrations, digital usage metrics, operating costs, and service performance scores.

However, like any large-scale information system, library datasets may contain data quality problems. These issues may arise due to:

  • manual data entry mistakes
  • system synchronization errors
  • missing values
  • duplicated records
  • fraud in reporting library statistics
  • incorrect date formats
  • inconsistent capitalization or character formatting

If these issues are not detected and corrected, they can lead to incorrect decisions regarding funding allocation, staffing requirements, and service improvements.

SAS programming provides powerful tools to identify and correct these problems through:

  • data step programming
  • character and numeric functions
  • date processing
  • dataset merging
  • macro automation
  • fraud detection logic

In this project, we simulate a Public Library Management dataset that intentionally includes multiple errors. Then we use SAS programming techniques to detect and correct these issues.

The final goal is to demonstrate how advanced SAS techniques can help maintain accurate, reliable, and fraud-resistant library data systems.

2. Business Context

Cities operate multiple public libraries to support community education. Each library tracks key operational indicators such as:

  • number of books available
  • number of registered members
  • digital service usage
  • operational costs
  • service quality scores
  • membership fees

Government funding decisions are often based on these metrics.

However, if data is manipulated or entered incorrectly, it can cause:

  • overfunding inefficient libraries
  • underfunding high-performing libraries
  • fraudulent reporting of library performance
  • incorrect strategic planning

Therefore, maintaining clean, validated datasets is essential for library management.

SAS programming is widely used in analytics environments because it provides:

  • strong data validation tools
  • reliable data transformations
  • automated quality checks
  • fraud detection algorithms

This project demonstrates a simplified simulation of such a system.

3. Table of Contents

  1. Introduction
  2. Business Context
  3. Raw Dataset Creation
  4. Intentional Error Injection
  5. Character Data Cleaning
  6. Numeric Validation
  7. Date Format Handling
  8. Utilization Classification
  9. Fraud Detection Macro
  10. Dataset Merging
  11. Dataset Appending
  12. Dataset Transposing
  13. Final Dataset Creation
  14. Dataset Deletion
  15. Analytical Insights
  16. Project Summary
  17. Conclusion

1. Creating Raw Library Dataset With Intentional Errors

data library_raw;

length Library_Name $25 City $20 Utilization_Class $15;

input Library_Name $ City $ Books_Count Members Digital_Usage 

      Operating_Cost Service_Score Percentage Fees Join_Date :ddmmyy10.;

format Join_Date ddmmyy10.;

datalines;

central_library hyderabad 50000 1500 900 200000 8.5 85 200 12/05/2020

city_library Hyderabad 45000 1400 800 190000 8.0 80 180 15/06/2020

knowledge_hub vizag 30000 900 500 150000 7.2 72 150 10/07/2021

digital_library chennai 40000 1100 700 175000 7.8 78 170 20/08/2021

reading_center Bangalore 35000 1000 600 165000 7.5 75 160 11/09/2021

community_library pune 20000 800 450 120000 6.8 68 120 05/10/2021

metro_library delhi 60000 1800 1200 250000 9.0 90 220 17/11/2022

urban_library kolkata 42000 1300 750 185000 8.1 81 190 22/12/2022

study_zone hyderabad -30000 900 600 160000 7.3 73 160 01/01/2023

smart_library bangalore 38000 1100 700 -170000 7.6 76 170 14/02/2023

city_readers chennai 0 850 550 150000 7.0 70 150 20/03/2023

digital_hub vizag 45000 . 900 200000 8.3 83 200 05/04/2023

;

run;

proc print data=library_raw;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_Date
1central_libraryhyderabad 5000015009002000008.58520012/05/2020
2city_libraryHyderabad 4500014008001900008.08018015/06/2020
3knowledge_hubvizag 300009005001500007.27215010/07/2021
4digital_librarychennai 4000011007001750007.87817020/08/2021
5reading_centerBangalore 3500010006001650007.57516011/09/2021
6community_librarypune 200008004501200006.86812005/10/2021
7metro_librarydelhi 60000180012002500009.09022017/11/2022
8urban_librarykolkata 4200013007501850008.18119022/12/2022
9study_zonehyderabad -300009006001600007.37316001/01/2023
10smart_librarybangalore 380001100700-1700007.67617014/02/2023
11city_readerschennai 08505501500007.07015020/03/2023
12digital_hubvizag 45000.9002000008.38320005/04/2023

Explanation

This DATA step creates the raw dataset representing multiple public libraries across different Indian cities.

The dataset contains the following variables:

Library_Name
City
Books_Count
Members
Digital_Usage
Operating_Cost
Service_Score
Percentage
Fees
Join_Date

However, the dataset intentionally contains multiple data errors.

Examples include:

·  negative values in Books_Count

·  negative operating cost

·  missing values

·  zero book counts

·  inconsistent capitalization in city names

These errors will later be corrected using SAS programming techniques.

Errors Introduced Intentionally

·  Negative Books_Count (-30000)

·  Negative Operating_Cost (-170000)

·  Missing Members value

·  Zero Books_Count

·  inconsistent city capitalization

·  potential fraud in digital usage

These are typical data quality problems in real systems.

Why This Code Is Important

This code establishes the foundation dataset used throughout the project.

It simulates real-world scenarios where raw data contains inaccuracies.

Without a raw dataset containing errors, we cannot demonstrate data cleaning and fraud detection logic.

Key Points

·  DATA step creates a new dataset

·  LENGTH statement defines character variable sizes

·  INPUT statement reads raw data

·  FORMAT applies date display format

·  DATALINES loads the data

·  Join_Date uses DDMMYY format

·  Dataset contains intentional errors

·  Missing values represented using "."

·  Negative values simulate data entry mistakes

·  Dataset acts as starting point for cleaning

2. Standardizing Character Variables

data library_clean_char;

set library_raw;

Library_Name = propcase(Library_Name);

City = propcase(strip(City));

run;

proc print data=library_clean_char;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_Date
1Central_libraryHyderabad 5000015009002000008.58520012/05/2020
2City_libraryHyderabad 4500014008001900008.08018015/06/2020
3Knowledge_hubVizag 300009005001500007.27215010/07/2021
4Digital_libraryChennai 4000011007001750007.87817020/08/2021
5Reading_centerBangalore 3500010006001650007.57516011/09/2021
6Community_libraryPune 200008004501200006.86812005/10/2021
7Metro_libraryDelhi 60000180012002500009.09022017/11/2022
8Urban_libraryKolkata 4200013007501850008.18119022/12/2022
9Study_zoneHyderabad -300009006001600007.37316001/01/2023
10Smart_libraryBangalore 380001100700-1700007.67617014/02/2023
11City_readersChennai 08505501500007.07015020/03/2023
12Digital_hubVizag 45000.9002000008.38320005/04/2023

Explanation

Character variables often contain inconsistencies such as:

·  extra spaces

·  mixed capitalization

·  inconsistent naming formats

STRIP

Removes leading and trailing spaces.

PROPCASE

Converts text to proper case.

Example:

hyderabad → Hyderabad

Why It Is Used

Clean character variables ensure consistent grouping and reporting.

For example:

"Hyderabad"
"hyderabad"

would otherwise be treated as two different cities.

Key Points

·  SET reads existing dataset

·  STRIP removes extra spaces

·  PROPCASE standardizes capitalization

·  Improves data consistency

·  Prevents grouping errors

·  Helps in reporting

·  Improves data quality

·  Supports merging operations

·  Removes formatting inconsistencies

·  Standardizes textual variables

3. Using Character Functions

data library_char_functions;

set library_clean_char;

City_Upper = upcase(City);

City_Lower = lowcase(City);

Library_Label = catx('-',Library_Name,City);

run;

proc print data=library_char_functions;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_Label
1Central_libraryHyderabad 5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad
2City_libraryHyderabad 4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad
3Knowledge_hubVizag 300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag
4Digital_libraryChennai 4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai
5Reading_centerBangalore 3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore
6Community_libraryPune 200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune
7Metro_libraryDelhi 60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi
8Urban_libraryKolkata 4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata
9Study_zoneHyderabad -300009006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad
10Smart_libraryBangalore 380001100700-1700007.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore
11City_readersChennai 08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai
12Digital_hubVizag 45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag

Explanation

This program demonstrates various character manipulation functions.

UPCASE converts text to uppercase.

LOWCASE converts text to lowercase.

CATX concatenates values with a delimiter.

Example result:

Central_Library-Hyderabad

These functions help create new identifiers and standardized variables.

Key Points

·  UPCASE converts to uppercase

·  LOWCASE converts to lowercase

·  CATX concatenates with delimiter

·  Helps build unique identifiers

·  Useful in reporting systems

·  Prevents case-sensitive mismatches

·  Supports analytics grouping

·  Improves readability

·  Creates derived variables

·  Demonstrates character function usage

4. Numeric Error Detection

data library_numeric_check;

set library_char_functions;

if Books_Count < 0 then Books_Count = .;

if Operating_Cost < 0 then Operating_Cost = .;

run;

proc print data=library_numeric_check;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_Label
1Central_libraryHyderabad 5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad
2City_libraryHyderabad 4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad
3Knowledge_hubVizag 300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag
4Digital_libraryChennai 4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai
5Reading_centerBangalore 3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore
6Community_libraryPune 200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune
7Metro_libraryDelhi 60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi
8Urban_libraryKolkata 4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata
9Study_zoneHyderabad .9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad
10Smart_libraryBangalore 380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore
11City_readersChennai 08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai
12Digital_hubVizag 45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag

Explanation

Numeric validation ensures that variables contain logical values.

Books count cannot be negative.

Operating cost cannot be negative.

If such values exist, they are converted to missing values.

Why It Is Used

This prevents invalid numeric values from affecting calculations.

For example:

negative costs would distort financial summaries.

Key Points

·  Numeric validation ensures logical values

·  Negative values replaced with missing

·  Improves data accuracy

·  Prevents calculation errors

·  Supports reliable analytics

·  Handles data entry mistakes

·  Ensures realistic business metrics

·  Protects financial reporting

·  Maintains dataset integrity

·  Prepares dataset for further processing

5. Creating Standardized Date Variables Using MDY Function

data library_dates;

set library_numeric_check;

Open_Date = mdy(1,1,2020);

Years_Active = intck('year', Open_Date, Join_Date);

format Open_Date date9.;

run;

proc print data=library_dates;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_Active
1Central_libraryHyderabad 5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN20200
2City_libraryHyderabad 4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN20200
3Knowledge_hubVizag 300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN20201
4Digital_libraryChennai 4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN20201
5Reading_centerBangalore 3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN20201
6Community_libraryPune 200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN20201
7Metro_libraryDelhi 60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN20202
8Urban_libraryKolkata 4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN20202
9Study_zoneHyderabad .9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN20203
10Smart_libraryBangalore 380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN20203
11City_readersChennai 08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN20203
12Digital_hubVizag 45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN20203

Explanation

Dates are extremely important in operational datasets. Libraries track when members join, when services start, and how long systems have been active.

In this code we create a standard opening date for libraries and calculate how many years the library has been active.

Functions Used

MDY

Creates a SAS date value using:

MDY(month, day, year)

Example:

mdy(1,1,2020)

creates the date 01JAN2020.

INTCK

Calculates the number of intervals between two dates.

INTCK(interval,start_date,end_date)

Example:

INTCK('year', Open_Date, Join_Date)

This calculates the number of years between the library opening date and the join date.

Why This Code Is Important

Libraries often analyze service performance based on years of operation.

For example:

Older libraries may have:

·  higher membership

·  larger book collections

·  higher operating costs

This code enables those analyses.

Key Points

·  MDY creates SAS date values

·  Open_Date represents library start date

·  INTCK calculates time differences

·  Years_Active measures operational duration

·  FORMAT improves readability of dates

·  Date variables support trend analysis

·  Helps compare library performance over time

·  Enables lifecycle analytics

·  Important for operational reports

·  Date functions are widely used in SAS

6. Using INTNX For Future Planning Dates

data library_planning;

set library_dates;

Next_Audit_Date = intnx('year', Join_Date, 1, 'same');

format Next_Audit_Date date9.;

run;

proc print data=library_planning;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_ActiveNext_Audit_Date
1Central_libraryHyderabad 5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN2020012MAY2021
2City_libraryHyderabad 4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN2020015JUN2021
3Knowledge_hubVizag 300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN2020110JUL2022
4Digital_libraryChennai 4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN2020120AUG2022
5Reading_centerBangalore 3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN2020111SEP2022
6Community_libraryPune 200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN2020105OCT2022
7Metro_libraryDelhi 60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN2020217NOV2023
8Urban_libraryKolkata 4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN2020222DEC2023
9Study_zoneHyderabad .9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN2020301JAN2024
10Smart_libraryBangalore 380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN2020314FEB2024
11City_readersChennai 08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN2020320MAR2024
12Digital_hubVizag 45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN2020305APR2024

Explanation

Libraries conduct periodic audits of their operations and services.

This code calculates the next audit date, which is scheduled one year after the join date.

INTNX Function

INTNX moves a date forward or backward by a specified interval.

INTNX(interval,start_date,increment)

Example:

INTNX('year', Join_Date, 1)

This moves the date forward by one year.

Alignment Option

The option 'same' ensures that the same day and month are preserved.

Why This Code Is Used

Organizations need automated ways to schedule audits and inspections.

Using INTNX allows SAS to automatically generate future operational dates.

Key Points

·  INTNX shifts dates forward or backward

·  Used for scheduling future events

·  Helps plan library audits

·  Maintains consistent scheduling intervals

·  Prevents manual date calculations

·  Works with months, quarters, and years

·  Useful in financial forecasting

·  Important for operational planning

·  Reduces scheduling errors

·  Supports automated reporting systems

7. Utilization Classification

data library_utilization;

set library_planning;

Usage_Ratio = Digital_Usage / Members;

if Usage_Ratio >= 0.6 then Utilization_Class="High";

else if Usage_Ratio >=0.5 then Utilization_Class="Medium";

else Utilization_Class="Low";

run;

proc print data=library_utilization;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_ActiveNext_Audit_DateUsage_Ratio
1Central_libraryHyderabadHigh5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN2020012MAY20210.60000
2City_libraryHyderabadMedium4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN2020015JUN20210.57143
3Knowledge_hubVizagMedium300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN2020110JUL20220.55556
4Digital_libraryChennaiHigh4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN2020120AUG20220.63636
5Reading_centerBangaloreHigh3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN2020111SEP20220.60000
6Community_libraryPuneMedium200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN2020105OCT20220.56250
7Metro_libraryDelhiHigh60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN2020217NOV20230.66667
8Urban_libraryKolkataMedium4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN2020222DEC20230.57692
9Study_zoneHyderabadHigh.9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN2020301JAN20240.66667
10Smart_libraryBangaloreHigh380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN2020314FEB20240.63636
11City_readersChennaiHigh08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN2020320MAR20240.64706
12Digital_hubVizagLow45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN2020305APR2024.

Explanation

Library managers often measure how actively members use digital services.

We calculate a usage ratio.

Usage_Ratio = Digital_Usage / Members

This measures how many members actively use digital services.

Classification Logic

Ratio

Classification

>=0.6

High

>=0.5

Medium

<0.4

Low

Why This Is Important

This classification helps libraries understand how effectively digital resources are being used.

Libraries with low utilization may require:

·  digital awareness campaigns

·  training programs

·  improved infrastructure

Key Points

·  Usage_Ratio measures digital activity

·  IF-THEN logic creates classification

·  High utilization indicates strong digital engagement

·  Medium indicates moderate usage

·  Low indicates underutilization

·  Helps improve digital strategy

·  Supports performance benchmarking

·  Enables service optimization

·  Helps management decision-making

·  Improves operational efficiency

8. Fraud Detection Logic

data library_fraud_flag;

set library_utilization;

Fraud_Flag = 0;

if Digital_Usage > Members then Fraud_Flag = 1;

if Books_Count > 50000 then Fraud_Flag = 1;

if Percentage > 85 then Fraud_Flag = 1;

run;

proc print data=library_fraud_flag;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_ActiveNext_Audit_DateUsage_RatioFraud_Flag
1Central_libraryHyderabadHigh5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN2020012MAY20210.600000
2City_libraryHyderabadMedium4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN2020015JUN20210.571430
3Knowledge_hubVizagMedium300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN2020110JUL20220.555560
4Digital_libraryChennaiHigh4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN2020120AUG20220.636360
5Reading_centerBangaloreHigh3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN2020111SEP20220.600000
6Community_libraryPuneMedium200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN2020105OCT20220.562500
7Metro_libraryDelhiHigh60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN2020217NOV20230.666671
8Urban_libraryKolkataMedium4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN2020222DEC20230.576920
9Study_zoneHyderabadHigh.9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN2020301JAN20240.666670
10Smart_libraryBangaloreHigh380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN2020314FEB20240.636360
11City_readersChennaiHigh08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN2020320MAR20240.647060
12Digital_hubVizagLow45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN2020305APR2024.1

Explanation

Some libraries may manipulate statistics to obtain additional funding.

Examples include:

·  reporting more digital usage than members

·  inflating book counts

·  exaggerating service scores

This code identifies suspicious records.

Fraud Conditions

Condition 1

Digital_Usage > Members

This is logically impossible.

Condition 2

Books_Count > 50000

This may indicate inflated inventory numbers.

Condition 3

Percentage > 100

Fraud_Flag

0 = Normal record

1 = Suspicious record

Why This Code Is Important

Fraud detection is a critical part of modern data analytics systems.

Automated detection helps organizations:

·  prevent manipulation

·  ensure fair funding

·  maintain transparency

Key Points

·  Fraud_Flag identifies suspicious records

·  Logical validation ensures data integrity

·  Impossible values trigger flags

·  Helps auditors investigate anomalies

·  Prevents statistical manipulation

·  Supports regulatory compliance

·  Enables automated monitoring

·  Improves governance

·  Protects financial resources

·  Strengthens data reliability

9. Creating Fraud Detection Macro

%macro fraud_check(data=);

data fraud_result;

set &data;

length Fraud_Level $10.;

Fraud_Level="Normal";

if Fraud_Flag=1 then Fraud_Level="High Risk";

run;

proc print data=fraud_result;

run;

%mend;


%fraud_check(data=library_fraud_flag);

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_ActiveNext_Audit_DateUsage_RatioFraud_FlagFraud_Level
1Central_libraryHyderabadHigh5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN2020012MAY20210.600000Normal
2City_libraryHyderabadMedium4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN2020015JUN20210.571430Normal
3Knowledge_hubVizagMedium300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN2020110JUL20220.555560Normal
4Digital_libraryChennaiHigh4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN2020120AUG20220.636360Normal
5Reading_centerBangaloreHigh3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN2020111SEP20220.600000Normal
6Community_libraryPuneMedium200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN2020105OCT20220.562500Normal
7Metro_libraryDelhiHigh60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN2020217NOV20230.666671High Risk
8Urban_libraryKolkataMedium4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN2020222DEC20230.576920Normal
9Study_zoneHyderabadHigh.9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN2020301JAN20240.666670Normal
10Smart_libraryBangaloreHigh380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN2020314FEB20240.636360Normal
11City_readersChennaiHigh08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN2020320MAR20240.647060Normal
12Digital_hubVizagLow45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN2020305APR2024.1High Risk

Explanation

Macros allow SAS programmers to automate repetitive tasks.

Instead of writing fraud detection code repeatedly, we define a reusable macro.

Macro Components

%MACRO

Defines the macro.

&data

Macro variable representing input dataset.

%MEND

Ends the macro definition.

Why Macros Are Important

Macros are heavily used in enterprise SAS environments because they:

·  automate workflows

·  reduce code duplication

·  improve efficiency

Key Points

·  Macros automate SAS programs

·  %MACRO defines a reusable program

·  &data passes dataset names

·  %MEND ends macro definition

·  Reduces repetitive coding

·  Improves scalability

·  Used heavily in production systems

·  Enhances maintainability

·  Supports parameterized programs

·  Essential SAS programming skill

10. Creating Additional Library Dataset

data library_finance;

length Library_Name $25 Maintenance_Cost 8 Staff_Count 8;

input Library_Name $ Maintenance_Cost Staff_Count;

datalines;

Central_Library 50000 45

City_Library 45000 40

Knowledge_Hub 30000 30

Digital_Library 38000 35

Reading_Center 35000 32

Community_Library 20000 20

Metro_Library 60000 50

Urban_Library 42000 38

Study_Zone 31000 25

Smart_Library 39000 33

City_Readers 27000 22

Digital_Hub 41000 36

;

run;

proc print data=library_finance;

run;

OUTPUT:

ObsLibrary_NameMaintenance_CostStaff_Count
1Central_Library5000045
2City_Library4500040
3Knowledge_Hub3000030
4Digital_Library3800035
5Reading_Center3500032
6Community_Library2000020
7Metro_Library6000050
8Urban_Library4200038
9Study_Zone3100025
10Smart_Library3900033
11City_Readers2700022
12Digital_Hub4100036

Explanation

This code creates a second dataset representing financial information about libraries.

Variables include:

Library_Name
Maintenance_Cost
Staff_Count

This dataset simulates information coming from a financial management system.

In real organizations, operational data and financial data are usually stored in separate databases.

Why This Code Is Important

To perform meaningful analysis, data from multiple systems must be combined.

This dataset will later be merged with the operational library dataset.

Key Points

·  DATA step creates a new dataset

·  LENGTH defines variable attributes

·  INPUT reads financial data

·  DATALINES loads observations

·  Dataset represents financial metrics

·  Simulates external data source

·  Contains maintenance costs

·  Includes staff counts

·  Prepares dataset for merging

·  Demonstrates multi-source data environment

11. Combining Datasets Using MERGE

proc sort data=fraud_result;

by Library_Name;

run;

proc print data=fraud_result;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_ActiveNext_Audit_DateUsage_RatioFraud_FlagFraud_Level
1Central_libraryHyderabadHigh5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN2020012MAY20210.600000Normal
2City_libraryHyderabadMedium4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN2020015JUN20210.571430Normal
3City_readersChennaiHigh08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN2020320MAR20240.647060Normal
4Community_libraryPuneMedium200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN2020105OCT20220.562500Normal
5Digital_hubVizagLow45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN2020305APR2024.1High Risk
6Digital_libraryChennaiHigh4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN2020120AUG20220.636360Normal
7Knowledge_hubVizagMedium300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN2020110JUL20220.555560Normal
8Metro_libraryDelhiHigh60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN2020217NOV20230.666671High Risk
9Reading_centerBangaloreHigh3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN2020111SEP20220.600000Normal
10Smart_libraryBangaloreHigh380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN2020314FEB20240.636360Normal
11Study_zoneHyderabadHigh.9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN2020301JAN20240.666670Normal
12Urban_libraryKolkataMedium4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN2020222DEC20230.576920Normal

proc sort data=library_finance;

by Library_Name;

run;

proc print data=library_finance;

run;

OUTPUT:

ObsLibrary_NameMaintenance_CostStaff_Count
1Central_Library5000045
2City_Library4500040
3City_Readers2700022
4Community_Library2000020
5Digital_Hub4100036
6Digital_Library3800035
7Knowledge_Hub3000030
8Metro_Library6000050
9Reading_Center3500032
10Smart_Library3900033
11Study_Zone3100025
12Urban_Library4200038

data library_merged;

merge fraud_result library_finance;

by Library_Name;

run;

proc print data=library_merged;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_ActiveNext_Audit_DateUsage_RatioFraud_FlagFraud_LevelMaintenance_CostStaff_Count
1Central_Library  ........   ..... 5000045
2Central_libraryHyderabadHigh5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN2020012MAY20210.600000Normal..
3City_Library  ........   ..... 4500040
4City_Readers  ........   ..... 2700022
5City_libraryHyderabadMedium4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN2020015JUN20210.571430Normal..
6City_readersChennaiHigh08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN2020320MAR20240.647060Normal..
7Community_Library  ........   ..... 2000020
8Community_libraryPuneMedium200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN2020105OCT20220.562500Normal..
9Digital_Hub  ........   ..... 4100036
10Digital_Library  ........   ..... 3800035
11Digital_hubVizagLow45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN2020305APR2024.1High Risk..
12Digital_libraryChennaiHigh4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN2020120AUG20220.636360Normal..
13Knowledge_Hub  ........   ..... 3000030
14Knowledge_hubVizagMedium300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN2020110JUL20220.555560Normal..
15Metro_Library  ........   ..... 6000050
16Metro_libraryDelhiHigh60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN2020217NOV20230.666671High Risk..
17Reading_Center  ........   ..... 3500032
18Reading_centerBangaloreHigh3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN2020111SEP20220.600000Normal..
19Smart_Library  ........   ..... 3900033
20Smart_libraryBangaloreHigh380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN2020314FEB20240.636360Normal..
21Study_Zone  ........   ..... 3100025
22Study_zoneHyderabadHigh.9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN2020301JAN20240.666670Normal..
23Urban_Library  ........   ..... 4200038
24Urban_libraryKolkataMedium4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN2020222DEC20230.576920Normal..

Explanation

The MERGE statement combines datasets based on a common variable.

Here the common variable is:

Library_Name

Before merging, both datasets must be sorted by the BY variable.

Why Sorting Is Required

SAS requires datasets to be sorted before performing a merge using BY statements.

If sorting is skipped, SAS may produce incorrect results.

Result

The merged dataset now contains:

Operational Data
+
Financial Data

This allows analysts to study relationships between library usage and financial resources.

Key Points

·  PROC SORT organizes datasets

·  BY statement defines merge key

·  MERGE combines datasets

·  Sorting ensures correct matching

·  Enables multi-source integration

·  Adds financial variables

·  Creates richer dataset

·  Supports advanced analysis

·  Common technique in data engineering

·  Critical skill for SAS programmers

12. Using SET Statement

data library_copy;

 set library_merged;

run;

proc print data=library_copy;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_ActiveNext_Audit_DateUsage_RatioFraud_FlagFraud_LevelMaintenance_CostStaff_Count
1Central_Library  ........   ..... 5000045
2Central_libraryHyderabadHigh5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN2020012MAY20210.600000Normal..
3City_Library  ........   ..... 4500040
4City_Readers  ........   ..... 2700022
5City_libraryHyderabadMedium4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN2020015JUN20210.571430Normal..
6City_readersChennaiHigh08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN2020320MAR20240.647060Normal..
7Community_Library  ........   ..... 2000020
8Community_libraryPuneMedium200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN2020105OCT20220.562500Normal..
9Digital_Hub  ........   ..... 4100036
10Digital_Library  ........   ..... 3800035
11Digital_hubVizagLow45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN2020305APR2024.1High Risk..
12Digital_libraryChennaiHigh4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN2020120AUG20220.636360Normal..
13Knowledge_Hub  ........   ..... 3000030
14Knowledge_hubVizagMedium300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN2020110JUL20220.555560Normal..
15Metro_Library  ........   ..... 6000050
16Metro_libraryDelhiHigh60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN2020217NOV20230.666671High Risk..
17Reading_Center  ........   ..... 3500032
18Reading_centerBangaloreHigh3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN2020111SEP20220.600000Normal..
19Smart_Library  ........   ..... 3900033
20Smart_libraryBangaloreHigh380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN2020314FEB20240.636360Normal..
21Study_Zone  ........   ..... 3100025
22Study_zoneHyderabadHigh.9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN2020301JAN20240.666670Normal..
23Urban_Library  ........   ..... 4200038
24Urban_libraryKolkataMedium4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN2020222DEC20230.576920Normal..

Explanation

The SET statement reads observations from an existing dataset and creates a new dataset.

Here we create a copy of the merged dataset.

Why This Is Useful

Sometimes analysts want to:

·  preserve the original dataset

·  create working copies for transformations

The SET statement allows safe data manipulation.

Key Points

·  SET reads existing datasets

·  Creates new dataset copies

·  Prevents modification of original data

·  Useful for data transformations

·  Supports sequential data processing

·  Allows derived variable creation

·  Enables data pipeline design

·  Maintains data safety

·  Commonly used in SAS workflows

·  Fundamental SAS concept

13. Appending New Records

data new_library;

length Library_Name $25 City $20 Utilization_Class $15;

input Library_Name $ City $ Books_Count Members Digital_Usage 

      Operating_Cost Service_Score Percentage Fees Join_Date :ddmmyy10.;

datalines;

Future_Library Hyderabad 30000 1000 600 150000 7.5 75 150 01/05/2023

;

run;

proc print data=new_library;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_Date
1Future_LibraryHyderabad 3000010006001500007.57515023131

proc append base=library_copy 

            data=new_library;

run;

proc print data=library_copy;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_ActiveNext_Audit_DateUsage_RatioFraud_FlagFraud_LevelMaintenance_CostStaff_Count
1Central_Library  ........   ..... 5000045
2Central_libraryHyderabadHigh5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN2020012MAY20210.600000Normal..
3City_Library  ........   ..... 4500040
4City_Readers  ........   ..... 2700022
5City_libraryHyderabadMedium4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN2020015JUN20210.571430Normal..
6City_readersChennaiHigh08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN2020320MAR20240.647060Normal..
7Community_Library  ........   ..... 2000020
8Community_libraryPuneMedium200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN2020105OCT20220.562500Normal..
9Digital_Hub  ........   ..... 4100036
10Digital_Library  ........   ..... 3800035
11Digital_hubVizagLow45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN2020305APR2024.1High Risk..
12Digital_libraryChennaiHigh4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN2020120AUG20220.636360Normal..
13Knowledge_Hub  ........   ..... 3000030
14Knowledge_hubVizagMedium300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN2020110JUL20220.555560Normal..
15Metro_Library  ........   ..... 6000050
16Metro_libraryDelhiHigh60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN2020217NOV20230.666671High Risk..
17Reading_Center  ........   ..... 3500032
18Reading_centerBangaloreHigh3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN2020111SEP20220.600000Normal..
19Smart_Library  ........   ..... 3900033
20Smart_libraryBangaloreHigh380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN2020314FEB20240.636360Normal..
21Study_Zone  ........   ..... 3100025
22Study_zoneHyderabadHigh.9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN2020301JAN20240.666670Normal..
23Urban_Library  ........   ..... 4200038
24Urban_libraryKolkataMedium4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN2020222DEC20230.576920Normal..
25Future_LibraryHyderabad 3000010006001500007.57515001/05/2023   ..... ..
 Explanation

Sometimes new data arrives after a dataset is already created.

The PROC APPEND procedure allows new records to be added to an existing dataset.

Instead of rebuilding the entire dataset, SAS simply adds the new observations.

Why APPEND Is Important

In production systems, datasets grow continuously.

APPEND provides a fast and efficient way to add new records.

Key Points

·  PROC APPEND adds records to dataset

·  BASE specifies existing dataset

·  DATA specifies new dataset

·  Faster than rebuilding dataset

·  Efficient for large datasets

·  Used in ETL pipelines

·  Supports incremental data loading

·  Prevents unnecessary recomputation

·  Widely used in production systems

·  Important SAS procedure

14. Using COALESCE Function

data library_coalesce;

set library_copy;

Members_Final = coalesce(Members,0);

run;

proc print data=library_coalesce;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelOpen_DateYears_ActiveNext_Audit_DateUsage_RatioFraud_FlagFraud_LevelMaintenance_CostStaff_CountMembers_Final
1Central_Library  ........   ..... 50000450
2Central_libraryHyderabadHigh5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad01JAN2020012MAY20210.600000Normal..1500
3City_Library  ........   ..... 45000400
4City_Readers  ........   ..... 27000220
5City_libraryHyderabadMedium4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad01JAN2020015JUN20210.571430Normal..1400
6City_readersChennaiHigh08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai01JAN2020320MAR20240.647060Normal..850
7Community_Library  ........   ..... 20000200
8Community_libraryPuneMedium200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune01JAN2020105OCT20220.562500Normal..800
9Digital_Hub  ........   ..... 41000360
10Digital_Library  ........   ..... 38000350
11Digital_hubVizagLow45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag01JAN2020305APR2024.1High Risk..0
12Digital_libraryChennaiHigh4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai01JAN2020120AUG20220.636360Normal..1100
13Knowledge_Hub  ........   ..... 30000300
14Knowledge_hubVizagMedium300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag01JAN2020110JUL20220.555560Normal..900
15Metro_Library  ........   ..... 60000500
16Metro_libraryDelhiHigh60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi01JAN2020217NOV20230.666671High Risk..1800
17Reading_Center  ........   ..... 35000320
18Reading_centerBangaloreHigh3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore01JAN2020111SEP20220.600000Normal..1000
19Smart_Library  ........   ..... 39000330
20Smart_libraryBangaloreHigh380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore01JAN2020314FEB20240.636360Normal..1100
21Study_Zone  ........   ..... 31000250
22Study_zoneHyderabadHigh.9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad01JAN2020301JAN20240.666670Normal..900
23Urban_Library  ........   ..... 42000380
24Urban_libraryKolkataMedium4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata01JAN2020222DEC20230.576920Normal..1300
25Future_LibraryHyderabad 3000010006001500007.57515001/05/2023   ..... ..1000

Explanation

The COALESCE function replaces missing values with the first non-missing value.

In this example:

If Members is missing, it will be replaced with 0.

Why This Is Important

Missing values can cause problems in calculations.

For example:

Average membership calculations may become incorrect.

Replacing missing values ensures stable analysis results.

Key Points

·  COALESCE handles missing values

·  Returns first non-missing value

·  Improves dataset completeness

·  Prevents calculation errors

·  Useful in large datasets

·  Helps data validation

·  Supports reliable analytics

·  Reduces null value problems

·  Commonly used numeric function

·  Essential for data cleaning

15. Transposing Dataset

proc transpose data=library_coalesce out=library_transpose;

var Books_Count Members Digital_Usage;

run;

proc print data=library_transpose;

run;

OUTPUT:

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10COL11COL12COL13COL14COL15COL16COL17COL18COL19COL20COL21COL22COL23COL24COL25
1Books_Count.50000..450000.20000..4500040000.30000.60000.35000.38000...4200030000
2Members.1500..1400850.800...1100.900.1800.1000.1100.900.13001000
3Digital_Usage.900..800550.450..900700.500.1200.600.700.600.750600

Explanation

Sometimes datasets must be restructured for reporting or visualization.

PROC TRANSPOSE converts rows into columns.

Example transformation:

Variable

  Value

Books_Count

Members

Digital_Usage

becomes:

| Column1 | Column2 | Column3 |

Why This Is Important

Transposing data helps when creating:

·  reports

·  dashboards

·  statistical summaries

Key Points

·  PROC TRANSPOSE restructures datasets

·  Converts rows to columns

·  Useful for reports

·  Helps data visualization

·  Simplifies analysis

·  Supports pivot-style datasets

·  Enables flexible formatting

·  Used in reporting systems

·  Useful for summary tables

·  Important SAS procedure

16. Deleting Temporary Datasets

proc datasets library=work;

delete library_copy library_transpose new_library;

run;

quit;

LOG:

NOTE: Deleting WORK.LIBRARY_COPY (memtype=DATA).
NOTE: Deleting WORK.LIBRARY_TRANSPOSE (memtype=DATA).
NOTE: Deleting WORK.NEW_LIBRARY (memtype=DATA).

Explanation

During data processing, many temporary datasets are created.

These datasets consume memory and storage.

PROC DATASETS allows programmers to delete unnecessary datasets.

Why This Is Important

Efficient memory management is essential in large data environments.

Deleting unused datasets keeps the SAS workspace clean and efficient.

Key Points

·  PROC DATASETS manages datasets

·  DELETE removes datasets

·  WORK library stores temporary datasets

·  Helps memory management

·  Keeps environment clean

·  Improves processing efficiency

·  Reduces clutter

·  Important in production workflows

·  Prevents storage overflow

·  Essential data housekeeping step

17. Complete Corrected SAS Program

data library_final;

set library_raw;

Library_Name = propcase(strip(Library_Name));

City = propcase(strip(City));

City_Upper = upcase(City);

City_Lower = lowcase(City);

Library_Label = catx('-',Library_Name,City);

if Books_Count < 0 then Books_Count = .;

if Operating_Cost < 0 then Operating_Cost = .;

Members_Final = coalesce(Members,0);

Open_Date = mdy(1,1,2020);

Years_Active = intck('year', Open_Date, Join_Date);

Next_Audit_Date = intnx('year', Join_Date, 1, 'same');

Usage_Ratio = Digital_Usage / Members_Final;

if Usage_Ratio >=0.6 then Utilization_Class="High";

else if Usage_Ratio >=0.5 then Utilization_Class="Medium";

else Utilization_Class="Low";

Fraud_Flag = 0;

if Digital_Usage > Members_Final then Fraud_Flag=1;

if Books_Count > 50000 then Fraud_Flag=1;

if Percentage > 85 then Fraud_Flag=1;

format Open_Date Next_Audit_Date date9.;

run;

proc print data=library_final;

run;

OUTPUT:

ObsLibrary_NameCityUtilization_ClassBooks_CountMembersDigital_UsageOperating_CostService_ScorePercentageFeesJoin_DateCity_UpperCity_LowerLibrary_LabelMembers_FinalOpen_DateYears_ActiveNext_Audit_DateUsage_RatioFraud_Flag
1Central_libraryHyderabadHigh5000015009002000008.58520012/05/2020HYDERABADhyderabadCentral_library-Hyderabad150001JAN2020012MAY20210.600000
2City_libraryHyderabadMedium4500014008001900008.08018015/06/2020HYDERABADhyderabadCity_library-Hyderabad140001JAN2020015JUN20210.571430
3Knowledge_hubVizagMedium300009005001500007.27215010/07/2021VIZAGvizagKnowledge_hub-Vizag90001JAN2020110JUL20220.555560
4Digital_libraryChennaiHigh4000011007001750007.87817020/08/2021CHENNAIchennaiDigital_library-Chennai110001JAN2020120AUG20220.636360
5Reading_centerBangaloreHigh3500010006001650007.57516011/09/2021BANGALOREbangaloreReading_center-Bangalore100001JAN2020111SEP20220.600000
6Community_libraryPuneMedium200008004501200006.86812005/10/2021PUNEpuneCommunity_library-Pune80001JAN2020105OCT20220.562500
7Metro_libraryDelhiHigh60000180012002500009.09022017/11/2022DELHIdelhiMetro_library-Delhi180001JAN2020217NOV20230.666671
8Urban_libraryKolkataMedium4200013007501850008.18119022/12/2022KOLKATAkolkataUrban_library-Kolkata130001JAN2020222DEC20230.576920
9Study_zoneHyderabadHigh.9006001600007.37316001/01/2023HYDERABADhyderabadStudy_zone-Hyderabad90001JAN2020301JAN20240.666670
10Smart_libraryBangaloreHigh380001100700.7.67617014/02/2023BANGALOREbangaloreSmart_library-Bangalore110001JAN2020314FEB20240.636360
11City_readersChennaiHigh08505501500007.07015020/03/2023CHENNAIchennaiCity_readers-Chennai85001JAN2020320MAR20240.647060
12Digital_hubVizagLow45000.9002000008.38320005/04/2023VIZAGvizagDigital_hub-Vizag001JAN2020305APR2024.1

Intentional Errors Introduced In The Raw Dataset

When the raw dataset library_raw was created earlier, several errors were intentionally inserted to simulate real-world data problems.

Below are the errors.

Error 1 — Negative Books Count

Example:

Study_Zone  -30000

Problem:

A library cannot have a negative number of books.

Reason for error:

Manual data entry mistake or corrupted system record.

Correction logic:

if Books_Count < 0 then Books_Count = .;

This replaces invalid values with missing values.

Error 2 — Negative Operating Cost

Example:

Smart_Library  Operating_Cost = -170000

Problem:

Operating costs cannot be negative.

Possible reason:

Financial system import error.

Correction:

if Operating_Cost < 0 then Operating_Cost = .;

Error 3 — Missing Members Value

Example:

Digital_Hub Members = .

Problem:

Membership information missing.

Correction using COALESCE:

Members_Final = coalesce(Members,0);

This replaces missing values with 0.

Error 4 — Zero Books Count

Example:

City_Readers Books_Count = 0

Problem:

Libraries normally contain at least some books.

Possible reason:

Data import failure.

Correction logic may flag such records for manual verification.

Error 5 — Inconsistent City Capitalization

Examples:

hyderabad
Hyderabad
HYDERABAD

Problem:

Inconsistent text formatting.

Correction:

City = propcase(strip(City));

This standardizes city names.

Error 6 — Fraudulent Usage Numbers

Example condition:

Digital_Usage > Members

Problem:

Digital usage cannot exceed total membership.

Correction logic:

Flag suspicious records using Fraud_Flag variable.

Final Dataset Utilization

The library_final dataset can be used for several analyses.

Examples:

Library Performance Analysis

Compare service scores across cities.

Cost Efficiency Analysis

Evaluate operating costs relative to membership size.

Digital Engagement Analysis

Measure digital usage levels.

Fraud Detection

Identify suspicious records.

Strategic Planning

Plan future library expansion based on utilization patterns.

Explanation

This code performs complete data cleaning and fraud detection.

The main steps include:

·  Character standardization

·  Numeric error correction

·  Missing value handling

·  Date variable creation

·  Utilization classification

·  Fraud detection

This integrated program represents a real-world SAS data processing pipeline.

Why This Final Code Is Important

In real analytics environments, SAS programmers must combine multiple tasks in a single workflow:

·  data cleaning

·  validation

·  transformation

·  fraud detection

This final program demonstrates how those steps can be integrated into one process.

18.20 Key Points About This Project

·  The project simulates a public library data management system.

·  A raw dataset with intentional errors was created.

·  Errors included negative values and missing data.

·  Character variables were standardized using PROPCASE and STRIP.

·  Text case conversions were performed using UPCASE and LOWCASE.

·  New identifiers were created using CATX.

·  Numeric validation removed invalid values.

·  Missing membership values were handled using COALESCE.

·  Date variables were generated using MDY.

·  Library age was calculated using INTCK.

·  Future audit dates were scheduled using INTNX.

·  Digital service usage ratios were calculated.

·  Libraries were classified into utilization categories.

·  Fraud detection rules were implemented.

·  Suspicious records were flagged automatically.

·  Multiple datasets were combined using MERGE.

·  Additional records were added using PROC APPEND.

·  Data structure was reshaped using PROC TRANSPOSE.

·  Temporary datasets were removed using PROC DATASETS DELETE.

·  The final dataset supports advanced library analytics.

Project Summary

This project demonstrates how SAS programming can be used to manage and validate operational data in public library systems.

The project began with a raw dataset containing several intentional errors to simulate real-world data quality problems. These included negative numeric values, missing membership counts, inconsistent capitalization, and suspicious digital usage statistics.

Through a series of SAS programs, the dataset was cleaned and transformed using a combination of:

·  character functions

·  numeric validation rules

·  date functions

·  dataset merging

·  macros

·  fraud detection logic

The final dataset provides a reliable foundation for analyzing library operations, evaluating digital engagement, monitoring operational costs, and identifying suspicious reporting patterns.

This type of data pipeline is common in modern analytics environments where data quality and governance are critical for accurate decision-making.

Conclusion

Accurate data is essential for the effective management of public services such as libraries. Poor data quality can lead to incorrect funding decisions, inefficient resource allocation, and misleading performance evaluations.

This project illustrated how advanced SAS programming techniques can be applied to detect and correct errors in operational datasets. By combining data cleaning techniques, date processing functions, character manipulation functions, and automated fraud detection rules, SAS programmers can transform unreliable raw data into a trustworthy analytical dataset.

The approach demonstrated in this project mirrors real-world analytics workflows used by organizations to maintain data integrity and transparency. With proper validation processes and automated monitoring systems, institutions can ensure that their operational data remains accurate, reliable, and suitable for strategic decision-making.

SAS INTERVIEW QUESTIONS

1. What is a Cartesian join?

Answer:
A Cartesian join occurs when every row of one table joins with every row of another table.

2. What is DISTINCT used for?

Answer:
DISTINCT removes duplicate rows from query results in Sql.

3. What is the difference between UNION and OUTER UNION?

Answer:
UNION removes duplicates, while OUTER UNION keeps all rows including duplicates.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent PUBLIC LIBRARY data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:



--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy


Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

420.Can We Detect Errors, Prevent Fraud, And Optimize Biometric Access System Security Using Advanced SAS Programming?

418.Can We Design, Debug, Detect Fraud, And Optimize A Smart Parking System Using Advanced SAS Programming Techniques?