424.Can Advanced SAS Programming Detect Fraud And Correct Data Errors In Public Library Utilization Systems?
Smart Data Validation and Fraud Detection in Public Library Utilization Systems Using Advanced SAS Techniques
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | SET | MERGE | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | INPUT | LENGTH | FORMAT | IF THEN ELSE | MDY | INTCK | INTNX | COALESCE | STRIP | TRIM | CAT | CATX | PROPCASE | UPCASE | LOWCASE | MACRO | MEND
1. Introduction
Public
libraries play an important role in the knowledge ecosystem of cities. They
provide access to books, digital content, community learning spaces, and
research materials. Modern libraries maintain extensive databases that track
information such as the number of books available, member registrations,
digital usage metrics, operating costs, and service performance scores.
However,
like any large-scale information system, library datasets may contain data
quality problems. These issues may arise due to:
- manual data entry mistakes
- system synchronization
errors
- missing values
- duplicated records
- fraud in reporting library
statistics
- incorrect date formats
- inconsistent capitalization
or character formatting
If these
issues are not detected and corrected, they can lead to incorrect decisions
regarding funding allocation, staffing requirements, and service improvements.
SAS
programming provides powerful tools to identify and correct these problems
through:
- data step programming
- character and numeric
functions
- date processing
- dataset merging
- macro automation
- fraud detection logic
In this
project, we simulate a Public Library Management dataset that
intentionally includes multiple errors. Then we use SAS programming techniques
to detect and correct these issues.
The final
goal is to demonstrate how advanced SAS techniques can help maintain accurate,
reliable, and fraud-resistant library data systems.
2. Business Context
Cities
operate multiple public libraries to support community education. Each library
tracks key operational indicators such as:
- number of books available
- number of registered members
- digital service usage
- operational costs
- service quality scores
- membership fees
Government
funding decisions are often based on these metrics.
However,
if data is manipulated or entered incorrectly, it can cause:
- overfunding inefficient
libraries
- underfunding high-performing
libraries
- fraudulent reporting of
library performance
- incorrect strategic planning
Therefore,
maintaining clean, validated datasets is essential for library
management.
SAS
programming is widely used in analytics environments because it provides:
- strong data validation tools
- reliable data
transformations
- automated quality checks
- fraud detection algorithms
This
project demonstrates a simplified simulation of such a system.
3. Table of Contents
- Introduction
- Business Context
- Raw Dataset Creation
- Intentional Error Injection
- Character Data Cleaning
- Numeric Validation
- Date Format Handling
- Utilization Classification
- Fraud Detection Macro
- Dataset Merging
- Dataset Appending
- Dataset Transposing
- Final Dataset Creation
- Dataset Deletion
- Analytical Insights
- Project Summary
- Conclusion
1. Creating Raw Library Dataset With Intentional Errors
data library_raw;
length Library_Name $25 City $20 Utilization_Class $15;
input Library_Name $ City $ Books_Count Members Digital_Usage
Operating_Cost Service_Score Percentage Fees Join_Date :ddmmyy10.;
format Join_Date ddmmyy10.;
datalines;
central_library hyderabad 50000 1500 900 200000 8.5 85 200 12/05/2020
city_library Hyderabad 45000 1400 800 190000 8.0 80 180 15/06/2020
knowledge_hub vizag 30000 900 500 150000 7.2 72 150 10/07/2021
digital_library chennai 40000 1100 700 175000 7.8 78 170 20/08/2021
reading_center Bangalore 35000 1000 600 165000 7.5 75 160 11/09/2021
community_library pune 20000 800 450 120000 6.8 68 120 05/10/2021
metro_library delhi 60000 1800 1200 250000 9.0 90 220 17/11/2022
urban_library kolkata 42000 1300 750 185000 8.1 81 190 22/12/2022
study_zone hyderabad -30000 900 600 160000 7.3 73 160 01/01/2023
smart_library bangalore 38000 1100 700 -170000 7.6 76 170 14/02/2023
city_readers chennai 0 850 550 150000 7.0 70 150 20/03/2023
digital_hub vizag 45000 . 900 200000 8.3 83 200 05/04/2023
;
run;
proc print data=library_raw;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | central_library | hyderabad | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | |
| 2 | city_library | Hyderabad | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | |
| 3 | knowledge_hub | vizag | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | |
| 4 | digital_library | chennai | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | |
| 5 | reading_center | Bangalore | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | |
| 6 | community_library | pune | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | |
| 7 | metro_library | delhi | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | |
| 8 | urban_library | kolkata | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | |
| 9 | study_zone | hyderabad | -30000 | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | |
| 10 | smart_library | bangalore | 38000 | 1100 | 700 | -170000 | 7.6 | 76 | 170 | 14/02/2023 | |
| 11 | city_readers | chennai | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | |
| 12 | digital_hub | vizag | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 |
Explanation
This DATA step creates the raw dataset representing multiple public
libraries across different Indian cities.
The dataset contains the following variables:
Library_Name
City
Books_Count
Members
Digital_Usage
Operating_Cost
Service_Score
Percentage
Fees
Join_Date
However, the dataset intentionally contains multiple data errors.
Examples include:
·
negative values in Books_Count
·
negative operating cost
·
missing values
·
zero book counts
·
inconsistent capitalization in city
names
These errors will later be corrected using SAS programming techniques.
Errors Introduced Intentionally
·
Negative Books_Count (-30000)
·
Negative Operating_Cost (-170000)
·
Missing Members value
·
Zero Books_Count
·
inconsistent city capitalization
·
potential fraud in digital usage
These are typical data quality problems in real systems.
Why This Code Is Important
This code establishes the foundation dataset used
throughout the project.
It simulates real-world scenarios where raw data contains inaccuracies.
Without a raw dataset containing errors, we cannot demonstrate data
cleaning and fraud detection logic.
Key Points
· DATA step creates a new dataset
· LENGTH statement defines character variable
sizes
· INPUT statement reads raw data
· FORMAT applies date display format
· DATALINES loads the data
· Join_Date uses DDMMYY format
· Dataset contains intentional errors
· Missing values represented using
"."
· Negative values simulate data entry mistakes
· Dataset acts as starting point for cleaning
2. Standardizing Character Variables
data library_clean_char;
set library_raw;
Library_Name = propcase(Library_Name);
City = propcase(strip(City));
run;
proc print data=library_clean_char;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | |
| 2 | City_library | Hyderabad | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | |
| 3 | Knowledge_hub | Vizag | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | |
| 4 | Digital_library | Chennai | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | |
| 5 | Reading_center | Bangalore | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | |
| 6 | Community_library | Pune | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | |
| 7 | Metro_library | Delhi | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | |
| 8 | Urban_library | Kolkata | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | |
| 9 | Study_zone | Hyderabad | -30000 | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | |
| 10 | Smart_library | Bangalore | 38000 | 1100 | 700 | -170000 | 7.6 | 76 | 170 | 14/02/2023 | |
| 11 | City_readers | Chennai | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | |
| 12 | Digital_hub | Vizag | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 |
Explanation
Character variables often contain inconsistencies such as:
· extra spaces
· mixed capitalization
· inconsistent naming formats
STRIP
Removes leading and trailing spaces.
PROPCASE
Converts text to proper case.
Example:
hyderabad → Hyderabad
Why It Is Used
Clean character variables ensure consistent grouping and reporting.
For example:
"Hyderabad"
"hyderabad"
would otherwise be treated as two different cities.
Key Points
· SET reads existing dataset
· STRIP removes extra spaces
· PROPCASE standardizes capitalization
· Improves data consistency
· Prevents grouping errors
· Helps in reporting
· Improves data quality
· Supports merging operations
· Removes formatting inconsistencies
· Standardizes textual variables
3. Using Character Functions
data library_char_functions;
set library_clean_char;
City_Upper = upcase(City);
City_Lower = lowcase(City);
Library_Label = catx('-',Library_Name,City);
run;
proc print data=library_char_functions;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | |
| 2 | City_library | Hyderabad | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | |
| 3 | Knowledge_hub | Vizag | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | |
| 4 | Digital_library | Chennai | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | |
| 5 | Reading_center | Bangalore | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | |
| 6 | Community_library | Pune | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | |
| 7 | Metro_library | Delhi | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | |
| 8 | Urban_library | Kolkata | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | |
| 9 | Study_zone | Hyderabad | -30000 | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | |
| 10 | Smart_library | Bangalore | 38000 | 1100 | 700 | -170000 | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | |
| 11 | City_readers | Chennai | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | |
| 12 | Digital_hub | Vizag | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag |
Explanation
This
program demonstrates various character manipulation functions.
UPCASE
converts text to uppercase.
LOWCASE
converts text to lowercase.
CATX
concatenates values with a delimiter.
Example
result:
Central_Library-Hyderabad
These
functions help create new identifiers and standardized variables.
Key Points
· UPCASE converts to uppercase
· LOWCASE converts to lowercase
· CATX concatenates with delimiter
· Helps build unique identifiers
· Useful in reporting systems
· Prevents case-sensitive mismatches
· Supports analytics grouping
· Improves readability
· Creates derived variables
4. Numeric Error Detection
data library_numeric_check;
set library_char_functions;
if Books_Count < 0 then Books_Count = .;
if Operating_Cost < 0 then Operating_Cost = .;
run;
proc print data=library_numeric_check;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | |
| 2 | City_library | Hyderabad | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | |
| 3 | Knowledge_hub | Vizag | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | |
| 4 | Digital_library | Chennai | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | |
| 5 | Reading_center | Bangalore | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | |
| 6 | Community_library | Pune | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | |
| 7 | Metro_library | Delhi | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | |
| 8 | Urban_library | Kolkata | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | |
| 9 | Study_zone | Hyderabad | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | |
| 10 | Smart_library | Bangalore | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | |
| 11 | City_readers | Chennai | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | |
| 12 | Digital_hub | Vizag | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag |
Explanation
Numeric
validation ensures that variables contain logical values.
Books
count cannot be negative.
Operating
cost cannot be negative.
If such
values exist, they are converted to missing values.
Why It Is Used
This
prevents invalid numeric values from affecting calculations.
For
example:
negative
costs would distort financial summaries.
Key Points
· Numeric validation ensures logical values
· Negative values replaced with missing
· Improves data accuracy
· Prevents calculation errors
· Supports reliable analytics
· Handles data entry mistakes
· Ensures realistic business metrics
· Protects financial reporting
· Maintains dataset integrity
· Prepares dataset for further processing
5. Creating Standardized Date Variables Using MDY Function
data library_dates;
set library_numeric_check;
Open_Date = mdy(1,1,2020);
Years_Active = intck('year', Open_Date, Join_Date);
format Open_Date date9.;
run;
proc print data=library_dates;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | |
| 2 | City_library | Hyderabad | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | |
| 3 | Knowledge_hub | Vizag | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | |
| 4 | Digital_library | Chennai | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | |
| 5 | Reading_center | Bangalore | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | |
| 6 | Community_library | Pune | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | |
| 7 | Metro_library | Delhi | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | |
| 8 | Urban_library | Kolkata | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | |
| 9 | Study_zone | Hyderabad | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | |
| 10 | Smart_library | Bangalore | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | |
| 11 | City_readers | Chennai | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | |
| 12 | Digital_hub | Vizag | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 |
Explanation
Dates are extremely important in operational datasets. Libraries track when
members join, when services start, and how long systems have been active.
In this code we create a standard opening date for libraries
and calculate how many years the library has been active.
Functions Used
MDY
Creates a SAS date value using:
MDY(month, day, year)
Example:
mdy(1,1,2020)
creates the date 01JAN2020.
INTCK
Calculates the number of intervals between two dates.
INTCK(interval,start_date,end_date)
Example:
INTCK('year', Open_Date, Join_Date)
This calculates the number of years between the
library opening date and the join date.
Why This Code Is Important
Libraries often analyze service performance based on years of
operation.
For example:
Older libraries may have:
· higher membership
· larger book collections
· higher operating costs
This code enables those
analyses.
Key Points
· MDY creates SAS date values
· Open_Date represents library start date
· INTCK calculates time differences
· Years_Active measures operational duration
· FORMAT improves readability of dates
· Date variables support trend analysis
· Helps compare library performance over time
· Enables lifecycle analytics
· Important for operational reports
· Date functions are widely used in SAS
6. Using INTNX For Future Planning Dates
data library_planning;
set library_dates;
Next_Audit_Date = intnx('year', Join_Date, 1, 'same');
format Next_Audit_Date date9.;
run;
proc print data=library_planning;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active | Next_Audit_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | 12MAY2021 | |
| 2 | City_library | Hyderabad | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | 15JUN2021 | |
| 3 | Knowledge_hub | Vizag | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | 10JUL2022 | |
| 4 | Digital_library | Chennai | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | 20AUG2022 | |
| 5 | Reading_center | Bangalore | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | 11SEP2022 | |
| 6 | Community_library | Pune | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | 05OCT2022 | |
| 7 | Metro_library | Delhi | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | 17NOV2023 | |
| 8 | Urban_library | Kolkata | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | 22DEC2023 | |
| 9 | Study_zone | Hyderabad | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | 01JAN2024 | |
| 10 | Smart_library | Bangalore | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | 14FEB2024 | |
| 11 | City_readers | Chennai | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | 20MAR2024 | |
| 12 | Digital_hub | Vizag | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 | 05APR2024 |
Explanation
Libraries conduct periodic audits of their operations and services.
This code calculates the next audit date, which is
scheduled one year after the join date.
INTNX Function
INTNX moves a date forward or backward by a specified interval.
INTNX(interval,start_date,increment)
Example:
INTNX('year', Join_Date, 1)
This moves the date forward by one year.
Alignment Option
The option 'same' ensures
that the same day and month are preserved.
Why This Code Is Used
Organizations need automated ways to schedule audits and inspections.
Using INTNX allows SAS to automatically generate future operational
dates.
Key Points
· INTNX shifts dates forward or backward
· Used for scheduling future events
· Helps plan library audits
· Maintains consistent scheduling intervals
· Prevents manual date calculations
· Works with months, quarters, and years
· Useful in financial forecasting
· Important for operational planning
· Reduces scheduling errors
· Supports automated reporting systems
7. Utilization Classification
data library_utilization;
set library_planning;
Usage_Ratio = Digital_Usage / Members;
if Usage_Ratio >= 0.6 then Utilization_Class="High";
else if Usage_Ratio >=0.5 then Utilization_Class="Medium";
else Utilization_Class="Low";
run;
proc print data=library_utilization;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active | Next_Audit_Date | Usage_Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | High | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | 12MAY2021 | 0.60000 |
| 2 | City_library | Hyderabad | Medium | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | 15JUN2021 | 0.57143 |
| 3 | Knowledge_hub | Vizag | Medium | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | 10JUL2022 | 0.55556 |
| 4 | Digital_library | Chennai | High | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | 20AUG2022 | 0.63636 |
| 5 | Reading_center | Bangalore | High | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | 11SEP2022 | 0.60000 |
| 6 | Community_library | Pune | Medium | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | 05OCT2022 | 0.56250 |
| 7 | Metro_library | Delhi | High | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | 17NOV2023 | 0.66667 |
| 8 | Urban_library | Kolkata | Medium | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | 22DEC2023 | 0.57692 |
| 9 | Study_zone | Hyderabad | High | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | 01JAN2024 | 0.66667 |
| 10 | Smart_library | Bangalore | High | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | 14FEB2024 | 0.63636 |
| 11 | City_readers | Chennai | High | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | 20MAR2024 | 0.64706 |
| 12 | Digital_hub | Vizag | Low | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 | 05APR2024 | . |
Explanation
Library managers often measure how actively members use digital services.
We calculate a usage ratio.
Usage_Ratio = Digital_Usage / Members
This measures how many
members actively use digital services.
Classification Logic
|
Ratio |
Classification |
|
>=0.6 |
High |
|
>=0.5 |
Medium |
|
<0.4 |
Low |
Why This Is Important
This classification helps libraries understand how effectively digital resources
are being used.
Libraries with low utilization may require:
· digital awareness campaigns
· training programs
· improved infrastructure
Key Points
· Usage_Ratio measures digital activity
· IF-THEN logic creates classification
· High utilization indicates strong digital
engagement
· Medium indicates moderate usage
· Low indicates underutilization
· Helps improve digital strategy
· Supports performance benchmarking
· Enables service optimization
· Helps management decision-making
8. Fraud Detection Logic
data library_fraud_flag;
set library_utilization;
Fraud_Flag = 0;
if Digital_Usage > Members then Fraud_Flag = 1;
if Books_Count > 50000 then Fraud_Flag = 1;
if Percentage > 85 then Fraud_Flag = 1;
run;
proc print data=library_fraud_flag;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active | Next_Audit_Date | Usage_Ratio | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | High | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | 12MAY2021 | 0.60000 | 0 |
| 2 | City_library | Hyderabad | Medium | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | 15JUN2021 | 0.57143 | 0 |
| 3 | Knowledge_hub | Vizag | Medium | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | 10JUL2022 | 0.55556 | 0 |
| 4 | Digital_library | Chennai | High | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | 20AUG2022 | 0.63636 | 0 |
| 5 | Reading_center | Bangalore | High | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | 11SEP2022 | 0.60000 | 0 |
| 6 | Community_library | Pune | Medium | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | 05OCT2022 | 0.56250 | 0 |
| 7 | Metro_library | Delhi | High | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | 17NOV2023 | 0.66667 | 1 |
| 8 | Urban_library | Kolkata | Medium | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | 22DEC2023 | 0.57692 | 0 |
| 9 | Study_zone | Hyderabad | High | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | 01JAN2024 | 0.66667 | 0 |
| 10 | Smart_library | Bangalore | High | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | 14FEB2024 | 0.63636 | 0 |
| 11 | City_readers | Chennai | High | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | 20MAR2024 | 0.64706 | 0 |
| 12 | Digital_hub | Vizag | Low | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 | 05APR2024 | . | 1 |
Explanation
Some libraries may manipulate statistics to obtain additional funding.
Examples include:
· reporting more digital usage than members
· inflating book counts
· exaggerating service scores
This code identifies
suspicious records.
Fraud Conditions
Condition 1
Digital_Usage >
Members
This is logically impossible.
Condition 2
Books_Count > 50000
This may indicate inflated inventory numbers.
Condition 3
Percentage > 100
Fraud_Flag
0 = Normal record
1 = Suspicious record
Why This Code Is Important
Fraud detection is a critical part of modern data analytics systems.
Automated detection helps organizations:
· prevent manipulation
· ensure fair funding
· maintain transparency
Key Points
· Fraud_Flag identifies suspicious records
· Logical validation ensures data integrity
· Impossible values trigger flags
· Helps auditors investigate anomalies
· Prevents statistical manipulation
· Supports regulatory compliance
· Enables automated monitoring
· Improves governance
· Protects financial resources
· Strengthens data reliability
9. Creating Fraud Detection Macro
%macro fraud_check(data=);
data fraud_result;
set &data;
length Fraud_Level $10.;
Fraud_Level="Normal";
if Fraud_Flag=1 then Fraud_Level="High Risk";
run;
proc print data=fraud_result;
run;
%mend;
%fraud_check(data=library_fraud_flag);
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active | Next_Audit_Date | Usage_Ratio | Fraud_Flag | Fraud_Level |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | High | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | 12MAY2021 | 0.60000 | 0 | Normal |
| 2 | City_library | Hyderabad | Medium | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | 15JUN2021 | 0.57143 | 0 | Normal |
| 3 | Knowledge_hub | Vizag | Medium | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | 10JUL2022 | 0.55556 | 0 | Normal |
| 4 | Digital_library | Chennai | High | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | 20AUG2022 | 0.63636 | 0 | Normal |
| 5 | Reading_center | Bangalore | High | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | 11SEP2022 | 0.60000 | 0 | Normal |
| 6 | Community_library | Pune | Medium | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | 05OCT2022 | 0.56250 | 0 | Normal |
| 7 | Metro_library | Delhi | High | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | 17NOV2023 | 0.66667 | 1 | High Risk |
| 8 | Urban_library | Kolkata | Medium | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | 22DEC2023 | 0.57692 | 0 | Normal |
| 9 | Study_zone | Hyderabad | High | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | 01JAN2024 | 0.66667 | 0 | Normal |
| 10 | Smart_library | Bangalore | High | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | 14FEB2024 | 0.63636 | 0 | Normal |
| 11 | City_readers | Chennai | High | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | 20MAR2024 | 0.64706 | 0 | Normal |
| 12 | Digital_hub | Vizag | Low | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 | 05APR2024 | . | 1 | High Risk |
Explanation
Macros allow SAS programmers to automate repetitive tasks.
Instead of writing fraud detection code repeatedly, we define a reusable
macro.
Macro Components
%MACRO
Defines the macro.
&data
Macro variable representing input dataset.
%MEND
Ends the macro definition.
Why Macros Are Important
Macros are heavily used in enterprise SAS environments
because they:
· automate workflows
· reduce code duplication
· improve efficiency
Key
Points
· Macros automate SAS programs
· %MACRO defines a reusable program
· &data passes dataset names
· %MEND ends macro definition
· Reduces repetitive coding
· Improves scalability
· Used heavily in production systems
· Enhances maintainability
· Supports parameterized programs
· Essential SAS programming skill
10. Creating Additional Library Dataset
data library_finance;
length Library_Name $25 Maintenance_Cost 8 Staff_Count 8;
input Library_Name $ Maintenance_Cost Staff_Count;
datalines;
Central_Library 50000 45
City_Library 45000 40
Knowledge_Hub 30000 30
Digital_Library 38000 35
Reading_Center 35000 32
Community_Library 20000 20
Metro_Library 60000 50
Urban_Library 42000 38
Study_Zone 31000 25
Smart_Library 39000 33
City_Readers 27000 22
Digital_Hub 41000 36
;
run;
proc print data=library_finance;
run;
OUTPUT:
| Obs | Library_Name | Maintenance_Cost | Staff_Count |
|---|---|---|---|
| 1 | Central_Library | 50000 | 45 |
| 2 | City_Library | 45000 | 40 |
| 3 | Knowledge_Hub | 30000 | 30 |
| 4 | Digital_Library | 38000 | 35 |
| 5 | Reading_Center | 35000 | 32 |
| 6 | Community_Library | 20000 | 20 |
| 7 | Metro_Library | 60000 | 50 |
| 8 | Urban_Library | 42000 | 38 |
| 9 | Study_Zone | 31000 | 25 |
| 10 | Smart_Library | 39000 | 33 |
| 11 | City_Readers | 27000 | 22 |
| 12 | Digital_Hub | 41000 | 36 |
Explanation
This code
creates a second dataset representing financial information about libraries.
Variables
include:
Library_Name
Maintenance_Cost
Staff_Count
This
dataset simulates information coming from a financial management system.
In real
organizations, operational data and financial data are usually stored in separate
databases.
Why This Code Is Important
To
perform meaningful analysis, data from multiple systems must be combined.
This
dataset will later be merged with the operational library dataset.
Key Points
· DATA step creates a new dataset
· LENGTH defines variable attributes
· INPUT reads financial data
· DATALINES loads observations
· Dataset represents financial metrics
· Simulates external data source
· Contains maintenance costs
· Includes staff counts
· Prepares dataset for merging
· Demonstrates multi-source data environment
11. Combining Datasets Using MERGE
proc sort data=fraud_result;
by Library_Name;
run;
proc print data=fraud_result;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active | Next_Audit_Date | Usage_Ratio | Fraud_Flag | Fraud_Level |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | High | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | 12MAY2021 | 0.60000 | 0 | Normal |
| 2 | City_library | Hyderabad | Medium | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | 15JUN2021 | 0.57143 | 0 | Normal |
| 3 | City_readers | Chennai | High | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | 20MAR2024 | 0.64706 | 0 | Normal |
| 4 | Community_library | Pune | Medium | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | 05OCT2022 | 0.56250 | 0 | Normal |
| 5 | Digital_hub | Vizag | Low | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 | 05APR2024 | . | 1 | High Risk |
| 6 | Digital_library | Chennai | High | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | 20AUG2022 | 0.63636 | 0 | Normal |
| 7 | Knowledge_hub | Vizag | Medium | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | 10JUL2022 | 0.55556 | 0 | Normal |
| 8 | Metro_library | Delhi | High | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | 17NOV2023 | 0.66667 | 1 | High Risk |
| 9 | Reading_center | Bangalore | High | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | 11SEP2022 | 0.60000 | 0 | Normal |
| 10 | Smart_library | Bangalore | High | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | 14FEB2024 | 0.63636 | 0 | Normal |
| 11 | Study_zone | Hyderabad | High | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | 01JAN2024 | 0.66667 | 0 | Normal |
| 12 | Urban_library | Kolkata | Medium | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | 22DEC2023 | 0.57692 | 0 | Normal |
proc sort data=library_finance;
by Library_Name;
run;
proc print data=library_finance;
run;
OUTPUT:
| Obs | Library_Name | Maintenance_Cost | Staff_Count |
|---|---|---|---|
| 1 | Central_Library | 50000 | 45 |
| 2 | City_Library | 45000 | 40 |
| 3 | City_Readers | 27000 | 22 |
| 4 | Community_Library | 20000 | 20 |
| 5 | Digital_Hub | 41000 | 36 |
| 6 | Digital_Library | 38000 | 35 |
| 7 | Knowledge_Hub | 30000 | 30 |
| 8 | Metro_Library | 60000 | 50 |
| 9 | Reading_Center | 35000 | 32 |
| 10 | Smart_Library | 39000 | 33 |
| 11 | Study_Zone | 31000 | 25 |
| 12 | Urban_Library | 42000 | 38 |
data library_merged;
merge fraud_result library_finance;
by Library_Name;
run;
proc print data=library_merged;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active | Next_Audit_Date | Usage_Ratio | Fraud_Flag | Fraud_Level | Maintenance_Cost | Staff_Count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 50000 | 45 | ||||||
| 2 | Central_library | Hyderabad | High | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | 12MAY2021 | 0.60000 | 0 | Normal | . | . |
| 3 | City_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 45000 | 40 | ||||||
| 4 | City_Readers | . | . | . | . | . | . | . | . | . | . | . | . | . | 27000 | 22 | ||||||
| 5 | City_library | Hyderabad | Medium | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | 15JUN2021 | 0.57143 | 0 | Normal | . | . |
| 6 | City_readers | Chennai | High | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | 20MAR2024 | 0.64706 | 0 | Normal | . | . |
| 7 | Community_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 20000 | 20 | ||||||
| 8 | Community_library | Pune | Medium | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | 05OCT2022 | 0.56250 | 0 | Normal | . | . |
| 9 | Digital_Hub | . | . | . | . | . | . | . | . | . | . | . | . | . | 41000 | 36 | ||||||
| 10 | Digital_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 38000 | 35 | ||||||
| 11 | Digital_hub | Vizag | Low | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 | 05APR2024 | . | 1 | High Risk | . | . |
| 12 | Digital_library | Chennai | High | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | 20AUG2022 | 0.63636 | 0 | Normal | . | . |
| 13 | Knowledge_Hub | . | . | . | . | . | . | . | . | . | . | . | . | . | 30000 | 30 | ||||||
| 14 | Knowledge_hub | Vizag | Medium | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | 10JUL2022 | 0.55556 | 0 | Normal | . | . |
| 15 | Metro_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 60000 | 50 | ||||||
| 16 | Metro_library | Delhi | High | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | 17NOV2023 | 0.66667 | 1 | High Risk | . | . |
| 17 | Reading_Center | . | . | . | . | . | . | . | . | . | . | . | . | . | 35000 | 32 | ||||||
| 18 | Reading_center | Bangalore | High | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | 11SEP2022 | 0.60000 | 0 | Normal | . | . |
| 19 | Smart_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 39000 | 33 | ||||||
| 20 | Smart_library | Bangalore | High | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | 14FEB2024 | 0.63636 | 0 | Normal | . | . |
| 21 | Study_Zone | . | . | . | . | . | . | . | . | . | . | . | . | . | 31000 | 25 | ||||||
| 22 | Study_zone | Hyderabad | High | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | 01JAN2024 | 0.66667 | 0 | Normal | . | . |
| 23 | Urban_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 42000 | 38 | ||||||
| 24 | Urban_library | Kolkata | Medium | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | 22DEC2023 | 0.57692 | 0 | Normal | . | . |
Explanation
The MERGE statement combines datasets based on a common
variable.
Here the common variable is:
Library_Name
Before
merging, both datasets must be sorted by the BY variable.
Why Sorting Is Required
SAS
requires datasets to be sorted before performing a merge using BY statements.
If
sorting is skipped, SAS may produce incorrect results.
Result
The
merged dataset now contains:
Operational
Data
+
Financial Data
This
allows analysts to study relationships between library usage and financial
resources.
Key Points
· PROC SORT organizes datasets
· BY statement defines merge key
· MERGE combines datasets
· Sorting ensures correct matching
· Enables multi-source integration
· Adds financial variables
· Creates richer dataset
· Supports advanced analysis
· Common technique in data engineering
· Critical skill for SAS programmers
12. Using SET Statement
data library_copy;
set library_merged;
run;
proc print data=library_copy;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active | Next_Audit_Date | Usage_Ratio | Fraud_Flag | Fraud_Level | Maintenance_Cost | Staff_Count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 50000 | 45 | ||||||
| 2 | Central_library | Hyderabad | High | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | 12MAY2021 | 0.60000 | 0 | Normal | . | . |
| 3 | City_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 45000 | 40 | ||||||
| 4 | City_Readers | . | . | . | . | . | . | . | . | . | . | . | . | . | 27000 | 22 | ||||||
| 5 | City_library | Hyderabad | Medium | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | 15JUN2021 | 0.57143 | 0 | Normal | . | . |
| 6 | City_readers | Chennai | High | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | 20MAR2024 | 0.64706 | 0 | Normal | . | . |
| 7 | Community_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 20000 | 20 | ||||||
| 8 | Community_library | Pune | Medium | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | 05OCT2022 | 0.56250 | 0 | Normal | . | . |
| 9 | Digital_Hub | . | . | . | . | . | . | . | . | . | . | . | . | . | 41000 | 36 | ||||||
| 10 | Digital_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 38000 | 35 | ||||||
| 11 | Digital_hub | Vizag | Low | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 | 05APR2024 | . | 1 | High Risk | . | . |
| 12 | Digital_library | Chennai | High | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | 20AUG2022 | 0.63636 | 0 | Normal | . | . |
| 13 | Knowledge_Hub | . | . | . | . | . | . | . | . | . | . | . | . | . | 30000 | 30 | ||||||
| 14 | Knowledge_hub | Vizag | Medium | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | 10JUL2022 | 0.55556 | 0 | Normal | . | . |
| 15 | Metro_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 60000 | 50 | ||||||
| 16 | Metro_library | Delhi | High | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | 17NOV2023 | 0.66667 | 1 | High Risk | . | . |
| 17 | Reading_Center | . | . | . | . | . | . | . | . | . | . | . | . | . | 35000 | 32 | ||||||
| 18 | Reading_center | Bangalore | High | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | 11SEP2022 | 0.60000 | 0 | Normal | . | . |
| 19 | Smart_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 39000 | 33 | ||||||
| 20 | Smart_library | Bangalore | High | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | 14FEB2024 | 0.63636 | 0 | Normal | . | . |
| 21 | Study_Zone | . | . | . | . | . | . | . | . | . | . | . | . | . | 31000 | 25 | ||||||
| 22 | Study_zone | Hyderabad | High | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | 01JAN2024 | 0.66667 | 0 | Normal | . | . |
| 23 | Urban_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 42000 | 38 | ||||||
| 24 | Urban_library | Kolkata | Medium | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | 22DEC2023 | 0.57692 | 0 | Normal | . | . |
Explanation
The SET
statement reads observations from an existing dataset and creates a new
dataset.
Here we
create a copy of the merged dataset.
Why This Is Useful
Sometimes
analysts want to:
· preserve the original dataset
· create working copies for transformations
The SET statement allows safe data manipulation.
Key Points
· SET reads existing datasets
· Creates new dataset copies
· Prevents modification of original data
· Useful for data transformations
· Supports sequential data processing
· Allows derived variable creation
· Enables data pipeline design
· Maintains data safety
· Commonly used in SAS workflows
· Fundamental SAS concept
13. Appending New Records
data new_library;
length Library_Name $25 City $20 Utilization_Class $15;
input Library_Name $ City $ Books_Count Members Digital_Usage
Operating_Cost Service_Score Percentage Fees Join_Date :ddmmyy10.;
datalines;
Future_Library Hyderabad 30000 1000 600 150000 7.5 75 150 01/05/2023
;
run;
proc print data=new_library;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Future_Library | Hyderabad | 30000 | 1000 | 600 | 150000 | 7.5 | 75 | 150 | 23131 |
proc append base=library_copy
data=new_library;
run;
proc print data=library_copy;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active | Next_Audit_Date | Usage_Ratio | Fraud_Flag | Fraud_Level | Maintenance_Cost | Staff_Count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 50000 | 45 | ||||||
| 2 | Central_library | Hyderabad | High | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | 12MAY2021 | 0.60000 | 0 | Normal | . | . |
| 3 | City_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 45000 | 40 | ||||||
| 4 | City_Readers | . | . | . | . | . | . | . | . | . | . | . | . | . | 27000 | 22 | ||||||
| 5 | City_library | Hyderabad | Medium | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | 15JUN2021 | 0.57143 | 0 | Normal | . | . |
| 6 | City_readers | Chennai | High | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | 20MAR2024 | 0.64706 | 0 | Normal | . | . |
| 7 | Community_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 20000 | 20 | ||||||
| 8 | Community_library | Pune | Medium | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | 05OCT2022 | 0.56250 | 0 | Normal | . | . |
| 9 | Digital_Hub | . | . | . | . | . | . | . | . | . | . | . | . | . | 41000 | 36 | ||||||
| 10 | Digital_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 38000 | 35 | ||||||
| 11 | Digital_hub | Vizag | Low | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 | 05APR2024 | . | 1 | High Risk | . | . |
| 12 | Digital_library | Chennai | High | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | 20AUG2022 | 0.63636 | 0 | Normal | . | . |
| 13 | Knowledge_Hub | . | . | . | . | . | . | . | . | . | . | . | . | . | 30000 | 30 | ||||||
| 14 | Knowledge_hub | Vizag | Medium | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | 10JUL2022 | 0.55556 | 0 | Normal | . | . |
| 15 | Metro_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 60000 | 50 | ||||||
| 16 | Metro_library | Delhi | High | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | 17NOV2023 | 0.66667 | 1 | High Risk | . | . |
| 17 | Reading_Center | . | . | . | . | . | . | . | . | . | . | . | . | . | 35000 | 32 | ||||||
| 18 | Reading_center | Bangalore | High | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | 11SEP2022 | 0.60000 | 0 | Normal | . | . |
| 19 | Smart_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 39000 | 33 | ||||||
| 20 | Smart_library | Bangalore | High | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | 14FEB2024 | 0.63636 | 0 | Normal | . | . |
| 21 | Study_Zone | . | . | . | . | . | . | . | . | . | . | . | . | . | 31000 | 25 | ||||||
| 22 | Study_zone | Hyderabad | High | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | 01JAN2024 | 0.66667 | 0 | Normal | . | . |
| 23 | Urban_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 42000 | 38 | ||||||
| 24 | Urban_library | Kolkata | Medium | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | 22DEC2023 | 0.57692 | 0 | Normal | . | . |
| 25 | Future_Library | Hyderabad | 30000 | 1000 | 600 | 150000 | 7.5 | 75 | 150 | 01/05/2023 | . | . | . | . | . | . | . |
Sometimes
new data arrives after a dataset is already created.
The PROC
APPEND procedure allows new records to be added to an existing dataset.
Instead
of rebuilding the entire dataset, SAS simply adds the new observations.
Why APPEND Is Important
In
production systems, datasets grow continuously.
APPEND
provides a fast and efficient way to add new records.
Key Points
· PROC APPEND adds records to dataset
· BASE specifies existing dataset
· DATA specifies new dataset
· Faster than rebuilding dataset
· Efficient for large datasets
· Used in ETL pipelines
· Supports incremental data loading
· Prevents unnecessary recomputation
· Widely used in production systems
· Important SAS procedure
14. Using COALESCE Function
data library_coalesce;
set library_copy;
Members_Final = coalesce(Members,0);
run;
proc print data=library_coalesce;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Open_Date | Years_Active | Next_Audit_Date | Usage_Ratio | Fraud_Flag | Fraud_Level | Maintenance_Cost | Staff_Count | Members_Final |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 50000 | 45 | 0 | ||||||
| 2 | Central_library | Hyderabad | High | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 01JAN2020 | 0 | 12MAY2021 | 0.60000 | 0 | Normal | . | . | 1500 |
| 3 | City_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 45000 | 40 | 0 | ||||||
| 4 | City_Readers | . | . | . | . | . | . | . | . | . | . | . | . | . | 27000 | 22 | 0 | ||||||
| 5 | City_library | Hyderabad | Medium | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 01JAN2020 | 0 | 15JUN2021 | 0.57143 | 0 | Normal | . | . | 1400 |
| 6 | City_readers | Chennai | High | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 01JAN2020 | 3 | 20MAR2024 | 0.64706 | 0 | Normal | . | . | 850 |
| 7 | Community_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 20000 | 20 | 0 | ||||||
| 8 | Community_library | Pune | Medium | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 01JAN2020 | 1 | 05OCT2022 | 0.56250 | 0 | Normal | . | . | 800 |
| 9 | Digital_Hub | . | . | . | . | . | . | . | . | . | . | . | . | . | 41000 | 36 | 0 | ||||||
| 10 | Digital_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 38000 | 35 | 0 | ||||||
| 11 | Digital_hub | Vizag | Low | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 01JAN2020 | 3 | 05APR2024 | . | 1 | High Risk | . | . | 0 |
| 12 | Digital_library | Chennai | High | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 01JAN2020 | 1 | 20AUG2022 | 0.63636 | 0 | Normal | . | . | 1100 |
| 13 | Knowledge_Hub | . | . | . | . | . | . | . | . | . | . | . | . | . | 30000 | 30 | 0 | ||||||
| 14 | Knowledge_hub | Vizag | Medium | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 01JAN2020 | 1 | 10JUL2022 | 0.55556 | 0 | Normal | . | . | 900 |
| 15 | Metro_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 60000 | 50 | 0 | ||||||
| 16 | Metro_library | Delhi | High | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 01JAN2020 | 2 | 17NOV2023 | 0.66667 | 1 | High Risk | . | . | 1800 |
| 17 | Reading_Center | . | . | . | . | . | . | . | . | . | . | . | . | . | 35000 | 32 | 0 | ||||||
| 18 | Reading_center | Bangalore | High | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 01JAN2020 | 1 | 11SEP2022 | 0.60000 | 0 | Normal | . | . | 1000 |
| 19 | Smart_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 39000 | 33 | 0 | ||||||
| 20 | Smart_library | Bangalore | High | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 01JAN2020 | 3 | 14FEB2024 | 0.63636 | 0 | Normal | . | . | 1100 |
| 21 | Study_Zone | . | . | . | . | . | . | . | . | . | . | . | . | . | 31000 | 25 | 0 | ||||||
| 22 | Study_zone | Hyderabad | High | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 01JAN2020 | 3 | 01JAN2024 | 0.66667 | 0 | Normal | . | . | 900 |
| 23 | Urban_Library | . | . | . | . | . | . | . | . | . | . | . | . | . | 42000 | 38 | 0 | ||||||
| 24 | Urban_library | Kolkata | Medium | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 01JAN2020 | 2 | 22DEC2023 | 0.57692 | 0 | Normal | . | . | 1300 |
| 25 | Future_Library | Hyderabad | 30000 | 1000 | 600 | 150000 | 7.5 | 75 | 150 | 01/05/2023 | . | . | . | . | . | . | . | 1000 |
Explanation
The COALESCE
function replaces missing values with the first non-missing value.
In this
example:
If Members
is missing, it will be replaced with 0.
Why This Is Important
Missing
values can cause problems in calculations.
For
example:
Average
membership calculations may become incorrect.
Replacing
missing values ensures stable analysis results.
Key Points
·
COALESCE handles missing values
·
Returns first non-missing value
·
Improves dataset completeness
·
Prevents calculation errors
·
Useful in large datasets
·
Helps data validation
·
Supports reliable analytics
·
Reduces null value problems
·
Commonly used numeric function
·
Essential for data cleaning
15. Transposing Dataset
proc transpose data=library_coalesce out=library_transpose;
var Books_Count Members Digital_Usage;
run;
proc print data=library_transpose;
run;
OUTPUT:
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 | COL10 | COL11 | COL12 | COL13 | COL14 | COL15 | COL16 | COL17 | COL18 | COL19 | COL20 | COL21 | COL22 | COL23 | COL24 | COL25 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Books_Count | . | 50000 | . | . | 45000 | 0 | . | 20000 | . | . | 45000 | 40000 | . | 30000 | . | 60000 | . | 35000 | . | 38000 | . | . | . | 42000 | 30000 |
| 2 | Members | . | 1500 | . | . | 1400 | 850 | . | 800 | . | . | . | 1100 | . | 900 | . | 1800 | . | 1000 | . | 1100 | . | 900 | . | 1300 | 1000 |
| 3 | Digital_Usage | . | 900 | . | . | 800 | 550 | . | 450 | . | . | 900 | 700 | . | 500 | . | 1200 | . | 600 | . | 700 | . | 600 | . | 750 | 600 |
Explanation
Sometimes
datasets must be restructured for reporting or visualization.
PROC
TRANSPOSE converts rows into columns.
Example
transformation:
|
Variable |
Value |
|
Books_Count |
|
|
Members |
|
|
Digital_Usage |
becomes:
| Column1
| Column2 | Column3 |
Why This Is Important
Transposing
data helps when creating:
·
reports
·
dashboards
·
statistical summaries
Key Points
·
PROC TRANSPOSE restructures datasets
·
Converts rows to columns
·
Useful for reports
·
Helps data visualization
·
Simplifies analysis
·
Supports pivot-style datasets
·
Enables flexible formatting
·
Used in reporting systems
·
Useful for summary tables
·
Important SAS procedure
16. Deleting Temporary Datasets
proc datasets library=work;
delete library_copy library_transpose new_library;
run;
quit;
LOG:
Explanation
During
data processing, many temporary datasets are created.
These
datasets consume memory and storage.
PROC
DATASETS allows programmers to delete unnecessary datasets.
Why This Is Important
Efficient
memory management is essential in large data environments.
Deleting
unused datasets keeps the SAS workspace clean and efficient.
Key Points
·
PROC DATASETS manages datasets
·
DELETE removes datasets
·
WORK library stores temporary datasets
·
Helps memory management
·
Keeps environment clean
·
Improves processing efficiency
·
Reduces clutter
·
Important in production workflows
·
Prevents storage overflow
·
Essential data housekeeping step
17. Complete Corrected SAS Program
data library_final;
set library_raw;
Library_Name = propcase(strip(Library_Name));
City = propcase(strip(City));
City_Upper = upcase(City);
City_Lower = lowcase(City);
Library_Label = catx('-',Library_Name,City);
if Books_Count < 0 then Books_Count = .;
if Operating_Cost < 0 then Operating_Cost = .;
Members_Final = coalesce(Members,0);
Open_Date = mdy(1,1,2020);
Years_Active = intck('year', Open_Date, Join_Date);
Next_Audit_Date = intnx('year', Join_Date, 1, 'same');
Usage_Ratio = Digital_Usage / Members_Final;
if Usage_Ratio >=0.6 then Utilization_Class="High";
else if Usage_Ratio >=0.5 then Utilization_Class="Medium";
else Utilization_Class="Low";
Fraud_Flag = 0;
if Digital_Usage > Members_Final then Fraud_Flag=1;
if Books_Count > 50000 then Fraud_Flag=1;
if Percentage > 85 then Fraud_Flag=1;
format Open_Date Next_Audit_Date date9.;
run;
proc print data=library_final;
run;
OUTPUT:
| Obs | Library_Name | City | Utilization_Class | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Percentage | Fees | Join_Date | City_Upper | City_Lower | Library_Label | Members_Final | Open_Date | Years_Active | Next_Audit_Date | Usage_Ratio | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Central_library | Hyderabad | High | 50000 | 1500 | 900 | 200000 | 8.5 | 85 | 200 | 12/05/2020 | HYDERABAD | hyderabad | Central_library-Hyderabad | 1500 | 01JAN2020 | 0 | 12MAY2021 | 0.60000 | 0 |
| 2 | City_library | Hyderabad | Medium | 45000 | 1400 | 800 | 190000 | 8.0 | 80 | 180 | 15/06/2020 | HYDERABAD | hyderabad | City_library-Hyderabad | 1400 | 01JAN2020 | 0 | 15JUN2021 | 0.57143 | 0 |
| 3 | Knowledge_hub | Vizag | Medium | 30000 | 900 | 500 | 150000 | 7.2 | 72 | 150 | 10/07/2021 | VIZAG | vizag | Knowledge_hub-Vizag | 900 | 01JAN2020 | 1 | 10JUL2022 | 0.55556 | 0 |
| 4 | Digital_library | Chennai | High | 40000 | 1100 | 700 | 175000 | 7.8 | 78 | 170 | 20/08/2021 | CHENNAI | chennai | Digital_library-Chennai | 1100 | 01JAN2020 | 1 | 20AUG2022 | 0.63636 | 0 |
| 5 | Reading_center | Bangalore | High | 35000 | 1000 | 600 | 165000 | 7.5 | 75 | 160 | 11/09/2021 | BANGALORE | bangalore | Reading_center-Bangalore | 1000 | 01JAN2020 | 1 | 11SEP2022 | 0.60000 | 0 |
| 6 | Community_library | Pune | Medium | 20000 | 800 | 450 | 120000 | 6.8 | 68 | 120 | 05/10/2021 | PUNE | pune | Community_library-Pune | 800 | 01JAN2020 | 1 | 05OCT2022 | 0.56250 | 0 |
| 7 | Metro_library | Delhi | High | 60000 | 1800 | 1200 | 250000 | 9.0 | 90 | 220 | 17/11/2022 | DELHI | delhi | Metro_library-Delhi | 1800 | 01JAN2020 | 2 | 17NOV2023 | 0.66667 | 1 |
| 8 | Urban_library | Kolkata | Medium | 42000 | 1300 | 750 | 185000 | 8.1 | 81 | 190 | 22/12/2022 | KOLKATA | kolkata | Urban_library-Kolkata | 1300 | 01JAN2020 | 2 | 22DEC2023 | 0.57692 | 0 |
| 9 | Study_zone | Hyderabad | High | . | 900 | 600 | 160000 | 7.3 | 73 | 160 | 01/01/2023 | HYDERABAD | hyderabad | Study_zone-Hyderabad | 900 | 01JAN2020 | 3 | 01JAN2024 | 0.66667 | 0 |
| 10 | Smart_library | Bangalore | High | 38000 | 1100 | 700 | . | 7.6 | 76 | 170 | 14/02/2023 | BANGALORE | bangalore | Smart_library-Bangalore | 1100 | 01JAN2020 | 3 | 14FEB2024 | 0.63636 | 0 |
| 11 | City_readers | Chennai | High | 0 | 850 | 550 | 150000 | 7.0 | 70 | 150 | 20/03/2023 | CHENNAI | chennai | City_readers-Chennai | 850 | 01JAN2020 | 3 | 20MAR2024 | 0.64706 | 0 |
| 12 | Digital_hub | Vizag | Low | 45000 | . | 900 | 200000 | 8.3 | 83 | 200 | 05/04/2023 | VIZAG | vizag | Digital_hub-Vizag | 0 | 01JAN2020 | 3 | 05APR2024 | . | 1 |
Intentional Errors Introduced In The Raw Dataset
When the raw dataset library_raw was created earlier, several errors
were intentionally inserted to simulate real-world data problems.
Below are the errors.
Error
1 — Negative Books Count
Example:
Study_Zone -30000
Problem:
A library cannot have a negative number of
books.
Reason for error:
Manual data entry mistake or corrupted system
record.
Correction logic:
if Books_Count < 0 then Books_Count = .;
This replaces invalid values with missing values.
Error
2 — Negative Operating Cost
Example:
Smart_Library Operating_Cost = -170000
Problem:
Operating costs cannot be negative.
Possible reason:
Financial system import error.
Correction:
if Operating_Cost < 0 then Operating_Cost = .;
Error
3 — Missing Members Value
Example:
Digital_Hub Members = .
Problem:
Membership information missing.
Correction using COALESCE:
Members_Final = coalesce(Members,0);
This replaces missing values with 0.
Error
4 — Zero Books Count
Example:
City_Readers Books_Count = 0
Problem:
Libraries normally contain at least some
books.
Possible reason:
Data import failure.
Correction logic may flag such records for manual verification.
Error
5 — Inconsistent City Capitalization
Examples:
hyderabadHyderabadHYDERABAD
Problem:
Inconsistent text formatting.
Correction:
City = propcase(strip(City));
This standardizes city names.
Error
6 — Fraudulent Usage Numbers
Example condition:
Digital_Usage > Members
Problem:
Digital usage cannot exceed total membership.
Correction logic:
Flag suspicious records using Fraud_Flag variable.
Final Dataset Utilization
The library_final
dataset can be used for several analyses.
Examples:
Library Performance Analysis
Compare service scores across cities.
Cost Efficiency Analysis
Evaluate operating costs relative to
membership size.
Digital Engagement Analysis
Measure digital usage levels.
Fraud Detection
Identify suspicious records.
Strategic Planning
Plan future library expansion based on utilization patterns.
Explanation
This code performs complete data cleaning and fraud detection.
The main steps include:
·
Character standardization
·
Numeric error correction
·
Missing value handling
·
Date variable creation
·
Utilization classification
·
Fraud detection
This
integrated program represents a real-world SAS data processing pipeline.
Why This Final Code Is
Important
In real
analytics environments, SAS programmers must combine multiple tasks in a single
workflow:
·
data cleaning
·
validation
·
transformation
·
fraud detection
This final program demonstrates how those steps can be integrated into one
process.
·
The project simulates a public library
data management system.
·
A raw dataset with intentional errors
was created.
·
Errors included negative values and
missing data.
·
Character variables were standardized
using PROPCASE and STRIP.
·
Text case conversions were performed
using UPCASE and LOWCASE.
·
New identifiers were created using CATX.
·
Numeric validation removed invalid
values.
·
Missing membership values were handled
using COALESCE.
·
Date variables were generated using MDY.
·
Library age was calculated using INTCK.
·
Future audit dates were scheduled using
INTNX.
·
Digital service usage ratios were
calculated.
·
Libraries were classified into
utilization categories.
·
Fraud detection rules were implemented.
·
Suspicious records were flagged
automatically.
·
Multiple datasets were combined using
MERGE.
·
Additional records were added using PROC
APPEND.
·
Data structure was reshaped using PROC
TRANSPOSE.
·
Temporary datasets were removed using
PROC DATASETS DELETE.
·
The final dataset supports advanced
library analytics.
Project Summary
This project demonstrates how SAS
programming can be used to manage and validate operational data in public
library systems.
The project began with a raw dataset
containing several intentional errors to simulate real-world data quality
problems. These included negative numeric values, missing membership counts,
inconsistent capitalization, and suspicious digital usage statistics.
Through a series of SAS programs, the dataset
was cleaned and transformed using a combination of:
·
character functions
·
numeric validation rules
·
date functions
·
dataset merging
·
macros
·
fraud detection logic
The final dataset provides a reliable foundation for analyzing library
operations, evaluating digital engagement, monitoring operational costs, and
identifying suspicious reporting patterns.
This type of data pipeline is common in modern
analytics environments where data quality and governance are critical for
accurate decision-making.
Conclusion
Accurate data is essential for the effective
management of public services such as libraries. Poor data quality can lead to
incorrect funding decisions, inefficient resource allocation, and misleading
performance evaluations.
This project illustrated how advanced SAS
programming techniques can be applied to detect and correct errors in
operational datasets. By combining data cleaning techniques, date processing
functions, character manipulation functions, and automated fraud detection
rules, SAS programmers can transform unreliable raw data into a trustworthy
analytical dataset.
The approach demonstrated in this project
mirrors real-world analytics workflows used by organizations to maintain data
integrity and transparency. With proper validation processes and automated
monitoring systems, institutions can ensure that their operational data remains
accurate, reliable, and suitable for strategic decision-making.
SAS INTERVIEW QUESTIONS
1.
What is a Cartesian join?
Answer:
A Cartesian join occurs when every row of
one table joins with every row of another table.
2.
What is DISTINCT used for?
Answer:
DISTINCT removes duplicate rows from
query results in Sql.
3.
What is the difference between UNION and OUTER UNION?
Answer:
UNION removes duplicates, while OUTER UNION keeps all rows including duplicates.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent PUBLIC LIBRARY data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment