388.From Booth to Ballot: A Complete SAS Analytics System for Election Monitoring
From Booth to Ballot: A Complete SAS Analytics System for Election Monitoring
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC SQL | PROC PRINT | PROC SGPLOT | MACROS | PROC CORR | PROC MEANS | PROC FREQ | PROC UNIVARIATE | APPEND | PROC SORT | MERGE | PROC DATASETS DELETE | DATA FUNCTIONS
INTRODUCTION
In modern elections, managing voting booths efficiently is very important to ensure fair and smooth voting. Election authorities need to know how many voters are registered, how many actually voted, how much time voters waited, and whether there were any equipment problems at each booth.
This project is created to analyze voting booth data using SAS. The main aim is to understand voter turnout, identify poorly performing booths, detect possible fraud, and support better planning for future elections.
By using different SAS procedures like PROC SQL, PROC MEANS, PROC FREQ, PROC UNIVARIATE, PROC CORR, and PROC SGPLOT, along with macros and date functions, this project shows how real election data can be cleaned, analyzed, and converted into useful business insights for decision making.
The final outcome of this project helps election officials improve voter experience, reduce waiting time, prevent misuse of voting systems, and increase public trust in the election process.
Table Of Contents
- Business Context
- Project Contents
- Dataset Creation
- Data Enrichment
- PROC SQL Analytics
- PROC FREQ / MEANS / UNIVARIATE
- Correlation Analysis
- Visualization (SGPLOT)
- Utilization Classification Macro
- Fraud Detection Macro
- Advanced Date Logic
- Transpose / Merge / Append
- Character & Numeric Functions
- PROC DATASETS Cleanup
- Final Business Interpretation
1. BUSINESS CONTEXT
Election commissions want to monitor voting booth performance to ensure:
- Voter turnout is healthy
- Waiting times are reasonable
- Equipment issues are minimal
- Fraud or abnormal behavior is detected
- Booth utilization is optimized
This project simulates a national election monitoring system where data analysts use SAS to:
- Track turnout trends
- Classify booth utilization
- Detect suspicious booths
- Produce management dashboards
2. PROJECT CONTENTS
Here We Use:
- PROC SQL – data creation & analytics
- PROC MEANS / UNIVARIATE – statistics
- PROC FREQ – categorical analysis
- PROC CORR – relationships
- PROC SGPLOT – visualization
- Macros – utilization & fraud logic
- Date functions – MDY, INTCK, INTNX
- Character functions – STRIP, CATX, PROPCASE
- Numeric functions – ROUND, CEIL, FLOOR
- SET / MERGE / APPEND / TRANSPOSE
- PROC DATASETS DELETE – cleanup
3. DATASET CREATION
data voting_booths;
input Booth_ID $ Region $ Registered_Voters Votes_Cast
Waiting_Time Equipment_Issues Election_Date :date9.;
format Election_Date date9.;
datalines;
B001 North 1200 950 18 2 01JAN2025
B002 North 1500 1450 25 5 01JAN2025
B003 South 900 870 12 1 01JAN2025
B004 South 1100 600 40 7 01JAN2025
B005 East 1300 1250 15 0 01JAN2025
B006 East 1400 900 35 6 01JAN2025
B007 West 1000 980 10 0 01JAN2025
B008 West 1600 800 50 9 01JAN2025
B009 Central 1800 1700 20 2 01JAN2025
B010 Central 2000 1950 22 1 01JAN2025
B011 North 1700 800 55 10 01JAN2025
B012 South 950 930 14 0 01JAN2025
B013 East 1250 400 60 12 01JAN2025
B014 West 1400 1350 18 1 01JAN2025
B015 Central 1600 1580 16 0 01JAN2025
B016 North 1000 990 11 0 01JAN2025
;
run;
proc print data=voting_booths;
run;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date |
|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 |
4. DATA ENRICHMENT
data voting_enriched;
set voting_booths;
Turnout_Rate = round((Votes_Cast/Registered_Voters)*100, 0.01);
Booth_Label = catx('-', strip(Region), Booth_ID);
run;
proc print data=voting_enriched;
run;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label |
|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 |
5. PROC SQL ANALYTICS
proc sql;
create table region_summary as
select Region,
count(*) as Booth_Count,
avg(Turnout_Rate) as Avg_Turnout,
avg(Waiting_Time) as Avg_Wait,
sum(Equipment_Issues) as Total_Issues
from voting_enriched
group by Region;
quit;
proc print data=region_summary;
run;
OUTPUT:
| Obs | Region | Booth_Count | Avg_Turnout | Avg_Wait | Total_Issues |
|---|---|---|---|---|---|
| 1 | Central | 3 | 96.8967 | 19.3333 | 3 |
| 2 | East | 3 | 64.1467 | 36.6667 | 18 |
| 3 | North | 4 | 80.4750 | 27.2500 | 17 |
| 4 | South | 3 | 83.0367 | 22.0000 | 8 |
| 5 | West | 3 | 81.4767 | 26.0000 | 10 |
6. PROC FREQ
proc freq data=voting_enriched;
tables Region Equipment_Issues;
run;
OUTPUT:
The FREQ Procedure
| Region | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| Central | 3 | 18.75 | 3 | 18.75 |
| East | 3 | 18.75 | 6 | 37.50 |
| North | 4 | 25.00 | 10 | 62.50 |
| South | 3 | 18.75 | 13 | 81.25 |
| West | 3 | 18.75 | 16 | 100.00 |
| Equipment_Issues | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| 0 | 5 | 31.25 | 5 | 31.25 |
| 1 | 3 | 18.75 | 8 | 50.00 |
| 2 | 2 | 12.50 | 10 | 62.50 |
| 5 | 1 | 6.25 | 11 | 68.75 |
| 6 | 1 | 6.25 | 12 | 75.00 |
| 7 | 1 | 6.25 | 13 | 81.25 |
| 9 | 1 | 6.25 | 14 | 87.50 |
| 10 | 1 | 6.25 | 15 | 93.75 |
| 12 | 1 | 6.25 | 16 | 100.00 |
7. PROC MEANS
proc means data=voting_enriched mean min max std;
var Votes_Cast Waiting_Time Turnout_Rate;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Minimum | Maximum | Std Dev |
|---|---|---|---|---|
Votes_Cast Waiting_Time Turnout_Rate | 1093.75 26.3125000 81.1606250 | 400.0000000 10.0000000 32.0000000 | 1950.00 60.0000000 99.0000000 | 417.1790183 16.4751075 23.2674446 |
8. PROC UNIVARIATE
proc univariate data=voting_enriched;
var Turnout_Rate;
histogram;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Turnout_Rate
| Moments | |||
|---|---|---|---|
| N | 16 | Sum Weights | 16 |
| Mean | 81.160625 | Sum Observations | 1298.57 |
| Std Deviation | 23.2674446 | Variance | 541.37398 |
| Skewness | -1.03252 | Kurtosis | -0.5143293 |
| Uncorrected SS | 113513.363 | Corrected SS | 8120.60969 |
| Coeff Variation | 28.6683902 | Std Error Mean | 5.81686116 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 81.16063 | Std Deviation | 23.26744 |
| Median | 96.29000 | Variance | 541.37398 |
| Mode | 96.67000 | Range | 67.00000 |
| Interquartile Range | 38.27500 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 13.95265 | Pr > |t| | <.0001 |
| Sign | M | 8 | Pr >= |M| | <.0001 |
| Signed Rank | S | 68 | Pr >= |S| | <.0001 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 99.000 |
| 99% | 99.000 |
| 95% | 99.000 |
| 90% | 98.750 |
| 75% Q3 | 97.695 |
| 50% Median | 96.290 |
| 25% Q1 | 59.420 |
| 10% | 47.060 |
| 5% | 32.000 |
| 1% | 32.000 |
| 0% Min | 32.000 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 32.00 | 13 | 97.50 | 10 |
| 47.06 | 11 | 97.89 | 12 |
| 50.00 | 8 | 98.00 | 7 |
| 54.55 | 4 | 98.75 | 15 |
| 64.29 | 6 | 99.00 | 16 |
The UNIVARIATE Procedure
9. CORRELATION
proc corr data=voting_enriched;
var Waiting_Time Equipment_Issues Turnout_Rate;
run;
OUTPUT:
The CORR Procedure
| 3 Variables: | Waiting_Time Equipment_Issues Turnout_Rate |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Waiting_Time | 16 | 26.31250 | 16.47511 | 421.00000 | 10.00000 | 60.00000 |
| Equipment_Issues | 16 | 3.50000 | 4.08248 | 56.00000 | 0 | 12.00000 |
| Turnout_Rate | 16 | 81.16063 | 23.26744 | 1299 | 32.00000 | 99.00000 |
| Pearson Correlation Coefficients, N = 16 Prob > |r| under H0: Rho=0 | |||
|---|---|---|---|
| Waiting_Time | Equipment_Issues | Turnout_Rate | |
| Waiting_Time | 1.00000 | 0.98078 <.0001 | -0.95566 <.0001 |
| Equipment_Issues | 0.98078 <.0001 | 1.00000 | -0.94863 <.0001 |
| Turnout_Rate | -0.95566 <.0001 | -0.94863 <.0001 | 1.00000 |
10. VISUALIZATION
proc sgplot data=voting_enriched;
scatter x=Waiting_Time y=Turnout_Rate;
run;
OUTPUT:
11. UTILIZATION CLASSIFICATION MACRO
%macro utilization;
data voting_util;
set voting_enriched;
length Utilization $12.;
if Turnout_Rate >= 90 then Utilization='High';
else if Turnout_Rate >= 70 then Utilization='Medium';
else Utilization='Low';
run;
proc print data=voting_util;
run;
%mend;
%utilization;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label | Utilization |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 | Medium |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 | High |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 | High |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 | Low |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 | High |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 | Low |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 | High |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 | Low |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 | High |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 | High |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 | Low |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 | High |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 | Low |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 | High |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 | High |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 | High |
12. FRAUD DETECTION MACRO
%macro fraud_check;
data fraud_flags;
set voting_util;
if Turnout_Rate > 98 and Equipment_Issues > 5 then Fraud_Flag='Yes';
else if Waiting_Time > 50 and Votes_Cast < 500 then Fraud_Flag='Yes';
else Fraud_Flag='No';
run;
proc print data=fraud_flags;
run;
%mend;
%fraud_check;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 | Medium | No |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 | High | No |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 | High | No |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 | Low | No |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 | High | No |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 | Low | No |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 | High | No |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 | Low | No |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 | High | No |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 | High | No |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 | Low | No |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 | High | No |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 | Low | Yes |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 | High | No |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 | High | No |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 | High | No |
13. DATE FUNCTIONS
data election_dates;
set fraud_flags;
Next_Election = intnx('year', Election_Date, 5);
Days_Since = intck('day', Election_Date, today());
run;
proc print data=election_dates;
run;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label | Utilization | Fraud_Flag | Next_Election | Days_Since |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 | Medium | No | 25568 | 396 |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 | High | No | 25568 | 396 |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 | High | No | 25568 | 396 |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 | Low | No | 25568 | 396 |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 | High | No | 25568 | 396 |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 | Low | No | 25568 | 396 |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 | High | No | 25568 | 396 |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 | Low | No | 25568 | 396 |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 | High | No | 25568 | 396 |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 | High | No | 25568 | 396 |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 | Low | No | 25568 | 396 |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 | High | No | 25568 | 396 |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 | Low | Yes | 25568 | 396 |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 | High | No | 25568 | 396 |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 | High | No | 25568 | 396 |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 | High | No | 25568 | 396 |
14. TRANSPOSE
proc transpose data=region_summary out=region_transposed;
by Region NotSorted;
run;
proc print data=region_transposed;
run;
OUTPUT:
| Obs | Region | _NAME_ | COL1 |
|---|---|---|---|
| 1 | Central | Booth_Count | 3.0000 |
| 2 | Central | Avg_Turnout | 96.8967 |
| 3 | Central | Avg_Wait | 19.3333 |
| 4 | Central | Total_Issues | 3.0000 |
| 5 | East | Booth_Count | 3.0000 |
| 6 | East | Avg_Turnout | 64.1467 |
| 7 | East | Avg_Wait | 36.6667 |
| 8 | East | Total_Issues | 18.0000 |
| 9 | North | Booth_Count | 4.0000 |
| 10 | North | Avg_Turnout | 80.4750 |
| 11 | North | Avg_Wait | 27.2500 |
| 12 | North | Total_Issues | 17.0000 |
| 13 | South | Booth_Count | 3.0000 |
| 14 | South | Avg_Turnout | 83.0367 |
| 15 | South | Avg_Wait | 22.0000 |
| 16 | South | Total_Issues | 8.0000 |
| 17 | West | Booth_Count | 3.0000 |
| 18 | West | Avg_Turnout | 81.4767 |
| 19 | West | Avg_Wait | 26.0000 |
| 20 | West | Total_Issues | 10.0000 |
15. MERGE & APPEND
proc sort data=voting_util;by Booth_ID;run;
proc print data=voting_util;
run;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label | Utilization |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 | Medium |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 | High |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 | High |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 | Low |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 | High |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 | Low |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 | High |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 | Low |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 | High |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 | High |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 | Low |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 | High |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 | Low |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 | High |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 | High |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 | High |
proc sort data=fraud_flags;by Booth_ID;run;
proc print data=fraud_flags;
run;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label | Utilization | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 | Medium | No |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 | High | No |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 | High | No |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 | Low | No |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 | High | No |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 | Low | No |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 | High | No |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 | Low | No |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 | High | No |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 | High | No |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 | Low | No |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 | High | No |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 | Low | Yes |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 | High | No |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 | High | No |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 | High | No |
data final_merge;
merge voting_util
fraud_flags;
by Booth_ID;
run;
proc print data=election_dates;
run;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label | Utilization | Fraud_Flag | Next_Election | Days_Since |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 | Medium | No | 25568 | 396 |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 | High | No | 25568 | 396 |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 | High | No | 25568 | 396 |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 | Low | No | 25568 | 396 |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 | High | No | 25568 | 396 |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 | Low | No | 25568 | 396 |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 | High | No | 25568 | 396 |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 | Low | No | 25568 | 396 |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 | High | No | 25568 | 396 |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 | High | No | 25568 | 396 |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 | Low | No | 25568 | 396 |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 | High | No | 25568 | 396 |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 | Low | Yes | 25568 | 396 |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 | High | No | 25568 | 396 |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 | High | No | 25568 | 396 |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 | High | No | 25568 | 396 |
proc append base=voting_booths
data=fraud_flags(where=(Fraud_Flag='Yes')) force;
run;
proc print data=election_dates;
run;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label | Utilization | Fraud_Flag | Next_Election | Days_Since |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 | Medium | No | 25568 | 396 |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 | High | No | 25568 | 396 |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 | High | No | 25568 | 396 |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 | Low | No | 25568 | 396 |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 | High | No | 25568 | 396 |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 | Low | No | 25568 | 396 |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 | High | No | 25568 | 396 |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 | Low | No | 25568 | 396 |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 | High | No | 25568 | 396 |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 | High | No | 25568 | 396 |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 | Low | No | 25568 | 396 |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 | High | No | 25568 | 396 |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 | Low | Yes | 25568 | 396 |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 | High | No | 25568 | 396 |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 | High | No | 25568 | 396 |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 | High | No | 25568 | 396 |
16. CHARACTER FUNCTIONS
data char_demo;
set final_merge;
Region_Upper = upcase(Region);
Region_Lower = lowcase(Region);
Region_Proper = propcase(Region);
run;
proc print data=char_demo;
run;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label | Utilization | Fraud_Flag | Region_Upper | Region_Lower | Region_Proper |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 | Medium | No | NORTH | north | North |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 | High | No | NORTH | north | North |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 | High | No | SOUTH | south | South |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 | Low | No | SOUTH | south | South |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 | High | No | EAST | east | East |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 | Low | No | EAST | east | East |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 | High | No | WEST | west | West |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 | Low | No | WEST | west | West |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 | High | No | CENTRAL | central | Central |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 | High | No | CENTRAL | central | Central |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 | Low | No | NORTH | north | North |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 | High | No | SOUTH | south | South |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 | Low | Yes | EAST | east | East |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 | High | No | WEST | west | West |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 | High | No | CENTRAL | central | Central |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 | High | No | NORTH | north | North |
17. NUMERIC FUNCTIONS
data numeric_demo;
set final_merge;
Wait_Hours = round(Waiting_Time/60,0.01);
Issues_Ceil = ceil(Equipment_Issues/2);
run;
proc print data=numeric_demo;
run;
OUTPUT:
| Obs | Booth_ID | Region | Registered_Voters | Votes_Cast | Waiting_Time | Equipment_Issues | Election_Date | Turnout_Rate | Booth_Label | Utilization | Fraud_Flag | Wait_Hours | Issues_Ceil |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | B001 | North | 1200 | 950 | 18 | 2 | 01JAN2025 | 79.17 | North-B001 | Medium | No | 0.30 | 1 |
| 2 | B002 | North | 1500 | 1450 | 25 | 5 | 01JAN2025 | 96.67 | North-B002 | High | No | 0.42 | 3 |
| 3 | B003 | South | 900 | 870 | 12 | 1 | 01JAN2025 | 96.67 | South-B003 | High | No | 0.20 | 1 |
| 4 | B004 | South | 1100 | 600 | 40 | 7 | 01JAN2025 | 54.55 | South-B004 | Low | No | 0.67 | 4 |
| 5 | B005 | East | 1300 | 1250 | 15 | 0 | 01JAN2025 | 96.15 | East-B005 | High | No | 0.25 | 0 |
| 6 | B006 | East | 1400 | 900 | 35 | 6 | 01JAN2025 | 64.29 | East-B006 | Low | No | 0.58 | 3 |
| 7 | B007 | West | 1000 | 980 | 10 | 0 | 01JAN2025 | 98.00 | West-B007 | High | No | 0.17 | 0 |
| 8 | B008 | West | 1600 | 800 | 50 | 9 | 01JAN2025 | 50.00 | West-B008 | Low | No | 0.83 | 5 |
| 9 | B009 | Central | 1800 | 1700 | 20 | 2 | 01JAN2025 | 94.44 | Central-B009 | High | No | 0.33 | 1 |
| 10 | B010 | Central | 2000 | 1950 | 22 | 1 | 01JAN2025 | 97.50 | Central-B010 | High | No | 0.37 | 1 |
| 11 | B011 | North | 1700 | 800 | 55 | 10 | 01JAN2025 | 47.06 | North-B011 | Low | No | 0.92 | 5 |
| 12 | B012 | South | 950 | 930 | 14 | 0 | 01JAN2025 | 97.89 | South-B012 | High | No | 0.23 | 0 |
| 13 | B013 | East | 1250 | 400 | 60 | 12 | 01JAN2025 | 32.00 | East-B013 | Low | Yes | 1.00 | 6 |
| 14 | B014 | West | 1400 | 1350 | 18 | 1 | 01JAN2025 | 96.43 | West-B014 | High | No | 0.30 | 1 |
| 15 | B015 | Central | 1600 | 1580 | 16 | 0 | 01JAN2025 | 98.75 | Central-B015 | High | No | 0.27 | 0 |
| 16 | B016 | North | 1000 | 990 | 11 | 0 | 01JAN2025 | 99.00 | North-B016 | High | No | 0.18 | 0 |
18. PROC DATASETS DELETE
proc datasets library=work nolist;
delete char_demo numeric_demo;
quit;
LOG:
FINAL BUSINESS INTERPRETATION
This system allows election authorities to:
1. Identify high performing booths
2. Detect fraud-prone locations
3. Optimize staff allocation
4. Reduce waiting time
5. Improve voter satisfaction
6. Schedule future elections
7. Generate automated dashboards
THIS PROJECT COVERED
This single project demonstrates:
Skill | Covered |
Data Creation | YES |
SQL Analytics | YES |
Statistics | YES |
Visualization | YES |
Macros | YES |
Fraud Logic | YES |
Date Handling | YES |
Cleaning | YES |
Reporting | YES |
Real Business Use | YES |
CONCLUSION
This project successfully demonstrates how voting booth data can be analyzed using SAS to improve election management. By studying variables like registered voters, votes cast, waiting time, equipment issues, and turnout rate, we can clearly understand how each booth is performing.
Using different SAS procedures such as PROC SQL, PROC MEANS, PROC FREQ, PROC UNIVARIATE, PROC CORR, and PROC SGPLOT, the data was converted into meaningful reports and visualizations. Macros were used to classify booth utilization and to detect possible fraud, which helps in identifying risky or underperforming booths.
Overall, this project shows that SAS is a powerful tool for election analytics. It helps authorities take better decisions, reduce operational problems, improve voter satisfaction, and ensure a fair and transparent election system.
INTERVIEW QUESTIONS FOR YOU
· What is the difference between PROC MEANS and PROC SUMMARY?
· What is PROC FREQ used for?
· What is PROC UNIVARIATE?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent Election data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics and smart cities
· EV and energy industry professionals
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment