428.How Do SAS and R Complement Each Other in Detecting, Cleaning, and Transforming Complex Sensor Fusion Vehicle Data?
Advanced SAS and R Integration for Intelligent Error Detection and Transformation in Sensor Fusion Vehicle Data
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SAS STATEMENTS USED
DATA STEP |
LENGTH | INPUT | DATALINES | SET | MERGE | PROC SORT | PROC PRINT | PROC
CONTENTS | PROC FREQ | PROC TRANSPOSE | PROC APPEND | PROC DATASETS DELETE |
MACRO / %MACRO / %MEND | NUMERIC FUNCTIONS | CHARACTER FUNCTIONS
R EQUIVALENT STATEMENTS USED
data.frame()
/ tibble() | col_types / structure() | scan() / read.table() | inline data
creation (c(), data.frame()) | rbind() / bind_rows() | merge() / dplyr::join
functions | order() / dplyr::arrange() | print() / View() | str() / glimpse() |
table() / count() | t() / pivot_longer() / pivot_wider() | bind_rows() | rm() |
functions() | numeric functions (mean(), sum(), max()) | character functions
(trimws(), toupper(), tolower(), paste(), paste0())
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Introduction
Sensor
fusion is a critical component in modern autonomous and semi-autonomous
vehicles. It integrates data from multiple sensors such as LiDAR, Radar, and
Cameras to make real-time driving decisions. However, real-world sensor data is
noisy, inconsistent, and error-prone.
In this
project, we will:
- Create a Sensor Fusion
dataset
- Introduce intentional
errors
- Detect and fix errors using SAS
programming
- Apply data engineering
techniques
- Use macros for fraud
detection
- Perform transformations
using:
- SET, MERGE, APPEND
- PROC TRANSPOSE
- Date functions (MDY, INTCK,
INTNX)
- Character & numeric
functions
- Mirror each step in R
Table of Contents
- Business Context
- Dataset Design
- Raw Data Creation (with intentional
errors)
- Error Identification
- Data Cleaning
- Derived Variables
- Date Handling
- Utilization Classification
- Dataset Merging
- Appending Data
- Transposing Data
- Fraud Detection Macro
- Functions Usage
- Dataset Cleanup
- Final Dataset
- SAS vs R Key Points
- Summary
- Conclusion
Business Context
Automotive
companies collect millions of sensor records daily. Problems include:
- Incorrect accuracy values
(>100%)
- Missing sensor readings
- Invalid dates
- Fraudulent manipulation of
performance metrics
Goal:
Build a robust data pipeline that detects and fixes these issues.
1. Create Raw Dataset (With Intentional Errors)
SAS Code
data sensor_raw;
length Vehicle_ID $10 Date $12;
input Vehicle_ID $ Lidar_Accuracy Radar_Accuracy Camera_Confidence
Detection_Latency Obstacle_Error_Rate Fees Utilization Date $;
datalines;
V001 98 95 0.89 120 2.1 500 85 01-15-2024
V002 105 88 0.92 140 1.5 450 90 02-30-2024
V003 97 . 0.85 110 3.2 600 80 03-10-2024
V004 96 92 1.2 130 2.5 700 88 04-12-2024
V005 94 90 0.88 -10 2.0 550 75 05-20-2024
;
run;
LOG:
R Code
Errors Introduced
·
Lidar_Accuracy > 100 (invalid)
·
Radar_Accuracy missing
·
Camera_Confidence > 1
·
Negative latency
·
Invalid date (Feb 30)
2. Detect Errors
SAS Code
proc print data=sensor_raw;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization |
|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 |
| 2 | V002 | 02-30-2024 | 105 | 88 | 0.92 | 140 | 1.5 | 450 | 90 |
| 3 | V003 | 03-10-2024 | 97 | . | 0.85 | 110 | 3.2 | 600 | 80 |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.20 | 130 | 2.5 | 700 | 88 |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | -10 | 2.0 | 550 | 75 |
R Code
Why Used
·
Initial inspection
·
Helps identify anomalies
3. Fix Numeric Errors
SAS Code
data sensor_clean1;
set sensor_raw;
if Lidar_Accuracy > 100 then Lidar_Accuracy = 100;
if Camera_Confidence > 1 then Camera_Confidence = 1;
if Detection_Latency < 0 then Detection_Latency = .;
if Radar_Accuracy = . then Radar_Accuracy = 85;
run;
proc print data=sensor_clean1;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization |
|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 |
R Code
Key Points
·
Data correction logic
·
Handling missing values
·
Ensures realistic ranges
4. Date Conversion (MDY)
SAS Code
data sensor_clean2;
set sensor_clean1;
Date_Converted = input(Date, mmddyy10.);
format Date_Converted date9.;
run;
proc print data=sensor_clean2;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 |
R Code
Why Important
·
Converts character to date
·
Required for time analysis
5. INTCK & INTNX
SAS Code
data sensor_dates;
set sensor_clean2;
Today = today();
Days_Diff = intck('day', Date_Converted, Today);
Next_Month = intnx('month', Date_Converted, 1);
format Next_Month date9.;
run;
proc print data=sensor_dates;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 |
R Code
Explanation
·
%m+% months(1) safely adds one month
·
Handles:
·
month-end dates
·
leap years
·
invalid rollover
· INTCK → difference
·
INTNX → shifting dates
6. Character Functions
SAS Code
data sensor_char;
set sensor_dates;
Vehicle_ID_Clean = strip(Vehicle_ID);
Vehicle_Upper = upcase(Vehicle_ID);
Vehicle_Proper = propcase(Vehicle_ID);
Combined = catx('-', Vehicle_ID, put(Lidar_Accuracy, 3.));
run;
proc print data=sensor_char;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month | Vehicle_ID_Clean | Vehicle_Upper | Vehicle_Proper | Combined |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 |
R Code
7. Utilization Classification
SAS Code
data sensor_class;
set sensor_char;
length Util_Class $8.;
if Utilization >= 90 then Util_Class="High";
else if Utilization >= 80 then Util_Class="Medium";
else Util_Class="Low";
run;
proc print data=sensor_class;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month | Vehicle_ID_Clean | Vehicle_Upper | Vehicle_Proper | Combined | Util_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 | Medium |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 | High |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 | Medium |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 | Medium |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 | Low |
R Code
8. MERGE Example
SAS Code
data extra;
input Vehicle_ID $ Fees_New;
datalines;
V001 550
V002 480
;
run;
proc print data=extra;
run;
OUTPUT:
| Obs | Vehicle_ID | Fees_New |
|---|---|---|
| 1 | V001 | 550 |
| 2 | V002 | 480 |
proc sort data=sensor_class;by Vehicle_ID;run;
proc print data=sensor_class;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month | Vehicle_ID_Clean | Vehicle_Upper | Vehicle_Proper | Combined | Util_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 | Medium |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 | High |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 | Medium |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 | Medium |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 | Low |
proc sort data=extra;by Vehicle_ID;run;
proc print data=extra;
run;
OUTPUT:
| Obs | Vehicle_ID | Fees_New |
|---|---|---|
| 1 | V001 | 550 |
| 2 | V002 | 480 |
data merged;
merge sensor_class
extra;
by Vehicle_ID;
run;
proc print data=merged;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month | Vehicle_ID_Clean | Vehicle_Upper | Vehicle_Proper | Combined | Util_Class | Fees_New |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 | Medium | 550 |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 | High | 480 |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 | Medium | . |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 | Medium | . |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 | Low | . |
R Code
9. SET Statement
SAS Code
data combined;
set sensor_class
merged;
run;
proc print data=combined;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month | Vehicle_ID_Clean | Vehicle_Upper | Vehicle_Proper | Combined | Util_Class | Fees_New |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 | Medium | . |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 | High | . |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 | Medium | . |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 | Medium | . |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 | Low | . |
| 6 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 | Medium | 550 |
| 7 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 | High | 480 |
| 8 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 | Medium | . |
| 9 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 | Medium | . |
| 10 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 | Low | . |
R Code
Explanation
·
Makes both datasets structurally identical
·
Satisfies rbind( ) requirements
10. APPEND Statement
SAS Code
proc append base=sensor_class
data=extra force;
run;
proc print data=sensor_class;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month | Vehicle_ID_Clean | Vehicle_Upper | Vehicle_Proper | Combined | Util_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 | Medium |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 | High |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 | Medium |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 | Medium |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 | Low |
| 6 | V001 | . | . | . | . | . | . | . | . | . | . | . | ||||||
| 7 | V002 | . | . | . | . | . | . | . | . | . | . | . |
R Code
Explanation
·
Matches columns by name
·
Adds missing columns
·
Fills with NA
11. TRANSPOSE
SAS Code
proc transpose data=sensor_class out=transposed;
var Lidar_Accuracy Radar_Accuracy;
run;
proc print data=transposed;
run;
OUTPUT:
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 |
|---|---|---|---|---|---|---|---|---|
| 1 | Lidar_Accuracy | 98 | 100 | 97 | 96 | 94 | . | . |
| 2 | Radar_Accuracy | 95 | 88 | 85 | 92 | 90 | . | . |
R Code
12. Fraud Detection Macro
SAS Code
%macro fraud_check;
data fraud_flag;
set sensor_class;
if Lidar_Accuracy = 100 and Radar_Accuracy < 50 then Fraud="Yes";
else Fraud="No";
run;
proc print data=fraud_flag;
run;
%mend;
%fraud_check;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month | Vehicle_ID_Clean | Vehicle_Upper | Vehicle_Proper | Combined | Util_Class | Fraud |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 | Medium | No |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 | High | No |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 | Medium | No |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 | Medium | No |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 | Low | No |
| 6 | V001 | . | . | . | . | . | . | . | . | . | . | . | No | ||||||
| 7 | V002 | . | . | . | . | . | . | . | . | . | . | . | No |
R Code
13. Numeric Functions
SAS Code
data numeric_calc;
set sensor_class;
Avg_Accuracy = mean(Lidar_Accuracy, Radar_Accuracy);
Max_Acc = max(Lidar_Accuracy, Radar_Accuracy);
run;
proc print data=numeric_calc;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month | Vehicle_ID_Clean | Vehicle_Upper | Vehicle_Proper | Combined | Util_Class | Avg_Accuracy | Max_Acc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 | Medium | 96.5 | 98 |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 | High | 94.0 | 100 |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 | Medium | 91.0 | 97 |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 | Medium | 94.0 | 96 |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 | Low | 92.0 | 94 |
| 6 | V001 | . | . | . | . | . | . | . | . | . | . | . | . | . | ||||||
| 7 | V002 | . | . | . | . | . | . | . | . | . | . | . | . | . |
R Code
14. PROC DATASETS DELETE
SAS Code
proc datasets library=work;
delete sensor_raw extra;
quit;
LOG:
R Code
15. Final Corrected Dataset (Full Code)
SAS Code
data final_dataset;
set sensor_class;
if Lidar_Accuracy > 100 then Lidar_Accuracy = 100;
if Camera_Confidence > 1 then Camera_Confidence = 1;
Date_Converted = input(Date, mmddyy10.);
format Date_Converted date9.;
Avg_Accuracy = mean(Lidar_Accuracy, Radar_Accuracy);
if Utilization >= 90 then Util_Class="High";
else if Utilization >= 80 then Util_Class="Medium";
else Util_Class="Low";
run;
proc print data=final_dataset;
run;
OUTPUT:
| Obs | Vehicle_ID | Date | Lidar_Accuracy | Radar_Accuracy | Camera_Confidence | Detection_Latency | Obstacle_Error_Rate | Fees | Utilization | Date_Converted | Today | Days_Diff | Next_Month | Vehicle_ID_Clean | Vehicle_Upper | Vehicle_Proper | Combined | Util_Class | Avg_Accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | V001 | 01-15-2024 | 98 | 95 | 0.89 | 120 | 2.1 | 500 | 85 | 15JAN2024 | 24186 | 796 | 01FEB2024 | V001 | V001 | V001 | V001-98 | Medium | 96.5 |
| 2 | V002 | 02-30-2024 | 100 | 88 | 0.92 | 140 | 1.5 | 450 | 90 | . | 24186 | . | . | V002 | V002 | V002 | V002-100 | High | 94.0 |
| 3 | V003 | 03-10-2024 | 97 | 85 | 0.85 | 110 | 3.2 | 600 | 80 | 10MAR2024 | 24186 | 741 | 01APR2024 | V003 | V003 | V003 | V003-97 | Medium | 91.0 |
| 4 | V004 | 04-12-2024 | 96 | 92 | 1.00 | 130 | 2.5 | 700 | 88 | 12APR2024 | 24186 | 708 | 01MAY2024 | V004 | V004 | V004 | V004-96 | Medium | 94.0 |
| 5 | V005 | 05-20-2024 | 94 | 90 | 0.88 | . | 2.0 | 550 | 75 | 20MAY2024 | 24186 | 670 | 01JUN2024 | V005 | V005 | V005 | V005-94 | Low | 92.0 |
| 6 | V001 | . | . | . | . | . | . | . | . | . | . | . | Low | . | |||||
| 7 | V002 | . | . | . | . | . | . | . | . | . | . | . | Low | . |
R Code
SAS vs R — 15 Key Points
1. Data Ingestion
·
SAS:
DATA step,
INFILE,
PROC IMPORT
provide structured ingestion with strong typing
·
R: read.csv( ), readr, data.table::fread( )
offer flexible but less strict ingestion
2. Schema Enforcement
·
SAS:
Fixed metadata (length, type) ensures consistency
·
R:
Dynamic typing; schema inconsistencies may silently propagate
3. Handling Missing Values
·
SAS:
Missing numeric = . handled uniformly across procedures
·
R:
Uses NA,
requires explicit handling (na.rm=TRUE)
4. Error Detection (Outliers)
·
SAS:
IF conditions,
PROC
UNIVARIATE, PROC
MEANS
·
R:
summary(),
boxplot.stats(),
dplyr::filter()
5. Data Validation
·
SAS:
Strong rule-based validation in DATA step
·
R:
Requires manual checks or packages like validate,
assertthat
6. Data Correction
·
SAS:
Inline correction using IF-THEN logic in DATA step
·
R:
Vectorized replacement using indexing or mutate()
7. Date Handling
·
SAS:
MDY,
INTCK,
INTNX
are built-in and robust
·
R:
Base date + lubridate
needed for equivalent flexibility
8. Row-wise Processing
·
SAS:
Native row-by-row execution (DATA step)
·
R:
Vectorized; row-wise requires apply(
) or rowwise(
)
9. Dataset Appending
·
SAS:
SET
automatically aligns variables
·
R:
rbind( )
fails unless structure matches; bind_rows(
) needed
10. Merging Datasets
·
SAS:
MERGE BY with sorted datasets
·
R:
merge( ) or dplyr::left_join(
)
more flexible
11. Transpose Operations
·
SAS:
PROC TRANSPOSE
(simple, structured)
·
R:
t( ),
pivot_longer( ),
pivot_wider( )
(more flexible but verbose)
12. String Handling
·
SAS:
Functions like STRIP,
TRIM,
UPCASE,
PROPCASE
·
R:
trimws( ),
toupper( ),
tolower( ),
stringr
package
13. Numeric Computation
·
SAS:
MEAN,
SUM,
MAX
handle missing automatically
·
R:
Requires na.rm=TRUE
explicitly
14. Automation (Macros vs Functions)
·
SAS:
Macro language (%MACRO)
for dynamic code generation
·
R:
Functions and loops; more powerful but less declarative
15. Error Logging & Debugging
·
SAS:
Built-in log with warnings, notes, errors
·
R:
Console-based; debugging requires traceback(
), debug( )
Summary
Both SAS
and R are capable of detecting, analyzing, and fixing errors in sensor
fusion datasets, but their approaches differ fundamentally:
·
SAS
is:
o Structured
o Metadata-driven
o Ideal
for regulated environments (like clinical trials, automotive safety logs)
o Strong
in data validation, traceability, and
reproducibility
·
R
is:
o Flexible
o Developer-driven
o Powerful
for custom analytics and machine learning
o Requires
careful handling of structure and missing values
For sensor fusion data (like LiDAR, Radar,
Camera metrics):
·
SAS ensures data integrity and compliance
·
R enables advanced
analytics and modeling
Conclusion
Advanced SAS programming provides a robust, rule-based framework for
detecting and correcting errors in structured datasets such as sensor fusion
vehicle data. Its strengths lie in consistency,
automatic handling of structure, and built-in validation mechanisms.
In contrast, R offers greater flexibility and analytical power, but demands
stricter developer discipline for handling inconsistencies, especially in real-world
messy data.
Best Practice in Industry:
·
Use SAS
for:
o Data
cleaning
o Regulatory
reporting
o Structured
pipelines
·
Use R
for:
o Advanced
analytics
o Visualization
o AI/ML
modeling
SAS INTERVIEW QUESTIONS
1.
What is the difference between macro variables and dataset variables?
Answer:
Macro variables exist during program
compilation, while dataset variables exist in datasets during execution.
2.
What is the difference between CALL SYMPUT and CALL SYMPUTX?
Answer:
CALL SYMPUT may include extra spaces,
while CALL SYMPUTX removes leading and
trailing spaces.
3.
What is the purpose of %LET?
Answer:
%LET assigns values to macro variables.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent VEHICLE data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics
· Clinical SAS Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment