Transforming India’s Petrol Price History (2000–2026) into Analytical Intelligence with SAS WHERE vs IF, PROC SQL, DATA Step and R

1. Introduction -When One Bad Record Destroys an Entire Analysis

Imagine a national petroleum analytics company preparing a strategic fuel pricing report for the Government of India. The dataset contains petrol prices from 2000–2026 across multiple Indian states. Executives depend on this analysis to forecast inflation, transportation costs, and subsidy planning.

Everything looks professional until one hidden issue changes the story completely.

A petrol price entered as -105 instead of 105.

A state name recorded as "delhi " instead of "DELHI".

A missing year accidentally treated as valid.

A duplicate record inflates the national average.

Suddenly, dashboards become misleading. Regional fuel trends become inaccurate. Business decisions become dangerous.

This is exactly why professional data cleaning matters.

In modern analytics, raw data is rarely perfect. Whether in clinical trials, banking, insurance, or petroleum analytics, data arrives broken, inconsistent, duplicated, and incomplete. The role of a SAS programmer or data analyst is not merely to generate reports it is to establish trust in data.

Two technologies dominate enterprise-grade data preparation:

SAS trusted heavily in pharmaceuticals, banking, and regulatory industries
R powerful for modern data wrangling and visualization

This article demonstrates how to build, clean, validate, filter, and report petrol price datasets using:

SAS DATA Step
PROC SQL
WHERE vs IF filtering
R tidyverse methods
Advanced validation logic
Real-world business rules

We intentionally create messy datasets and solve them professionally.

2. Raw Data Creation in SAS — Building the Messy Dataset

Business Scenario

A petroleum intelligence firm collected petrol price records from 2000–2026 across India. The raw flat file contains:

Missing petrol prices
Duplicate IDs
Invalid negative prices
Wrong date formats
Mixed case state names
NULL text values
Blank region names

Raw Flat File Simulation Using INFILE

filename fueldata temp;

data _null_;

file fueldata;

|Source_System|Remarks";

put "101|Delhi |north|2000|30|12-01-2000|PETROL|SYS_A|VALID";

put "102|MUMBAI|WEST|2001|-32|15-02-2001|PETROL|SYS_B|NEGATIVE";

put "104|Kolkata|EAST|2003|35|31-13-2003|PETROL|SYS_A|BADDATE";

put "105|Delhi |NORTH|2004|40|01-05-2004|PETROL|SYS_A|VALID";

put "105|Delhi |NORTH|2004|40|01-05-2004|PETROL|SYS_A|DUPLICATE";

put "106| Hyderabad|south|2005|45|14-07-2005|PETROL|SYS_B|VALID";

put "107|NULL|WEST|2006|48|18-08-2006|PETROL|SYS_C|BADSTATE";

put "108|Pune| |2007|52|20-09-2007|PETROL|SYS_A|BLANKREGION";

put "109|Bangalore|south|2008|-55|15-10-2008|PETROL|SYS_B|NEGATIVE";

put "110|Mumbai|WEST|2026|105|11-01-2026|PETROL|SYS_C|LATEST";

run;

LOG:

 NOTE: 12 records were written to the file FUELDATA.

       The minimum record length was 54.

       The maximum record length was 87.

Why This Matters

This simulates real enterprise flat-file ingestion.

Key Professional Concepts

FILENAME TEMP creates temporary external files.
PUT statements simulate raw source systems.
Delimiter (|) mimics production feeds.
Intentional data corruption tests cleaning logic.
Regulatory systems often validate these exact issues.

Importing Raw Data

data petrol_raw;

retain Fuel_ID State Region Year Petrol_Price Record_Date

Fuel_Type Source_System Remarks;

infile fueldata dlm='|' dsd firstobs=2;

length State $20 Region $15 Fuel_Type $15 Source_System $15

Record_Date $10 Remarks $20;

input Fuel_ID State $ Region $ Year Petrol_Price $ Record_Date $

Fuel_Type $ Source_System $ Remarks $;

run;

proc print data = petrol_raw;

run;

OUTPUT:


Obs	Fuel_ID	State	Region	Year	Petrol_Price	Record_Date	Fuel_Type	Source_System	Remarks
1	101	Delhi	north	2000	30	12-01-2000	PETROL	SYS_A	VALID
2	102	MUMBAI	WEST	2001	-32	15-02-2001	PETROL	SYS_B	NEGATIVE
3	103	Chennai	south	2002	NULL	22-03-2002	petrol	SYS_C	MISSING
4	104	Kolkata	EAST	2003	35	31-13-2003	PETROL	SYS_A	BADDATE
5	105	Delhi	NORTH	2004	40	01-05-2004	PETROL	SYS_A	VALID
6	105	Delhi	NORTH	2004	40	01-05-2004	PETROL	SYS_A	DUPLICATE
7	106	Hyderabad	south	2005	45	14-07-2005	PETROL	SYS_B	VALID
8	107	NULL	WEST	2006	48	18-08-2006	PETROL	SYS_C	BADSTATE
9	108	Pune		2007	52	20-09-2007	PETROL	SYS_A	BLANKREGION
10	109	Bangalore	south	2008	-55	15-10-2008	PETROL	SYS_B	NEGATIVE
11	110	Mumbai	WEST	2026	105	11-01-2026	PETROL	SYS_C	LATEST

The Truncation Trap — Why LENGTH Must Come First

One of the biggest hidden dangers in SAS is character truncation.

Incorrect:

if State='Delhi' then Region='NORTH';

length Region $15;

Correct:

length Region $15;

if State='Delhi' then Region='NORTH';

Why?

Because SAS determines variable length during first compilation. If Region='N' initially appears, SAS permanently allocates length 1 unless explicitly defined earlier.

This silently destroys data integrity.

DATA STEP Cleaning Logic

data petrol_clean;

set petrol_raw;

retain Fuel_ID State Region Year Petrol_Price Record_Date

Fuel_Type Source_System Remarks;

length State $20 Price_Category $15;

State=upcase(strip(State));

if State='NULL' or State='' then State='UNKNOWN';

Region=upcase(strip(Region));

if Region='' then Region='UNKNOWN';

if Petrol_Price='NULL' then Petrol_Price='';

Petrol_Num=input(Petrol_Price,best12.);

Petrol_Num=abs(Petrol_Num);

month_num=input(scan(Record_Date,2,'-'),best12.);

if month_num >= 1 and month_num <= 12 then

Date_Num =input(Record_Date,ddmmyy10.);

else Date_Num=.;

format Date_Num date9.;

Fuel_Type=upcase(Fuel_Type);

select;

when (Petrol_Num < 40) Price_Category='LOW';

when (40 <= Petrol_Num <= 70) Price_Category='MEDIUM';

otherwise Price_Category='HIGH';

end;

drop Petrol_Price Record_Date;

rename Petrol_Num = Petrol_Price Date_Num = Record_Date ;

run;

proc print data = petrol_clean;

run;

OUTPUT:


Obs	Fuel_ID	State	Region	Year	Fuel_Type	Source_System	Remarks	Price_Category	Petrol_Price	month_num	Record_Date
1	101	DELHI	NORTH	2000	PETROL	SYS_A	VALID	LOW	30	1	12JAN2000
2	102	MUMBAI	WEST	2001	PETROL	SYS_B	NEGATIVE	LOW	32	2	15FEB2001
3	103	CHENNAI	SOUTH	2002	PETROL	SYS_C	MISSING	LOW	.	3	22MAR2002
4	104	KOLKATA	EAST	2003	PETROL	SYS_A	BADDATE	LOW	35	13	.
5	105	DELHI	NORTH	2004	PETROL	SYS_A	VALID	MEDIUM	40	5	01MAY2004
6	105	DELHI	NORTH	2004	PETROL	SYS_A	DUPLICATE	MEDIUM	40	5	01MAY2004
7	106	HYDERABAD	SOUTH	2005	PETROL	SYS_B	VALID	MEDIUM	45	7	14JUL2005
8	107	UNKNOWN	WEST	2006	PETROL	SYS_C	BADSTATE	MEDIUM	48	8	18AUG2006
9	108	PUNE	UNKNOWN	2007	PETROL	SYS_A	BLANKREGION	MEDIUM	52	9	20SEP2007
10	109	BANGALORE	SOUTH	2008	PETROL	SYS_B	NEGATIVE	MEDIUM	55	10	15OCT2008
11	110	MUMBAI	WEST	2026	PETROL	SYS_C	LATEST	HIGH	105	1	11JAN2026

Why This DATA STEP Is Professional

Key Functions Explained

ABS()

Converts negative petrol prices into valid positive values.

Example:
-55 → 55

Useful in:

Banking corrections
Financial normalization
Sensor anomaly handling

INPUT()

Converts character dates into numeric SAS dates.

Date_Num=input(Record_Date,ddmmyy10.);

Without conversion:

Sorting fails
Date arithmetic fails
Reporting becomes unreliable

SELECT-WHEN vs IF-THEN

IF-THEN

Best for:

Complex Boolean conditions
Nested logic
Dynamic calculations

SELECT-WHEN

Best for:

Categorical grouping
Cleaner readability
Faster maintenance

This project uses SELECT-WHEN because price categories are mutually exclusive.

Deduplication Using PROC SORT

proc sort data=petrol_clean out=petrol_nodup nodupkey;

by Fuel_ID;

run;

proc print data = petrol_nodup;

run;

LOG:

 NOTE: There were 11 observations read from the data set WORK.PETROL_CLEAN.

 NOTE: 1 observations with duplicate key values were deleted.

OUTPUT:


Obs	Fuel_ID	State	Region	Year	Fuel_Type	Source_System	Remarks	Price_Category	Petrol_Price	month_num	Record_Date
1	101	DELHI	NORTH	2000	PETROL	SYS_A	VALID	LOW	30	1	12JAN2000
2	102	MUMBAI	WEST	2001	PETROL	SYS_B	NEGATIVE	LOW	32	2	15FEB2001
3	103	CHENNAI	SOUTH	2002	PETROL	SYS_C	MISSING	LOW	.	3	22MAR2002
4	104	KOLKATA	EAST	2003	PETROL	SYS_A	BADDATE	LOW	35	13	.
5	105	DELHI	NORTH	2004	PETROL	SYS_A	VALID	MEDIUM	40	5	01MAY2004
6	106	HYDERABAD	SOUTH	2005	PETROL	SYS_B	VALID	MEDIUM	45	7	14JUL2005
7	107	UNKNOWN	WEST	2006	PETROL	SYS_C	BADSTATE	MEDIUM	48	8	18AUG2006
8	108	PUNE	UNKNOWN	2007	PETROL	SYS_A	BLANKREGION	MEDIUM	52	9	20SEP2007
9	109	BANGALORE	SOUTH	2008	PETROL	SYS_B	NEGATIVE	MEDIUM	55	10	15OCT2008
10	110	MUMBAI	WEST	2026	PETROL	SYS_C	LATEST	HIGH	105	1	11JAN2026

Why PROC SORT NODUPKEY Matters

Duplicate fuel records can inflate:

State averages
Inflation estimates
Government subsidy calculations

NODUPKEY preserves only one unique observation per key.

WHERE vs IF in SAS

Using WHERE

proc print data=petrol_nodup;

where Region='SOUTH';

run;

OUTPUT:


Obs	Fuel_ID	State	Region	Year	Fuel_Type	Source_System	Remarks	Price_Category	Petrol_Price	month_num	Record_Date
3	103	CHENNAI	SOUTH	2002	PETROL	SYS_C	MISSING	LOW	.	3	22MAR2002
6	106	HYDERABAD	SOUTH	2005	PETROL	SYS_B	VALID	MEDIUM	45	7	14JUL2005
9	109	BANGALORE	SOUTH	2008	PETROL	SYS_B	NEGATIVE	MEDIUM	55	10	15OCT2008

WHERE Characteristics

Filters BEFORE reading data
Faster for large datasets
Cannot use newly created variables

Using IF

data south_prices;

set petrol_nodup;

if Region='SOUTH';

run;

proc print data=south_prices;

run;

OUTPUT:


Obs	Fuel_ID	State	Region	Year	Fuel_Type	Source_System	Remarks	Price_Category	Petrol_Price	month_num	Record_Date
1	103	CHENNAI	SOUTH	2002	PETROL	SYS_C	MISSING	LOW	.	3	22MAR2002
2	106	HYDERABAD	SOUTH	2005	PETROL	SYS_B	VALID	MEDIUM	45	7	14JUL2005
3	109	BANGALORE	SOUTH	2008	PETROL	SYS_B	NEGATIVE	MEDIUM	55	10	15OCT2008

IF Characteristics

Filters AFTER reading observations
Can use computed variables
More flexible

Performance Difference

Feature	WHERE	IF
Execution Stage	Before PDV	After PDV
Speed	Faster	Slower
Uses Derived Variables	No	Yes
Best For	Extraction	Transformation

PROC SQL Alternative

proc sql;

create table petrol_sql as

select Fuel_ID,State,Region,Year,Petrol_Price format=8.2,

Price_Category

from petrol_nodup

where Petrol_Price > 50;

quit;

proc print data=petrol_sql;

run;

OUTPUT:


Obs	Fuel_ID	State	Region	Year	Petrol_Price	Price_Category
1	108	PUNE	UNKNOWN	2007	52.00	MEDIUM
2	109	BANGALORE	SOUTH	2008	55.00	MEDIUM
3	110	MUMBAI	WEST	2026	105.00	HIGH

Why PROC SQL Is Powerful

PROC SQL is ideal for:

Joins
Aggregation
Filtering
Enterprise reporting

DATA Step excels at:

Row-wise transformations
Sequential processing
Complex derivations

Professional programmers use both strategically.

3. The R Refinement Layer — Tidyverse Cleaning

library(dplyr)

library(stringr)

library(tidyr)

petrol_raw <- data.frame(

Fuel_ID=c(101,102,103,104,105),

State=c("Delhi ","MUMBAI","NULL","Chennai","Delhi"),

Price=c(30,-32,NA,35,40)

)

OUTPUT:

	Fuel_ID	State	Price
1	101	Delhi	30
2	102	MUMBAI	-32
3	103	NULL	NA
4	104	Chennai	35
5	105	Delhi	40

petrol_clean <- petrol_raw %>%

mutate(

State=str_trim(toupper(State)),

State=ifelse(State=="NULL","UNKNOWN",State),

Price=abs(Price),

Price=replace_na(Price,0)

)

OUTPUT:

	Fuel_ID	State	Price
1	101	DELHI	30
2	102	MUMBAI	32
3	103	UNKNOWN	0
4	104	CHENNAI	35
5	105	DELHI	40

R Functions Explained

R Function	Purpose	SAS Equivalent
mutate()	Create variables	DATA STEP
filter()	Subset rows	WHERE/IF
replace_na()	Handle missing	IF MISSING()
toupper()	Standardization	UPCASE()
str_trim()	Remove spaces	STRIP()

Advanced Regex Cleaning

petrol_clean$State <- gsub("[^A-Z ]","",petrol_clean$State)

OUTPUT:

	Fuel_ID	State	Price
1	101	DELHI	30
2	102	MUMBAI	32
3	103	UNKNOWN	0
4	104	CHENNAI	35
5	105	DELHI	40

petrol_clean$State <- trimws(petrol_clean$State)

OUTPUT:

	Fuel_ID	State	Price
1	101	DELHI	30
2	102	MUMBAI	32
3	103	UNKNOWN	0
4	104	CHENNAI	35
5	105	DELHI	40

Why Regex Cleaning Matters

Real-world systems contain:

Hidden tabs
Extra punctuation
Invalid symbols

Regex removes these systematically.

4. Business Logic & The Missing Value Trap

High-Stakes Loan Approval Scenario

Imagine a bank approving fuel dealership loans.

Eligibility Rule:

Petrol Price > 50

Now imagine missing petrol prices.

In SAS:

data petrol_clean2;

retain Fuel_ID State Region Year Petrol_Price Record_Date

Fuel_Type Source_System Remarks Reject;

set petrol_clean;

if missing(Petrol_Price) then Reject="Review";

else if Petrol_Price > 50 then Reject="Yes";

else Reject="No";

run;

proc print data=petrol_clean2;

run;

Problem:

Missing numeric values are treated as smaller than any number.

Meaning:

. < 50 = TRUE

A missing value accidentally becomes eligible or rejected incorrectly.

This can:

Trigger regulatory violations
Cause financial losses
Destroy audit credibility

Always explicitly check missing values:

if missing(Petrol_Num) then Reject='REVIEW';

Key Point

SET imports variable attributes immediately.

Therefore:

RETAIN after SET may not reorder variables properly
LENGTH, FORMAT, ATTRIB, RETAIN are best placed before SET

OUTPUT:


Obs	Fuel_ID	State	Region	Year	Petrol_Price	Record_Date	Fuel_Type	Source_System	Remarks	Reject	Price_Category	month_num
1	101	DELHI	NORTH	2000	30	12JAN2000	PETROL	SYS_A	VALID	No	LOW	1
2	102	MUMBAI	WEST	2001	32	15FEB2001	PETROL	SYS_B	NEGATIVE	No	LOW	2
3	103	CHENNAI	SOUTH	2002	.	22MAR2002	PETROL	SYS_C	MISSING	Review	LOW	3
4	104	KOLKATA	EAST	2003	35	.	PETROL	SYS_A	BADDATE	No	LOW	13
5	105	DELHI	NORTH	2004	40	01MAY2004	PETROL	SYS_A	VALID	No	MEDIUM	5
6	105	DELHI	NORTH	2004	40	01MAY2004	PETROL	SYS_A	DUPLICATE	No	MEDIUM	5
7	106	HYDERABAD	SOUTH	2005	45	14JUL2005	PETROL	SYS_B	VALID	No	MEDIUM	7
8	107	UNKNOWN	WEST	2006	48	18AUG2006	PETROL	SYS_C	BADSTATE	No	MEDIUM	8
9	108	PUNE	UNKNOWN	2007	52	20SEP2007	PETROL	SYS_A	BLANKREGION	Yes	MEDIUM	9
10	109	BANGALORE	SOUTH	2008	55	15OCT2008	PETROL	SYS_B	NEGATIVE	Yes	MEDIUM	10
11	110	MUMBAI	WEST	2026	105	11JAN2026	PETROL	SYS_C	LATEST	Yes	HIGH	1

5. 20 Golden Rules for Professional SAS Projects

Always define LENGTH before assignments.
Never trust raw imported data.
Standardize character casing immediately.
Remove duplicates before aggregation.
Validate dates rigorously.
Use WHERE for faster filtering.
Use IF for derived-variable logic.
Never overwrite raw datasets.
Preserve audit copies.
Use formats consistently.
Validate missing values explicitly.
Document derivation rules.
Use meaningful variable names.
Avoid hardcoded business logic.
Validate source-to-target mapping.
Test edge-case scenarios.
Use PROC CONTENTS frequently.
Normalize text using STRIP().
Prefer SELECT-WHEN for categories.
Always create QC validation reports.

Extended Reporting & Aggregation

proc summary data=petrol_nodup nway;

class Region;

var Petrol_price;

output out=region_summary mean=Avg_Price

max=Max_Price min=Min_Price;

run;

proc print data=region_summary;

run;

OUTPUT:


Obs	Region	_TYPE_	_FREQ_	Avg_Price	Max_Price	Min_Price
1	EAST	1	1	35.0000	35	35
2	NORTH	1	2	35.0000	40	30
3	SOUTH	1	3	50.0000	55	45
4	UNKNOWN	1	1	52.0000	52	52
5	WEST	1	3	61.6667	105	32

Professional Reporting

proc report data=region_summary nowd;

columns Region Avg_Price Max_Price Min_Price;

define Region / group;

define Avg_Price / analysis;

define Max_Price / analysis;

define Min_Price / analysis;

title "Regional Petrol Price Summary Report";

run;

OUTPUT:

Regional Petrol Price Summary Report


Region	Avg_Price	Max_Price	Min_Price
EAST	35	35	35
NORTH	35	40	30
SOUTH	50	55	45
UNKNOWN	52	52	52
WEST	61.666667	105	32

Why PROC REPORT Is Enterprise Preferred

PROC REPORT provides:

Controlled layouts
Regulatory formatting
Executive-ready outputs
Dynamic summarization

Widely used in:

Clinical TLFs
Financial reporting
Government analytics

6. 20 Additional Data Cleaning Best Practices

Validate SDTM compliance before submission.
Maintain Define.xml traceability.
Use controlled terminology consistently.
Preserve original raw values.
Validate unit consistency.
Create automated QC macros.
Avoid manual spreadsheet edits.
Reconcile cross-domain mismatches.
Verify subject uniqueness.
Validate treatment dates carefully.
Maintain audit logs.
Track derivation origins.
Use metadata-driven programming.
Validate protocol deviations.
Perform double programming QC.
Use checksum comparisons.
Detect impossible date sequences.
Standardize categorical variables.
Document transformation assumptions.
Ensure reproducibility across environments.

7. Business Logic Behind Data Cleaning

Data cleaning is not cosmetic it directly impacts decision-making.

Suppose petrol prices contain negative values. Without correction, average price calculations become distorted. Governments may incorrectly estimate inflation trends. Oil companies may misprice supply contracts.

Missing values are equally dangerous. A blank patient age in a clinical trial could accidentally qualify an ineligible participant. Similarly, missing petrol pricing records could distort national fuel subsidy calculations.

Date correction is critical because incorrect timelines affect forecasting. If a 2026 record is accidentally stored as 2006, trend analysis becomes unreliable.

Salary normalization in banking, age validation in healthcare, and price standardization in petroleum analytics all follow the same principle:

Reliable decisions require reliable data.

That is why professional SAS programmers prioritize:

Validation
Standardization
Auditability
Reproducibility

before any reporting begins.

8. 20 Sharp Key Insights

Dirty data leads to wrong conclusions.
Standardization ensures reproducibility.
Missing values are hidden business risks.
WHERE filtering improves performance.
IF filtering improves flexibility.
PROC SQL simplifies aggregation.
DATA Step excels at transformations.
Duplicate records inflate metrics.
LENGTH prevents truncation disasters.
Regex cleaning removes hidden corruption.
Audit trails protect regulatory trust.
Validation improves business credibility.
ABS() corrects numerical anomalies.
INPUT() enables true date analytics.
PROC REPORT creates executive-ready outputs.
Null handling prevents logical failures.
Text normalization improves joins.
QC checks reduce compliance risk.
Structured programming improves maintenance.
Reliable analytics begin with clean data.

9. Summary

SAS and R both provide exceptional capabilities for enterprise-grade data cleaning, but each has distinct strengths.

SAS dominates highly regulated industries such as pharmaceuticals, banking, insurance, and government analytics because of its:

Stability
Auditability
Regulatory trust
Structured processing

Its DATA Step architecture makes row-wise transformations highly efficient. Features like WHERE processing, PROC SORT, PROC SQL, and PROC REPORT allow scalable enterprise workflows.

Meanwhile, R excels in:

Modern wrangling
Visualization
Open-source flexibility
Rapid exploratory analysis

Packages such as dplyr, tidyr, and stringr simplify complex transformations using readable syntax.

In professional environments, the strongest analysts understand both ecosystems.

SAS provides enterprise-grade reliability.

R provides analytical flexibility.

Together, they create scalable, accurate, and maintainable analytics pipelines capable of handling real-world messy data.

From petrol pricing intelligence to clinical trial validation, the core principle remains unchanged:

Clean data drives trustworthy decisions.

10. Conclusion

The journey from raw petrol price records to professional analytical intelligence demonstrates a critical truth about modern data science:

Analytics is only as reliable as the quality of the underlying data.

A single duplicate record can inflate averages.

A missing value can invalidate eligibility decisions.

A truncated character variable can silently corrupt reporting.

These are not theoretical problems they occur daily across banking systems, healthcare platforms, petroleum analytics, and government reporting infrastructures.

This project showcased how professional programmers use:

SAS DATA Step
PROC SQL
WHERE vs IF logic
PROC SORT
PROC REPORT
R tidyverse pipelines

to transform chaotic datasets into structured, validated, business-ready intelligence.

More importantly, we explored the reasoning behind each operation.

Professional data cleaning is not merely technical coding.

It is:

Risk management
Business protection
Regulatory defense
Decision assurance

The distinction between amateur and enterprise-grade analytics often lies not in dashboards, but in how rigorously the underlying data was validated.

Organizations trust analysts who:

Preserve audit trails
Anticipate edge cases
Handle missing values safely
Document transformations clearly
Build reproducible pipelines

Whether you work in clinical trials, petroleum analytics, banking, or insurance, mastering structured data cleaning frameworks will always remain one of the most valuable skills in analytics engineering.

Clean data is not a luxury.

It is the foundation of trustworthy intelligence.

11. Interview Questions & Answers

1. Why is WHERE faster than IF in SAS?

Answer:

WHERE filters observations before entering the Program Data Vector (PDV), reducing I/O operations. IF evaluates after data is read into memory.

2. When would you prefer DATA Step over PROC SQL?

Answer:

Use DATA Step for sequential row-level transformations and complex derivations. Use PROC SQL for joins, aggregations, and relational operations.

3. How does SAS treat missing numeric values?

Answer:

SAS treats missing numeric values as smaller than any valid number. This can create logical errors if not explicitly handled.

4. Explain a real-world duplicate record issue.

Answer:

Duplicate patient or petrol records can inflate averages, distort KPIs, and produce misleading executive reports. PROC SORT NODUPKEY helps resolve this.

5. What is the SAS equivalent of mutate() in R?

Answer:

The SAS DATA Step performs variable creation and modification similarly to mutate() in R.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.

Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. Here we learn about PETROL DATA.

Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.

This project is suitable for:
·  Students learning SAS
·  Data analysts building portfolios
·  Professionals preparing for SAS interviews
·  Bloggers writing about analytics and smart cities
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On :

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.From Dead Mobile Brands to Clean Data Mastery Using INPUT and PUT in SAS

2.Hidden Errors in Movie Data and How SAS LENGTH & FORMAT Bring Clarity

3.Why Even Small Data Errors Can Collapse Enterprise Analytics
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Search This Blog

SAS Learning Hub

India’s Fuel Economy Decoded: Advanced Petrol Price Cleansing and Reporting with SAS and R

Follow Us On :

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

About Us | Contact | Privacy Policy

Comments

Post a Comment