India’s Fuel Economy Decoded: Advanced Petrol Price Cleansing and Reporting with SAS and R

Transforming India’s Petrol Price History (2000–2026) into Analytical Intelligence with SAS WHERE vs IF, PROC SQL, DATA Step and R

1. Introduction -When One Bad Record Destroys an Entire Analysis

Imagine a national petroleum analytics company preparing a strategic fuel pricing report for the Government of India. The dataset contains petrol prices from 2000–2026 across multiple Indian states. Executives depend on this analysis to forecast inflation, transportation costs, and subsidy planning.

Everything looks professional until one hidden issue changes the story completely.

A petrol price entered as -105 instead of 105.

A state name recorded as "delhi " instead of "DELHI".

A missing year accidentally treated as valid.

A duplicate record inflates the national average.

Suddenly, dashboards become misleading. Regional fuel trends become inaccurate. Business decisions become dangerous.

This is exactly why professional data cleaning matters.

In modern analytics, raw data is rarely perfect. Whether in clinical trials, banking, insurance, or petroleum analytics, data arrives broken, inconsistent, duplicated, and incomplete. The role of a SAS programmer or data analyst is not merely to generate reports it is to establish trust in data.

Two technologies dominate enterprise-grade data preparation:

  • SAS trusted heavily in pharmaceuticals, banking, and regulatory industries
  • R powerful for modern data wrangling and visualization

This article demonstrates how to build, clean, validate, filter, and report petrol price datasets using:

  • SAS DATA Step
  • PROC SQL
  • WHERE vs IF filtering
  • R tidyverse methods
  • Advanced validation logic
  • Real-world business rules

We intentionally create messy datasets and solve them professionally.

2. Raw Data Creation in SAS — Building the Messy Dataset

Business Scenario

A petroleum intelligence firm collected petrol price records from 2000–2026 across India. The raw flat file contains:

  • Missing petrol prices
  • Duplicate IDs
  • Invalid negative prices
  • Wrong date formats
  • Mixed case state names
  • NULL text values
  • Blank region names

Raw Flat File Simulation Using INFILE

filename fueldata temp;

data _null_;

file fueldata;

put "Fuel_ID|State|Region|Year|Petrol_Price|Record_Date|Fuel_Type

     |Source_System|Remarks";


put "101|Delhi |north|2000|30|12-01-2000|PETROL|SYS_A|VALID";

put "102|MUMBAI|WEST|2001|-32|15-02-2001|PETROL|SYS_B|NEGATIVE";

put "103|Chennai|south|2002|NULL|22-03-2002|petrol|SYS_C|MISSING";

put "104|Kolkata|EAST|2003|35|31-13-2003|PETROL|SYS_A|BADDATE";

put "105|Delhi |NORTH|2004|40|01-05-2004|PETROL|SYS_A|VALID";

put "105|Delhi |NORTH|2004|40|01-05-2004|PETROL|SYS_A|DUPLICATE";

put "106| Hyderabad|south|2005|45|14-07-2005|PETROL|SYS_B|VALID";

put "107|NULL|WEST|2006|48|18-08-2006|PETROL|SYS_C|BADSTATE";

put "108|Pune| |2007|52|20-09-2007|PETROL|SYS_A|BLANKREGION";

put "109|Bangalore|south|2008|-55|15-10-2008|PETROL|SYS_B|NEGATIVE";

put "110|Mumbai|WEST|2026|105|11-01-2026|PETROL|SYS_C|LATEST";

run;

LOG:

NOTE: 12 records were written to the file FUELDATA.
The minimum record length was 54.
The maximum record length was 87.

Why This Matters

This simulates real enterprise flat-file ingestion.

Key Professional Concepts

  1. FILENAME TEMP creates temporary external files.
  2. PUT statements simulate raw source systems.
  3. Delimiter (|) mimics production feeds.
  4. Intentional data corruption tests cleaning logic.
  5. Regulatory systems often validate these exact issues.

Importing Raw Data

data petrol_raw;

retain Fuel_ID State Region Year Petrol_Price Record_Date

       Fuel_Type Source_System Remarks;

infile fueldata dlm='|' dsd firstobs=2;

length State $20 Region $15 Fuel_Type $15 Source_System $15 

       Record_Date $10 Remarks $20;

input Fuel_ID State $ Region $ Year Petrol_Price $ Record_Date $

      Fuel_Type $ Source_System $ Remarks $;

run;

proc print data = petrol_raw;

run;

OUTPUT:

ObsFuel_IDStateRegionYearPetrol_PriceRecord_DateFuel_TypeSource_SystemRemarks
1101Delhinorth20003012-01-2000PETROLSYS_AVALID
2102MUMBAIWEST2001-3215-02-2001PETROLSYS_BNEGATIVE
3103Chennaisouth2002NULL22-03-2002petrolSYS_CMISSING
4104KolkataEAST20033531-13-2003PETROLSYS_ABADDATE
5105DelhiNORTH20044001-05-2004PETROLSYS_AVALID
6105DelhiNORTH20044001-05-2004PETROLSYS_ADUPLICATE
7106Hyderabadsouth20054514-07-2005PETROLSYS_BVALID
8107NULLWEST20064818-08-2006PETROLSYS_CBADSTATE
9108Pune 20075220-09-2007PETROLSYS_ABLANKREGION
10109Bangaloresouth2008-5515-10-2008PETROLSYS_BNEGATIVE
11110MumbaiWEST202610511-01-2026PETROLSYS_CLATEST

The Truncation Trap — Why LENGTH Must Come First

One of the biggest hidden dangers in SAS is character truncation.

Incorrect:

if State='Delhi' then Region='NORTH';

length Region $15;

Correct:

length Region $15;

if State='Delhi' then Region='NORTH';

Why?

Because SAS determines variable length during first compilation. If Region='N' initially appears, SAS permanently allocates length 1 unless explicitly defined earlier.

This silently destroys data integrity.

DATA STEP Cleaning Logic

data petrol_clean;

set petrol_raw;

retain Fuel_ID State Region Year Petrol_Price Record_Date

       Fuel_Type Source_System Remarks;

length State $20 Price_Category $15;

State=upcase(strip(State));

if State='NULL' or State='' then State='UNKNOWN';

Region=upcase(strip(Region));

if Region='' then Region='UNKNOWN';

if Petrol_Price='NULL' then Petrol_Price='';

Petrol_Num=input(Petrol_Price,best12.);

Petrol_Num=abs(Petrol_Num);

month_num=input(scan(Record_Date,2,'-'),best12.);

if month_num >= 1 and month_num <= 12 then

   Date_Num =input(Record_Date,ddmmyy10.);

 else Date_Num=.;

format Date_Num date9.;

Fuel_Type=upcase(Fuel_Type);

select;

   when (Petrol_Num < 40) Price_Category='LOW';

   when (40 <= Petrol_Num <= 70) Price_Category='MEDIUM';

   otherwise Price_Category='HIGH';

end;

drop Petrol_Price Record_Date;

rename Petrol_Num = Petrol_Price Date_Num = Record_Date ;

run;

proc print data = petrol_clean;

run;

OUTPUT:

ObsFuel_IDStateRegionYearFuel_TypeSource_SystemRemarksPrice_CategoryPetrol_Pricemonth_numRecord_Date
1101DELHINORTH2000PETROLSYS_AVALIDLOW30112JAN2000
2102MUMBAIWEST2001PETROLSYS_BNEGATIVELOW32215FEB2001
3103CHENNAISOUTH2002PETROLSYS_CMISSINGLOW.322MAR2002
4104KOLKATAEAST2003PETROLSYS_ABADDATELOW3513.
5105DELHINORTH2004PETROLSYS_AVALIDMEDIUM40501MAY2004
6105DELHINORTH2004PETROLSYS_ADUPLICATEMEDIUM40501MAY2004
7106HYDERABADSOUTH2005PETROLSYS_BVALIDMEDIUM45714JUL2005
8107UNKNOWNWEST2006PETROLSYS_CBADSTATEMEDIUM48818AUG2006
9108PUNEUNKNOWN2007PETROLSYS_ABLANKREGIONMEDIUM52920SEP2007
10109BANGALORESOUTH2008PETROLSYS_BNEGATIVEMEDIUM551015OCT2008
11110MUMBAIWEST2026PETROLSYS_CLATESTHIGH105111JAN2026

Why This DATA STEP Is Professional

Key Functions Explained

ABS()

Converts negative petrol prices into valid positive values.

Example:
-55 → 55

Useful in:

  • Banking corrections
  • Financial normalization
  • Sensor anomaly handling

INPUT()

Converts character dates into numeric SAS dates.

Date_Num=input(Record_Date,ddmmyy10.);

Without conversion:

  • Sorting fails
  • Date arithmetic fails
  • Reporting becomes unreliable

SELECT-WHEN vs IF-THEN

IF-THEN

Best for:

  • Complex Boolean conditions
  • Nested logic
  • Dynamic calculations

SELECT-WHEN

Best for:

  • Categorical grouping
  • Cleaner readability
  • Faster maintenance

This project uses SELECT-WHEN because price categories are mutually exclusive.

Deduplication Using PROC SORT

proc sort data=petrol_clean out=petrol_nodup nodupkey;

by Fuel_ID;

run;

proc print data = petrol_nodup;

run;

LOG:

NOTE: There were 11 observations read from the data set WORK.PETROL_CLEAN.
NOTE: 1 observations with duplicate key values were deleted.

OUTPUT:

ObsFuel_IDStateRegionYearFuel_TypeSource_SystemRemarksPrice_CategoryPetrol_Pricemonth_numRecord_Date
1101DELHINORTH2000PETROLSYS_AVALIDLOW30112JAN2000
2102MUMBAIWEST2001PETROLSYS_BNEGATIVELOW32215FEB2001
3103CHENNAISOUTH2002PETROLSYS_CMISSINGLOW.322MAR2002
4104KOLKATAEAST2003PETROLSYS_ABADDATELOW3513.
5105DELHINORTH2004PETROLSYS_AVALIDMEDIUM40501MAY2004
6106HYDERABADSOUTH2005PETROLSYS_BVALIDMEDIUM45714JUL2005
7107UNKNOWNWEST2006PETROLSYS_CBADSTATEMEDIUM48818AUG2006
8108PUNEUNKNOWN2007PETROLSYS_ABLANKREGIONMEDIUM52920SEP2007
9109BANGALORESOUTH2008PETROLSYS_BNEGATIVEMEDIUM551015OCT2008
10110MUMBAIWEST2026PETROLSYS_CLATESTHIGH105111JAN2026

Why PROC SORT NODUPKEY Matters

Duplicate fuel records can inflate:

  • State averages
  • Inflation estimates
  • Government subsidy calculations

NODUPKEY preserves only one unique observation per key.

WHERE vs IF in SAS

Using WHERE

proc print data=petrol_nodup;

where Region='SOUTH';

run;

OUTPUT:

ObsFuel_IDStateRegionYearFuel_TypeSource_SystemRemarksPrice_CategoryPetrol_Pricemonth_numRecord_Date
3103CHENNAISOUTH2002PETROLSYS_CMISSINGLOW.322MAR2002
6106HYDERABADSOUTH2005PETROLSYS_BVALIDMEDIUM45714JUL2005
9109BANGALORESOUTH2008PETROLSYS_BNEGATIVEMEDIUM551015OCT2008

WHERE Characteristics

  • Filters BEFORE reading data
  • Faster for large datasets
  • Cannot use newly created variables

Using IF

data south_prices;

set petrol_nodup;

if Region='SOUTH';

run;

proc print data=south_prices;

run;

OUTPUT:

ObsFuel_IDStateRegionYearFuel_TypeSource_SystemRemarksPrice_CategoryPetrol_Pricemonth_numRecord_Date
1103CHENNAISOUTH2002PETROLSYS_CMISSINGLOW.322MAR2002
2106HYDERABADSOUTH2005PETROLSYS_BVALIDMEDIUM45714JUL2005
3109BANGALORESOUTH2008PETROLSYS_BNEGATIVEMEDIUM551015OCT2008

IF Characteristics

  • Filters AFTER reading observations
  • Can use computed variables
  • More flexible

Performance Difference

Feature

WHERE

IF

Execution Stage

Before PDV

After PDV

Speed

Faster

Slower

Uses Derived Variables

No

Yes

Best For

Extraction

Transformation

PROC SQL Alternative

proc sql;

create table petrol_sql as

select Fuel_ID,State,Region,Year,Petrol_Price format=8.2,

       Price_Category

from petrol_nodup

where Petrol_Price > 50;

quit;

proc print data=petrol_sql;

run;

OUTPUT:

ObsFuel_IDStateRegionYearPetrol_PricePrice_Category
1108PUNEUNKNOWN200752.00MEDIUM
2109BANGALORESOUTH200855.00MEDIUM
3110MUMBAIWEST2026105.00HIGH

Why PROC SQL Is Powerful

PROC SQL is ideal for:

  • Joins
  • Aggregation
  • Filtering
  • Enterprise reporting

DATA Step excels at:

  • Row-wise transformations
  • Sequential processing
  • Complex derivations

Professional programmers use both strategically.

3. The R Refinement Layer — Tidyverse Cleaning

library(dplyr)

library(stringr)

library(tidyr)

petrol_raw <- data.frame(

  Fuel_ID=c(101,102,103,104,105),

  State=c("Delhi ","MUMBAI","NULL","Chennai","Delhi"),

  Price=c(30,-32,NA,35,40)

)

OUTPUT:

 

Fuel_ID

State

Price

1

101

Delhi 

30

2

102

MUMBAI

-32

3

103

NULL

NA

4

104

Chennai

35

5

105

Delhi

40


petrol_clean <- petrol_raw %>%

  mutate(

    State=str_trim(toupper(State)),

    State=ifelse(State=="NULL","UNKNOWN",State),

    Price=abs(Price),

    Price=replace_na(Price,0)

  )

OUTPUT:

 

Fuel_ID

State

Price

1

101

DELHI

30

2

102

MUMBAI

32

3

103

UNKNOWN

0

4

104

CHENNAI

35

5

105

DELHI

40

R Functions Explained

R Function

Purpose

SAS Equivalent

mutate()

Create variables

DATA STEP

filter()

Subset rows

WHERE/IF

replace_na()

Handle missing

IF MISSING()

toupper()

Standardization

UPCASE()

str_trim()

Remove spaces

STRIP()

Advanced Regex Cleaning

petrol_clean$State <- gsub("[^A-Z ]","",petrol_clean$State)

OUTPUT:

 

Fuel_ID

State

Price

1

101

DELHI

30

2

102

MUMBAI

32

3

103

UNKNOWN

0

4

104

CHENNAI

35

5

105

DELHI

40


petrol_clean$State <- trimws(petrol_clean$State)

OUTPUT:

 

Fuel_ID

State

Price

1

101

DELHI

30

2

102

MUMBAI

32

3

103

UNKNOWN

0

4

104

CHENNAI

35

5

105

DELHI

40

Why Regex Cleaning Matters

Real-world systems contain:

  • Hidden tabs
  • Extra punctuation
  • Invalid symbols

Regex removes these systematically.

4. Business Logic & The Missing Value Trap

High-Stakes Loan Approval Scenario

Imagine a bank approving fuel dealership loans.

Eligibility Rule:

Petrol Price > 50

Now imagine missing petrol prices.

In SAS:

data petrol_clean2;

retain Fuel_ID State Region Year Petrol_Price Record_Date

       Fuel_Type Source_System Remarks Reject;

set petrol_clean;

if missing(Petrol_Price) then Reject="Review";

else if Petrol_Price > 50 then Reject="Yes";

else Reject="No";

run;

proc print data=petrol_clean2;

run;

Problem:

Missing numeric values are treated as smaller than any number.

Meaning:

. < 50 = TRUE

A missing value accidentally becomes eligible or rejected incorrectly.

This can:

  • Trigger regulatory violations
  • Cause financial losses
  • Destroy audit credibility

Always explicitly check missing values:

if missing(Petrol_Num) then Reject='REVIEW';

Key Point

SET imports variable attributes immediately.

Therefore:

  • RETAIN after SET may not reorder variables properly
  • LENGTH, FORMAT, ATTRIB, RETAIN are best placed before SET

OUTPUT:

ObsFuel_IDStateRegionYearPetrol_PriceRecord_DateFuel_TypeSource_SystemRemarksRejectPrice_Categorymonth_num
1101DELHINORTH20003012JAN2000PETROLSYS_AVALIDNoLOW1
2102MUMBAIWEST20013215FEB2001PETROLSYS_BNEGATIVENoLOW2
3103CHENNAISOUTH2002.22MAR2002PETROLSYS_CMISSINGReviewLOW3
4104KOLKATAEAST200335.PETROLSYS_ABADDATENoLOW13
5105DELHINORTH20044001MAY2004PETROLSYS_AVALIDNoMEDIUM5
6105DELHINORTH20044001MAY2004PETROLSYS_ADUPLICATENoMEDIUM5
7106HYDERABADSOUTH20054514JUL2005PETROLSYS_BVALIDNoMEDIUM7
8107UNKNOWNWEST20064818AUG2006PETROLSYS_CBADSTATENoMEDIUM8
9108PUNEUNKNOWN20075220SEP2007PETROLSYS_ABLANKREGIONYesMEDIUM9
10109BANGALORESOUTH20085515OCT2008PETROLSYS_BNEGATIVEYesMEDIUM10
11110MUMBAIWEST202610511JAN2026PETROLSYS_CLATESTYesHIGH1

5. 20 Golden Rules for Professional SAS Projects

  1. Always define LENGTH before assignments.
  2. Never trust raw imported data.
  3. Standardize character casing immediately.
  4. Remove duplicates before aggregation.
  5. Validate dates rigorously.
  6. Use WHERE for faster filtering.
  7. Use IF for derived-variable logic.
  8. Never overwrite raw datasets.
  9. Preserve audit copies.
  10. Use formats consistently.
  11. Validate missing values explicitly.
  12. Document derivation rules.
  13. Use meaningful variable names.
  14. Avoid hardcoded business logic.
  15. Validate source-to-target mapping.
  16. Test edge-case scenarios.
  17. Use PROC CONTENTS frequently.
  18. Normalize text using STRIP().
  19. Prefer SELECT-WHEN for categories.
  20. Always create QC validation reports.

Extended Reporting & Aggregation

proc summary data=petrol_nodup nway;

class Region;

var Petrol_price;

output out=region_summary mean=Avg_Price

       max=Max_Price       min=Min_Price;

run;

proc print data=region_summary;

run;

OUTPUT:

ObsRegion_TYPE__FREQ_Avg_PriceMax_PriceMin_Price
1EAST1135.00003535
2NORTH1235.00004030
3SOUTH1350.00005545
4UNKNOWN1152.00005252
5WEST1361.666710532

Professional Reporting

proc report data=region_summary nowd;

columns Region Avg_Price Max_Price Min_Price;

define Region / group;

define Avg_Price / analysis;

define Max_Price / analysis;

define Min_Price / analysis;

title "Regional Petrol Price Summary Report";

run;

OUTPUT:

Regional Petrol Price Summary Report

RegionAvg_PriceMax_PriceMin_Price
EAST353535
NORTH354030
SOUTH505545
UNKNOWN525252
WEST61.66666710532

Why PROC REPORT Is Enterprise Preferred

PROC REPORT provides:

  • Controlled layouts
  • Regulatory formatting
  • Executive-ready outputs
  • Dynamic summarization

Widely used in:

  • Clinical TLFs
  • Financial reporting
  • Government analytics

6. 20 Additional Data Cleaning Best Practices

  1. Validate SDTM compliance before submission.
  2. Maintain Define.xml traceability.
  3. Use controlled terminology consistently.
  4. Preserve original raw values.
  5. Validate unit consistency.
  6. Create automated QC macros.
  7. Avoid manual spreadsheet edits.
  8. Reconcile cross-domain mismatches.
  9. Verify subject uniqueness.
  10. Validate treatment dates carefully.
  11. Maintain audit logs.
  12. Track derivation origins.
  13. Use metadata-driven programming.
  14. Validate protocol deviations.
  15. Perform double programming QC.
  16. Use checksum comparisons.
  17. Detect impossible date sequences.
  18. Standardize categorical variables.
  19. Document transformation assumptions.
  20. Ensure reproducibility across environments.

7. Business Logic Behind Data Cleaning

Data cleaning is not cosmetic it directly impacts decision-making.

Suppose petrol prices contain negative values. Without correction, average price calculations become distorted. Governments may incorrectly estimate inflation trends. Oil companies may misprice supply contracts.

Missing values are equally dangerous. A blank patient age in a clinical trial could accidentally qualify an ineligible participant. Similarly, missing petrol pricing records could distort national fuel subsidy calculations.

Date correction is critical because incorrect timelines affect forecasting. If a 2026 record is accidentally stored as 2006, trend analysis becomes unreliable.

Salary normalization in banking, age validation in healthcare, and price standardization in petroleum analytics all follow the same principle:

Reliable decisions require reliable data.

That is why professional SAS programmers prioritize:

  • Validation
  • Standardization
  • Auditability
  • Reproducibility

before any reporting begins.

8. 20 Sharp Key Insights

  1. Dirty data leads to wrong conclusions.
  2. Standardization ensures reproducibility.
  3. Missing values are hidden business risks.
  4. WHERE filtering improves performance.
  5. IF filtering improves flexibility.
  6. PROC SQL simplifies aggregation.
  7. DATA Step excels at transformations.
  8. Duplicate records inflate metrics.
  9. LENGTH prevents truncation disasters.
  10. Regex cleaning removes hidden corruption.
  11. Audit trails protect regulatory trust.
  12. Validation improves business credibility.
  13. ABS() corrects numerical anomalies.
  14. INPUT() enables true date analytics.
  15. PROC REPORT creates executive-ready outputs.
  16. Null handling prevents logical failures.
  17. Text normalization improves joins.
  18. QC checks reduce compliance risk.
  19. Structured programming improves maintenance.
  20. Reliable analytics begin with clean data.

9. Summary

SAS and R both provide exceptional capabilities for enterprise-grade data cleaning, but each has distinct strengths.

SAS dominates highly regulated industries such as pharmaceuticals, banking, insurance, and government analytics because of its:

  • Stability
  • Auditability
  • Regulatory trust
  • Structured processing

Its DATA Step architecture makes row-wise transformations highly efficient. Features like WHERE processing, PROC SORT, PROC SQL, and PROC REPORT allow scalable enterprise workflows.

Meanwhile, R excels in:

  • Modern wrangling
  • Visualization
  • Open-source flexibility
  • Rapid exploratory analysis

Packages such as dplyr, tidyr, and stringr simplify complex transformations using readable syntax.

In professional environments, the strongest analysts understand both ecosystems.

SAS provides enterprise-grade reliability.

R provides analytical flexibility.

Together, they create scalable, accurate, and maintainable analytics pipelines capable of handling real-world messy data.

From petrol pricing intelligence to clinical trial validation, the core principle remains unchanged:

Clean data drives trustworthy decisions.

10. Conclusion

The journey from raw petrol price records to professional analytical intelligence demonstrates a critical truth about modern data science:

Analytics is only as reliable as the quality of the underlying data.

A single duplicate record can inflate averages.

A missing value can invalidate eligibility decisions.

A truncated character variable can silently corrupt reporting.

These are not theoretical problems they occur daily across banking systems, healthcare platforms, petroleum analytics, and government reporting infrastructures.

This project showcased how professional programmers use:

  • SAS DATA Step
  • PROC SQL
  • WHERE vs IF logic
  • PROC SORT
  • PROC REPORT
  • R tidyverse pipelines

to transform chaotic datasets into structured, validated, business-ready intelligence.

More importantly, we explored the reasoning behind each operation.

Professional data cleaning is not merely technical coding.

It is:

  • Risk management
  • Business protection
  • Regulatory defense
  • Decision assurance

The distinction between amateur and enterprise-grade analytics often lies not in dashboards, but in how rigorously the underlying data was validated.

Organizations trust analysts who:

  • Preserve audit trails
  • Anticipate edge cases
  • Handle missing values safely
  • Document transformations clearly
  • Build reproducible pipelines

Whether you work in clinical trials, petroleum analytics, banking, or insurance, mastering structured data cleaning frameworks will always remain one of the most valuable skills in analytics engineering.

Clean data is not a luxury.

It is the foundation of trustworthy intelligence.

11. Interview Questions & Answers

1. Why is WHERE faster than IF in SAS?

Answer:

WHERE filters observations before entering the Program Data Vector (PDV), reducing I/O operations. IF evaluates after data is read into memory.

2. When would you prefer DATA Step over PROC SQL?

Answer:

Use DATA Step for sequential row-level transformations and complex derivations. Use PROC SQL for joins, aggregations, and relational operations.

3. How does SAS treat missing numeric values?

Answer:

SAS treats missing numeric values as smaller than any valid number. This can create logical errors if not explicitly handled.

4. Explain a real-world duplicate record issue.

Answer:

Duplicate patient or petrol records can inflate averages, distort KPIs, and produce misleading executive reports. PROC SORT NODUPKEY helps resolve this.

5. What is the SAS equivalent of mutate() in R?

Answer:

The SAS DATA Step performs variable creation and modification similarly to mutate() in R.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. Here we learn about PETROL DATA.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics and smart cities

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:



3.Why Even Small Data Errors Can Collapse Enterprise Analytics
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy

Comments