When Bombay Meets Mumbai: Transforming Inconsistent City Data into Analytical Gold

Cities Reborn, Data Refined: Turning Chaotic Global City Name Changes into Analytical Intelligence with SAS & R

1. Introduction 

Imagine you’re working on a global pharmaceutical trial. Patient data is coming in from multiple countries India, USA, Japan, Russia. Everything looks fine until you run a regional analysis. Suddenly, your output shows Madras and Chennai as two different locations. Bombay appears separately from Mumbai. Worse, some records show “NULL,” others “unknown,” and some are blank.

Now your regional efficacy results are split across duplicate geographies.

That’s not just messy that’s dangerous.

In clinical trials, business intelligence, or financial systems, bad data leads to:

  • Incorrect aggregations
  • Misleading trends
  • Faulty decision-making
  • Regulatory risks

This is where SAS and R become critical. SAS provides structured, enterprise-grade data processing (ideal for clinical environments), while R offers flexible, modern data wrangling capabilities.

In this project, we simulate a global dataset of old vs new city names, intentionally inject errors, and then clean, transform, and combine datasets using:

  • SAS DATA Step vs PROC SQL (MERGE vs JOIN)
  • R dplyr workflows

2. Raw Data Creation (With Intentional Errors)

SAS Code (Raw Dataset)

data city_raw;

length Old_Name $20 New_Name $20 Country $15 Region $15 

       Update_Date $12 Source $20 Status $10;

input ID Old_Name $ New_Name $ Country $ Region $ Update_Date $ 

      Population Source $ Status $;

datalines;

1 Bombay Mumbai India West 2020-01-01 20000000 govt active

2 Madras Chennai India South 2019-05-10 11000000 govt active

3 Calcutta Kolkata India East 2018-07-21 15000000 govt active

4 NULL Delhi India North 2021-02-01 18000000 private active

5 Peking Beijing China North 2017-06-15 21000000 govt active

6 Saigon HoChiMinh Vietnam South 2016-03-12 9000000 govt active

7 Leningrad StPetersburg Russia West 2015-09-30 5000000 govt active

8 Bombay Mumbai India West 2020-01-01 20000000 govt active

9 -999 Karachi Pakistan South 2018-11-11 14000000 govt inactive

10 Edo Tokyo Japan East wrongdate 13000000 govt active

;

run;

proc print data = city_raw;

run;

OUTPUT:

ObsOld_NameNew_NameCountryRegionUpdate_DateSourceStatusIDPopulation
1BombayMumbaiIndiaWest2020-01-01govtactive120000000
2MadrasChennaiIndiaSouth2019-05-10govtactive211000000
3CalcuttaKolkataIndiaEast2018-07-21govtactive315000000
4NULLDelhiIndiaNorth2021-02-01privateactive418000000
5PekingBeijingChinaNorth2017-06-15govtactive521000000
6SaigonHoChiMinhVietnamSouth2016-03-12govtactive69000000
7LeningradStPetersburgRussiaWest2015-09-30govtactive75000000
8BombayMumbaiIndiaWest2020-01-01govtactive820000000
9-999KarachiPakistanSouth2018-11-11govtinactive914000000
10EdoTokyoJapanEastwrongdategovtactive1013000000

Explanation (SAS Raw Data)

This dataset intentionally includes:

  • Missing values (NULL)
  • Invalid numeric (-999)
  • Wrong date format (wrongdate)
  • Duplicate records (Bombay → Mumbai)
  • Inconsistent naming

This mirrors real-world ingestion issues. Notice the use of length this is critical to prevent truncation. Without it, SAS may default to smaller lengths and silently corrupt strings.

R Code – Equivalent Raw Dataset

options(scipen=999)

city_raw <- data.frame(

  ID = 1:10,

  Old_Name = c("Bombay","Madras","Calcutta","NULL","Peking","Saigon",

               "Leningrad","Bombay","-999","Edo"),

  New_Name = c("Mumbai","Chennai","Kolkata","Delhi","Beijing",

               "HoChiMinh","StPetersburg","Mumbai","Karachi","Tokyo"),

  Country = c("India","India","India","India","China","Vietnam","Russia",

              "India","Pakistan","Japan"),

  Region = c("West","South","East","North","North","South","West","West",

             "South","East"),

  Update_Date = c("2020-01-01","2019-05-10","2018-07-21","2021-02-01",

                  "2017-06-15","2016-03-12","2015-09-30","2020-01-01",

                  "2018-11-11","wrongdate"),

  Population = c(20000000,11000000,15000000,18000000,21000000,9000000,

                 5000000,20000000,14000000,13000000),

  Source = c("govt","govt","govt","private","govt","govt","govt","govt",

             "govt","govt"),

  Status = c("active","active","active","active","active","active","active",

             "active","inactive","active")

)

OUTPUT:

 

ID

Old_Name

New_Name

Country

Region

Update_Date

Population

Source

Status

1

1

Bombay

Mumbai

India

West

01-01-2020

20000000

govt

active

2

2

Madras

Chennai

India

South

10-05-2019

11000000

govt

active

3

3

Calcutta

Kolkata

India

East

21-07-2018

15000000

govt

active

4

4

NULL

Delhi

India

North

01-02-2021

18000000

private

active

5

5

Peking

Beijing

China

North

15-06-2017

21000000

govt

active

6

6

Saigon

HoChiMinh

Vietnam

South

12-03-2016

9000000

govt

active

7

7

Leningrad

StPetersburg

Russia

West

30-09-2015

5000000

govt

active

8

8

Bombay

Mumbai

India

West

01-01-2020

20000000

govt

active

9

9

-999

Karachi

Pakistan

South

11-11-2018

14000000

govt

inactive

10

10

Edo

Tokyo

Japan

East

wrongdate

13000000

govt

active

Explanation (R Raw Data)

R’s data.frame() mirrors SAS input but allows more flexibility. However:

  • Strings like "NULL" are not automatically treated as missing
  • "wrongdate" remains a string
  • No strict typing enforcement like SAS

This flexibility is powerful but dangerous if not controlled.

Section 1: Phase 1 – The SAS Engineering Layer

Data Cleaning in SAS

data city_clean;

set city_raw;

/* Fix missing names */

if strip(upcase(Old_Name)) in ("NULL","-999") then Old_Name = "";

Old_Name = coalescec(Old_Name, "UNKNOWN");

/* Handle invalid numeric */

if Population < 0 then Population = abs(Population);

/* Date conversion */

New_Date =input(Update_Date,??  yymmdd10.);

format New_Date yymmdd10.;

/* Standardize text */

Old_Name = upcase(strip(Old_Name));

New_Name = propcase(strip(New_Name));

/* Category assignment */

select(Status);

  when("active") Flag = 1;

  when("inactive") Flag = 0;

  otherwise Flag = .;

end;

run;

proc print data = city_clean;

run;

OUTPUT:

ObsOld_NameNew_NameCountryRegionUpdate_DateSourceStatusIDPopulationNew_DateFlag
1BOMBAYMumbaiIndiaWest2020-01-01govtactive1200000002020-01-011
2MADRASChennaiIndiaSouth2019-05-10govtactive2110000002019-05-101
3CALCUTTAKolkataIndiaEast2018-07-21govtactive3150000002018-07-211
4UNKNOWNDelhiIndiaNorth2021-02-01privateactive4180000002021-02-011
5PEKINGBeijingChinaNorth2017-06-15govtactive5210000002017-06-151
6SAIGONHochiminhVietnamSouth2016-03-12govtactive690000002016-03-121
7LENINGRADStpetersburgRussiaWest2015-09-30govtactive750000002015-09-301
8BOMBAYMumbaiIndiaWest2020-01-01govtactive8200000002020-01-011
9UNKNOWNKarachiPakistanSouth2018-11-11govtinactive9140000002018-11-110
10EDOTokyoJapanEastwrongdategovtactive1013000000.1

Explanation

  • COALESCEC fills missing character values
  • ABS() fixes invalid numeric values
  • INPUT() converts string → date
  • SELECT-WHEN is cleaner than IF-ELSE for categorical logic
  • STRIP + UPCASE ensures standardization
  • Convert "NULL" → blank → then COALESCEC() works.

      This is advanced SAS defensive programming

  • ? → suppress warning
  • ?? → suppress warning + NOTE

👉 If "wrongdate" appears:

  • No log clutter
  • Value safely becomes .

/* Remove duplicates */

proc sort data=city_clean nodupkey;

by Old_Name New_Name Country;

run;

proc print data = city_clean;

run;

OUTPUT:

ObsOld_NameNew_NameCountryRegionUpdate_DateSourceStatusIDPopulationNew_DateFlag
1BOMBAYMumbaiIndiaWest2020-01-01govtactive1200000002020-01-011
2CALCUTTAKolkataIndiaEast2018-07-21govtactive3150000002018-07-211
3EDOTokyoJapanEastwrongdategovtactive1013000000.1
4LENINGRADStpetersburgRussiaWest2015-09-30govtactive750000002015-09-301
5MADRASChennaiIndiaSouth2019-05-10govtactive2110000002019-05-101
6PEKINGBeijingChinaNorth2017-06-15govtactive5210000002017-06-151
7SAIGONHochiminhVietnamSouth2016-03-12govtactive690000002016-03-121
8UNKNOWNDelhiIndiaNorth2021-02-01privateactive4180000002021-02-011
9UNKNOWNKarachiPakistanSouth2018-11-11govtinactive9140000002018-11-110

Explanation

NODUPKEY removes duplicate combinations. This is critical in preventing double counting.

MERGE vs JOIN (SAS)

data region_map;

input Country $ Region_Code $;

datalines;

India IN

China CN

Japan JP

Pakistan PAK

Russia Rus

Vietnam Viet

;

run;

proc print data = region_map;

run;

OUTPUT:

ObsCountryRegion_Code
1IndiaIN
2ChinaCN
3JapanJP
4PakistanPAK
5RussiaRus
6VietnamViet

/*Sorting Before Merging*/

proc sort data=region_map;

by Country;

run;

proc print data = region_map;

run;

OUTPUT:

ObsCountryRegion_Code
1ChinaCN
2IndiaIN
3JapanJP
4PakistanPAK
5RussiaRus
6VietnamViet

proc sort data=city_clean;

by Country;

run;

proc print data = city_clean;

run;

OUTPUT:

ObsOld_NameNew_NameCountryRegionUpdate_DateSourceStatusIDPopulationNew_DateFlag
1PEKINGBeijingChinaNorth2017-06-15govtactive5210000002017-06-151
2BOMBAYMumbaiIndiaWest2020-01-01govtactive1200000002020-01-011
3CALCUTTAKolkataIndiaEast2018-07-21govtactive3150000002018-07-211
4MADRASChennaiIndiaSouth2019-05-10govtactive2110000002019-05-101
5UNKNOWNDelhiIndiaNorth2021-02-01privateactive4180000002021-02-011
6EDOTokyoJapanEastwrongdategovtactive1013000000.1
7UNKNOWNKarachiPakistanSouth2018-11-11govtinactive9140000002018-11-110
8LENINGRADStpetersburgRussiaWest2015-09-30govtactive750000002015-09-301
9SAIGONHochiminhVietnamSouth2016-03-12govtactive690000002016-03-121

/* MERGE */

data merged_data;

merge city_clean(in=a) region_map(in=b);

by Country;

if a;

run;

proc print data = merged_data;

run;

OUTPUT:

ObsOld_NameNew_NameCountryRegionUpdate_DateSourceStatusIDPopulationNew_DateFlagRegion_Code
1PEKINGBeijingChinaNorth2017-06-15govtactive5210000002017-06-151CN
2BOMBAYMumbaiIndiaWest2020-01-01govtactive1200000002020-01-011IN
3CALCUTTAKolkataIndiaEast2018-07-21govtactive3150000002018-07-211IN
4MADRASChennaiIndiaSouth2019-05-10govtactive2110000002019-05-101IN
5UNKNOWNDelhiIndiaNorth2021-02-01privateactive4180000002021-02-011IN
6EDOTokyoJapanEastwrongdategovtactive1013000000.1JP
7UNKNOWNKarachiPakistanSouth2018-11-11govtinactive9140000002018-11-110PAK
8LENINGRADStpetersburgRussiaWest2015-09-30govtactive750000002015-09-301Rus
9SAIGONHochiminhVietnamSouth2016-03-12govtactive690000002016-03-121Viet

/* PROC SQL JOIN */

proc sql;

create table joined_data as

select a.*, b.Region_Code

from city_clean a

left join region_map b

on a.Country = b.Country

order by ID;

quit;

proc print data = joined_data;

run;

OUTPUT:

ObsOld_NameNew_NameCountryRegionUpdate_DateSourceStatusIDPopulationNew_DateFlagRegion_Code
1BOMBAYMumbaiIndiaWest2020-01-01govtactive1200000002020-01-011IN
2MADRASChennaiIndiaSouth2019-05-10govtactive2110000002019-05-101IN
3CALCUTTAKolkataIndiaEast2018-07-21govtactive3150000002018-07-211IN
4UNKNOWNDelhiIndiaNorth2021-02-01privateactive4180000002021-02-011IN
5PEKINGBeijingChinaNorth2017-06-15govtactive5210000002017-06-151CN
6SAIGONHochiminhVietnamSouth2016-03-12govtactive690000002016-03-121Viet
7LENINGRADStpetersburgRussiaWest2015-09-30govtactive750000002015-09-301Rus
8UNKNOWNKarachiPakistanSouth2018-11-11govtinactive9140000002018-11-110PAK
9EDOTokyoJapanEastwrongdategovtactive1013000000.1JP

Explanation

  • MERGE requires sorted data and works row-wise
  • JOIN is flexible and does not require sorting
  • SQL JOIN is preferred for complex relationships

Section 2: Phase 2 – R Refinement Layer

library(dplyr)

city_clean <- city_raw %>%

  mutate(

    Old_Name = ifelse(Old_Name %in% c("NULL","-999"), "UNKNOWN", Old_Name),

    Population = ifelse(Population < 0, abs(Population), Population),

    Update_Date = as.Date(Update_Date, format="%Y-%m-%d"),

    Old_Name = toupper(trimws(Old_Name)),

    New_Name = tools::toTitleCase(trimws(New_Name)),

    Flag = case_when(

      Status == "active" ~ 1,

      Status == "inactive" ~ 0,

      TRUE ~ NA_real_

    )

  ) %>%

  distinct(Old_Name, New_Name, .keep_all = TRUE)

 OUTPUT:

 

ID

Old_Name

New_Name

Country

Region

Update_Date

Population

Source

Status

Flag

1

1

BOMBAY

Mumbai

India

West

01-01-2020

20000000

govt

active

1

2

2

MADRAS

Chennai

India

South

10-05-2019

11000000

govt

active

1

3

3

CALCUTTA

Kolkata

India

East

21-07-2018

15000000

govt

active

1

4

4

UNKNOWN

Delhi

India

North

01-02-2021

18000000

private

active

1

5

5

PEKING

Beijing

China

North

15-06-2017

21000000

govt

active

1

6

6

SAIGON

HoChiMinh

Vietnam

South

12-03-2016

9000000

govt

active

1

7

7

LENINGRAD

StPetersburg

Russia

West

30-09-2015

5000000

govt

active

1

8

9

UNKNOWN

Karachi

Pakistan

South

11-11-2018

14000000

govt

inactive

0

9

10

EDO

Tokyo

Japan

East

NA

13000000

govt

active

1

Logic Bridge (SAS vs R)

  • mutate() = DATA step transformation
  • case_when() = SELECT-WHEN
  • distinct() = PROC SORT NODUPKEY
  • as.Date() = INPUT()
  • trimws() before as.Date()
  • Removes hidden spaces → avoids conversion failure
  • %>% requires dplyr library.checks multiple values correctly
  • %in% is safer than chained OR conditions

Section 3: Business Logic & The “Why”

In a financial system, if Mumbai and Bombay are treated separately:

  • Revenue aggregation splits
  • Regional performance becomes inaccurate
  • Decision-makers misinterpret market size

In clinical trials:

  • Duplicate site names distort patient counts
  • Regulatory submissions fail validation

A single missing value treated incorrectly can lead to:

  • Wrong dosage recommendations
  • Financial misreporting

Section 4: 20 Key Implementation Practices

  1. Always define LENGTH in SAS
  2. Standardize before merging
  3. Never trust raw input
  4. Deduplicate early
  5. Use audit logs
  6. Validate joins
  7. Avoid implicit conversions
  8. Check missing explicitly
  9. Use consistent casing
  10. Separate raw and clean layers
  11. Validate dates
  12. Flag anomalies
  13. Avoid hardcoding
  14. Use reusable macros
  15. Test edge cases
  16. Use metadata-driven design
  17. Validate counts before/after joins
  18. Keep backup datasets
  19. Document transformations
  20. Automate QC checks

3. Phase 3: Extended SAS Analysis

data file_import;

infile datalines dlm=',' firstobs=2;

input Old_Name $ New_Name $ Population;

datalines;

Bombay,Mumbai,20000000

Madras,Chennai,11000000

Calcutta,Kolkata,15000000

Peking,Beijing,21000000

Saigon,HoChiMinh,9000000

Leningrad,StPetersburg,5000000

Bombay,Mumbai,20000000

Edo,Tokyo,13000000

;

run;

proc print data = file_import;

run;

OUTPUT:

ObsOld_NameNew_NamePopulation
1MadrasChennai11000000
2CalcuttaKolkata15000000
3PekingBeijing21000000
4SaigonHoChiMin9000000
5LeningraStPeters5000000
6BombayMumbai20000000
7EdoTokyo13000000

/* Flag high population */

data flagged;

set file_import;

if Population > 15000000 then High_Flag = 1;

else High_Flag = 0;

run;

proc print data = flagged;

run;

OUTPUT:

ObsOld_NameNew_NamePopulationHigh_Flag
1MadrasChennai110000000
2CalcuttaKolkata150000000
3PekingBeijing210000001
4SaigonHoChiMin90000000
5LeningraStPeters50000000
6BombayMumbai200000001
7EdoTokyo130000000

/* Aggregation */

proc means data=flagged;

var Population;

class High_Flag;

run;

OUTPUT:

The MEANS Procedure

Analysis Variable : Population
High_FlagN ObsNMeanStd DevMinimumMaximum
05510600000.003847076.815000000.0015000000.00
12220500000.00707106.7820000000.0021000000.00

/* Deduplication */

proc sort data=flagged nodupkey;

by Old_Name;

run;

proc print data=flagged;

run;

OUTPUT:

ObsOld_NameNew_NamePopulationHigh_Flag
1BombayMumbai200000001
2CalcuttaKolkata150000000
3EdoTokyo130000000
4LeningraStPeters50000000
5MadrasChennai110000000
6PekingBeijing210000001
7SaigonHoChiMin90000000

Explanation

  • Import
  • Transform
  • Aggregate
  • Deduplicate
  • Report

4. 20 Additional Data Cleaning Best Practices

  1. Follow CDISC standards
  2. Maintain SDTM compliance
  3. Validate ADaM derivations
  4. Track lineage
  5. Ensure reproducibility
  6. Maintain audit trails
  7. Validate controlled terminology
  8. Perform double programming
  9. Use version control
  10. Validate joins
  11. Cross-check outputs
  12. Use QC datasets
  13. Document assumptions
  14. Handle partial dates
  15. Normalize units
  16. Validate ranges
  17. Track missing patterns
  18. Ensure traceability
  19. Maintain regulatory logs
  20. Perform independent review

5. Business Logic Behind Data Cleaning

Data cleaning is not cosmetic it directly impacts decision-making. Missing values are replaced to ensure continuity in analysis. For example, if a patient’s age is missing, imputing it based on median prevents bias in demographic summaries. Similarly, unrealistic values such as negative population or age must be corrected using functions like ABS() because they distort statistical distributions.

Date imputation is critical in longitudinal studies. If a visit date is missing, incorrect sequencing may lead to wrong conclusions about treatment efficacy. In financial datasets, salary normalization ensures comparability across regions.

Incorrect handling of missing values (e.g., treating NULL as zero) can significantly bias results. For instance, in clinical trials, a missing lab value treated as zero might falsely indicate abnormality.

Thus, data cleaning ensures:

  • Accuracy
  • Consistency
  • Regulatory compliance
  • Reliable analytics

6. 20 Sharp Insights

  • Dirty data leads to wrong conclusions
  • Standardization ensures reproducibility
  • Missing ≠ zero
  • Deduplication prevents inflation
  • Dates drive timelines
  • Validation prevents disasters
  • Merge carefully
  • Always audit transformations
  • Clean before analysis
  • Consistency is key
  • Logic errors are silent killers
  • Documentation is power
  • Automation saves time
  • QC is non-negotiable
  • Data lineage matters
  • Format before logic
  • Always test edge cases
  • Clean data builds trust
  • Reproducibility is science
  • Accuracy beats speed

7. Summary

SAS and R both provide powerful frameworks for data cleaning, but they serve different operational philosophies. SAS excels in structured, enterprise-level processing, making it ideal for regulated industries like pharmaceuticals. Its DATA step offers deterministic execution, while PROC SQL enables flexible joins and aggregations.

R, on the other hand, thrives in exploratory and flexible environments. Packages like dplyr simplify transformations, while stringr enhances text cleaning capabilities. The “tidyverse” ecosystem allows analysts to write expressive, readable pipelines.

In this project, we demonstrated:

  • How messy global city name data can distort analytics
  • How SAS handles structured cleaning using DATA steps, SELECT-WHEN, and PROC SORT
  • How R achieves similar outcomes using mutate(), case_when(), and distinct()
  • The difference between MERGE (row-wise, sorted) and JOIN (relational, flexible)

The key takeaway:
SAS ensures reliability and compliance, while R provides agility and speed.

A hybrid approach is often the best strategy in real-world projects.

8. Conclusion

Data is only as valuable as its quality. Whether you are analyzing global city transformations, financial transactions, or clinical trial outcomes, the underlying principle remains the same: clean, structured, and validated data is the foundation of trustworthy analytics.

Through this project, we explored how seemingly simple inconsistencies like “Bombay” vs “Mumbai” can cascade into major analytical errors. These issues are not hypothetical; they occur daily in real-world systems. Without proper cleaning frameworks, organizations risk making flawed decisions that can impact revenue, compliance, and even human lives.

SAS provides a robust, auditable environment where every transformation is controlled and reproducible. This is why it remains the gold standard in clinical and regulatory domains. R complements this by offering flexibility and speed, allowing analysts to iterate quickly and explore data deeply.

The combination of:

  • Structured SAS pipelines
  • Agile R workflows
  • Strong business logic
  • Rigorous validation

creates a powerful data engineering ecosystem.

Ultimately, data cleaning is not a preliminary step it is a core analytical function. Investing time in cleaning ensures accuracy, builds trust, and enables scalable analytics.

9. Interview Questions

1. Difference between MERGE and JOIN in SAS?

Answer: MERGE works row-wise and requires sorting; JOIN is relational and flexible.

2. How do you handle missing values in SAS?

Answer: Use COALESCEC for character and IF conditions for numeric.

3. R equivalent of SELECT-WHEN?

Answer: case_when()

4. What happens if LENGTH is not defined?

Answer: SAS may truncate variables, causing data loss.

5. Real-world debugging scenario?

Answer: Duplicate city names causing double counting resolved using PROC SORT NODUPKEY and standardization.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. Here we learn about CITY NAME DATA.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics and smart cities

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:



--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy

Comments

Popular posts from this blog

Beyond Fabric and Fashion: Turning the World’s Most Beautiful Sarees Dataset into Structured Intelligence with SAS and R

Data Cleaning Secrets Using Famous Food Dataset:Handling Duplicate Records in SAS

453.Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS