A Global Eye Bank Dataset into Trusted Intelligence Using SAS and R

Vision Without Validation Becomes Blind Analytics: Transforming a Global Eye Bank Dataset into Trusted Intelligence Using SAS and R

Introduction:From Corneal Donation Delays to Analytical Failure

Imagine a multinational ophthalmology research consortium preparing a global report on corneal transplant accessibility. The study combines operational records from eye banks across continents to evaluate donor availability, transplant waiting periods, and regional healthcare inequality.

Three weeks before executive presentation, statisticians discover a disaster.

Several eye banks appear twice under slightly different names. Some donor counts are negative because of data entry errors. Missing collection dates cause survival analyses to fail. Regional labels appear as APAC, Apac, Asia-Pacific, and ASIA PACIFIC simultaneously. A few email addresses contain spaces while others are malformed entirely. Some eye banks report impossible utilization rates exceeding 100%.

The dashboard shows misleading trends.

Machine learning models generate incorrect forecasts.

Funding decisions become unreliable.

Regulatory submissions become questionable.

This is not a software problem.

This is a data quality problem.

Experienced Clinical SAS Programmers understand that poor-quality source data can silently destroy statistical conclusions long before model building begins.

In modern healthcare analytics, data cleaning is not preprocessing.

It is risk management.

Global Eye Banks Dataset Overview

Our analytical dataset contains information collected from major eye banks worldwide.

Dataset Variables (9 Variables)

Variable

Description

EYE_BANK_ID

Unique eye bank identifier

EYE_BANK_NAME

Eye bank name

COUNTRY

Country location

REGION_CODE

Geographic region

DONORS_RECEIVED

Annual corneal donors

TRANSPLANTS_DONE

Annual transplants

AVG_WAIT_DAYS

Average waiting days

CONTACT_EMAIL

Operational email

LAST_AUDIT_DATE

Last regulatory audit date

Intentional corruption includes:

  • Duplicate IDs
  • Negative donor values
  • Missing audit dates
  • Invalid email addresses
  • Incorrect region codes
  • Mixed text case
  • Leading spaces
  • Impossible transplant counts
  • NULL strings
  • Invalid date values

1.SAS Raw Dataset Creation with Intentional Errors

data eyebank_raw;

infile datalines dsd dlm=',' truncover;

length eye_bank_id $8 eye_bank_name $45 country $25 region_code $20

       contact_email $60;

informat last_audit_date anydtdte20.;

format last_audit_date date9.;

input eye_bank_id :$8. eye_bank_name :$45. country :$25. region_code :$20.

      donors_received transplants_done avg_wait_days contact_email :$60.

      last_audit_date :anydtdte20.;

datalines;

EB001,LionsEyeBank,India,APAC,450,420,32,support@lions.org,12JAN2025

EB002,VisionTrust,USA,NA,620,590,28,info@visiontrust.org,15FEB2025

EB003,GlobalEye,UK,EUROPE,-45,310,20,admin@globaleye.org,11MAR2025

EB003,GlobalEye,UK,Europe,500,310,20,admin@globaleye.org,11MAR2025

EB004,CorneaCare,India,APAC,520,700,18,NULL,.

EB005,BrightSight,Canada,na,410,390,25,help@brightsight,14APR2025

EB006,FutureVision,Australia,APAC,300,280,22,futurevision.org,30FEB2025

EB007,EyeLife,Germany,EU,290,275,-5,contact@eyelife.org,19MAY2025

EB008,WorldCornea,India,AsiaPacific,610,605,31,world@cornea.org,.

EB009,ClearView,Japan,APAC,510,490,27,clear@view.org,10JUN2025

EB010,VisionTrust,USA,NA,620,590,28,info@visiontrust.org,15FEB2025

EB011,Eye Hope,Brazil,LATAM,400,410,30,hope @eye.org,01JUL2025

EB012,RetinaGift,India,APAC,.,340,40,retina@gift.org,20AUG2025

EB013,NewLight,UAE,MEA,370,355,29,newlight@mail.com,15SEP2025

EB014,OcularBank,France,eu,280,270,23,ocular@bank.org,15OCT2025

EB015,DonorVision,India,APAC,530,520,26,donor@vision.org,22NOV2025

EB016,LifeSight,Kenya,AFRICA,250,240,34,sight@life.org,11DEC2025

EB017,GiftVision,India,APAC,0,230,36,gift@vision.org,.

EB018,HopeCornea,USA,NorthAmerica,550,530,21,hope@cornea.org,15JAN2025

EB019,EyeFuture,India,APAC,460,455,24,eyefuture.com,14FEB2025

EB020,GlobalCornea,Singapore,APAC,390,370,19,global@cornea.org,15MAR2025

EB021,BrightVision,India,APAC,440,435,33,bright@vision.org,01APR2025

EB022,ClearGift,UK,EUROPE,480,470,27,cleargift@eye.org,18MAY2025

;

run;

proc print data=eyebank_raw;

run;

OUTPUT:

Obseye_bank_ideye_bank_namecountryregion_codecontact_emaillast_audit_datedonors_receivedtransplants_doneavg_wait_days
1EB001LionsEyeBankIndiaAPACsupport@lions.org12JAN202545042032
2EB002VisionTrustUSANAinfo@visiontrust.org15FEB202562059028
3EB003GlobalEyeUKEUROPEadmin@globaleye.org11MAR2025-4531020
4EB003GlobalEyeUKEuropeadmin@globaleye.org11MAR202550031020
5EB004CorneaCareIndiaAPACNULL.52070018
6EB005BrightSightCanadanahelp@brightsight14APR202541039025
7EB006FutureVisionAustraliaAPACfuturevision.org.30028022
8EB007EyeLifeGermanyEUcontact@eyelife.org19MAY2025290275-5
9EB008WorldCorneaIndiaAsiaPacificworld@cornea.org.61060531
10EB009ClearViewJapanAPACclear@view.org10JUN202551049027
11EB010VisionTrustUSANAinfo@visiontrust.org15FEB202562059028
12EB011Eye HopeBrazilLATAMhope @eye.org01JUL202540041030
13EB012RetinaGiftIndiaAPACretina@gift.org20AUG2025.34040
14EB013NewLightUAEMEAnewlight@mail.com15SEP202537035529
15EB014OcularBankFranceeuocular@bank.org15OCT202528027023
16EB015DonorVisionIndiaAPACdonor@vision.org22NOV202553052026
17EB016LifeSightKenyaAFRICAsight@life.org11DEC202525024034
18EB017GiftVisionIndiaAPACgift@vision.org.023036
19EB018HopeCorneaUSANorthAmericahope@cornea.org15JAN202555053021
20EB019EyeFutureIndiaAPACeyefuture.com14FEB202546045524
21EB020GlobalCorneaSingaporeAPACglobal@cornea.org15MAR202539037019
22EB021BrightVisionIndiaAPACbright@vision.org01APR202544043533
23EB022ClearGiftUKEUROPEcleargift@eye.org18MAY202548047027

Why LENGTH Must Come First

A common production mistake is defining LENGTH after assignment statements.

eye_bank_name="International Eye Donation Registry";

length eye_bank_name $20;

The variable becomes permanently truncated because SAS allocates memory during compilation.

This creates silent corruption.

R behaves differently because character vectors dynamically allocate memory rather than using fixed-length storage definitions.

Clinical production programmers always define LENGTH before assignment logic.

2.Cleaning Workflow Using DATA Step Programming

data eyebank_clean;

set eyebank_raw;

eye_bank_name=propcase(strip(eye_bank_name));

country=propcase(strip(country));

region_code=upcase(strip(region_code));

contact_email=compress(lowcase(contact_email),' ');

if donors_received <0 then donors_received=abs(donors_received);

if avg_wait_days<0 then avg_wait_days=.;

if missing(last_audit_date) then

last_audit_date=intnx('month',today(),-3,'same');

if index(contact_email,'@')=0 then

contact_email='unknown@placeholder.org';

if transplants_done>donors_received then

transplants_done=donors_received;

if region_code in ('EU','EUROPE') then region_code='EMEA';

select(region_code);

when('NA','NORTHAMERICA') region_code='NA';

when('APAC','ASIAPACIFIC') region_code='APAC';

otherwise;

end;

run;

proc print data=eyebank_clean;;

run;

OUTPUT:

Obseye_bank_ideye_bank_namecountryregion_codecontact_emaillast_audit_datedonors_receivedtransplants_doneavg_wait_days
1EB001LionseyebankIndiaAPACsupport@lions.org12JAN202545042032
2EB002VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028
3EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR2025454520
4EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR202550031020
5EB004CorneacareIndiaAPACunknown@placeholder.org04APR202652052018
6EB005BrightsightCanadaNAhelp@brightsight14APR202541039025
7EB006FuturevisionAustraliaAPACunknown@placeholder.org04APR202630028022
8EB007EyelifeGermanyEMEAcontact@eyelife.org19MAY2025290275.
9EB008WorldcorneaIndiaAPACworld@cornea.org04APR202661060531
10EB009ClearviewJapanAPACclear@view.org10JUN202551049027
11EB010VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028
12EB011Eye HopeBrazilLATAMhope@eye.org01JUL202540040030
13EB012RetinagiftIndiaAPACretina@gift.org20AUG2025..40
14EB013NewlightUaeMEAnewlight@mail.com15SEP202537035529
15EB014OcularbankFranceEMEAocular@bank.org15OCT202528027023
16EB015DonorvisionIndiaAPACdonor@vision.org22NOV202553052026
17EB016LifesightKenyaAFRICAsight@life.org11DEC202525024034
18EB017GiftvisionIndiaAPACgift@vision.org04APR20260036
19EB018HopecorneaUsaNAhope@cornea.org15JAN202555053021
20EB019EyefutureIndiaAPACunknown@placeholder.org14FEB202546045524
21EB020GlobalcorneaSingaporeAPACglobal@cornea.org15MAR202539037019
22EB021BrightvisionIndiaAPACbright@vision.org01APR202544043533
23EB022CleargiftUkEMEAcleargift@eye.org18MAY202548047027

Explanation

This workflow combines multiple defensive programming approaches used in production clinical environments. PROPCASE standardizes presentation layers while STRIP removes hidden whitespace corruption. ABS corrects impossible negative operational values without destroying historical traceability. INTNX generates replacement dates using reproducible business rules. INDEX validates email structure quickly without expensive regular expressions. SELECT-WHEN logic improves readability compared with nested IF blocks and simplifies maintenance during production deployment.

3.Deduplication Using PROC SORT

proc sort data=eyebank_clean out=audit_reference nodupkey;

by eye_bank_id;

run;

proc print data=audit_reference;

run;

OUTPUT:

NOTE: There were 23 observations read from the data set WORK.EYEBANK_CLEAN.
NOTE: 1 observations with duplicate key values were deleted.
NOTE: The data set WORK.AUDIT_REFERENCE has 22 observations and 9 variables.

OUTPUT:

Obseye_bank_ideye_bank_namecountryregion_codecontact_emaillast_audit_datedonors_receivedtransplants_doneavg_wait_days
1EB001LionseyebankIndiaAPACsupport@lions.org12JAN202545042032
2EB002VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028
3EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR2025454520
4EB004CorneacareIndiaAPACunknown@placeholder.org04APR202652052018
5EB005BrightsightCanadaNAhelp@brightsight14APR202541039025
6EB006FuturevisionAustraliaAPACunknown@placeholder.org04APR202630028022
7EB007EyelifeGermanyEMEAcontact@eyelife.org19MAY2025290275.
8EB008WorldcorneaIndiaAPACworld@cornea.org04APR202661060531
9EB009ClearviewJapanAPACclear@view.org10JUN202551049027
10EB010VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028
11EB011Eye HopeBrazilLATAMhope@eye.org01JUL202540040030
12EB012RetinagiftIndiaAPACretina@gift.org20AUG2025..40
13EB013NewlightUaeMEAnewlight@mail.com15SEP202537035529
14EB014OcularbankFranceEMEAocular@bank.org15OCT202528027023
15EB015DonorvisionIndiaAPACdonor@vision.org22NOV202553052026
16EB016LifesightKenyaAFRICAsight@life.org11DEC202525024034
17EB017GiftvisionIndiaAPACgift@vision.org04APR20260036
18EB018HopecorneaUsaNAhope@cornea.org15JAN202555053021
19EB019EyefutureIndiaAPACunknown@placeholder.org14FEB202546045524
20EB020GlobalcorneaSingaporeAPACglobal@cornea.org15MAR202539037019
21EB021BrightvisionIndiaAPACbright@vision.org01APR202544043533
22EB022CleargiftUkEMEAcleargift@eye.org18MAY202548047027

Explanation

PROC SORT with NODUPKEY removes duplicate primary identifiers while preserving first occurrence records. This method is extremely common in SDTM and ADaM preparation pipelines. Analysts must document whether first occurrence retention aligns with business rules because inappropriate deduplication can create regulatory traceability concerns.

4.DATA Step Versus PROC SQL

proc sort data=eyebank_clean;

by eye_bank_id;

run;

proc print data=eyebank_clean;

run;

OUTPUT:

Obseye_bank_ideye_bank_namecountryregion_codecontact_emaillast_audit_datedonors_receivedtransplants_doneavg_wait_days
1EB001LionseyebankIndiaAPACsupport@lions.org12JAN202545042032
2EB002VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028
3EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR2025454520
4EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR202550031020
5EB004CorneacareIndiaAPACunknown@placeholder.org04APR202652052018
6EB005BrightsightCanadaNAhelp@brightsight14APR202541039025
7EB006FuturevisionAustraliaAPACunknown@placeholder.org04APR202630028022
8EB007EyelifeGermanyEMEAcontact@eyelife.org19MAY2025290275.
9EB008WorldcorneaIndiaAPACworld@cornea.org04APR202661060531
10EB009ClearviewJapanAPACclear@view.org10JUN202551049027
11EB010VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028
12EB011Eye HopeBrazilLATAMhope@eye.org01JUL202540040030
13EB012RetinagiftIndiaAPACretina@gift.org20AUG2025..40
14EB013NewlightUaeMEAnewlight@mail.com15SEP202537035529
15EB014OcularbankFranceEMEAocular@bank.org15OCT202528027023
16EB015DonorvisionIndiaAPACdonor@vision.org22NOV202553052026
17EB016LifesightKenyaAFRICAsight@life.org11DEC202525024034
18EB017GiftvisionIndiaAPACgift@vision.org04APR20260036
19EB018HopecorneaUsaNAhope@cornea.org15JAN202555053021
20EB019EyefutureIndiaAPACunknown@placeholder.org14FEB202546045524
21EB020GlobalcorneaSingaporeAPACglobal@cornea.org15MAR202539037019
22EB021BrightvisionIndiaAPACbright@vision.org01APR202544043533
23EB022CleargiftUkEMEAcleargift@eye.org18MAY202548047027

4.1 DATA Step Merge

data audit_reference;

length eye_bank_id $8

       audit_status $20;

input eye_bank_id $ audit_status $;

datalines;

EB001 PASS

EB002 PASS

EB003 REVIEW

EB004 FAIL

EB005 PASS

EB006 REVIEW

EB007 PASS

EB008 REVIEW

EB009 PASS

EB010 PASS

EB011 FAIL

EB012 REVIEW

EB013 PASS

EB014 PASS

EB015 PASS

EB016 REVIEW

EB017 FAIL

EB018 PASS

EB019 REVIEW

EB020 PASS

EB021 PASS

EB022 PASS

;

run;

proc print data=audit_reference;

run;

OUTPUT:

Obseye_bank_idaudit_status
1EB001PASS
2EB002PASS
3EB003REVIEW
4EB004FAIL
5EB005PASS
6EB006REVIEW
7EB007PASS
8EB008REVIEW
9EB009PASS
10EB010PASS
11EB011FAIL
12EB012REVIEW
13EB013PASS
14EB014PASS
15EB015PASS
16EB016REVIEW
17EB017FAIL
18EB018PASS
19EB019REVIEW
20EB020PASS
21EB021PASS
22EB022PASS

data final_ds;

merge eyebank_clean(in=a)

      audit_reference(in=b);

by eye_bank_id;

if a;

run;

proc print data=final_ds;

run;

OUTPUT:

Obseye_bank_ideye_bank_namecountryregion_codecontact_emaillast_audit_datedonors_receivedtransplants_doneavg_wait_daysaudit_status
1EB001LionseyebankIndiaAPACsupport@lions.org12JAN202545042032PASS
2EB002VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028PASS
3EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR2025454520REVIEW
4EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR202550031020REVIEW
5EB004CorneacareIndiaAPACunknown@placeholder.org04APR202652052018FAIL
6EB005BrightsightCanadaNAhelp@brightsight14APR202541039025PASS
7EB006FuturevisionAustraliaAPACunknown@placeholder.org04APR202630028022REVIEW
8EB007EyelifeGermanyEMEAcontact@eyelife.org19MAY2025290275.PASS
9EB008WorldcorneaIndiaAPACworld@cornea.org04APR202661060531REVIEW
10EB009ClearviewJapanAPACclear@view.org10JUN202551049027PASS
11EB010VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028PASS
12EB011Eye HopeBrazilLATAMhope@eye.org01JUL202540040030FAIL
13EB012RetinagiftIndiaAPACretina@gift.org20AUG2025..40REVIEW
14EB013NewlightUaeMEAnewlight@mail.com15SEP202537035529PASS
15EB014OcularbankFranceEMEAocular@bank.org15OCT202528027023PASS
16EB015DonorvisionIndiaAPACdonor@vision.org22NOV202553052026PASS
17EB016LifesightKenyaAFRICAsight@life.org11DEC202525024034REVIEW
18EB017GiftvisionIndiaAPACgift@vision.org04APR20260036FAIL
19EB018HopecorneaUsaNAhope@cornea.org15JAN202555053021PASS
20EB019EyefutureIndiaAPACunknown@placeholder.org14FEB202546045524REVIEW
21EB020GlobalcorneaSingaporeAPACglobal@cornea.org15MAR202539037019PASS
22EB021BrightvisionIndiaAPACbright@vision.org01APR202544043533PASS
23EB022CleargiftUkEMEAcleargift@eye.org18MAY202548047027PASS

4.2 PROC SQL Join

proc sql;

create table final_sql as

select a.*,

       b.audit_status

from eyebank_clean as a

left join audit_reference as b

on a.eye_bank_id=b.eye_bank_id;

quit;

proc print data=final_sql;

run;

OUTPUT:

Obseye_bank_ideye_bank_namecountryregion_codecontact_emaillast_audit_datedonors_receivedtransplants_doneavg_wait_daysaudit_status
1EB001LionseyebankIndiaAPACsupport@lions.org12JAN202545042032PASS
2EB002VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028PASS
3EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR2025454520REVIEW
4EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR202550031020REVIEW
5EB004CorneacareIndiaAPACunknown@placeholder.org04APR202652052018FAIL
6EB005BrightsightCanadaNAhelp@brightsight14APR202541039025PASS
7EB006FuturevisionAustraliaAPACunknown@placeholder.org04APR202630028022REVIEW
8EB007EyelifeGermanyEMEAcontact@eyelife.org19MAY2025290275.PASS
9EB008WorldcorneaIndiaAPACworld@cornea.org04APR202661060531REVIEW
10EB009ClearviewJapanAPACclear@view.org10JUN202551049027PASS
11EB010VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028PASS
12EB011Eye HopeBrazilLATAMhope@eye.org01JUL202540040030FAIL
13EB012RetinagiftIndiaAPACretina@gift.org20AUG2025..40REVIEW
14EB013NewlightUaeMEAnewlight@mail.com15SEP202537035529PASS
15EB014OcularbankFranceEMEAocular@bank.org15OCT202528027023PASS
16EB015DonorvisionIndiaAPACdonor@vision.org22NOV202553052026PASS
17EB016LifesightKenyaAFRICAsight@life.org11DEC202525024034REVIEW
18EB017GiftvisionIndiaAPACgift@vision.org04APR20260036FAIL
19EB018HopecorneaUsaNAhope@cornea.org15JAN202555053021PASS
20EB019EyefutureIndiaAPACunknown@placeholder.org14FEB202546045524REVIEW
21EB020GlobalcorneaSingaporeAPACglobal@cornea.org15MAR202539037019PASS
22EB021BrightvisionIndiaAPACbright@vision.org01APR202544043533PASS
23EB022CleargiftUkEMEAcleargift@eye.org18MAY202548047027PASS

Comparison

DATA step merges are faster for sorted datasets and provide FIRST./LAST. functionality. PROC SQL offers superior readability for complex joins and aggregations. Enterprise teams typically use both depending on workflow requirements.

5.FIRST. and LAST. Processing

proc sort data=final_ds;

by country;

run;

proc print data=final_ds;

run;

OUTPUT:

Obseye_bank_ideye_bank_namecountryregion_codecontact_emaillast_audit_datedonors_receivedtransplants_doneavg_wait_daysaudit_status
1EB006FuturevisionAustraliaAPACunknown@placeholder.org04APR202630028022REVIEW
2EB011Eye HopeBrazilLATAMhope@eye.org01JUL202540040030FAIL
3EB005BrightsightCanadaNAhelp@brightsight14APR202541039025PASS
4EB014OcularbankFranceEMEAocular@bank.org15OCT202528027023PASS
5EB007EyelifeGermanyEMEAcontact@eyelife.org19MAY2025290275.PASS
6EB001LionseyebankIndiaAPACsupport@lions.org12JAN202545042032PASS
7EB004CorneacareIndiaAPACunknown@placeholder.org04APR202652052018FAIL
8EB008WorldcorneaIndiaAPACworld@cornea.org04APR202661060531REVIEW
9EB012RetinagiftIndiaAPACretina@gift.org20AUG2025..40REVIEW
10EB015DonorvisionIndiaAPACdonor@vision.org22NOV202553052026PASS
11EB017GiftvisionIndiaAPACgift@vision.org04APR20260036FAIL
12EB019EyefutureIndiaAPACunknown@placeholder.org14FEB202546045524REVIEW
13EB021BrightvisionIndiaAPACbright@vision.org01APR202544043533PASS
14EB009ClearviewJapanAPACclear@view.org10JUN202551049027PASS
15EB016LifesightKenyaAFRICAsight@life.org11DEC202525024034REVIEW
16EB020GlobalcorneaSingaporeAPACglobal@cornea.org15MAR202539037019PASS
17EB013NewlightUaeMEAnewlight@mail.com15SEP202537035529PASS
18EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR2025454520REVIEW
19EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR202550031020REVIEW
20EB022CleargiftUkEMEAcleargift@eye.org18MAY202548047027PASS
21EB002VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028PASS
22EB010VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028PASS
23EB018HopecorneaUsaNAhope@cornea.org15JAN202555053021PASS

data country_summary;

set final_ds;

by country;

retain total_donors 0;

if first.country then total_donors=0;

total_donors+donors_received;

if last.country then output;

run;

proc print data=country_summary;

run;

OUTPUT:

Obseye_bank_ideye_bank_namecountryregion_codecontact_emaillast_audit_datedonors_receivedtransplants_doneavg_wait_daysaudit_statustotal_donors
1EB006FuturevisionAustraliaAPACunknown@placeholder.org04APR202630028022REVIEW300
2EB011Eye HopeBrazilLATAMhope@eye.org01JUL202540040030FAIL400
3EB005BrightsightCanadaNAhelp@brightsight14APR202541039025PASS410
4EB014OcularbankFranceEMEAocular@bank.org15OCT202528027023PASS280
5EB007EyelifeGermanyEMEAcontact@eyelife.org19MAY2025290275.PASS290
6EB021BrightvisionIndiaAPACbright@vision.org01APR202544043533PASS3010
7EB009ClearviewJapanAPACclear@view.org10JUN202551049027PASS510
8EB016LifesightKenyaAFRICAsight@life.org11DEC202525024034REVIEW250
9EB020GlobalcorneaSingaporeAPACglobal@cornea.org15MAR202539037019PASS390
10EB013NewlightUaeMEAnewlight@mail.com15SEP202537035529PASS370
11EB022CleargiftUkEMEAcleargift@eye.org18MAY202548047027PASS1025
12EB018HopecorneaUsaNAhope@cornea.org15JAN202555053021PASS1790

Explanation

FIRST. and LAST. variables are among the most powerful SAS capabilities unavailable in standard SQL implementations. These automatic indicators support cumulative calculations, patient visit sequencing, exposure duration calculations, and longitudinal clinical analysis.

6.ARRAY Processing Example

data num_arrays;

set final_ds;

array nums donors_received

           transplants_done

           avg_wait_days;

do i=1 to dim(nums);

if nums(i)<0 then nums(i)=abs(nums(i));

end;

run;

proc print data=num_arrays;

run;

OUTPUT:

Obseye_bank_ideye_bank_namecountryregion_codecontact_emaillast_audit_datedonors_receivedtransplants_doneavg_wait_daysaudit_statusi
1EB006FuturevisionAustraliaAPACunknown@placeholder.org04APR202630028022REVIEW4
2EB011Eye HopeBrazilLATAMhope@eye.org01JUL202540040030FAIL4
3EB005BrightsightCanadaNAhelp@brightsight14APR202541039025PASS4
4EB014OcularbankFranceEMEAocular@bank.org15OCT202528027023PASS4
5EB007EyelifeGermanyEMEAcontact@eyelife.org19MAY2025290275.PASS4
6EB001LionseyebankIndiaAPACsupport@lions.org12JAN202545042032PASS4
7EB004CorneacareIndiaAPACunknown@placeholder.org04APR202652052018FAIL4
8EB008WorldcorneaIndiaAPACworld@cornea.org04APR202661060531REVIEW4
9EB012RetinagiftIndiaAPACretina@gift.org20AUG2025..40REVIEW4
10EB015DonorvisionIndiaAPACdonor@vision.org22NOV202553052026PASS4
11EB017GiftvisionIndiaAPACgift@vision.org04APR20260036FAIL4
12EB019EyefutureIndiaAPACunknown@placeholder.org14FEB202546045524REVIEW4
13EB021BrightvisionIndiaAPACbright@vision.org01APR202544043533PASS4
14EB009ClearviewJapanAPACclear@view.org10JUN202551049027PASS4
15EB016LifesightKenyaAFRICAsight@life.org11DEC202525024034REVIEW4
16EB020GlobalcorneaSingaporeAPACglobal@cornea.org15MAR202539037019PASS4
17EB013NewlightUaeMEAnewlight@mail.com15SEP202537035529PASS4
18EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR2025454520REVIEW4
19EB003GlobaleyeUkEMEAadmin@globaleye.org11MAR202550031020REVIEW4
20EB022CleargiftUkEMEAcleargift@eye.org18MAY202548047027PASS4
21EB002VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028PASS4
22EB010VisiontrustUsaNAinfo@visiontrust.org15FEB202562059028PASS4
23EB018HopecorneaUsaNAhope@cornea.org15JAN202555053021PASS4

Explanation

ARRAY processing reduces repetitive code and improves maintainability. In production SDTM environments, arrays often validate hundreds of laboratory parameters simultaneously.

7.Professional Reporting

proc report data=eyebank_clean nowd;

column country donors_received

       transplants_done

       avg_wait_days;

define country/group;

define donors_received/sum;

define transplants_done/sum;

define avg_wait_days/mean;

run;

OUTPUT:

countrydonors_receivedtransplants_doneavg_wait_days
Australia30028022
Brazil40040030
Canada41039025
France28027023
Germany290275.
India3010295530
Japan51049027
Kenya25024034
Singapore39037019
Uae37035529
Uk102582522.333333
Usa1790171025.666667

Explanation

PROC REPORT remains one of the most powerful reporting procedures in SAS. Clinical TLF generation frequently relies on REPORT because of its flexible compute blocks and formatting options.

8.Reusable Macro Example

%macro check_missing(ds,var);

proc sql;

select count(*) as missing_count

from &ds

where missing(&var);

quit;

%mend;

%check_missing(eyebank_clean,contact_email);

OUTPUT:

missing_count
0

Explanation

Macros improve standardization and reproducibility. Regulatory organizations strongly prefer reusable validated components rather than duplicated business logic.

9.R Raw Dataset

library(tibble)

eyebank_raw <- tibble(

  eye_bank_id = c("EB001","EB002","EB003","EB003","EB004","EB005","EB006",

    "EB007","EB008","EB009","EB010","EB011","EB012","EB013","EB014",

    "EB015","EB016","EB017","EB018","EB019","EB020","EB021","EB022"),

  eye_bank_name = c("LionsEyeBank","VisionTrust","GlobalEye","GlobalEye",

    "CorneaCare","BrightSight","FutureVision","EyeLife","WorldCornea",

    "ClearView","VisionTrust","Eye Hope","RetinaGift","NewLight",

    "OcularBank","DonorVision","LifeSight","GiftVision","HopeCornea",

    "EyeFuture","GlobalCornea","BrightVision","ClearGift"),

  country = c("India","USA","UK","UK","India","Canada","Australia",

    "Germany","India","Japan","USA","Brazil","India","UAE","France",

    "India","Kenya","India","USA","India","Singapore","India","UK"),

  region_code = c("APAC","NA","EUROPE","Europe","APAC","na","APAC",

    "EU","AsiaPacific","APAC","NA","LATAM","APAC","MEA","eu","APAC",

    "AFRICA","APAC","NorthAmerica","APAC","APAC","APAC","EUROPE"),

  donors_received = c(450,620,-45,500,520,410,300,290,610,510,620,400,

                      NA,370,280,530,250,0,550,460,390,440,480),

  transplants_done = c(420,590,310,310,700,390,280,275,605,490,590,410,

                       340,355,270,520,240,230,530,455,370,435,470),

  avg_wait_days = c(32,28,20,20,18,25,22,-5,31,27,28,30,40,29,

                    23,26,34,36,21,24,19,33,27),

  contact_email = c("support@lions.org","info@visiontrust.org",

    "admin@globaleye.org","admin@globaleye.org","NULL","help@brightsight",

    "futurevision.org","contact@eyelife.org","world@cornea.org","clear@view.org",

    "info@visiontrust.org","hope @eye.org","retina@gift.org","newlight@mail.com",

    "ocular@bank.org","donor@vision.org","sight@life.org","gift@vision.org",

    "hope@cornea.org","eyefuture.com","global@cornea.org","bright@vision.org",

    "cleargift@eye.org"),

  last_audit_date = c("12JAN2025","15FEB2025","11MAR2025","11MAR2025",".",

    "14APR2025","30FEB2025","19MAY2025",".","10JUN2025","15FEB2025","01JUL2025",

    "20AUG2025","15SEP2025","15OCT2025","22NOV2025","11DEC2025",".","15JAN2025",

    "14FEB2025","15MAR2025","01APR2025","18MAY2025")

)

OUTPUT:

eye_bank_id

eye_bank_name

country

region_code

donors_received

transplants_done

avg_wait_days

contact_email

last_audit_date

EB001

LionsEyeBank

India

APAC

450

420

32

support@lions.org

12JAN2025

EB002

VisionTrust

USA

NA

620

590

28

info@visiontrust.org

15FEB2025

EB003

GlobalEye

UK

EUROPE

-45

310

20

admin@globaleye.org

11MAR2025

EB003

GlobalEye

UK

Europe

500

310

20

admin@globaleye.org

11MAR2025

EB004

CorneaCare

India

APAC

520

700

18

NULL

.

EB005

BrightSight

Canada

na

410

390

25

help@brightsight

14APR2025

EB006

FutureVision

Australia

APAC

300

280

22

futurevision.org

30FEB2025

EB007

EyeLife

Germany

EU

290

275

-5

contact@eyelife.org

19MAY2025

EB008

WorldCornea

India

AsiaPacific

610

605

31

world@cornea.org

.

EB009

ClearView

Japan

APAC

510

490

27

clear@view.org

10JUN2025

EB010

VisionTrust

USA

NA

620

590

28

info@visiontrust.org

15FEB2025

EB011

Eye Hope

Brazil

LATAM

400

410

30

hope @eye.org

01JUL2025

EB012

RetinaGift

India

APAC

340

40

retina@gift.org

20AUG2025

EB013

NewLight

UAE

MEA

370

355

29

newlight@mail.com

15SEP2025

EB014

OcularBank

France

eu

280

270

23

ocular@bank.org

15OCT2025

EB015

DonorVision

India

APAC

530

520

26

donor@vision.org

22NOV2025

EB016

LifeSight

Kenya

AFRICA

250

240

34

sight@life.org

11DEC2025

EB017

GiftVision

India

APAC

0

230

36

gift@vision.org

.

EB018

HopeCornea

USA

NorthAmerica

550

530

21

hope@cornea.org

15JAN2025

EB019

EyeFuture

India

APAC

460

455

24

eyefuture.com

14FEB2025

EB020

GlobalCornea

Singapore

APAC

390

370

19

global@cornea.org

15MAR2025

EB021

BrightVision

India

APAC

440

435

33

bright@vision.org

01APR2025

EB022

ClearGift

UK

EUROPE

480

470

27

cleargift@eye.org

18MAY2025


10.R Cleaning Layer

library(tidyverse)

library(janitor)

library(lubridate)

eyebank_clean <- eyebank_raw %>%

  clean_names() %>%

  mutate(eye_bank_name=str_to_title(str_trim(eye_bank_name)),

               country=str_to_title(country),

           region_code=str_to_upper(region_code),

       donors_received=abs(donors_received),

         avg_wait_days=if_else(avg_wait_days<0,NA_real_,

                          avg_wait_days),

         contact_email=str_replace_all(contact_email," ",""),

         contact_email=coalesce(contact_email,"unknown@placeholder.org")

  ) %>%

  distinct(eye_bank_id,.keep_all=TRUE)

OUTPUT:

eye_bank_id

eye_bank_name

country

region_code

donors_received

transplants_done

avg_wait_days

contact_email

last_audit_date

EB001

Lionseyebank

India

APAC

450

420

32

support@lions.org

12JAN2025

EB002

Visiontrust

Usa

NA

620

590

28

info@visiontrust.org

15FEB2025

EB003

Globaleye

Uk

EUROPE

45

310

20

admin@globaleye.org

11MAR2025

EB004

Corneacare

India

APAC

520

700

18

NULL

.

EB005

Brightsight

Canada

NA

410

390

25

help@brightsight

14APR2025

EB006

Futurevision

Australia

APAC

300

280

22

futurevision.org

30FEB2025

EB007

Eyelife

Germany

EU

290

275

contact@eyelife.org

19MAY2025

EB008

Worldcornea

India

ASIAPACIFIC

610

605

31

world@cornea.org

.

EB009

Clearview

Japan

APAC

510

490

27

clear@view.org

10JUN2025

EB010

Visiontrust

Usa

NA

620

590

28

info@visiontrust.org

15FEB2025

EB011

Eye Hope

Brazil

LATAM

400

410

30

hope@eye.org

01JUL2025

EB012

Retinagift

India

APAC

340

40

retina@gift.org

20AUG2025

EB013

Newlight

Uae

MEA

370

355

29

newlight@mail.com

15SEP2025

EB014

Ocularbank

France

EU

280

270

23

ocular@bank.org

15OCT2025

EB015

Donorvision

India

APAC

530

520

26

donor@vision.org

22NOV2025

EB016

Lifesight

Kenya

AFRICA

250

240

34

sight@life.org

11DEC2025

EB017

Giftvision

India

APAC

0

230

36

gift@vision.org

.

EB018

Hopecornea

Usa

NORTHAMERICA

550

530

21

hope@cornea.org

15JAN2025

EB019

Eyefuture

India

APAC

460

455

24

eyefuture.com

14FEB2025

EB020

Globalcornea

Singapore

APAC

390

370

19

global@cornea.org

15MAR2025

EB021

Brightvision

India

APAC

440

435

33

bright@vision.org

01APR2025

EB022

Cleargift

Uk

EUROPE

480

470

27

cleargift@eye.org

18MAY2025

Explanation

The tidyverse workflow mirrors SAS logic almost line for line. mutate() behaves similarly to DATA step assignments while case_when() resembles SELECT-WHEN blocks. distinct() mirrors PROC SORT NODUPKEY behavior and coalesce() provides elegant missing value replacement.

Enterprise Validation and Compliance

Clinical data cleaning extends beyond aesthetics.

It supports:

  • SDTM traceability
  • ADaM reproducibility
  • Audit trail requirements
  • QC independence
  • Regulatory transparency
  • Submission readiness

One dangerous SAS behavior involves missing numeric values.

In SAS:

if age<18 then pediatric='Y';

Missing ages satisfy this condition because missing numeric values are treated as smaller than valid numbers.

This subtle behavior has caused real production errors.

Business Logic Behind Cleaning Decisions

Data cleaning is never arbitrary.

Negative donor counts become absolute values because donor registrations cannot be negative in reality. Missing audit dates are imputed using operational assumptions so monitoring reports continue functioning. Region codes are standardized because APAC and AsiaPacific represent identical business concepts but fragment aggregation logic.

Suppose one eye bank reports donor count as -450 due to accidental keyboard entry. Leaving this value unchanged could incorrectly reduce global donor estimates by hundreds of procedures.

Similarly, missing dates can break time-series analysis and trend forecasting.

Text normalization improves grouping consistency.

Email validation supports operational communication workflows.

Healthcare analytics depends on semantic consistency as much as statistical correctness.

Twenty Data Cleaning Best Practices

  1. Preserve raw data permanently.
  2. Never overwrite source files.
  3. Validate identifiers first.
  4. Standardize metadata.
  5. Maintain audit logs.
  6. Document assumptions.
  7. Version control macros.
  8. Separate QC teams.
  9. Validate dates early.
  10. Normalize text variables.
  11. Track imputation rules.
  12. Use reusable functions.
  13. Automate validation reports.
  14. Compare frequencies before deployment.
  15. Protect lineage information.
  16. Monitor truncation risk.
  17. Standardize missing values.
  18. Perform peer review.
  19. Validate business rules.
  20. Maintain reproducible workflows.

Twenty One-Line Insights

  • Dirty data creates expensive business mistakes.
  • Validation logic beats visual inspection.
  • Metadata drives reproducibility.
  • Missing values deserve business context.
  • Duplicates distort reality.
  • Standardization improves trust.
  • Defensive programming saves projects.
  • Clean inputs create reliable AI.
  • Audits reward traceability.
  • Governance reduces surprises.
  • Business rules matter.
  • Automation reduces human error.
  • Documentation improves continuity.
  • SQL is powerful but not magical.
  • DATA step remains irreplaceable.
  • Reusable macros reduce risk.
  • Consistency improves analytics.
  • Data lineage protects organizations.
  • Regulatory trust starts with quality.
  • Reliable decisions require reliable data.

SAS Versus R for Enterprise Cleaning

SAS provides unmatched auditability, metadata management, and production deployment reliability. DATA step programming remains exceptionally efficient for row-wise transformations and longitudinal clinical processing.

R provides flexibility, open-source innovation, and advanced string manipulation capabilities. Tidyverse workflows are highly expressive and often easier for exploratory analytics.

Large pharmaceutical organizations continue using SAS because regulatory ecosystems depend heavily on traceability and reproducibility.

Technology companies often prefer R because of integration with machine learning ecosystems.

The strongest analytical teams use both.

SAS handles governance.

R handles flexibility.

Together they create robust analytical intelligence.

Conclusion

Modern analytics systems depend less on algorithms and more on trustworthy input data.

An advanced predictive model trained on corrupted information simply produces sophisticated errors.

Eye bank operations illustrate this challenge clearly.

Duplicate identifiers distort operational capacity estimates. Missing audit dates interrupt compliance monitoring. Invalid email structures disrupt communication systems. Inconsistent regional classifications create misleading dashboards.

Data cleaning therefore becomes a strategic capability rather than an operational inconvenience.

SAS contributes industrial-grade governance, reproducibility, and audit readiness.

R contributes agility, innovation, and exploratory flexibility.

Organizations that combine both ecosystems gain significant competitive advantages in healthcare analytics, insurance risk modeling, financial fraud detection, and retail intelligence.

The future of analytics does not belong to the organization with the largest data warehouse.

It belongs to the organization with the most trustworthy data.

Because in modern analytics, vision alone is not enough.

Validated vision changes decisions.

Validated decisions change outcomes.

Interview Questions and Answers

1. How would you identify duplicate records in SAS?

Use PROC SORT with NODUPKEY or PROC SQL GROUP BY with HAVING COUNT(*)>1 depending on reporting requirements.

2. Why is LENGTH placement important in SAS?

LENGTH statements must appear before assignments because variable storage allocation occurs during compilation and late declarations create truncation risk.

3. What is the R equivalent of PROC SORT NODUPKEY?

The equivalent approach is distinct() from dplyr.

4. Why are missing values dangerous in SAS?

Missing numeric values are treated as smaller than valid numbers, causing unexpected IF condition behavior.

5. When would you choose PROC SQL over DATA Step?

PROC SQL performs better for complex joins and aggregations while DATA step excels at sequential processing and FIRST./LAST. logic.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. Here we learn about EYE BANK DATA.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics and smart cities

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:



--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Comments

Popular posts from this blog

Beyond Fabric and Fashion: Turning the World’s Most Beautiful Sarees Dataset into Structured Intelligence with SAS and R

Data Cleaning Secrets Using Famous Food Dataset:Handling Duplicate Records in SAS

Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS