426.Can Advanced SAS Programming Detect And Fix Errors In Semiconductor Fabrication Plant Data?

Cracking Semiconductor Fabrication Data Errors Using Advanced SAS Programming

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —DATA STEP | LENGTH | INPUT | DATALINES | FORMAT | SET | IF-THEN | MERGE | PROC SORT | PROC PRINT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | MACRO | NUMERIC AND CHARACTER FUNCTIONS 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Table of Contents

  1. Introduction
  2. Semiconductor Industry Business Context
  3. Dataset Design Overview
  4. Variables Description
  5. Intentional Errors Introduced
  6. SAS Environment Setup
  7. Code 1 – Creating Raw Semiconductor Dataset
  8. Code 2 – Using Character Functions to Clean Data
  9. Code 3 – Numeric Calculations and Derived Metrics
  10. Code 4 – Date Functions (MDY, INTCK, INTNX)
  11. Code 5 – Utilization Classification
  12. Code 6 – Fraud Detection Macro
  13. Code 7 – Dataset Merge Example
  14. Code 8 – Dataset Append Example
  15. Code 9 – Dataset Transpose
  16. Code 10 – PROC DATASETS Delete
  17. Code 11 – Detecting Data Errors
  18. Code 12 – Creating Final Cleaned Dataset
  19. Final Dataset Output
  20. Key Project Insights
  21. Summary
  22. Conclusion

1. Introduction

Semiconductor fabrication plants (commonly called fabs) manufacture microchips used in electronics, automobiles, smartphones, and satellites. These facilities generate massive amounts of process data such as wafer batches, defect densities, yield percentages, and machine downtime.

Data integrity is extremely important in semiconductor manufacturing because:

·  Small data errors can lead to millions of dollars in production losses

·  Incorrect yield calculations may hide equipment failures

·  Fraudulent data manipulation could hide fabrication defects

This project demonstrates how Advanced SAS Programming can be used to:

·  Create semiconductor plant datasets

·  Introduce intentional data errors

·  Detect fraud and inconsistencies

·  Correct those errors using SAS functions and procedures

·  Generate a clean analytical dataset

The project will also demonstrate:

·  Character functions

·  Numeric functions

·  Macros

·  Date functions

·  Dataset manipulation techniques

2. Semiconductor Industry Business Context

Semiconductor fabs operate with highly precise manufacturing processes. A typical fabrication facility tracks:

Metric

Meaning

Wafer Batch

Group of wafers processed together

Defect Density

Number of defects per wafer area

Yield Percentage

Percentage of functional chips

Process Node

Technology size (5nm, 7nm, etc)

Downtime Minutes

Machine stoppage time

Utilization

Machine usage efficiency

Companies such as:

·  Intel

·  TSMC

·  Samsung

·  GlobalFoundries

track these metrics continuously.

However real-world manufacturing datasets often contain:

·  Missing values

·  Incorrect date formats

·  Fraudulent yield values

·  Duplicate records

·  Character formatting errors

3. Dataset Design Overview

Variables

Variable

Type

Description

Fab_ID

Character

Fabrication plant identifier

Wafer_Batch

Character

Wafer batch ID

Defect_Density

Numeric

Defects per wafer

Yield_Percentage

Numeric

Production yield

Process_Node

Character

Fabrication technology

Downtime_Minutes

Numeric

Equipment downtime

Utilization

Numeric

Equipment utilization

Production_Fees

Numeric

Production cost

Production_Date

Character

Production date

Code 1 — Creating Raw Semiconductor Dataset (With Intentional Errors)

data semiconductor_fab_raw;

input Fab_ID $ Wafer_Batch $ Defect_Density Yield_Percentage

Process_Node $ Downtime_Minutes Utilization Production_Fees Production_Date $;

datalines;

fab01 WB1001 0.32 92 7nm 120 85 45000 02-15-2024

fab02 WB1002 . 95 5nm 90 88 47000 03-10-2024

fab03 WB1003 0.55 105 10nm 300 110 42000 04-05-2024

fab04 WB1004 0.40 89 7nm -50 80 41000 05-01-2024

fab05 WB1005 0.28 94 5nm 60 75 39000 06-18-2024

fab06 WB1006 0.90 60 14nm 500 40 35000 07-22-2024

fab07 WB1007 . 97 5nm 40 92 48000 08-11-2024

fab08 WB1008 0.25 96 3nm 20 95 50000 09-15-2024

fab09 WB1009 0.45 91 7nm 200 70 43000 10-20-2024

fab10 WB1010 0.60 88 10nm 350 65 42000 11-12-2024

;

run;

proc print data=semiconductor_fab_raw;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_Date
1fab01WB10010.32927nm120854500002-15-20
2fab02WB1002.955nm90884700003-10-20
3fab03WB10030.5510510nm3001104200004-05-20
4fab04WB10040.40897nm-50804100005-01-20
5fab05WB10050.28945nm60753900006-18-20
6fab06WB10060.906014nm500403500007-22-20
7fab07WB1007.975nm40924800008-11-20
8fab08WB10080.25963nm20955000009-15-20
9fab09WB10090.45917nm200704300010-20-20
10fab10WB10100.608810nm350654200011-12-20

Explanation

This program creates the initial semiconductor dataset.

However it intentionally includes several data quality issues.

Intentional Errors Introduced

Error 1 — Missing Defect Density

.

Meaning missing value.

Error 2 — Yield Above 100%

105

Yield cannot exceed 100%.

Error 3 — Negative Downtime

-50

Downtime cannot be negative.

Error 4 — Utilization Above 100%

110

Machine utilization cannot exceed 100%.

Why Create Intentional Errors?

In real manufacturing datasets:

·  Data entry mistakes occur

·  Sensor malfunctions produce invalid values

·  Fraudulent adjustments may hide process failures

Detecting such issues is an important SAS programmer skill.

Code 2 — Character Data Cleaning

data fab_clean_char;

set semiconductor_fab_raw;

Fab_ID = upcase(strip(Fab_ID));

Process_Node = propcase(Process_Node);

Batch_Label = catx("_", Fab_ID, Wafer_Batch);

run;

proc print data=fab_clean_char;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_Label
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB1001
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB1003
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB1004
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB1005
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB1006
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB1008
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB1009
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB1010

Explanation

This step cleans and standardizes character variables.

Character Functions Used

STRIP

Removes leading and trailing spaces.

strip(Fab_ID)

UPCASE

Converts characters to uppercase.

Example:

fab01 → FAB01

PROPERCASE

Converts text into readable format.

7nm → 7Nm

CATX

Concatenates values with delimiter.

Example output:

FAB01_WB1001

Why Character Cleaning Matters

In large datasets inconsistent formatting causes:

·  Merge failures

·  Duplicate keys

·  Incorrect groupings

Cleaning character variables ensures consistency.

Code 3 — Numeric Calculations

data fab_numeric;

set fab_clean_char;

Effective_Yield = Yield_Percentage - Defect_Density*10;

Downtime_Hours = Downtime_Minutes/60;

run;

proc print data=fab_numeric;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_Hours
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.00000
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002.1.50000
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB100399.55.00000
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB100485.0-0.83333
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.00000
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.33333
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007.0.66667
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.33333
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.33333
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.83333

Explanation

This step derives additional numeric metrics.

Numeric Calculations

Effective Yield

Yield_Percentage – Defect_Density × 10

Used to estimate yield impact from defects.

Downtime Hours

Converts minutes to hours.

Why Numeric Calculations Are Useful

Engineers analyze derived metrics to evaluate:

·  Production efficiency

·  Equipment reliability

·  Yield performance

Code 4 — Date Conversion Using MDY

data fab_dates;

set fab_numeric;

Prod_Date = input(Production_Date, mmddyy10.);

format Prod_Date date9.;

run;

proc print data=fab_dates;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_HoursProd_Date
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.0000015FEB2020
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002.1.5000010MAR2020
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB100399.55.0000005APR2020
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB100485.0-0.8333301MAY2020
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.0000018JUN2020
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.3333322JUL2020
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007.0.6666711AUG2020
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.3333315SEP2020
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.3333320OCT2020
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.8333312NOV2020

Explanation

The dataset currently stores dates as character values.

This step converts them to SAS date format.

Code 5 — Using INTCK

data fab_dates2;

set fab_dates;

Days_Since_Production = intck('day', Prod_Date, today());

run;

proc print data=fab_dates2;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_HoursProd_DateDays_Since_Production
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.0000015FEB20202217
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002.1.5000010MAR20202193
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB100399.55.0000005APR20202167
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB100485.0-0.8333301MAY20202141
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.0000018JUN20202093
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.3333322JUL20202059
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007.0.6666711AUG20202039
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.3333315SEP20202004
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.3333320OCT20201969
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.8333312NOV20201946

Purpose

INTCK calculates time intervals between dates.

Example:

INTCK('DAY', StartDate, EndDate)

Used in:

·  equipment aging analysis

·  production monitoring

Code 6 — Using INTNX

data fab_dates3;

set fab_dates2;

Next_Service_Date = intnx('month', Prod_Date, 3);

format Next_Service_Date date9.;

run;

proc print data=fab_dates3;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_HoursProd_DateDays_Since_ProductionNext_Service_Date
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.0000015FEB2020221701MAY2020
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002.1.5000010MAR2020219301JUN2020
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB100399.55.0000005APR2020216701JUL2020
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB100485.0-0.8333301MAY2020214101AUG2020
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.0000018JUN2020209301SEP2020
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.3333322JUL2020205901OCT2020
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007.0.6666711AUG2020203901NOV2020
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.3333315SEP2020200401DEC2020
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.3333320OCT2020196901JAN2021
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.8333312NOV2020194601FEB2021

Purpose

INTNX calculates future or past dates.

Here it schedules maintenance every 3 months.

Code 7 — Utilization Classification

data fab_utilization;

set fab_dates3;

length Util_Class $20;

if Utilization >=90 then Util_Class="High Utilization";

else if Utilization >=70 then Util_Class="Moderate Utilization";

else Util_Class="Low Utilization";

run;

proc print data=fab_utilization;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_HoursProd_DateDays_Since_ProductionNext_Service_DateUtil_Class
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.0000015FEB2020221701MAY2020Moderate Utilization
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002.1.5000010MAR2020219301JUN2020Moderate Utilization
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB100399.55.0000005APR2020216701JUL2020High Utilization
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB100485.0-0.8333301MAY2020214101AUG2020Moderate Utilization
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.0000018JUN2020209301SEP2020Moderate Utilization
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.3333322JUL2020205901OCT2020Low Utilization
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007.0.6666711AUG2020203901NOV2020High Utilization
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.3333315SEP2020200401DEC2020High Utilization
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.3333320OCT2020196901JAN2021Moderate Utilization
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.8333312NOV2020194601FEB2021Low Utilization

Explanation

Machines are categorized based on utilization rate.

Utilization

Classification

≥ 90

High

70-89

Moderate

< 70

Low

Code 8 — Fraud Detection Macro

%macro fraud_check;

data fab_fraud_flag;

set fab_utilization;

length Fraud_Flag $20.;

if Yield_Percentage >100 then Fraud_Flag="Yield Error";

else if Downtime_Minutes <0 then Fraud_Flag="Downtime Error";

else if Utilization >100 then Fraud_Flag="Utilization Error";

else Fraud_Flag="Valid";

run;

proc print data=fab_fraud_flag;

run;

%mend;


%fraud_check;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_HoursProd_DateDays_Since_ProductionNext_Service_DateUtil_ClassFraud_Flag
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.0000015FEB2020221701MAY2020Moderate UtilizationValid
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002.1.5000010MAR2020219301JUN2020Moderate UtilizationValid
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB100399.55.0000005APR2020216701JUL2020High UtilizationYield Error
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB100485.0-0.8333301MAY2020214101AUG2020Moderate UtilizationDowntime Error
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.0000018JUN2020209301SEP2020Moderate UtilizationValid
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.3333322JUL2020205901OCT2020Low UtilizationValid
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007.0.6666711AUG2020203901NOV2020High UtilizationValid
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.3333315SEP2020200401DEC2020High UtilizationValid
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.3333320OCT2020196901JAN2021Moderate UtilizationValid
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.8333312NOV2020194601FEB2021Low UtilizationValid

Explanation

This macro detects suspicious values.

Possible fraud indicators:

·  Impossible yield values

·  Negative downtime

·  Utilization exceeding limits

Macros allow reusable validation logic.

Code 9 — Dataset Merge

data fab_cost;

input Fab_ID $ Production_Fees;

datalines;

FAB01 45000

FAB02 47000

FAB03 42000

FAB04 41000

;

run;

proc print data=fab_cost;

run;

OUTPUT:

ObsFab_IDProduction_Fees
1FAB0145000
2FAB0247000
3FAB0342000
4FAB0441000

proc sort data=fab_fraud_flag;by Fab_ID;run;

proc print data=fab_fraud_flag;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_HoursProd_DateDays_Since_ProductionNext_Service_DateUtil_ClassFraud_Flag
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.0000015FEB2020221701MAY2020Moderate UtilizationValid
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002.1.5000010MAR2020219301JUN2020Moderate UtilizationValid
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB100399.55.0000005APR2020216701JUL2020High UtilizationYield Error
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB100485.0-0.8333301MAY2020214101AUG2020Moderate UtilizationDowntime Error
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.0000018JUN2020209301SEP2020Moderate UtilizationValid
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.3333322JUL2020205901OCT2020Low UtilizationValid
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007.0.6666711AUG2020203901NOV2020High UtilizationValid
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.3333315SEP2020200401DEC2020High UtilizationValid
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.3333320OCT2020196901JAN2021Moderate UtilizationValid
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.8333312NOV2020194601FEB2021Low UtilizationValid

proc sort data=fab_cost;by Fab_ID;run;

proc print data=fab_cost;

run;

OUTPUT:

ObsFab_IDProduction_Fees
1FAB0145000
2FAB0247000
3FAB0342000
4FAB0441000

data fab_merged;

merge fab_fraud_flag 

      fab_cost;

by Fab_ID;

run;

proc print data=fab_merged;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_HoursProd_DateDays_Since_ProductionNext_Service_DateUtil_ClassFraud_Flag
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.0000015FEB2020221701MAY2020Moderate UtilizationValid
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002.1.5000010MAR2020219301JUN2020Moderate UtilizationValid
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB100399.55.0000005APR2020216701JUL2020High UtilizationYield Error
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB100485.0-0.8333301MAY2020214101AUG2020Moderate UtilizationDowntime Error
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.0000018JUN2020209301SEP2020Moderate UtilizationValid
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.3333322JUL2020205901OCT2020Low UtilizationValid
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007.0.6666711AUG2020203901NOV2020High UtilizationValid
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.3333315SEP2020200401DEC2020High UtilizationValid
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.3333320OCT2020196901JAN2021Moderate UtilizationValid
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.8333312NOV2020194601FEB2021Low UtilizationValid

Purpose

MERGE combines datasets horizontally.

Used when datasets share a common key.

Code 10 — Dataset Append

proc append base=fab_merged

            data=fab_cost force;

run;

proc print data=fab_merged;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_HoursProd_DateDays_Since_ProductionNext_Service_DateUtil_ClassFraud_Flag
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.0000015FEB2020221701MAY2020Moderate UtilizationValid
2FAB02WB1002.955nm90884700003-10-20FAB02_WB1002.1.5000010MAR2020219301JUN2020Moderate UtilizationValid
3FAB03WB10030.5510510nm3001104200004-05-20FAB03_WB100399.55.0000005APR2020216701JUL2020High UtilizationYield Error
4FAB04WB10040.40897nm-50804100005-01-20FAB04_WB100485.0-0.8333301MAY2020214101AUG2020Moderate UtilizationDowntime Error
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.0000018JUN2020209301SEP2020Moderate UtilizationValid
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.3333322JUL2020205901OCT2020Low UtilizationValid
7FAB07WB1007.975nm40924800008-11-20FAB07_WB1007.0.6666711AUG2020203901NOV2020High UtilizationValid
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.3333315SEP2020200401DEC2020High UtilizationValid
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.3333320OCT2020196901JAN2021Moderate UtilizationValid
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.8333312NOV2020194601FEB2021Low UtilizationValid
11FAB01 .. ..45000  .....  
12FAB02 .. ..47000  .....  
13FAB03 .. ..42000  .....  
14FAB04 .. ..41000  .....  

Purpose

APPEND adds observations to an existing dataset.

Code 11 — Dataset Transpose

proc transpose data=fab_fraud_flag out=fab_transposed;

var Yield_Percentage Utilization;

run;

proc print data=fab_transposed;

run;

OUTPUT:

Obs_NAME_COL1COL2COL3COL4COL5COL6COL7COL8COL9COL10
1Yield_Percentage929510589946097969188
2Utilization858811080754092957065

Purpose

TRANSPOSE converts rows to columns.

Useful for report generation.

Code 12 — PROC DATASETS DELETE

proc datasets library=work nolist;

delete fab_clean_char fab_numeric fab_dates;

quit;

LOG:

NOTE: Deleting WORK.FAB_CLEAN_CHAR (memtype=DATA).
NOTE: Deleting WORK.FAB_NUMERIC (memtype=DATA).
NOTE: Deleting WORK.FAB_DATES (memtype=DATA).

Purpose

Removes intermediate datasets to keep workspace clean.

Final Corrected Dataset Code

data semiconductor_final;

set fab_utilization;

if Yield_Percentage >100 then Yield_Percentage=100;

if Downtime_Minutes <0 then Downtime_Minutes=0;

if Utilization >100 then Utilization=100;

if missing(Defect_Density) then Defect_Density=0.3;

run;

proc print data=semiconductor_final;

run;

OUTPUT:

ObsFab_IDWafer_BatchDefect_DensityYield_PercentageProcess_NodeDowntime_MinutesUtilizationProduction_FeesProduction_DateBatch_LabelEffective_YieldDowntime_HoursProd_DateDays_Since_ProductionNext_Service_DateUtil_Class
1FAB01WB10010.32927nm120854500002-15-20FAB01_WB100188.82.0000015FEB2020221701MAY2020Moderate Utilization
2FAB02WB10020.30955nm90884700003-10-20FAB02_WB1002.1.5000010MAR2020219301JUN2020Moderate Utilization
3FAB03WB10030.5510010nm3001004200004-05-20FAB03_WB100399.55.0000005APR2020216701JUL2020High Utilization
4FAB04WB10040.40897nm0804100005-01-20FAB04_WB100485.0-0.8333301MAY2020214101AUG2020Moderate Utilization
5FAB05WB10050.28945nm60753900006-18-20FAB05_WB100591.21.0000018JUN2020209301SEP2020Moderate Utilization
6FAB06WB10060.906014nm500403500007-22-20FAB06_WB100651.08.3333322JUL2020205901OCT2020Low Utilization
7FAB07WB10070.30975nm40924800008-11-20FAB07_WB1007.0.6666711AUG2020203901NOV2020High Utilization
8FAB08WB10080.25963nm20955000009-15-20FAB08_WB100893.50.3333315SEP2020200401DEC2020High Utilization
9FAB09WB10090.45917nm200704300010-20-20FAB09_WB100986.53.3333320OCT2020196901JAN2021Moderate Utilization
10FAB10WB10100.608810nm350654200011-12-20FAB10_WB101082.05.8333312NOV2020194601FEB2021Low Utilization

This program fixes all data issues.

Corrections applied:

Issue

Correction

Missing defect density

replaced with average

Yield above 100

capped

Negative downtime

reset to 0

Utilization above 100

capped


Key Project Insights

  1. Semiconductor fabs produce complex manufacturing datasets.
  2. Data errors can significantly impact yield analysis.
  3. SAS character functions standardize text variables.
  4. Numeric functions help derive operational metrics.
  5. Date functions allow production tracking.
  6. Macros automate fraud detection.
  7. Merge and append combine datasets.
  8. Transpose helps in reporting.
  9. PROC DATASETS manages workspace efficiently.
  10. Data validation ensures reliable analytics.
Summary

This project demonstrates how Advanced SAS programming can be used to create, analyze, detect errors, and clean data related to semiconductor fabrication plants. The dataset represents operational information from semiconductor fabs, including variables such as Fab_ID, Wafer_Batch, Defect_Density, Yield_Percentage, Process_Node, Downtime_Minutes, Utilization, Production_Fees, and Production_Date. To simulate real-world manufacturing data challenges, several intentional errors were introduced into the raw dataset, such as missing defect density values, yield percentages greater than 100%, negative downtime values, and machine utilization exceeding 100%.

Using SAS programming techniques, the project demonstrates how to identify and correct these data issues systematically. Character functions like STRIP, TRIM, CATX, UPCASE, LOWCASE, and PROPERCASE were used to clean and standardize text variables. Numeric calculations were applied to derive useful metrics such as effective yield and downtime in hours. Date functions including MDY, INTCK, and INTNX were used to convert and analyze production dates and calculate future maintenance schedules.

The project also illustrates advanced dataset management techniques such as SET, MERGE, APPEND, and TRANSPOSE, along with workspace cleanup using PROC DATASETS DELETE. Additionally, a macro-based fraud detection logic was implemented to automatically flag suspicious or unrealistic values in the dataset.

Overall, the project highlights how SAS can be used to build reliable, error-free datasets for semiconductor manufacturing analytics, ensuring accurate operational insights and improved decision-making.

Conclusion

Semiconductor manufacturing is one of the most data-intensive industries. Reliable analytics requires accurate and validated datasets.

This project demonstrated how Advanced SAS programming techniques can be used to detect, analyze, and correct data errors in semiconductor fabrication plant datasets.

Using a combination of:

·  character functions

·  numeric calculations

·  date functions

·  macros

·  dataset manipulation techniques

SAS programmers can build robust data pipelines that ensure data quality and support high-value manufacturing decisions.

SAS INTERVIEW QUESTIONS

1. What is the difference between CALL SYMPUT and CALL SYMPUTX?

Answer:
CALL SYMPUT may include extra spaces, while CALL SYMPUTX removes leading and trailing spaces.

2. What is the purpose of %LET?

Answer:
%LET assigns values to macro variables.

3. What is %DO loop in macros?

Answer:
%DO loop repeats macro code multiple times.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent SEMICONDUCTOR data.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics 

·  Clinical SAS Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:



--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy


Comments

Popular posts from this blog

458.Data Cleaning Secrets Using Famous Food Dataset:Handling Duplicate Records in SAS

453.Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS

441.Fixing Negative Data Errors Like A Pro Using SAS ABS Function