399.How can SAS analytics expose risky mining sites that look profitable on paper?
How can SAS analytics expose risky mining sites that look profitable on paper?
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC SQL | PROC PRINT | PROC SGPLOT | MACROS | PROC CORR | PROC MEANS | PROC FREQ | PROC UNIVARIATE | PROC SORT | SET | APPEND | PROC TRANSPOSE | PROC DATASETS DELETE | DATA FUNCTIONS
TABLE OF CONTENTS
1.
Introduction
2.
Business Context
3.
Dataset Design & Variables
4.
Raw Mining Data Creation
5.
Data Cleaning & Standardization
6.
Date Handling (MDY, INTNX, INTCK)
7.
Derived Metrics & Index Calculations
8.
Utilization Classification Macro
9.
Environmental Risk & Fraud Detection Logic
10. PROC
SQL Analysis
11. PROC
FREQ Analysis
12. PROC
MEANS Analysis
13. PROC
UNIVARIATE Analysis
14. Correlation
Analysis (PROC CORR)
15. Visualization
(PROC SGPLOT)
16. Data
Reshaping (TRANSPOSE)
17. Dataset
Integration (SET, MERGE, APPEND)
18. Character
& Numeric Functions Usage
19. PROC
DATASETS (Delete / Cleanup)
20. Business
Interpretation
21. Simple
Conclusion
1. INTRODUCTION
Mining is a capital-intensive, risk-heavy, and environmentally sensitive
industry.
Organizations operating mines must constantly balance:
·
Production output
·
Worker safety
·
Environmental compliance
·
Operating cost
·
Profitability
This SAS project demonstrates how a clinical-style analytical mindset
can be applied to industrial mining data,
using enterprise-level SAS techniques.
This project is ideal for:
·
SAS
Programmer interviews
·
Analytics
portfolios
·
Real-world
simulation
·
Advanced
PROC & Macro practice
2. BUSINESS CONTEXT
Mining companies and regulators want to answer
questions like:
·
Which mines are high-risk for accidents?
·
Which sites show possible fraud or misreporting?
·
How does environmental
impact affect profitability?
·
Which mines are under-utilized or over-exploited?
·
Are reported outputs consistent over time?
This project simulates how data analytics teams help leadership
make safety-driven and profit-driven
decisions.
3. DATASET DESIGN & VARIABLES
Core Variables
|
Variable |
Description |
|
Mine_Name |
Name of the
mining site |
|
Mineral_Type |
Type of mineral extracted |
|
Output_Tons |
Annual production |
|
Accident_Rate |
Accidents per 100 workers |
|
Operating_Cost |
Annual cost (in million ₹) |
|
Environmental_Impact |
Score 1–10 |
|
Profitability_Index |
Calculated % |
|
Inspection_Date |
Safety inspection date |
|
Utilization_Pct |
Capacity utilization |
|
Fraud_Flag |
Potential fraud indicator |
4. RAW DATA CREATION
data mining_raw;
length Mine_Name $30 Mineral_Type $15;
input Mine_Name $ Mineral_Type $ Output_Tons Accident_Rate Operating_Cost
Environmental_Impact Utilization_Pct Inspect_Month Inspect_Day Inspect_Year;
Inspection_Date = mdy(Inspect_Month, Inspect_Day, Inspect_Year);
format Inspection_Date date9.;
datalines;
IronPeak Iron 85000 3.2 120 7 78 3 12 2024
CoalVista Coal 92000 4.8 140 8 82 4 18 2024
GoldRidge Gold 45000 2.1 210 6 65 2 05 2024
CopperVale Copper 67000 3.9 160 7 71 1 22 2024
SilverRock Silver 38000 1.8 190 5 60 5 11 2024
ZincField Zinc 72000 4.5 130 8 85 6 19 2024
BauxiteHub Bauxite 81000 3.0 110 6 74 7 14 2024
NickelCore Nickel 54000 2.7 155 7 69 8 09 2024
LithiumBay Lithium 49000 2.4 175 5 62 9 02 2024
ManganesePit Manganese 76000 4.1 125 8 80 10 16 2024
CoalDeep Coal 99000 5.5 150 9 90 11 21 2024
IronFort Iron 87000 3.6 118 7 76 12 08 2024
GoldNest Gold 47000 2.0 205 6 68 12 28 2024
;
run;
proc print data=mining_raw;
run;
OUTPUT:
| Obs | Mine_Name | Mineral_Type | Output_Tons | Accident_Rate | Operating_Cost | Environmental_Impact | Utilization_Pct | Inspect_Month | Inspect_Day | Inspect_Year | Inspection_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | IronPeak | Iron | 85000 | 3.2 | 120 | 7 | 78 | 3 | 12 | 2024 | 12MAR2024 |
| 2 | CoalVista | Coal | 92000 | 4.8 | 140 | 8 | 82 | 4 | 18 | 2024 | 18APR2024 |
| 3 | GoldRidge | Gold | 45000 | 2.1 | 210 | 6 | 65 | 2 | 5 | 2024 | 05FEB2024 |
| 4 | CopperVale | Copper | 67000 | 3.9 | 160 | 7 | 71 | 1 | 22 | 2024 | 22JAN2024 |
| 5 | SilverRock | Silver | 38000 | 1.8 | 190 | 5 | 60 | 5 | 11 | 2024 | 11MAY2024 |
| 6 | ZincField | Zinc | 72000 | 4.5 | 130 | 8 | 85 | 6 | 19 | 2024 | 19JUN2024 |
| 7 | BauxiteHub | Bauxite | 81000 | 3.0 | 110 | 6 | 74 | 7 | 14 | 2024 | 14JUL2024 |
| 8 | NickelCore | Nickel | 54000 | 2.7 | 155 | 7 | 69 | 8 | 9 | 2024 | 09AUG2024 |
| 9 | LithiumBay | Lithium | 49000 | 2.4 | 175 | 5 | 62 | 9 | 2 | 2024 | 02SEP2024 |
| 10 | ManganesePit | Manganese | 76000 | 4.1 | 125 | 8 | 80 | 10 | 16 | 2024 | 16OCT2024 |
| 11 | CoalDeep | Coal | 99000 | 5.5 | 150 | 9 | 90 | 11 | 21 | 2024 | 21NOV2024 |
| 12 | IronFort | Iron | 87000 | 3.6 | 118 | 7 | 76 | 12 | 8 | 2024 | 08DEC2024 |
| 13 | GoldNest | Gold | 47000 | 2.0 | 205 | 6 | 68 | 12 | 28 | 2024 | 28DEC2024 |
·
length avoids truncation of character
variables
·
mdy() converts numeric date parts into a
valid SAS date
·
format date9. improves readability
·
Simulates real inspection data
·
More than 12 observations, as required
5. DATA CLEANING & STANDARDIZATION
data mining_clean;
set mining_raw;
Mine_Name = propcase(strip(Mine_Name));
Mineral_Type = upcase(strip(Mineral_Type));
Accident_Rate = round(Accident_Rate, 0.1);
Operating_Cost = abs(Operating_Cost);
run;
proc print data=mining_clean;
run;
OUTPUT:
| Obs | Mine_Name | Mineral_Type | Output_Tons | Accident_Rate | Operating_Cost | Environmental_Impact | Utilization_Pct | Inspect_Month | Inspect_Day | Inspect_Year | Inspection_Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ironpeak | IRON | 85000 | 3.2 | 120 | 7 | 78 | 3 | 12 | 2024 | 12MAR2024 |
| 2 | Coalvista | COAL | 92000 | 4.8 | 140 | 8 | 82 | 4 | 18 | 2024 | 18APR2024 |
| 3 | Goldridge | GOLD | 45000 | 2.1 | 210 | 6 | 65 | 2 | 5 | 2024 | 05FEB2024 |
| 4 | Coppervale | COPPER | 67000 | 3.9 | 160 | 7 | 71 | 1 | 22 | 2024 | 22JAN2024 |
| 5 | Silverrock | SILVER | 38000 | 1.8 | 190 | 5 | 60 | 5 | 11 | 2024 | 11MAY2024 |
| 6 | Zincfield | ZINC | 72000 | 4.5 | 130 | 8 | 85 | 6 | 19 | 2024 | 19JUN2024 |
| 7 | Bauxitehub | BAUXITE | 81000 | 3.0 | 110 | 6 | 74 | 7 | 14 | 2024 | 14JUL2024 |
| 8 | Nickelcore | NICKEL | 54000 | 2.7 | 155 | 7 | 69 | 8 | 9 | 2024 | 09AUG2024 |
| 9 | Lithiumbay | LITHIUM | 49000 | 2.4 | 175 | 5 | 62 | 9 | 2 | 2024 | 02SEP2024 |
| 10 | Manganesepit | MANGANESE | 76000 | 4.1 | 125 | 8 | 80 | 10 | 16 | 2024 | 16OCT2024 |
| 11 | Coaldeep | COAL | 99000 | 5.5 | 150 | 9 | 90 | 11 | 21 | 2024 | 21NOV2024 |
| 12 | Ironfort | IRON | 87000 | 3.6 | 118 | 7 | 76 | 12 | 8 | 2024 | 08DEC2024 |
| 13 | Goldnest | GOLD | 47000 | 2.0 | 205 | 6 | 68 | 12 | 28 | 2024 | 28DEC2024 |
·
strip() removes unwanted spaces
·
propcase() standardizes naming
·
upcase() ensures grouping consistency
·
round() avoids floating-point noise
6. DERIVED METRICS & PROFITABILITY INDEX
data mining_derived;
set mining_clean;
Profitability_Index =
((Output_Tons * 0.02) - Operating_Cost) / Operating_Cost * 100;
format Profitability_Index 8.2;
run;
proc print data=mining_derived;
run;
OUTPUT:
| Obs | Mine_Name | Mineral_Type | Output_Tons | Accident_Rate | Operating_Cost | Environmental_Impact | Utilization_Pct | Inspect_Month | Inspect_Day | Inspect_Year | Inspection_Date | Profitability_Index |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ironpeak | IRON | 85000 | 3.2 | 120 | 7 | 78 | 3 | 12 | 2024 | 12MAR2024 | 1316.67 |
| 2 | Coalvista | COAL | 92000 | 4.8 | 140 | 8 | 82 | 4 | 18 | 2024 | 18APR2024 | 1214.29 |
| 3 | Goldridge | GOLD | 45000 | 2.1 | 210 | 6 | 65 | 2 | 5 | 2024 | 05FEB2024 | 328.57 |
| 4 | Coppervale | COPPER | 67000 | 3.9 | 160 | 7 | 71 | 1 | 22 | 2024 | 22JAN2024 | 737.50 |
| 5 | Silverrock | SILVER | 38000 | 1.8 | 190 | 5 | 60 | 5 | 11 | 2024 | 11MAY2024 | 300.00 |
| 6 | Zincfield | ZINC | 72000 | 4.5 | 130 | 8 | 85 | 6 | 19 | 2024 | 19JUN2024 | 1007.69 |
| 7 | Bauxitehub | BAUXITE | 81000 | 3.0 | 110 | 6 | 74 | 7 | 14 | 2024 | 14JUL2024 | 1372.73 |
| 8 | Nickelcore | NICKEL | 54000 | 2.7 | 155 | 7 | 69 | 8 | 9 | 2024 | 09AUG2024 | 596.77 |
| 9 | Lithiumbay | LITHIUM | 49000 | 2.4 | 175 | 5 | 62 | 9 | 2 | 2024 | 02SEP2024 | 460.00 |
| 10 | Manganesepit | MANGANESE | 76000 | 4.1 | 125 | 8 | 80 | 10 | 16 | 2024 | 16OCT2024 | 1116.00 |
| 11 | Coaldeep | COAL | 99000 | 5.5 | 150 | 9 | 90 | 11 | 21 | 2024 | 21NOV2024 | 1220.00 |
| 12 | Ironfort | IRON | 87000 | 3.6 | 118 | 7 | 76 | 12 | 8 | 2024 | 08DEC2024 | 1374.58 |
| 13 | Goldnest | GOLD | 47000 | 2.0 | 205 | 6 | 68 | 12 | 28 | 2024 | 28DEC2024 | 358.54 |
·
Converts production → revenue proxy
·
Normalizes profit as percentage
·
Used in board-level KPIs
7. UTILIZATION CLASSIFICATION MACRO
%macro utilization_class;
data mining_util;
set mining_derived;
length Utilization_Class $20;
if Utilization_Pct < 65 then Utilization_Class = "UNDER UTILIZED";
else if 65 < Utilization_Pct < 80 then Utilization_Class = "OPTIMAL";
else Utilization_Class = "OVER UTILIZED";
run;
proc print data=mining_util;
run;
%mend;
%utilization_class;
OUTPUT:
| Obs | Mine_Name | Mineral_Type | Output_Tons | Accident_Rate | Operating_Cost | Environmental_Impact | Utilization_Pct | Inspect_Month | Inspect_Day | Inspect_Year | Inspection_Date | Profitability_Index | Utilization_Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ironpeak | IRON | 85000 | 3.2 | 120 | 7 | 78 | 3 | 12 | 2024 | 12MAR2024 | 1316.67 | OPTIMAL |
| 2 | Coalvista | COAL | 92000 | 4.8 | 140 | 8 | 82 | 4 | 18 | 2024 | 18APR2024 | 1214.29 | OVER UTILIZED |
| 3 | Goldridge | GOLD | 45000 | 2.1 | 210 | 6 | 65 | 2 | 5 | 2024 | 05FEB2024 | 328.57 | OVER UTILIZED |
| 4 | Coppervale | COPPER | 67000 | 3.9 | 160 | 7 | 71 | 1 | 22 | 2024 | 22JAN2024 | 737.50 | OPTIMAL |
| 5 | Silverrock | SILVER | 38000 | 1.8 | 190 | 5 | 60 | 5 | 11 | 2024 | 11MAY2024 | 300.00 | UNDER UTILIZED |
| 6 | Zincfield | ZINC | 72000 | 4.5 | 130 | 8 | 85 | 6 | 19 | 2024 | 19JUN2024 | 1007.69 | OVER UTILIZED |
| 7 | Bauxitehub | BAUXITE | 81000 | 3.0 | 110 | 6 | 74 | 7 | 14 | 2024 | 14JUL2024 | 1372.73 | OPTIMAL |
| 8 | Nickelcore | NICKEL | 54000 | 2.7 | 155 | 7 | 69 | 8 | 9 | 2024 | 09AUG2024 | 596.77 | OPTIMAL |
| 9 | Lithiumbay | LITHIUM | 49000 | 2.4 | 175 | 5 | 62 | 9 | 2 | 2024 | 02SEP2024 | 460.00 | UNDER UTILIZED |
| 10 | Manganesepit | MANGANESE | 76000 | 4.1 | 125 | 8 | 80 | 10 | 16 | 2024 | 16OCT2024 | 1116.00 | OVER UTILIZED |
| 11 | Coaldeep | COAL | 99000 | 5.5 | 150 | 9 | 90 | 11 | 21 | 2024 | 21NOV2024 | 1220.00 | OVER UTILIZED |
| 12 | Ironfort | IRON | 87000 | 3.6 | 118 | 7 | 76 | 12 | 8 | 2024 | 08DEC2024 | 1374.58 | OPTIMAL |
| 13 | Goldnest | GOLD | 47000 | 2.0 | 205 | 6 | 68 | 12 | 28 | 2024 | 28DEC2024 | 358.54 | OPTIMAL |
·
Reusable logic
·
Reduces duplication
·
Interview favorite topic
8. FRAUD DETECTION LOGIC (MACRO)
%macro fraud_flag;
data mining_fraud;
set mining_util;
length Fraud_Flag $15.;
if Accident_Rate > 5 and Utilization_Pct > 85 and Environmental_Impact >= 8
then Fraud_Flag = "POTENTIAL FRAUD";
else Fraud_Flag = "NORMAL";
run;
proc print data=mining_fraud;
run;
%mend;
%fraud_flag;
OUTPUT:
| Obs | Mine_Name | Mineral_Type | Output_Tons | Accident_Rate | Operating_Cost | Environmental_Impact | Utilization_Pct | Inspect_Month | Inspect_Day | Inspect_Year | Inspection_Date | Profitability_Index | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ironpeak | IRON | 85000 | 3.2 | 120 | 7 | 78 | 3 | 12 | 2024 | 12MAR2024 | 1316.67 | OPTIMAL | NORMAL |
| 2 | Coalvista | COAL | 92000 | 4.8 | 140 | 8 | 82 | 4 | 18 | 2024 | 18APR2024 | 1214.29 | OVER UTILIZED | NORMAL |
| 3 | Goldridge | GOLD | 45000 | 2.1 | 210 | 6 | 65 | 2 | 5 | 2024 | 05FEB2024 | 328.57 | OVER UTILIZED | NORMAL |
| 4 | Coppervale | COPPER | 67000 | 3.9 | 160 | 7 | 71 | 1 | 22 | 2024 | 22JAN2024 | 737.50 | OPTIMAL | NORMAL |
| 5 | Silverrock | SILVER | 38000 | 1.8 | 190 | 5 | 60 | 5 | 11 | 2024 | 11MAY2024 | 300.00 | UNDER UTILIZED | NORMAL |
| 6 | Zincfield | ZINC | 72000 | 4.5 | 130 | 8 | 85 | 6 | 19 | 2024 | 19JUN2024 | 1007.69 | OVER UTILIZED | NORMAL |
| 7 | Bauxitehub | BAUXITE | 81000 | 3.0 | 110 | 6 | 74 | 7 | 14 | 2024 | 14JUL2024 | 1372.73 | OPTIMAL | NORMAL |
| 8 | Nickelcore | NICKEL | 54000 | 2.7 | 155 | 7 | 69 | 8 | 9 | 2024 | 09AUG2024 | 596.77 | OPTIMAL | NORMAL |
| 9 | Lithiumbay | LITHIUM | 49000 | 2.4 | 175 | 5 | 62 | 9 | 2 | 2024 | 02SEP2024 | 460.00 | UNDER UTILIZED | NORMAL |
| 10 | Manganesepit | MANGANESE | 76000 | 4.1 | 125 | 8 | 80 | 10 | 16 | 2024 | 16OCT2024 | 1116.00 | OVER UTILIZED | NORMAL |
| 11 | Coaldeep | COAL | 99000 | 5.5 | 150 | 9 | 90 | 11 | 21 | 2024 | 21NOV2024 | 1220.00 | OVER UTILIZED | POTENTIAL FRAUD |
| 12 | Ironfort | IRON | 87000 | 3.6 | 118 | 7 | 76 | 12 | 8 | 2024 | 08DEC2024 | 1374.58 | OPTIMAL | NORMAL |
| 13 | Goldnest | GOLD | 47000 | 2.0 | 205 | 6 | 68 | 12 | 28 | 2024 | 28DEC2024 | 358.54 | OPTIMAL | NORMAL |
·
High output + high accidents + high
impact = suspicious
·
Mimics regulatory red-flag rules
9. PROC SQL – SUMMARY ANALYSIS
proc sql;
create table mineral_summary as
select Mineral_Type,
count(*) as Mine_Count,
mean(Output_Tons) as Avg_Output,
mean(Profitability_Index) as Avg_Profit
from mining_fraud
group by Mineral_Type;
quit;
proc print data=mineral_summary;
run;
OUTPUT:
| Obs | Mineral_Type | Mine_Count | Avg_Output | Avg_Profit |
|---|---|---|---|---|
| 1 | BAUXITE | 1 | 81000 | 1372.73 |
| 2 | COAL | 2 | 95500 | 1217.14 |
| 3 | COPPER | 1 | 67000 | 737.50 |
| 4 | GOLD | 2 | 46000 | 343.55 |
| 5 | IRON | 2 | 86000 | 1345.62 |
| 6 | LITHIUM | 1 | 49000 | 460.00 |
| 7 | MANGANESE | 1 | 76000 | 1116.00 |
| 8 | NICKEL | 1 | 54000 | 596.77 |
| 9 | SILVER | 1 | 38000 | 300.00 |
| 10 | ZINC | 1 | 72000 | 1007.69 |
·
Industry-standard
·
Easy joins & aggregations
·
Preferred in ADaM-style logic
10. PROC FREQ – RISK DISTRIBUTION
proc freq data=mining_fraud;
tables Utilization_Class*Fraud_Flag / nocum nopercent;
run;
OUTPUT:
The FREQ Procedure
|
| ||||||||||||||||||||||||||||
·
Pattern detection
·
Compliance dashboards
· Risk monitoring
11. PROC MEANS – STATISTICS
proc means data=mining_fraud mean min max std;
var Output_Tons Accident_Rate Operating_Cost Profitability_Index;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Minimum | Maximum | Std Dev |
|---|---|---|---|---|
Output_Tons Accident_Rate Operating_Cost Profitability_Index | 68615.38 3.3538462 152.9230769 877.1792646 | 38000.00 1.8000000 110.0000000 300.0000000 | 99000.00 5.5000000 210.0000000 1374.58 | 20122.70 1.1630530 33.5570696 424.9146120 |
·
Identifies outliers
· Helps budgeting decisions
12. PROC UNIVARIATE – DISTRIBUTION CHECK
proc univariate data=mining_fraud;
var Profitability_Index;
histogram Profitability_Index/ normal;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Profitability_Index
| Moments | |||
|---|---|---|---|
| N | 13 | Sum Weights | 13 |
| Mean | 877.179265 | Sum Observations | 11403.3304 |
| Std Deviation | 424.914612 | Variance | 180552.427 |
| Skewness | -0.2184897 | Kurtosis | -1.8131089 |
| Uncorrected SS | 12169394.1 | Corrected SS | 2166629.13 |
| Coeff Variation | 48.4410233 | Std Error Mean | 117.850109 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 877.179 | Std Deviation | 424.91461 |
| Median | 1007.692 | Variance | 180552 |
| Mode | . | Range | 1075 |
| Interquartile Range | 760.00000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 7.443177 | Pr > |t| | <.0001 |
| Sign | M | 6.5 | Pr >= |M| | 0.0002 |
| Signed Rank | S | 45.5 | Pr >= |S| | 0.0002 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 1374.576 |
| 99% | 1374.576 |
| 95% | 1374.576 |
| 90% | 1372.727 |
| 75% Q3 | 1220.000 |
| 50% Median | 1007.692 |
| 25% Q1 | 460.000 |
| 10% | 328.571 |
| 5% | 300.000 |
| 1% | 300.000 |
| 0% Min | 300.000 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 300.000 | 5 | 1214.29 | 2 |
| 328.571 | 3 | 1220.00 | 11 |
| 358.537 | 13 | 1316.67 | 1 |
| 460.000 | 9 | 1372.73 | 7 |
| 596.774 | 8 | 1374.58 | 12 |
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for Profitability_Index
| Parameters for Normal Distribution | ||
|---|---|---|
| Parameter | Symbol | Estimate |
| Mean | Mu | 877.1793 |
| Std Dev | Sigma | 424.9146 |
| Goodness-of-Fit Tests for Normal Distribution | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Kolmogorov-Smirnov | D | 0.17449546 | Pr > D | >0.150 |
| Cramer-von Mises | W-Sq | 0.10067637 | Pr > W-Sq | 0.099 |
| Anderson-Darling | A-Sq | 0.63677192 | Pr > A-Sq | 0.079 |
| Quantiles for Normal Distribution | ||
|---|---|---|
| Percent | Quantile | |
| Observed | Estimated | |
| 1.0 | 300.000 | -111.320 |
| 5.0 | 300.000 | 178.257 |
| 10.0 | 328.571 | 332.629 |
| 25.0 | 460.000 | 590.579 |
| 50.0 | 1007.692 | 877.179 |
| 75.0 | 1220.000 | 1163.780 |
| 90.0 | 1372.727 | 1421.729 |
| 95.0 | 1374.576 | 1576.102 |
| 99.0 | 1374.576 | 1865.678 |
·
Detect skewness
· Financial risk assessment
13. PROC CORR – RELATIONSHIP ANALYSIS
proc corr data=mining_fraud;
var Output_Tons Accident_Rate Environmental_Impact Profitability_Index;
run;
OUTPUT:
The CORR Procedure
| 4 Variables: | Output_Tons Accident_Rate Environmental_Impact Profitability_Index |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Output_Tons | 13 | 68615 | 20123 | 892000 | 38000 | 99000 |
| Accident_Rate | 13 | 3.35385 | 1.16305 | 43.60000 | 1.80000 | 5.50000 |
| Environmental_Impact | 13 | 6.84615 | 1.21423 | 89.00000 | 5.00000 | 9.00000 |
| Profitability_Index | 13 | 877.17926 | 424.91461 | 11403 | 300.00000 | 1375 |
| Pearson Correlation Coefficients, N = 13 Prob > |r| under H0: Rho=0 | ||||
|---|---|---|---|---|
| Output_Tons | Accident_Rate | Environmental_Impact | Profitability_Index | |
| Output_Tons | 1.00000 | 0.85410 0.0002 | 0.77840 0.0017 | 0.94860 <.0001 |
| Accident_Rate | 0.85410 0.0002 | 1.00000 | 0.92099 <.0001 | 0.71199 0.0063 |
| Environmental_Impact | 0.77840 0.0017 | 0.92099 <.0001 | 1.00000 | 0.63839 0.0189 |
| Profitability_Index | 0.94860 <.0001 | 0.71199 0.0063 | 0.63839 0.0189 | 1.00000 |
· Shows trade-off between safety & profit
14. PROC SGPLOT – VISUAL ANALYSIS
proc sgplot data=mining_fraud;
scatter x=Environmental_Impact y=Profitability_Index /
group=Utilization_Class;
run;
OUTPUT:
·
Management-friendly
·
Decision support
15. TRANSPOSE – REPORT FORMAT
proc transpose data=mineral_summary out=mineral_t;
id Mineral_Type;
var Avg_Profit;
run;
proc print data=mineral_t;
run;
OUTPUT:
| Obs | _NAME_ | BAUXITE | COAL | COPPER | GOLD | IRON | LITHIUM | MANGANESE | NICKEL | SILVER | ZINC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Avg_Profit | 1372.73 | 1217.14 | 737.5 | 343.554 | 1345.62 | 460 | 1116 | 596.774 | 300 | 1007.69 |
16. APPEND,SET
data audit_log;
set mining_fraud;
Audit_Year = year(Inspection_Date);
run;
proc print data=audit_log;
run;
OUTPUT:
| Obs | Mine_Name | Mineral_Type | Output_Tons | Accident_Rate | Operating_Cost | Environmental_Impact | Utilization_Pct | Inspect_Month | Inspect_Day | Inspect_Year | Inspection_Date | Profitability_Index | Utilization_Class | Fraud_Flag | Audit_Year |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ironpeak | IRON | 85000 | 3.2 | 120 | 7 | 78 | 3 | 12 | 2024 | 12MAR2024 | 1316.67 | OPTIMAL | NORMAL | 2024 |
| 2 | Coalvista | COAL | 92000 | 4.8 | 140 | 8 | 82 | 4 | 18 | 2024 | 18APR2024 | 1214.29 | OVER UTILIZED | NORMAL | 2024 |
| 3 | Goldridge | GOLD | 45000 | 2.1 | 210 | 6 | 65 | 2 | 5 | 2024 | 05FEB2024 | 328.57 | OVER UTILIZED | NORMAL | 2024 |
| 4 | Coppervale | COPPER | 67000 | 3.9 | 160 | 7 | 71 | 1 | 22 | 2024 | 22JAN2024 | 737.50 | OPTIMAL | NORMAL | 2024 |
| 5 | Silverrock | SILVER | 38000 | 1.8 | 190 | 5 | 60 | 5 | 11 | 2024 | 11MAY2024 | 300.00 | UNDER UTILIZED | NORMAL | 2024 |
| 6 | Zincfield | ZINC | 72000 | 4.5 | 130 | 8 | 85 | 6 | 19 | 2024 | 19JUN2024 | 1007.69 | OVER UTILIZED | NORMAL | 2024 |
| 7 | Bauxitehub | BAUXITE | 81000 | 3.0 | 110 | 6 | 74 | 7 | 14 | 2024 | 14JUL2024 | 1372.73 | OPTIMAL | NORMAL | 2024 |
| 8 | Nickelcore | NICKEL | 54000 | 2.7 | 155 | 7 | 69 | 8 | 9 | 2024 | 09AUG2024 | 596.77 | OPTIMAL | NORMAL | 2024 |
| 9 | Lithiumbay | LITHIUM | 49000 | 2.4 | 175 | 5 | 62 | 9 | 2 | 2024 | 02SEP2024 | 460.00 | UNDER UTILIZED | NORMAL | 2024 |
| 10 | Manganesepit | MANGANESE | 76000 | 4.1 | 125 | 8 | 80 | 10 | 16 | 2024 | 16OCT2024 | 1116.00 | OVER UTILIZED | NORMAL | 2024 |
| 11 | Coaldeep | COAL | 99000 | 5.5 | 150 | 9 | 90 | 11 | 21 | 2024 | 21NOV2024 | 1220.00 | OVER UTILIZED | POTENTIAL FRAUD | 2024 |
| 12 | Ironfort | IRON | 87000 | 3.6 | 118 | 7 | 76 | 12 | 8 | 2024 | 08DEC2024 | 1374.58 | OPTIMAL | NORMAL | 2024 |
| 13 | Goldnest | GOLD | 47000 | 2.0 | 205 | 6 | 68 | 12 | 28 | 2024 | 28DEC2024 | 358.54 | OPTIMAL | NORMAL | 2024 |
proc append base=mining_fraud
data=audit_log force;
run;
proc print data=mining_fraud;
run;
OUTPUT:
| Obs | Mine_Name | Mineral_Type | Output_Tons | Accident_Rate | Operating_Cost | Environmental_Impact | Utilization_Pct | Inspect_Month | Inspect_Day | Inspect_Year | Inspection_Date | Profitability_Index | Utilization_Class | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ironpeak | IRON | 85000 | 3.2 | 120 | 7 | 78 | 3 | 12 | 2024 | 12MAR2024 | 1316.67 | OPTIMAL | NORMAL |
| 2 | Coalvista | COAL | 92000 | 4.8 | 140 | 8 | 82 | 4 | 18 | 2024 | 18APR2024 | 1214.29 | OVER UTILIZED | NORMAL |
| 3 | Goldridge | GOLD | 45000 | 2.1 | 210 | 6 | 65 | 2 | 5 | 2024 | 05FEB2024 | 328.57 | OVER UTILIZED | NORMAL |
| 4 | Coppervale | COPPER | 67000 | 3.9 | 160 | 7 | 71 | 1 | 22 | 2024 | 22JAN2024 | 737.50 | OPTIMAL | NORMAL |
| 5 | Silverrock | SILVER | 38000 | 1.8 | 190 | 5 | 60 | 5 | 11 | 2024 | 11MAY2024 | 300.00 | UNDER UTILIZED | NORMAL |
| 6 | Zincfield | ZINC | 72000 | 4.5 | 130 | 8 | 85 | 6 | 19 | 2024 | 19JUN2024 | 1007.69 | OVER UTILIZED | NORMAL |
| 7 | Bauxitehub | BAUXITE | 81000 | 3.0 | 110 | 6 | 74 | 7 | 14 | 2024 | 14JUL2024 | 1372.73 | OPTIMAL | NORMAL |
| 8 | Nickelcore | NICKEL | 54000 | 2.7 | 155 | 7 | 69 | 8 | 9 | 2024 | 09AUG2024 | 596.77 | OPTIMAL | NORMAL |
| 9 | Lithiumbay | LITHIUM | 49000 | 2.4 | 175 | 5 | 62 | 9 | 2 | 2024 | 02SEP2024 | 460.00 | UNDER UTILIZED | NORMAL |
| 10 | Manganesepit | MANGANESE | 76000 | 4.1 | 125 | 8 | 80 | 10 | 16 | 2024 | 16OCT2024 | 1116.00 | OVER UTILIZED | NORMAL |
| 11 | Coaldeep | COAL | 99000 | 5.5 | 150 | 9 | 90 | 11 | 21 | 2024 | 21NOV2024 | 1220.00 | OVER UTILIZED | POTENTIAL FRAUD |
| 12 | Ironfort | IRON | 87000 | 3.6 | 118 | 7 | 76 | 12 | 8 | 2024 | 08DEC2024 | 1374.58 | OPTIMAL | NORMAL |
| 13 | Goldnest | GOLD | 47000 | 2.0 | 205 | 6 | 68 | 12 | 28 | 2024 | 28DEC2024 | 358.54 | OPTIMAL | NORMAL |
| 14 | Ironpeak | IRON | 85000 | 3.2 | 120 | 7 | 78 | 3 | 12 | 2024 | 12MAR2024 | 1316.67 | OPTIMAL | NORMAL |
| 15 | Coalvista | COAL | 92000 | 4.8 | 140 | 8 | 82 | 4 | 18 | 2024 | 18APR2024 | 1214.29 | OVER UTILIZED | NORMAL |
| 16 | Goldridge | GOLD | 45000 | 2.1 | 210 | 6 | 65 | 2 | 5 | 2024 | 05FEB2024 | 328.57 | OVER UTILIZED | NORMAL |
| 17 | Coppervale | COPPER | 67000 | 3.9 | 160 | 7 | 71 | 1 | 22 | 2024 | 22JAN2024 | 737.50 | OPTIMAL | NORMAL |
| 18 | Silverrock | SILVER | 38000 | 1.8 | 190 | 5 | 60 | 5 | 11 | 2024 | 11MAY2024 | 300.00 | UNDER UTILIZED | NORMAL |
| 19 | Zincfield | ZINC | 72000 | 4.5 | 130 | 8 | 85 | 6 | 19 | 2024 | 19JUN2024 | 1007.69 | OVER UTILIZED | NORMAL |
| 20 | Bauxitehub | BAUXITE | 81000 | 3.0 | 110 | 6 | 74 | 7 | 14 | 2024 | 14JUL2024 | 1372.73 | OPTIMAL | NORMAL |
| 21 | Nickelcore | NICKEL | 54000 | 2.7 | 155 | 7 | 69 | 8 | 9 | 2024 | 09AUG2024 | 596.77 | OPTIMAL | NORMAL |
| 22 | Lithiumbay | LITHIUM | 49000 | 2.4 | 175 | 5 | 62 | 9 | 2 | 2024 | 02SEP2024 | 460.00 | UNDER UTILIZED | NORMAL |
| 23 | Manganesepit | MANGANESE | 76000 | 4.1 | 125 | 8 | 80 | 10 | 16 | 2024 | 16OCT2024 | 1116.00 | OVER UTILIZED | NORMAL |
| 24 | Coaldeep | COAL | 99000 | 5.5 | 150 | 9 | 90 | 11 | 21 | 2024 | 21NOV2024 | 1220.00 | OVER UTILIZED | POTENTIAL FRAUD |
| 25 | Ironfort | IRON | 87000 | 3.6 | 118 | 7 | 76 | 12 | 8 | 2024 | 08DEC2024 | 1374.58 | OPTIMAL | NORMAL |
| 26 | Goldnest | GOLD | 47000 | 2.0 | 205 | 6 | 68 | 12 | 28 | 2024 | 28DEC2024 | 358.54 | OPTIMAL | NORMAL |
17. PROC DATASETS – CLEANUP
proc datasets library=work nolist;
delete mining_raw mining_clean;
quit;
LOG:
·
Memory efficiency
·
Production discipline
18. BUSINESS INTERPRETATION
· Coal mines
show higher accident + fraud risk
·
Over-utilized mines reduce long-term
profitability
·
Environmental impact negatively
correlates with profit
·
Macro-based rules allow regulatory automation
19. Conclusion :
This
mining analytics project demonstrates how structured data and SAS-based
analysis can transform raw operational information into meaningful business
insights. By combining production output, accident rates, operating costs,
environmental impact, and profitability measures, the study provides a balanced
view of both performance and risk across mining sites. The use of data steps,
PROC SQL, statistical procedures, macros, and visualization techniques shows
how complex mining operations can be monitored in a systematic and repeatable
way. The analysis highlights that higher output alone does not guarantee better
profitability, especially when safety incidents and environmental impact
increase. It also shows how utilization classification and simple
fraud-detection logic can help identify mines that require closer inspection or
corrective action. Overall, this project emphasizes that data-driven
decision-making supports safer operations, better cost control, regulatory
compliance, and sustainable profitability. Such an analytical approach enables
management to move from reactive problem-solving to proactive planning,
ensuring long-term operational stability and responsible mining practices.
INTERVIEW QUESTIONS FOR YOU
1.What is the difference between a DATA step and a PROC step
in SAS?
2.What is the purpose
of PROC SQL in SAS?
3.What is the
difference between WHERE and IF in SAS?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
1.What is the difference between a DATA step and a PROC step
in SAS?
2.What is the purpose
of PROC SQL in SAS?
3.What is the
difference between WHERE and IF in SAS?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent MINING SITES data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics and smart cities
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Follow Us On :
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--->Follow our blog for more SAS-based analytics projects and industry data models.
---> Support Us By Following Our Blog..
To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:
2.Is SDTM DM quality control in SAS the key to avoiding last-minute FDA rejection risks?
3.Can SAS analytics reveal which world tourist places truly attract visitors year after year?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2.Is SDTM DM quality control in SAS the key to avoiding last-minute FDA rejection risks?
3.Can SAS analytics reveal which world tourist places truly attract visitors year after year?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment