401.How Efficient Are Global Data Centers? A Complete SAS Analytics Study
How Efficient Are Global Data Centers? A Complete SAS Analytics Study
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA STEP | PROC SQL | PROC PRINT | PROC SGPLOT | MACROS | PROC CORR | PROC MEANS | PROC FREQ | PROC UNIVARIATE | APPEND PROC TRANSPOSE | PROC DATASETS DELETE | DATA FUNCTIONS
1. INTRODUCTION
Modern data
centers are the backbone of cloud computing, AI, banking, healthcare,
and government systems.
They consume massive power, require efficient cooling, and must ensure zero downtime.
This project simulates global data center operational data and
demonstrates how a SAS programmer
analyzes efficiency, sustainability, risk, and anomalies using:
·
Base SAS
·
PROC SQL
·
Statistical procedures
·
Macros
·
Date intelligence
·
Data quality & fraud detection logic
2. BUSINESS CONTEXT
Why this analysis matters:
·
High Power
Usage Effectiveness (PUE) → Increased operational cost
·
High
Downtime → SLA penalties & revenue loss
·
Low
Sustainability Score → Regulatory & ESG risks
·
Abnormal
metrics → Possible data
manipulation or reporting fraud
Business questions answered:
·
Which data centers are inefficient?
·
Which locations show abnormal behavior?
·
Are sustainability scores aligned with
operational metrics?
·
Can we auto-classify risk and utilization?
3. TABLE OF CONTENTS
1.
Data Creation
2.
Standardization & Formatting
3.
Derived Metrics
4.
Macro-Driven Classification
5.
Statistical Analysis
6.
Fraud Detection Logic
7.
Correlation Analysis
8.
Visualization
9.
Advanced Data Handling (Append, Transpose)
10. Cleanup
& Optimization
11. Conclusion
4. DATASET CREATION (RAW DATA)
data datacenter_raw;
input Center_ID Center_Name $ Country:$12. Power_Usage_PUE Server_Count Cooling_Efficiency
Downtime_Hours Sustainability_Score Install_Date : date9.;
format Install_Date date9.;
datalines;
1 alpha_dc usa 1.4 12000 82 5 78 15JAN2015
2 beta_dc india 1.8 9500 70 18 65 22MAR2016
3 gamma_dc germany 1.3 14000 88 2 85 10JUN2014
4 delta_dc uk 1.6 11000 75 10 72 01DEC2017
5 epsilon_dc japan 1.2 16000 92 1 90 09SEP2013
6 zeta_dc india 2.1 8000 60 30 50 12FEB2018
7 eta_dc usa 1.5 10000 78 7 74 19AUG2016
8 theta_dc canada 1.4 10500 80 6 77 25NOV2015
9 iota_dc australia 1.7 9800 73 12 70 11APR2017
10 kappa_dc singapore 1.3 15000 90 3 88 06JUL2014
11 lambda_dc france 1.6 10800 76 9 73 18OCT2016
12 mu_dc india 2.3 7000 55 40 45 29JAN2019
13 nu_dc netherlands 1.2 15500 93 2 91 03MAY2013
14 xi_dc brazil 1.9 9000 68 22 60 14DEC2018
15 omicron_dc uae 1.5 11500 79 8 75 21JUN2016
;
run;
proc print data=datacenter_raw;
run;
OUTPUT:
| Obs | Center_ID | Center_Name | Country | Power_Usage_PUE | Server_Count | Cooling_Efficiency | Downtime_Hours | Sustainability_Score | Install_Date |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | alpha_dc | usa | 1.4 | 12000 | 82 | 5 | 78 | 15JAN2015 |
| 2 | 2 | beta_dc | india | 1.8 | 9500 | 70 | 18 | 65 | 22MAR2016 |
| 3 | 3 | gamma_dc | germany | 1.3 | 14000 | 88 | 2 | 85 | 10JUN2014 |
| 4 | 4 | delta_dc | uk | 1.6 | 11000 | 75 | 10 | 72 | 01DEC2017 |
| 5 | 5 | epsilon_ | japan | 1.2 | 16000 | 92 | 1 | 90 | 09SEP2013 |
| 6 | 6 | zeta_dc | india | 2.1 | 8000 | 60 | 30 | 50 | 12FEB2018 |
| 7 | 7 | eta_dc | usa | 1.5 | 10000 | 78 | 7 | 74 | 19AUG2016 |
| 8 | 8 | theta_dc | canada | 1.4 | 10500 | 80 | 6 | 77 | 25NOV2015 |
| 9 | 9 | iota_dc | australia | 1.7 | 9800 | 73 | 12 | 70 | 11APR2017 |
| 10 | 10 | kappa_dc | singapore | 1.3 | 15000 | 90 | 3 | 88 | 06JUL2014 |
| 11 | 11 | lambda_d | france | 1.6 | 10800 | 76 | 9 | 73 | 18OCT2016 |
| 12 | 12 | mu_dc | india | 2.3 | 7000 | 55 | 40 | 45 | 29JAN2019 |
| 13 | 13 | nu_dc | netherlands | 1.2 | 15500 | 93 | 2 | 91 | 03MAY2013 |
| 14 | 14 | xi_dc | brazil | 1.9 | 9000 | 68 | 22 | 60 | 14DEC2018 |
| 15 | 15 | omicron_ | uae | 1.5 | 11500 | 79 | 8 | 75 | 21JUN2016 |
·
DATA
step gives full control over variable types
·
Realistic multi-country, multi-year data
·
Includes both numeric & character variables
·
Dates allow INTCK / INTNX analysis
5. CHARACTER STANDARDIZATION & CLEANUP
data datacenter_clean;
set datacenter_raw;
Center_Name = propcase(strip(Center_Name));
Country = upcase(strip(Country));
Center_Code = catx('_', Country, Center_ID);
run;
proc print data=datacenter_clean;
run;
OUTPUT:
| Obs | Center_ID | Center_Name | Country | Power_Usage_PUE | Server_Count | Cooling_Efficiency | Downtime_Hours | Sustainability_Score | Install_Date | Center_Code |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | Alpha_dc | USA | 1.4 | 12000 | 82 | 5 | 78 | 15JAN2015 | USA_1 |
| 2 | 2 | Beta_dc | INDIA | 1.8 | 9500 | 70 | 18 | 65 | 22MAR2016 | INDIA_2 |
| 3 | 3 | Gamma_dc | GERMANY | 1.3 | 14000 | 88 | 2 | 85 | 10JUN2014 | GERMANY_3 |
| 4 | 4 | Delta_dc | UK | 1.6 | 11000 | 75 | 10 | 72 | 01DEC2017 | UK_4 |
| 5 | 5 | Epsilon_ | JAPAN | 1.2 | 16000 | 92 | 1 | 90 | 09SEP2013 | JAPAN_5 |
| 6 | 6 | Zeta_dc | INDIA | 2.1 | 8000 | 60 | 30 | 50 | 12FEB2018 | INDIA_6 |
| 7 | 7 | Eta_dc | USA | 1.5 | 10000 | 78 | 7 | 74 | 19AUG2016 | USA_7 |
| 8 | 8 | Theta_dc | CANADA | 1.4 | 10500 | 80 | 6 | 77 | 25NOV2015 | CANADA_8 |
| 9 | 9 | Iota_dc | AUSTRALIA | 1.7 | 9800 | 73 | 12 | 70 | 11APR2017 | AUSTRALIA_9 |
| 10 | 10 | Kappa_dc | SINGAPORE | 1.3 | 15000 | 90 | 3 | 88 | 06JUL2014 | SINGAPORE_10 |
| 11 | 11 | Lambda_d | FRANCE | 1.6 | 10800 | 76 | 9 | 73 | 18OCT2016 | FRANCE_11 |
| 12 | 12 | Mu_dc | INDIA | 2.3 | 7000 | 55 | 40 | 45 | 29JAN2019 | INDIA_12 |
| 13 | 13 | Nu_dc | NETHERLANDS | 1.2 | 15500 | 93 | 2 | 91 | 03MAY2013 | NETHERLANDS_13 |
| 14 | 14 | Xi_dc | BRAZIL | 1.9 | 9000 | 68 | 22 | 60 | 14DEC2018 | BRAZIL_14 |
| 15 | 15 | Omicron_ | UAE | 1.5 | 11500 | 79 | 8 | 75 | 21JUN2016 | UAE_15 |
|
Function |
Purpose |
|
STRIP |
Removes
leading/trailing spaces |
|
PROPCASE |
Standard
naming |
|
UPCASE |
Country
consistency |
|
CATX |
Safe
concatenation with delimiter |
6. DATE DERIVATIONS (MDY, INTCK, INTNX)
data datacenter_dates;
set datacenter_clean;
Current_Date = today();
Years_Operational = intck('year', Install_Date, Current_Date);
Next_Maintenance = intnx('month', Install_Date, 60, 'same');
format Current_Date Next_Maintenance date9.;
run;
proc print data=datacenter_dates;
run;
OUTPUT:
| Obs | Center_ID | Center_Name | Country | Power_Usage_PUE | Server_Count | Cooling_Efficiency | Downtime_Hours | Sustainability_Score | Install_Date | Center_Code | Current_Date | Years_Operational | Next_Maintenance |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | Alpha_dc | USA | 1.4 | 12000 | 82 | 5 | 78 | 15JAN2015 | USA_1 | 15FEB2026 | 11 | 15JAN2020 |
| 2 | 2 | Beta_dc | INDIA | 1.8 | 9500 | 70 | 18 | 65 | 22MAR2016 | INDIA_2 | 15FEB2026 | 10 | 22MAR2021 |
| 3 | 3 | Gamma_dc | GERMANY | 1.3 | 14000 | 88 | 2 | 85 | 10JUN2014 | GERMANY_3 | 15FEB2026 | 12 | 10JUN2019 |
| 4 | 4 | Delta_dc | UK | 1.6 | 11000 | 75 | 10 | 72 | 01DEC2017 | UK_4 | 15FEB2026 | 9 | 01DEC2022 |
| 5 | 5 | Epsilon_ | JAPAN | 1.2 | 16000 | 92 | 1 | 90 | 09SEP2013 | JAPAN_5 | 15FEB2026 | 13 | 09SEP2018 |
| 6 | 6 | Zeta_dc | INDIA | 2.1 | 8000 | 60 | 30 | 50 | 12FEB2018 | INDIA_6 | 15FEB2026 | 8 | 12FEB2023 |
| 7 | 7 | Eta_dc | USA | 1.5 | 10000 | 78 | 7 | 74 | 19AUG2016 | USA_7 | 15FEB2026 | 10 | 19AUG2021 |
| 8 | 8 | Theta_dc | CANADA | 1.4 | 10500 | 80 | 6 | 77 | 25NOV2015 | CANADA_8 | 15FEB2026 | 11 | 25NOV2020 |
| 9 | 9 | Iota_dc | AUSTRALIA | 1.7 | 9800 | 73 | 12 | 70 | 11APR2017 | AUSTRALIA_9 | 15FEB2026 | 9 | 11APR2022 |
| 10 | 10 | Kappa_dc | SINGAPORE | 1.3 | 15000 | 90 | 3 | 88 | 06JUL2014 | SINGAPORE_10 | 15FEB2026 | 12 | 06JUL2019 |
| 11 | 11 | Lambda_d | FRANCE | 1.6 | 10800 | 76 | 9 | 73 | 18OCT2016 | FRANCE_11 | 15FEB2026 | 10 | 18OCT2021 |
| 12 | 12 | Mu_dc | INDIA | 2.3 | 7000 | 55 | 40 | 45 | 29JAN2019 | INDIA_12 | 15FEB2026 | 7 | 29JAN2024 |
| 13 | 13 | Nu_dc | NETHERLANDS | 1.2 | 15500 | 93 | 2 | 91 | 03MAY2013 | NETHERLANDS_13 | 15FEB2026 | 13 | 03MAY2018 |
| 14 | 14 | Xi_dc | BRAZIL | 1.9 | 9000 | 68 | 22 | 60 | 14DEC2018 | BRAZIL_14 | 15FEB2026 | 8 | 14DEC2023 |
| 15 | 15 | Omicron_ | UAE | 1.5 | 11500 | 79 | 8 | 75 | 21JUN2016 | UAE_15 | 15FEB2026 | 10 | 21JUN2021 |
·
Operational age impacts maintenance cost
·
Predictive scheduling
·
Time-based risk modelling
7. UTILIZATION & RISK CLASSIFICATION (MACRO)
%macro classify;
data datacenter_classified;
set datacenter_dates;
length Utilization $12.;
if Server_Count >= 14000 then Utilization = 'HIGH';
else if Server_Count >= 10000 then Utilization = 'MEDIUM';
else Utilization = 'LOW';
length Risk_Flag $12.;
if Downtime_Hours > 25 or Power_Usage_PUE > 2 then Risk_Flag = 'HIGH_RISK';
else Risk_Flag = 'NORMAL';
run;
proc print data=datacenter_classified;
run;
%mend;
%classify;
OUTPUT:
| Obs | Center_ID | Center_Name | Country | Power_Usage_PUE | Server_Count | Cooling_Efficiency | Downtime_Hours | Sustainability_Score | Install_Date | Center_Code | Current_Date | Years_Operational | Next_Maintenance | Utilization | Risk_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | Alpha_dc | USA | 1.4 | 12000 | 82 | 5 | 78 | 15JAN2015 | USA_1 | 15FEB2026 | 11 | 15JAN2020 | MEDIUM | NORMAL |
| 2 | 2 | Beta_dc | INDIA | 1.8 | 9500 | 70 | 18 | 65 | 22MAR2016 | INDIA_2 | 15FEB2026 | 10 | 22MAR2021 | LOW | NORMAL |
| 3 | 3 | Gamma_dc | GERMANY | 1.3 | 14000 | 88 | 2 | 85 | 10JUN2014 | GERMANY_3 | 15FEB2026 | 12 | 10JUN2019 | HIGH | NORMAL |
| 4 | 4 | Delta_dc | UK | 1.6 | 11000 | 75 | 10 | 72 | 01DEC2017 | UK_4 | 15FEB2026 | 9 | 01DEC2022 | MEDIUM | NORMAL |
| 5 | 5 | Epsilon_ | JAPAN | 1.2 | 16000 | 92 | 1 | 90 | 09SEP2013 | JAPAN_5 | 15FEB2026 | 13 | 09SEP2018 | HIGH | NORMAL |
| 6 | 6 | Zeta_dc | INDIA | 2.1 | 8000 | 60 | 30 | 50 | 12FEB2018 | INDIA_6 | 15FEB2026 | 8 | 12FEB2023 | LOW | HIGH_RISK |
| 7 | 7 | Eta_dc | USA | 1.5 | 10000 | 78 | 7 | 74 | 19AUG2016 | USA_7 | 15FEB2026 | 10 | 19AUG2021 | MEDIUM | NORMAL |
| 8 | 8 | Theta_dc | CANADA | 1.4 | 10500 | 80 | 6 | 77 | 25NOV2015 | CANADA_8 | 15FEB2026 | 11 | 25NOV2020 | MEDIUM | NORMAL |
| 9 | 9 | Iota_dc | AUSTRALIA | 1.7 | 9800 | 73 | 12 | 70 | 11APR2017 | AUSTRALIA_9 | 15FEB2026 | 9 | 11APR2022 | LOW | NORMAL |
| 10 | 10 | Kappa_dc | SINGAPORE | 1.3 | 15000 | 90 | 3 | 88 | 06JUL2014 | SINGAPORE_10 | 15FEB2026 | 12 | 06JUL2019 | HIGH | NORMAL |
| 11 | 11 | Lambda_d | FRANCE | 1.6 | 10800 | 76 | 9 | 73 | 18OCT2016 | FRANCE_11 | 15FEB2026 | 10 | 18OCT2021 | MEDIUM | NORMAL |
| 12 | 12 | Mu_dc | INDIA | 2.3 | 7000 | 55 | 40 | 45 | 29JAN2019 | INDIA_12 | 15FEB2026 | 7 | 29JAN2024 | LOW | HIGH_RISK |
| 13 | 13 | Nu_dc | NETHERLANDS | 1.2 | 15500 | 93 | 2 | 91 | 03MAY2013 | NETHERLANDS_13 | 15FEB2026 | 13 | 03MAY2018 | HIGH | NORMAL |
| 14 | 14 | Xi_dc | BRAZIL | 1.9 | 9000 | 68 | 22 | 60 | 14DEC2018 | BRAZIL_14 | 15FEB2026 | 8 | 14DEC2023 | LOW | NORMAL |
| 15 | 15 | Omicron_ | UAE | 1.5 | 11500 | 79 | 8 | 75 | 21JUN2016 | UAE_15 | 15FEB2026 | 10 | 21JUN2021 | MEDIUM | NORMAL |
Why macros?
- Reusable logic
- Standard enterprise practice
- Interview favorite topic
8. FRAUD / ANOMALY DETECTION LOGIC
data fraud_check;
set datacenter_classified;
length Fraud_Flag $12.;
if Sustainability_Score > 85 and Cooling_Efficiency < 70
then Fraud_Flag = 'SUSPECT';
else Fraud_Flag = 'CLEAR';
run;
proc print data=fraud_check;
run;
OUTPUT:
| Obs | Center_ID | Center_Name | Country | Power_Usage_PUE | Server_Count | Cooling_Efficiency | Downtime_Hours | Sustainability_Score | Install_Date | Center_Code | Current_Date | Years_Operational | Next_Maintenance | Utilization | Risk_Flag | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | Alpha_dc | USA | 1.4 | 12000 | 82 | 5 | 78 | 15JAN2015 | USA_1 | 15FEB2026 | 11 | 15JAN2020 | MEDIUM | NORMAL | CLEAR |
| 2 | 2 | Beta_dc | INDIA | 1.8 | 9500 | 70 | 18 | 65 | 22MAR2016 | INDIA_2 | 15FEB2026 | 10 | 22MAR2021 | LOW | NORMAL | CLEAR |
| 3 | 3 | Gamma_dc | GERMANY | 1.3 | 14000 | 88 | 2 | 85 | 10JUN2014 | GERMANY_3 | 15FEB2026 | 12 | 10JUN2019 | HIGH | NORMAL | CLEAR |
| 4 | 4 | Delta_dc | UK | 1.6 | 11000 | 75 | 10 | 72 | 01DEC2017 | UK_4 | 15FEB2026 | 9 | 01DEC2022 | MEDIUM | NORMAL | CLEAR |
| 5 | 5 | Epsilon_ | JAPAN | 1.2 | 16000 | 92 | 1 | 90 | 09SEP2013 | JAPAN_5 | 15FEB2026 | 13 | 09SEP2018 | HIGH | NORMAL | CLEAR |
| 6 | 6 | Zeta_dc | INDIA | 2.1 | 8000 | 60 | 30 | 50 | 12FEB2018 | INDIA_6 | 15FEB2026 | 8 | 12FEB2023 | LOW | HIGH_RISK | CLEAR |
| 7 | 7 | Eta_dc | USA | 1.5 | 10000 | 78 | 7 | 74 | 19AUG2016 | USA_7 | 15FEB2026 | 10 | 19AUG2021 | MEDIUM | NORMAL | CLEAR |
| 8 | 8 | Theta_dc | CANADA | 1.4 | 10500 | 80 | 6 | 77 | 25NOV2015 | CANADA_8 | 15FEB2026 | 11 | 25NOV2020 | MEDIUM | NORMAL | CLEAR |
| 9 | 9 | Iota_dc | AUSTRALIA | 1.7 | 9800 | 73 | 12 | 70 | 11APR2017 | AUSTRALIA_9 | 15FEB2026 | 9 | 11APR2022 | LOW | NORMAL | CLEAR |
| 10 | 10 | Kappa_dc | SINGAPORE | 1.3 | 15000 | 90 | 3 | 88 | 06JUL2014 | SINGAPORE_10 | 15FEB2026 | 12 | 06JUL2019 | HIGH | NORMAL | CLEAR |
| 11 | 11 | Lambda_d | FRANCE | 1.6 | 10800 | 76 | 9 | 73 | 18OCT2016 | FRANCE_11 | 15FEB2026 | 10 | 18OCT2021 | MEDIUM | NORMAL | CLEAR |
| 12 | 12 | Mu_dc | INDIA | 2.3 | 7000 | 55 | 40 | 45 | 29JAN2019 | INDIA_12 | 15FEB2026 | 7 | 29JAN2024 | LOW | HIGH_RISK | CLEAR |
| 13 | 13 | Nu_dc | NETHERLANDS | 1.2 | 15500 | 93 | 2 | 91 | 03MAY2013 | NETHERLANDS_13 | 15FEB2026 | 13 | 03MAY2018 | HIGH | NORMAL | CLEAR |
| 14 | 14 | Xi_dc | BRAZIL | 1.9 | 9000 | 68 | 22 | 60 | 14DEC2018 | BRAZIL_14 | 15FEB2026 | 8 | 14DEC2023 | LOW | NORMAL | CLEAR |
| 15 | 15 | Omicron_ | UAE | 1.5 | 11500 | 79 | 8 | 75 | 21JUN2016 | UAE_15 | 15FEB2026 | 10 | 21JUN2021 | MEDIUM | NORMAL | CLEAR |
High sustainability with poor cooling → possible manipulated ESG reporting
9. STATISTICAL ANALYSIS
PROC MEANS
proc means data=fraud_check mean min max;
var Power_Usage_PUE Downtime_Hours Sustainability_Score;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Minimum | Maximum |
|---|---|---|---|
Power_Usage_PUE Downtime_Hours Sustainability_Score | 1.5866667 11.6666667 72.8666667 | 1.2000000 1.0000000 45.0000000 | 2.3000000 40.0000000 91.0000000 |
PROC UNIVARIATE
proc univariate data=fraud_check;
var Power_Usage_PUE;
run;
OUTPUT:
The UNIVARIATE Procedure
Variable: Power_Usage_PUE
| Moments | |||
|---|---|---|---|
| N | 15 | Sum Weights | 15 |
| Mean | 1.58666667 | Sum Observations | 23.8 |
| Std Deviation | 0.32484429 | Variance | 0.10552381 |
| Skewness | 0.87308189 | Kurtosis | 0.18800433 |
| Uncorrected SS | 39.24 | Corrected SS | 1.47733333 |
| Coeff Variation | 20.4733793 | Std Error Mean | 0.08387443 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 1.586667 | Std Deviation | 0.32484 |
| Median | 1.500000 | Variance | 0.10552 |
| Mode | 1.200000 | Range | 1.10000 |
| Interquartile Range | 0.50000 | ||
Note: The mode displayed is the smallest of 5 modes with a count of 2.
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 18.91717 | Pr > |t| | <.0001 |
| Sign | M | 7.5 | Pr >= |M| | <.0001 |
| Signed Rank | S | 60 | Pr >= |S| | <.0001 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 2.3 |
| 99% | 2.3 |
| 95% | 2.3 |
| 90% | 2.1 |
| 75% Q3 | 1.8 |
| 50% Median | 1.5 |
| 25% Q1 | 1.3 |
| 10% | 1.2 |
| 5% | 1.2 |
| 1% | 1.2 |
| 0% Min | 1.2 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 1.2 | 13 | 1.7 | 9 |
| 1.2 | 5 | 1.8 | 2 |
| 1.3 | 10 | 1.9 | 14 |
| 1.3 | 3 | 2.1 | 6 |
| 1.4 | 8 | 2.3 | 12 |
10. FREQUENCY ANALYSIS
proc freq data=fraud_check;
tables Country Utilization Risk_Flag Fraud_Flag;
run;
OUTPUT:
The FREQ Procedure
| Country | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| AUSTRALIA | 1 | 6.67 | 1 | 6.67 |
| BRAZIL | 1 | 6.67 | 2 | 13.33 |
| CANADA | 1 | 6.67 | 3 | 20.00 |
| FRANCE | 1 | 6.67 | 4 | 26.67 |
| GERMANY | 1 | 6.67 | 5 | 33.33 |
| INDIA | 3 | 20.00 | 8 | 53.33 |
| JAPAN | 1 | 6.67 | 9 | 60.00 |
| NETHERLANDS | 1 | 6.67 | 10 | 66.67 |
| SINGAPORE | 1 | 6.67 | 11 | 73.33 |
| UAE | 1 | 6.67 | 12 | 80.00 |
| UK | 1 | 6.67 | 13 | 86.67 |
| USA | 2 | 13.33 | 15 | 100.00 |
| Utilization | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| HIGH | 4 | 26.67 | 4 | 26.67 |
| LOW | 5 | 33.33 | 9 | 60.00 |
| MEDIUM | 6 | 40.00 | 15 | 100.00 |
| Risk_Flag | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| HIGH_RISK | 2 | 13.33 | 2 | 13.33 |
| NORMAL | 13 | 86.67 | 15 | 100.00 |
| Fraud_Flag | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| CLEAR | 15 | 100.00 | 15 | 100.00 |
11. CORRELATION ANALYSIS
proc corr data=fraud_check;
var Power_Usage_PUE Downtime_Hours Cooling_Efficiency Sustainability_Score;
run;
OUTPUT:
The CORR Procedure
| 4 Variables: | Power_Usage_PUE Downtime_Hours Cooling_Efficiency Sustainability_Score |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
| Power_Usage_PUE | 15 | 1.58667 | 0.32484 | 23.80000 | 1.20000 | 2.30000 |
| Downtime_Hours | 15 | 11.66667 | 11.28632 | 175.00000 | 1.00000 | 40.00000 |
| Cooling_Efficiency | 15 | 77.26667 | 11.12569 | 1159 | 55.00000 | 93.00000 |
| Sustainability_Score | 15 | 72.86667 | 13.58501 | 1093 | 45.00000 | 91.00000 |
| Pearson Correlation Coefficients, N = 15 Prob > |r| under H0: Rho=0 | ||||
|---|---|---|---|---|
| Power_Usage_PUE | Downtime_Hours | Cooling_Efficiency | Sustainability_Score | |
| Power_Usage_PUE | 1.00000 | 0.98062 <.0001 | -0.98516 <.0001 | -0.98939 <.0001 |
| Downtime_Hours | 0.98062 <.0001 | 1.00000 | -0.94637 <.0001 | -0.96372 <.0001 |
| Cooling_Efficiency | -0.98516 <.0001 | -0.94637 <.0001 | 1.00000 | 0.99553 <.0001 |
| Sustainability_Score | -0.98939 <.0001 | -0.96372 <.0001 | 0.99553 <.0001 | 1.00000 |
12. VISUALIZATION – PROC SGPLOT
proc sgplot data=fraud_check;
scatter x=Power_Usage_PUE y=Sustainability_Score;
run;
OUTPUT:
13. APPEND,TRANSPOSE
CREATE A NEW DATASET
data fraud_check_new;
set fraud_check;
if Center_ID in (14,15);
run;
proc print data=fraud_check_new;
run;
OUTPUT:
| Obs | Center_ID | Center_Name | Country | Power_Usage_PUE | Server_Count | Cooling_Efficiency | Downtime_Hours | Sustainability_Score | Install_Date | Center_Code | Current_Date | Years_Operational | Next_Maintenance | Utilization | Risk_Flag | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 14 | Xi_dc | BRAZIL | 1.9 | 9000 | 68 | 22 | 60 | 14DEC2018 | BRAZIL_14 | 15FEB2026 | 8 | 14DEC2023 | LOW | NORMAL | CLEAR |
| 2 | 15 | Omicron_ | UAE | 1.5 | 11500 | 79 | 8 | 75 | 21JUN2016 | UAE_15 | 15FEB2026 | 10 | 21JUN2021 | MEDIUM | NORMAL | CLEAR |
·
Simulates incremental data load
·
Common in production environments
APPEND
proc append base=fraud_check
data=fraud_check_new force;
run;
proc print data=fraud_check;
run;
OUTPUT:
| Obs | Center_ID | Center_Name | Country | Power_Usage_PUE | Server_Count | Cooling_Efficiency | Downtime_Hours | Sustainability_Score | Install_Date | Center_Code | Current_Date | Years_Operational | Next_Maintenance | Utilization | Risk_Flag | Fraud_Flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | Alpha_dc | USA | 1.4 | 12000 | 82 | 5 | 78 | 15JAN2015 | USA_1 | 15FEB2026 | 11 | 15JAN2020 | MEDIUM | NORMAL | CLEAR |
| 2 | 2 | Beta_dc | INDIA | 1.8 | 9500 | 70 | 18 | 65 | 22MAR2016 | INDIA_2 | 15FEB2026 | 10 | 22MAR2021 | LOW | NORMAL | CLEAR |
| 3 | 3 | Gamma_dc | GERMANY | 1.3 | 14000 | 88 | 2 | 85 | 10JUN2014 | GERMANY_3 | 15FEB2026 | 12 | 10JUN2019 | HIGH | NORMAL | CLEAR |
| 4 | 4 | Delta_dc | UK | 1.6 | 11000 | 75 | 10 | 72 | 01DEC2017 | UK_4 | 15FEB2026 | 9 | 01DEC2022 | MEDIUM | NORMAL | CLEAR |
| 5 | 5 | Epsilon_ | JAPAN | 1.2 | 16000 | 92 | 1 | 90 | 09SEP2013 | JAPAN_5 | 15FEB2026 | 13 | 09SEP2018 | HIGH | NORMAL | CLEAR |
| 6 | 6 | Zeta_dc | INDIA | 2.1 | 8000 | 60 | 30 | 50 | 12FEB2018 | INDIA_6 | 15FEB2026 | 8 | 12FEB2023 | LOW | HIGH_RISK | CLEAR |
| 7 | 7 | Eta_dc | USA | 1.5 | 10000 | 78 | 7 | 74 | 19AUG2016 | USA_7 | 15FEB2026 | 10 | 19AUG2021 | MEDIUM | NORMAL | CLEAR |
| 8 | 8 | Theta_dc | CANADA | 1.4 | 10500 | 80 | 6 | 77 | 25NOV2015 | CANADA_8 | 15FEB2026 | 11 | 25NOV2020 | MEDIUM | NORMAL | CLEAR |
| 9 | 9 | Iota_dc | AUSTRALIA | 1.7 | 9800 | 73 | 12 | 70 | 11APR2017 | AUSTRALIA_9 | 15FEB2026 | 9 | 11APR2022 | LOW | NORMAL | CLEAR |
| 10 | 10 | Kappa_dc | SINGAPORE | 1.3 | 15000 | 90 | 3 | 88 | 06JUL2014 | SINGAPORE_10 | 15FEB2026 | 12 | 06JUL2019 | HIGH | NORMAL | CLEAR |
| 11 | 11 | Lambda_d | FRANCE | 1.6 | 10800 | 76 | 9 | 73 | 18OCT2016 | FRANCE_11 | 15FEB2026 | 10 | 18OCT2021 | MEDIUM | NORMAL | CLEAR |
| 12 | 12 | Mu_dc | INDIA | 2.3 | 7000 | 55 | 40 | 45 | 29JAN2019 | INDIA_12 | 15FEB2026 | 7 | 29JAN2024 | LOW | HIGH_RISK | CLEAR |
| 13 | 13 | Nu_dc | NETHERLANDS | 1.2 | 15500 | 93 | 2 | 91 | 03MAY2013 | NETHERLANDS_13 | 15FEB2026 | 13 | 03MAY2018 | HIGH | NORMAL | CLEAR |
| 14 | 14 | Xi_dc | BRAZIL | 1.9 | 9000 | 68 | 22 | 60 | 14DEC2018 | BRAZIL_14 | 15FEB2026 | 8 | 14DEC2023 | LOW | NORMAL | CLEAR |
| 15 | 15 | Omicron_ | UAE | 1.5 | 11500 | 79 | 8 | 75 | 21JUN2016 | UAE_15 | 15FEB2026 | 10 | 21JUN2021 | MEDIUM | NORMAL | CLEAR |
| 16 | 14 | Xi_dc | BRAZIL | 1.9 | 9000 | 68 | 22 | 60 | 14DEC2018 | BRAZIL_14 | 15FEB2026 | 8 | 14DEC2023 | LOW | NORMAL | CLEAR |
| 17 | 15 | Omicron_ | UAE | 1.5 | 11500 | 79 | 8 | 75 | 21JUN2016 | UAE_15 | 15FEB2026 | 10 | 21JUN2021 | MEDIUM | NORMAL | CLEAR |
·
fraud_check → historical master table
·
fraud_check_new → new records
·
Rows get physically added
·
Structure mismatch handled by FORCE
TRANSPOSE
proc transpose data=fraud_check out=dc_transposed;
var Power_Usage_PUE Sustainability_Score;
run;
proc print data=dc_transposed;
run;
OUTPUT:
| Obs | _NAME_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 | COL10 | COL11 | COL12 | COL13 | COL14 | COL15 | COL16 | COL17 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Power_Usage_PUE | 1.4 | 1.8 | 1.3 | 1.6 | 1.2 | 2.1 | 1.5 | 1.4 | 1.7 | 1.3 | 1.6 | 2.3 | 1.2 | 1.9 | 1.5 | 1.9 | 1.5 |
| 2 | Sustainability_Score | 78.0 | 65.0 | 85.0 | 72.0 | 90.0 | 50.0 | 74.0 | 77.0 | 70.0 | 88.0 | 73.0 | 45.0 | 91.0 | 60.0 | 75.0 | 60.0 | 75.0 |
14. CLEANUP – PROC DATASETS DELETE
proc datasets library=work nolist;
delete datacenter_raw;
quit;
15. Conclusion
This Data Center Analytics project demonstrates
how SAS can be effectively used to transform raw operational data into
meaningful business insights. By integrating DATA steps, PROC SQL, statistical
procedures, macros, and advanced date functions, the analysis evaluates power
efficiency, server utilization, cooling performance, downtime, and
sustainability metrics across global data centers. The structured use of character
and numeric functions ensures data standardization, while correlation and
visualization techniques help uncover relationships between efficiency,
downtime, and sustainability. Overall, the project reflects a realistic
enterprise workflow where data quality, performance measurement, and
operational monitoring are equally important.
From a business and interview perspective, the project highlights how SAS supports decision-making beyond basic reporting. Utilization and risk classification macros enable automation and scalability, while anomaly and fraud-detection logic addresses governance and ESG reliability concerns. Proper use of PROC APPEND, MERGE, SET, and TRANSPOSE mirrors real production environments with incremental data loads and evolving structures. In summary, this analysis showcases SAS as a powerful tool for operational intelligence, risk control, and sustainable data center management in modern organizations.
INTERVIEW QUESTIONS FOR YOU
1. 1.What is PROC MEANS used for in SAS?
2. 2.What is the difference between SET and MERGE
statements?
3. 3.Why are macros used in SAS?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
1. 1.What is PROC MEANS used for in SAS?
2. 2.What is the difference between SET and MERGE
statements?
3. 3.Why are macros used in SAS?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent DATA CENTERS data.
Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS
· Data analysts building portfolios
· Professionals preparing for SAS interviews
· Bloggers writing about analytics and smart cities
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment