DATA ANALYSIS AND VISUALIZATION OF TEAM SIZE, BUDGET, PROJECT TYPE, TOOLS AND REMOTE POLICY ACROSS GLOBAL SOFTWARE OFFICES SOFTWARE OFFICE REQUIREMENTS WORLDWIDE | PROC SQL | PROC MEANS | PROC FREQ | PROC REPORT | PROC TABULATE | PROC SGPLOT | MACROS
/*CREATING A DATASET OF SOFTWARE OFFICE REQUIREMENTS*/
1) Create dataset
options nocenter;
data work.soft_office_req;
infile datalines dsd truncover;
length OfficeName $40 Country $30 Location $30 ProjectType $20 SoftwareTools $80
RemotePolicy $10;
input OfficeID OfficeName $ Country $ Location $ TeamSize ProjectType $ SoftwareTools $
BudgetUSD RemotePolicy $;
datalines;
1,"SiliconCore Labs","United States","San Francisco",120,Product,"Java, Python, AWS, Jira",4200000,Hybrid
2,"NileSoft","Egypt","Cairo",45,Platform,"Node.js, MySQL, Docker",900000,Onsite
3,"Bengaluru Pixels","India","Bengaluru",210,Product,"Java, React, Oracle, Jenkins",1800000,Hybrid
4,"Nordic Codeworks","Sweden","Stockholm",60,Infra,"Go, Kubernetes, Prometheus",1200000,Remote
5,"Andes DevHub","Chile","Santiago",30,App,"Flutter, Firebase, Figma",450000,Onsite
6,"Kyoto Systems","Japan","Kyoto",80,Product,"C++, Python, GitLab",1350000,Hybrid
7,"Praha Solutions","Czech Republic","Prague",55,Platform,"Java, Spring, PostgreSQL",750000,Remote
8,"Lagos Binary","Nigeria","Lagos",25,App,"PHP, Laravel, MySQL",300000,Onsite
9,"Melbourne Cloud","Australia","Melbourne",95,Infra,"AWS, Terraform, Ansible",1600000,Hybrid
10,"Toronto DevOps","Canada","Toronto",70,Infra,"Python, Docker, AWS",1100000,Hybrid
11,"Seoul Innovate","South Korea","Seoul",140,Product,"Kotlin, Java, Android Studio",2500000,Hybrid
12,"Berlin Agile","Germany","Berlin",85,Product,"JavaScript, Node.js, AWS",1450000,Remote
13,"Lisbon Labs","Portugal","Lisbon",40,Platform,"Ruby, Rails, PostgreSQL",600000,Hybrid
14,"Mumbai Matrix","India","Mumbai",160,Product,"SAS, R, Python, Git",2000000,Onsite
;
run;
proc print data=work.soft_office_req label noobs;
run;
Output:
| OfficeName | Country | Location | ProjectType | SoftwareTools | RemotePolicy | OfficeID | TeamSize | BudgetUSD |
|---|---|---|---|---|---|---|---|---|
| SiliconCore Labs | United States | San Francisco | Product | Java, Python, AWS, Jira | Hybrid | 1 | 120 | 4200000 |
| NileSoft | Egypt | Cairo | Platform | Node.js, MySQL, Docker | Onsite | 2 | 45 | 900000 |
| Bengaluru Pixels | India | Bengaluru | Product | Java, React, Oracle, Jenkins | Hybrid | 3 | 210 | 1800000 |
| Nordic Codeworks | Sweden | Stockholm | Infra | Go, Kubernetes, Prometheus | Remote | 4 | 60 | 1200000 |
| Andes DevHub | Chile | Santiago | App | Flutter, Firebase, Figma | Onsite | 5 | 30 | 450000 |
| Kyoto Systems | Japan | Kyoto | Product | C++, Python, GitLab | Hybrid | 6 | 80 | 1350000 |
| Praha Solutions | Czech Republic | Prague | Platform | Java, Spring, PostgreSQL | Remote | 7 | 55 | 750000 |
| Lagos Binary | Nigeria | Lagos | App | PHP, Laravel, MySQL | Onsite | 8 | 25 | 300000 |
| Melbourne Cloud | Australia | Melbourne | Infra | AWS, Terraform, Ansible | Hybrid | 9 | 95 | 1600000 |
| Toronto DevOps | Canada | Toronto | Infra | Python, Docker, AWS | Hybrid | 10 | 70 | 1100000 |
| Seoul Innovate | South Korea | Seoul | Product | Kotlin, Java, Android Studio | Hybrid | 11 | 140 | 2500000 |
| Berlin Agile | Germany | Berlin | Product | JavaScript, Node.js, AWS | Remote | 12 | 85 | 1450000 |
| Lisbon Labs | Portugal | Lisbon | Platform | Ruby, Rails, PostgreSQL | Hybrid | 13 | 40 | 600000 |
| Mumbai Matrix | India | Mumbai | Product | SAS, R, Python, Git | Onsite | 14 | 160 | 2000000 |
Purpose: Sort dataset by Country and TeamSize desc for reporting
proc sort data=work.soft_office_req out=work.soft_office_sorted;
by Country descending TeamSize;
run;
proc print data=work.soft_office_sorted(obs=7) label noobs;
run;
Output:
| OfficeName | Country | Location | ProjectType | SoftwareTools | RemotePolicy | OfficeID | TeamSize | BudgetUSD |
|---|---|---|---|---|---|---|---|---|
| Melbourne Cloud | Australia | Melbourne | Infra | AWS, Terraform, Ansible | Hybrid | 9 | 95 | 1600000 |
| Toronto DevOps | Canada | Toronto | Infra | Python, Docker, AWS | Hybrid | 10 | 70 | 1100000 |
| Andes DevHub | Chile | Santiago | App | Flutter, Firebase, Figma | Onsite | 5 | 30 | 450000 |
| Praha Solutions | Czech Republic | Prague | Platform | Java, Spring, PostgreSQL | Remote | 7 | 55 | 750000 |
| NileSoft | Egypt | Cairo | Platform | Node.js, MySQL, Docker | Onsite | 2 | 45 | 900000 |
| Berlin Agile | Germany | Berlin | Product | JavaScript, Node.js, AWS | Remote | 12 | 85 | 1450000 |
| Bengaluru Pixels | India | Bengaluru | Product | Java, React, Oracle, Jenkins | Hybrid | 3 | 210 | 1800000 |
2) MACRO SECTION
Purpose: Provide reusable code pieces for filtering and summarizing offices
Purpose: Macro %filter_by_country - creates a filtered dataset for a given country
%macro filter_by_country(country=);
%let dsname = soft_office_%sysfunc(compress(&country,%str( )));
proc sql;
create table work.&dsname as
select *
from work.soft_office_req
where upcase(country)=upcase("&country");
quit;
/*Use ATTRN for numeric attribute NOBS */
%let nobs=%sysfunc(attrn(%sysfunc(open(work.&dsname)), nobs));
%put NOTE: Created &dsname with &nobs observations.;
proc print data=work.&dsname;
run;
%mend filter_by_country;
%filter_by_country(country=India);
Log:
Output:
| Obs | OfficeName | Country | Location | ProjectType | SoftwareTools | RemotePolicy | OfficeID | TeamSize | BudgetUSD |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Bengaluru Pixels | India | Bengaluru | Product | Java, React, Oracle, Jenkins | Hybrid | 3 | 210 | 1800000 |
| 2 | Mumbai Matrix | India | Mumbai | Product | SAS, R, Python, Git | Onsite | 14 | 160 | 2000000 |
Purpose: Macro %summarize_budget - produce aggregate budget stats by ProjectType
%macro summarize_budget(in_ds=work.soft_office_req, out_ds=work.budget_summary);
proc sql;
create table &out_ds as
select ProjectType,
count(*) as Offices,
sum(BudgetUSD) as TotalBudget format=dollar12.,
mean(BudgetUSD) as AvgBudget format=dollar12.,
median(BudgetUSD) as MedianBudget format=dollar12.
from &in_ds
group by ProjectType;
quit;
proc print;run;
%mend summarize_budget;
%summarize_budget(in_ds=work.soft_office_req, out_ds=work.budget_summary);
Output:
| Obs | ProjectType | Offices | TotalBudget | AvgBudget | MedianBudget |
|---|---|---|---|---|---|
| 1 | App | 2 | $750,000 | $375,000 | $375,000 |
| 2 | Infra | 3 | $3,900,000 | $1,300,000 | $1,200,000 |
| 3 | Platform | 3 | $2,250,000 | $750,000 | $750,000 |
| 4 | Product | 6 | $13,300,000 | $2,216,667 | $1,900,000 |
Purpose: Macro %flag_large_teams - create a flag for large teams based on threshold
%macro flag_large_teams(ds=work.soft_office_req, threshold=100);
data work.flagged;
set &ds;
if TeamSize >= &threshold then LARGE_TEAM='Y'; else LARGE_TEAM='N';
run;
proc print;run;
%mend flag_large_teams;
%flag_large_teams(ds=work.soft_office_req, threshold=100);
Output:
| Obs | OfficeName | Country | Location | ProjectType | SoftwareTools | RemotePolicy | OfficeID | TeamSize | BudgetUSD | LARGE_TEAM |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SiliconCore Labs | United States | San Francisco | Product | Java, Python, AWS, Jira | Hybrid | 1 | 120 | 4200000 | Y |
| 2 | NileSoft | Egypt | Cairo | Platform | Node.js, MySQL, Docker | Onsite | 2 | 45 | 900000 | N |
| 3 | Bengaluru Pixels | India | Bengaluru | Product | Java, React, Oracle, Jenkins | Hybrid | 3 | 210 | 1800000 | Y |
| 4 | Nordic Codeworks | Sweden | Stockholm | Infra | Go, Kubernetes, Prometheus | Remote | 4 | 60 | 1200000 | N |
| 5 | Andes DevHub | Chile | Santiago | App | Flutter, Firebase, Figma | Onsite | 5 | 30 | 450000 | N |
| 6 | Kyoto Systems | Japan | Kyoto | Product | C++, Python, GitLab | Hybrid | 6 | 80 | 1350000 | N |
| 7 | Praha Solutions | Czech Republic | Prague | Platform | Java, Spring, PostgreSQL | Remote | 7 | 55 | 750000 | N |
| 8 | Lagos Binary | Nigeria | Lagos | App | PHP, Laravel, MySQL | Onsite | 8 | 25 | 300000 | N |
| 9 | Melbourne Cloud | Australia | Melbourne | Infra | AWS, Terraform, Ansible | Hybrid | 9 | 95 | 1600000 | N |
| 10 | Toronto DevOps | Canada | Toronto | Infra | Python, Docker, AWS | Hybrid | 10 | 70 | 1100000 | N |
| 11 | Seoul Innovate | South Korea | Seoul | Product | Kotlin, Java, Android Studio | Hybrid | 11 | 140 | 2500000 | Y |
| 12 | Berlin Agile | Germany | Berlin | Product | JavaScript, Node.js, AWS | Remote | 12 | 85 | 1450000 | N |
| 13 | Lisbon Labs | Portugal | Lisbon | Platform | Ruby, Rails, PostgreSQL | Hybrid | 13 | 40 | 600000 | N |
| 14 | Mumbai Matrix | India | Mumbai | Product | SAS, R, Python, Git | Onsite | 14 | 160 | 2000000 | Y |
3) PROC EXAMPLES
Purpose: PROC FREQ - show distribution of ProjectType and RemotePolicy
proc freq data=work.soft_office_req nlevels;
tables ProjectType RemotePolicy / nocum norow nopercent;
run;
Output:
The FREQ Procedure
| Number of Variable Levels | |
|---|---|
| Variable | Levels |
| ProjectType | 4 |
| RemotePolicy | 3 |
| ProjectType | Frequency |
|---|---|
| App | 2 |
| Infra | 3 |
| Platform | 3 |
| Product | 6 |
| RemotePolicy | Frequency |
|---|---|
| Hybrid | 7 |
| Onsite | 4 |
| Remote | 3 |
Purpose: PROC MEANS - provide summary statistics for TeamSize and BudgetUSD
proc means data=work.soft_office_req n mean median std min max maxdec=0;
var TeamSize BudgetUSD;
run;
Output:
The MEANS Procedure
| Variable | N | Mean | Median | Std Dev | Minimum | Maximum |
|---|---|---|---|---|---|---|
TeamSize BudgetUSD | 14 14 | 87 1442857 | 75 1275000 | 54 1005534 | 25 300000 | 210 4200000 |
Purpose: PROC TABULATE - create a cross-tab of RemotePolicy by ProjectType with budgets
proc tabulate data=work.soft_office_req missing;
class ProjectType RemotePolicy;
var BudgetUSD TeamSize;
table ProjectType,
RemotePolicy all,
(BudgetUSD*(sum mean) TeamSize*(mean));
run;
Output:
ProjectType App | ||||||||||||||||
| ||||||||||||||||
ProjectType Infra | ||||||||||||||||||||
| ||||||||||||||||||||
ProjectType Platform | ||||||||||||||||||||||||
| ||||||||||||||||||||||||
ProjectType Product | ||||||||||||||||||||||||
| ||||||||||||||||||||||||
Purpose: PROC REPORT - produce a formatted report with derived columns
proc report data=work.flagged nowd;
column OfficeID OfficeName Country Location TeamSize ProjectType
LARGE_TEAM BudgetUSD RemotePolicy;
define BudgetUSD / analysis sum format=dollar12.;
compute before _page_ ;
line '--- SOFTWARE OFFICE REQUIREMENTS REPORT ---';
endcomp;
run;
Output:
| --- SOFTWARE OFFICE REQUIREMENTS REPORT --- | ||||||||
| OfficeID | OfficeName | Country | Location | TeamSize | ProjectType | LARGE_TEAM | BudgetUSD | RemotePolicy |
|---|---|---|---|---|---|---|---|---|
| 1 | SiliconCore Labs | United States | San Francisco | 120 | Product | Y | $4,200,000 | Hybrid |
| 2 | NileSoft | Egypt | Cairo | 45 | Platform | N | $900,000 | Onsite |
| 3 | Bengaluru Pixels | India | Bengaluru | 210 | Product | Y | $1,800,000 | Hybrid |
| 4 | Nordic Codeworks | Sweden | Stockholm | 60 | Infra | N | $1,200,000 | Remote |
| 5 | Andes DevHub | Chile | Santiago | 30 | App | N | $450,000 | Onsite |
| 6 | Kyoto Systems | Japan | Kyoto | 80 | Product | N | $1,350,000 | Hybrid |
| 7 | Praha Solutions | Czech Republic | Prague | 55 | Platform | N | $750,000 | Remote |
| 8 | Lagos Binary | Nigeria | Lagos | 25 | App | N | $300,000 | Onsite |
| 9 | Melbourne Cloud | Australia | Melbourne | 95 | Infra | N | $1,600,000 | Hybrid |
| 10 | Toronto DevOps | Canada | Toronto | 70 | Infra | N | $1,100,000 | Hybrid |
| 11 | Seoul Innovate | South Korea | Seoul | 140 | Product | Y | $2,500,000 | Hybrid |
| 12 | Berlin Agile | Germany | Berlin | 85 | Product | N | $1,450,000 | Remote |
| 13 | Lisbon Labs | Portugal | Lisbon | 40 | Platform | N | $600,000 | Hybrid |
| 14 | Mumbai Matrix | India | Mumbai | 160 | Product | Y | $2,000,000 | Onsite |
data work.tool_long;
set work.soft_office_req;
length Tool $30;
i=1;
do until(scan(softwaretools,i,' ,')='');
Tool = strip(scan(softwaretools,i,','));
if Tool ne '' then output;
i+1;
end;
keep OfficeID OfficeName Tool;
run;
proc print;run;
Output:
| Obs | OfficeName | OfficeID | Tool |
|---|---|---|---|
| 1 | SiliconCore Labs | 1 | Java |
| 2 | SiliconCore Labs | 1 | Python |
| 3 | SiliconCore Labs | 1 | AWS |
| 4 | SiliconCore Labs | 1 | Jira |
| 5 | NileSoft | 2 | Node.js |
| 6 | NileSoft | 2 | MySQL |
| 7 | NileSoft | 2 | Docker |
| 8 | Bengaluru Pixels | 3 | Java |
| 9 | Bengaluru Pixels | 3 | React |
| 10 | Bengaluru Pixels | 3 | Oracle |
| 11 | Bengaluru Pixels | 3 | Jenkins |
| 12 | Nordic Codeworks | 4 | Go |
| 13 | Nordic Codeworks | 4 | Kubernetes |
| 14 | Nordic Codeworks | 4 | Prometheus |
| 15 | Andes DevHub | 5 | Flutter |
| 16 | Andes DevHub | 5 | Firebase |
| 17 | Andes DevHub | 5 | Figma |
| 18 | Kyoto Systems | 6 | C++ |
| 19 | Kyoto Systems | 6 | Python |
| 20 | Kyoto Systems | 6 | GitLab |
| 21 | Praha Solutions | 7 | Java |
| 22 | Praha Solutions | 7 | Spring |
| 23 | Praha Solutions | 7 | PostgreSQL |
| 24 | Lagos Binary | 8 | PHP |
| 25 | Lagos Binary | 8 | Laravel |
| 26 | Lagos Binary | 8 | MySQL |
| 27 | Melbourne Cloud | 9 | AWS |
| 28 | Melbourne Cloud | 9 | Terraform |
| 29 | Melbourne Cloud | 9 | Ansible |
| 30 | Toronto DevOps | 10 | Python |
| 31 | Toronto DevOps | 10 | Docker |
| 32 | Toronto DevOps | 10 | AWS |
| 33 | Seoul Innovate | 11 | Kotlin |
| 34 | Seoul Innovate | 11 | Java |
| 35 | Seoul Innovate | 11 | Android Studio |
| 36 | Berlin Agile | 12 | JavaScript |
| 37 | Berlin Agile | 12 | Node.js |
| 38 | Berlin Agile | 12 | AWS |
| 39 | Lisbon Labs | 13 | Ruby |
| 40 | Lisbon Labs | 13 | Rails |
| 41 | Lisbon Labs | 13 | PostgreSQL |
| 42 | Mumbai Matrix | 14 | SAS |
| 43 | Mumbai Matrix | 14 | R |
| 44 | Mumbai Matrix | 14 | Python |
| 45 | Mumbai Matrix | 14 | Git |
Purpose: PROC SQL - count tool frequency across offices
proc sql;
create table work.tool_counts as
select Tool, count(distinct OfficeID) as OfficesUsing format=8.
from work.tool_long
group by Tool
order by OfficesUsing desc;
quit;
proc print;run;
Output:
| Obs | Tool | OfficesUsing |
|---|---|---|
| 1 | Python | 4 |
| 2 | Java | 4 |
| 3 | AWS | 4 |
| 4 | Node.js | 2 |
| 5 | MySQL | 2 |
| 6 | Docker | 2 |
| 7 | PostgreSQL | 2 |
| 8 | React | 1 |
| 9 | R | 1 |
| 10 | Jenkins | 1 |
| 11 | SAS | 1 |
| 12 | Ansible | 1 |
| 13 | Figma | 1 |
| 14 | Prometheus | 1 |
| 15 | PHP | 1 |
| 16 | GitLab | 1 |
| 17 | Oracle | 1 |
| 18 | Go | 1 |
| 19 | Git | 1 |
| 20 | Laravel | 1 |
| 21 | JavaScript | 1 |
| 22 | Rails | 1 |
| 23 | Firebase | 1 |
| 24 | Ruby | 1 |
| 25 | Jira | 1 |
| 26 | Spring | 1 |
| 27 | Kubernetes | 1 |
| 28 | Android Studio | 1 |
| 29 | C++ | 1 |
| 30 | Flutter | 1 |
| 31 | Kotlin | 1 |
| 32 | Terraform | 1 |
Purpose: PROC SGPLOT - visualize TeamSize distribution
ods graphics on;
proc sgplot data=work.soft_office_req;
histogram TeamSize / nbins=8;
density TeamSize;
xaxis label='Team Size';
title 'Distribution of Team Sizes Across Offices';
run;
ods graphics off;
Output:
Purpose: PROC SQL join example - combine budget summary with office details
proc sql;
create table work.enriched as
select a.*, b.TotalBudget, b.AvgBudget, b.Offices
from work.soft_office_req a left join work.budget_summary b
on a.ProjectType = b.ProjectType;
quit;
proc print;run;
Output:
| Obs | OfficeName | Country | Location | ProjectType | SoftwareTools | RemotePolicy | OfficeID | TeamSize | BudgetUSD | TotalBudget | AvgBudget | Offices |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Andes DevHub | Chile | Santiago | App | Flutter, Firebase, Figma | Onsite | 5 | 30 | 450000 | $750,000 | $375,000 | 2 |
| 2 | Lagos Binary | Nigeria | Lagos | App | PHP, Laravel, MySQL | Onsite | 8 | 25 | 300000 | $750,000 | $375,000 | 2 |
| 3 | Toronto DevOps | Canada | Toronto | Infra | Python, Docker, AWS | Hybrid | 10 | 70 | 1100000 | $3,900,000 | $1,300,000 | 3 |
| 4 | Nordic Codeworks | Sweden | Stockholm | Infra | Go, Kubernetes, Prometheus | Remote | 4 | 60 | 1200000 | $3,900,000 | $1,300,000 | 3 |
| 5 | Melbourne Cloud | Australia | Melbourne | Infra | AWS, Terraform, Ansible | Hybrid | 9 | 95 | 1600000 | $3,900,000 | $1,300,000 | 3 |
| 6 | Lisbon Labs | Portugal | Lisbon | Platform | Ruby, Rails, PostgreSQL | Hybrid | 13 | 40 | 600000 | $2,250,000 | $750,000 | 3 |
| 7 | Praha Solutions | Czech Republic | Prague | Platform | Java, Spring, PostgreSQL | Remote | 7 | 55 | 750000 | $2,250,000 | $750,000 | 3 |
| 8 | NileSoft | Egypt | Cairo | Platform | Node.js, MySQL, Docker | Onsite | 2 | 45 | 900000 | $2,250,000 | $750,000 | 3 |
| 9 | Mumbai Matrix | India | Mumbai | Product | SAS, R, Python, Git | Onsite | 14 | 160 | 2000000 | $13,300,000 | $2,216,667 | 6 |
| 10 | Bengaluru Pixels | India | Bengaluru | Product | Java, React, Oracle, Jenkins | Hybrid | 3 | 210 | 1800000 | $13,300,000 | $2,216,667 | 6 |
| 11 | Kyoto Systems | Japan | Kyoto | Product | C++, Python, GitLab | Hybrid | 6 | 80 | 1350000 | $13,300,000 | $2,216,667 | 6 |
| 12 | Berlin Agile | Germany | Berlin | Product | JavaScript, Node.js, AWS | Remote | 12 | 85 | 1450000 | $13,300,000 | $2,216,667 | 6 |
| 13 | SiliconCore Labs | United States | San Francisco | Product | Java, Python, AWS, Jira | Hybrid | 1 | 120 | 4200000 | $13,300,000 | $2,216,667 | 6 |
| 14 | Seoul Innovate | South Korea | Seoul | Product | Kotlin, Java, Android Studio | Hybrid | 11 | 140 | 2500000 | $13,300,000 | $2,216,667 | 6 |
No comments:
Post a Comment