204.ANALYZING SOFTWARE |APPLICATIONS ACROSS CATEGORIES USING PROC PRINT | PROC SORT | PROC MEANS | PROC FREQ | PROC SQL | PROC MACRO | SAS POWER | FOR SMART BUSINESS DECISIONS
ANALYZING SOFTWARE |APPLICATIONS ACROSS CATEGORIES USING PROC PRINT | PROC SORT | PROC MEANS | PROC FREQ | PROC SQL | PROC MACRO | SAS POWER | FOR SMART BUSINESS DECISIONS
/*Creating a unique SAS dataset about different types of software applications*/
Step 1: Creating the Software Dataset in SAS
data software_inventory;
infile datalines dlm='' dsd truncover;
input Software_ID $ Software_Name:$15. Category:$15. Version $ Vendor:$25.
License_Type:$15. Platform $ Cost Rating Popularity;
datalines;
S01 Excel Spreadsheet 365 Microsoft Commercial Windows 500 4.5 95
S02 LibreOffice Spreadsheet 7.6 TheDocumentFoundation OpenSource Multi 0 4.2 70
S03 Photoshop Graphic 2024 Adobe Commercial Windows 1200 4.8 88
S04 GIMP Graphic 2.10 GNU OpenSource Multi 0 4.1 60
S05 SAS Analytics 9.4 SAS Commercial Windows 5000 4.9 85
S06 RStudio Analytics 2023 Posit OpenSource Multi 0 4.6 75
S07 MySQL Database 8.0 Oracle OpenSource Multi 0 4.5 78
S08 OracleDB Database 19c Oracle Commercial Multi 2500 4.7 80
S09 Zoom Communication 5.17 Zoom Commercial Multi 250 4.3 90
S10 Teams Communication 2023 Microsoft Commercial Multi 300 4.4 85
S11 Notepad++ Editor 8.5 DonHo OpenSource Windows 0 4.0 65
S12 VSCode Editor 1.89 Microsoft OpenSource Multi 0 4.8 92
S13 Tableau BI 2024 Salesforce Commercial Multi 3000 4.9 87
S14 PowerBI BI 2024 Microsoft Commercial Windows 1200 4.7 83
S15 VLC Media 3.0 VideoLAN OpenSource Multi 0 4.5 91
S16 AdobeReader Document 2024 Adobe Commercial Multi 100 4.2 76
S17 FoxitReader Document 2023 Foxit Commercial Multi 80 4.1 70
S18 Python Programming 3.12 PythonOrg OpenSource Multi 0 4.9 95
S19 MATLAB Programming R2023a MathWorks Commercial Multi 5000 4.6 72
S20 Git VersionControl 2.42 GitProject OpenSource Multi 0 4.7 90
;
run;
proc print;run;
Output:
| Obs | Software_ID | Software_Name | Category | Version | Vendor | License_Type | Platform | Cost | Rating | Popularity |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S01 | Excel | Spreadsheet | 365 | Microsoft | Commercial | Windows | 500 | 4.5 | 95 |
| 2 | S02 | LibreOffice | Spreadsheet | 7.6 | TheDocumentFoundation | OpenSource | Multi | 0 | 4.2 | 70 |
| 3 | S03 | Photoshop | Graphic | 2024 | Adobe | Commercial | Windows | 1200 | 4.8 | 88 |
| 4 | S04 | GIMP | Graphic | 2.10 | GNU | OpenSource | Multi | 0 | 4.1 | 60 |
| 5 | S05 | SAS | Analytics | 9.4 | SAS | Commercial | Windows | 5000 | 4.9 | 85 |
| 6 | S06 | RStudio | Analytics | 2023 | Posit | OpenSource | Multi | 0 | 4.6 | 75 |
| 7 | S07 | MySQL | Database | 8.0 | Oracle | OpenSource | Multi | 0 | 4.5 | 78 |
| 8 | S08 | OracleDB | Database | 19c | Oracle | Commercial | Multi | 2500 | 4.7 | 80 |
| 9 | S09 | Zoom | Communication | 5.17 | Zoom | Commercial | Multi | 250 | 4.3 | 90 |
| 10 | S10 | Teams | Communication | 2023 | Microsoft | Commercial | Multi | 300 | 4.4 | 85 |
| 11 | S11 | Notepad++ | Editor | 8.5 | DonHo | OpenSource | Windows | 0 | 4.0 | 65 |
| 12 | S12 | VSCode | Editor | 1.89 | Microsoft | OpenSource | Multi | 0 | 4.8 | 92 |
| 13 | S13 | Tableau | BI | 2024 | Salesforce | Commercial | Multi | 3000 | 4.9 | 87 |
| 14 | S14 | PowerBI | BI | 2024 | Microsoft | Commercial | Windows | 1200 | 4.7 | 83 |
| 15 | S15 | VLC | Media | 3.0 | VideoLAN | OpenSource | Multi | 0 | 4.5 | 91 |
| 16 | S16 | AdobeReader | Document | 2024 | Adobe | Commercial | Multi | 100 | 4.2 | 76 |
| 17 | S17 | FoxitReader | Document | 2023 | Foxit | Commercial | Multi | 80 | 4.1 | 70 |
| 18 | S18 | Python | Programming | 3.12 | PythonOrg | OpenSource | Multi | 0 | 4.9 | 95 |
| 19 | S19 | MATLAB | Programming | R2023a | MathWorks | Commercial | Multi | 5000 | 4.6 | 72 |
| 20 | S20 | Git | VersionControl | 2.42 | GitProject | OpenSource | Multi | 0 | 4.7 | 90 |
Step 2: PROC PRINT – View the Dataset
proc print data=software_inventory label;
title "Software Inventory Dataset - First Look";
run;
Output:
| Software Inventory Dataset - First Look |
| Obs | Software_ID | Software_Name | Category | Version | Vendor | License_Type | Platform | Cost | Rating | Popularity |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S01 | Excel | Spreadsheet | 365 | Microsoft | Commercial | Windows | 500 | 4.5 | 95 |
| 2 | S02 | LibreOffice | Spreadsheet | 7.6 | TheDocumentFoundation | OpenSource | Multi | 0 | 4.2 | 70 |
| 3 | S03 | Photoshop | Graphic | 2024 | Adobe | Commercial | Windows | 1200 | 4.8 | 88 |
| 4 | S04 | GIMP | Graphic | 2.10 | GNU | OpenSource | Multi | 0 | 4.1 | 60 |
| 5 | S05 | SAS | Analytics | 9.4 | SAS | Commercial | Windows | 5000 | 4.9 | 85 |
| 6 | S06 | RStudio | Analytics | 2023 | Posit | OpenSource | Multi | 0 | 4.6 | 75 |
| 7 | S07 | MySQL | Database | 8.0 | Oracle | OpenSource | Multi | 0 | 4.5 | 78 |
| 8 | S08 | OracleDB | Database | 19c | Oracle | Commercial | Multi | 2500 | 4.7 | 80 |
| 9 | S09 | Zoom | Communication | 5.17 | Zoom | Commercial | Multi | 250 | 4.3 | 90 |
| 10 | S10 | Teams | Communication | 2023 | Microsoft | Commercial | Multi | 300 | 4.4 | 85 |
| 11 | S11 | Notepad++ | Editor | 8.5 | DonHo | OpenSource | Windows | 0 | 4.0 | 65 |
| 12 | S12 | VSCode | Editor | 1.89 | Microsoft | OpenSource | Multi | 0 | 4.8 | 92 |
| 13 | S13 | Tableau | BI | 2024 | Salesforce | Commercial | Multi | 3000 | 4.9 | 87 |
| 14 | S14 | PowerBI | BI | 2024 | Microsoft | Commercial | Windows | 1200 | 4.7 | 83 |
| 15 | S15 | VLC | Media | 3.0 | VideoLAN | OpenSource | Multi | 0 | 4.5 | 91 |
| 16 | S16 | AdobeReader | Document | 2024 | Adobe | Commercial | Multi | 100 | 4.2 | 76 |
| 17 | S17 | FoxitReader | Document | 2023 | Foxit | Commercial | Multi | 80 | 4.1 | 70 |
| 18 | S18 | Python | Programming | 3.12 | PythonOrg | OpenSource | Multi | 0 | 4.9 | 95 |
| 19 | S19 | MATLAB | Programming | R2023a | MathWorks | Commercial | Multi | 5000 | 4.6 | 72 |
| 20 | S20 | Git | VersionControl | 2.42 | GitProject | OpenSource | Multi | 0 | 4.7 | 90 |
Step 3: PROC SORT – Sorting by Category and Cost
proc sort data=software_inventory out=sorted_software;
by Category descending Cost;
run;
proc print data=sorted_software;
title "Sorted Software Inventory by Category and Cost";
run;
Output:
| Sorted Software Inventory by Category and Cost |
| Obs | Software_ID | Software_Name | Category | Version | Vendor | License_Type | Platform | Cost | Rating | Popularity |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S05 | SAS | Analytics | 9.4 | SAS | Commercial | Windows | 5000 | 4.9 | 85 |
| 2 | S06 | RStudio | Analytics | 2023 | Posit | OpenSource | Multi | 0 | 4.6 | 75 |
| 3 | S13 | Tableau | BI | 2024 | Salesforce | Commercial | Multi | 3000 | 4.9 | 87 |
| 4 | S14 | PowerBI | BI | 2024 | Microsoft | Commercial | Windows | 1200 | 4.7 | 83 |
| 5 | S10 | Teams | Communication | 2023 | Microsoft | Commercial | Multi | 300 | 4.4 | 85 |
| 6 | S09 | Zoom | Communication | 5.17 | Zoom | Commercial | Multi | 250 | 4.3 | 90 |
| 7 | S08 | OracleDB | Database | 19c | Oracle | Commercial | Multi | 2500 | 4.7 | 80 |
| 8 | S07 | MySQL | Database | 8.0 | Oracle | OpenSource | Multi | 0 | 4.5 | 78 |
| 9 | S16 | AdobeReader | Document | 2024 | Adobe | Commercial | Multi | 100 | 4.2 | 76 |
| 10 | S17 | FoxitReader | Document | 2023 | Foxit | Commercial | Multi | 80 | 4.1 | 70 |
| 11 | S11 | Notepad++ | Editor | 8.5 | DonHo | OpenSource | Windows | 0 | 4.0 | 65 |
| 12 | S12 | VSCode | Editor | 1.89 | Microsoft | OpenSource | Multi | 0 | 4.8 | 92 |
| 13 | S03 | Photoshop | Graphic | 2024 | Adobe | Commercial | Windows | 1200 | 4.8 | 88 |
| 14 | S04 | GIMP | Graphic | 2.10 | GNU | OpenSource | Multi | 0 | 4.1 | 60 |
| 15 | S15 | VLC | Media | 3.0 | VideoLAN | OpenSource | Multi | 0 | 4.5 | 91 |
| 16 | S19 | MATLAB | Programming | R2023a | MathWorks | Commercial | Multi | 5000 | 4.6 | 72 |
| 17 | S18 | Python | Programming | 3.12 | PythonOrg | OpenSource | Multi | 0 | 4.9 | 95 |
| 18 | S01 | Excel | Spreadsheet | 365 | Microsoft | Commercial | Windows | 500 | 4.5 | 95 |
| 19 | S02 | LibreOffice | Spreadsheet | 7.6 | TheDocumentFoundation | OpenSource | Multi | 0 | 4.2 | 70 |
| 20 | S20 | Git | VersionControl | 2.42 | GitProject | OpenSource | Multi | 0 | 4.7 | 90 |
Step 4: PROC MEANS – Descriptive Stats of Cost, Rating, Popularity
proc means data=software_inventory mean min max std;
var Cost Rating Popularity;
title "Descriptive Statistics for Cost, Rating, and Popularity";
run;
Output:
| Descriptive Statistics for Cost, Rating, and Popularity |
| Variable | Mean | Minimum | Maximum | Std Dev | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
Step 5: PROC FREQ – Frequency Distribution of License_Type and Platform
proc freq data=software_inventory;
tables License_Type Platform;
title "Frequency Analysis of License Types and Platforms";
run;
Output:
| Frequency Analysis of License Types and Platforms |
| License_Type | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
|---|---|---|---|---|
| Commercial | 11 | 55.00 | 11 | 55.00 |
| OpenSource | 9 | 45.00 | 20 | 100.00 |
| Platform | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
|---|---|---|---|---|
| Multi | 15 | 75.00 | 15 | 75.00 |
| Windows | 5 | 25.00 | 20 | 100.00 |
Step 6: PROC SQL – Complex Querying
a) Average Rating by Software Category
proc sql;
select Category, avg(Rating) as Avg_Rating
from software_inventory
group by Category
order by Avg_Rating desc;
quit;
Output:
| Category | Avg_Rating |
|---|---|
| BI | 4.8 |
| Programming | 4.75 |
| Analytics | 4.75 |
| VersionControl | 4.7 |
| Database | 4.6 |
| Media | 4.5 |
| Graphic | 4.45 |
| Editor | 4.4 |
| Communication | 4.35 |
| Spreadsheet | 4.35 |
| Document | 4.15 |
b) Software Above Average Popularity & Free
proc sql;
select Software_Name, Popularity, License_Type
from software_inventory
where Popularity > (select avg(Popularity) from software_inventory)
and License_Type = 'OpenSource';
quit;
Output:
| Software_Name | Popularity | License_Type |
|---|---|---|
| VSCode | 92 | OpenSource |
| VLC | 91 | OpenSource |
| Python | 95 | OpenSource |
| Git | 90 | OpenSource |
Step 7: PROC FORMAT – Better Labeling for Clarity
proc format;
value $licensefmt
'OpenSource' = 'Free & Open Source'
'Commercial' = 'Paid/Proprietary';
value ratingfmt
low-4 = 'Below Average'
4<-4.5 = 'Average'
4.5<-5 = 'Above Average';
run;
proc print data=software_inventory;
format License_Type $licensefmt. Rating ratingfmt.;
title "Formatted Output of Software Inventory";
run;
Output:
| Formatted Output of Software Inventory |
| Obs | Software_ID | Software_Name | Category | Version | Vendor | License_Type | Platform | Cost | Rating | Popularity |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S01 | Excel | Spreadsheet | 365 | Microsoft | Paid/Proprietary | Windows | 500 | Average | 95 |
| 2 | S02 | LibreOffice | Spreadsheet | 7.6 | TheDocumentFoundation | Free & Open Source | Multi | 0 | Average | 70 |
| 3 | S03 | Photoshop | Graphic | 2024 | Adobe | Paid/Proprietary | Windows | 1200 | Above Average | 88 |
| 4 | S04 | GIMP | Graphic | 2.10 | GNU | Free & Open Source | Multi | 0 | Average | 60 |
| 5 | S05 | SAS | Analytics | 9.4 | SAS | Paid/Proprietary | Windows | 5000 | Above Average | 85 |
| 6 | S06 | RStudio | Analytics | 2023 | Posit | Free & Open Source | Multi | 0 | Above Average | 75 |
| 7 | S07 | MySQL | Database | 8.0 | Oracle | Free & Open Source | Multi | 0 | Average | 78 |
| 8 | S08 | OracleDB | Database | 19c | Oracle | Paid/Proprietary | Multi | 2500 | Above Average | 80 |
| 9 | S09 | Zoom | Communication | 5.17 | Zoom | Paid/Proprietary | Multi | 250 | Average | 90 |
| 10 | S10 | Teams | Communication | 2023 | Microsoft | Paid/Proprietary | Multi | 300 | Average | 85 |
| 11 | S11 | Notepad++ | Editor | 8.5 | DonHo | Free & Open Source | Windows | 0 | Below Average | 65 |
| 12 | S12 | VSCode | Editor | 1.89 | Microsoft | Free & Open Source | Multi | 0 | Above Average | 92 |
| 13 | S13 | Tableau | BI | 2024 | Salesforce | Paid/Proprietary | Multi | 3000 | Above Average | 87 |
| 14 | S14 | PowerBI | BI | 2024 | Microsoft | Paid/Proprietary | Windows | 1200 | Above Average | 83 |
| 15 | S15 | VLC | Media | 3.0 | VideoLAN | Free & Open Source | Multi | 0 | Average | 91 |
| 16 | S16 | AdobeReader | Document | 2024 | Adobe | Paid/Proprietary | Multi | 100 | Average | 76 |
| 17 | S17 | FoxitReader | Document | 2023 | Foxit | Paid/Proprietary | Multi | 80 | Average | 70 |
| 18 | S18 | Python | Programming | 3.12 | PythonOrg | Free & Open Source | Multi | 0 | Above Average | 95 |
| 19 | S19 | MATLAB | Programming | R2023a | MathWorks | Paid/Proprietary | Multi | 5000 | Above Average | 72 |
| 20 | S20 | Git | VersionControl | 2.42 | GitProject | Free & Open Source | Multi | 0 | Above Average | 90 |
Step 8: Creating MACROS to Automate Analysis
Macro 1: Filter Software by Category
%macro filter_by_category(cat);
proc print data=software_inventory;
where Category = "&cat";
title "Filtered Software List for Category: &cat";
run;
%mend;
%filter_by_category(Analytics)
Output:
| Filtered Software List for Category: Analytics |
| Obs | Software_ID | Software_Name | Category | Version | Vendor | License_Type | Platform | Cost | Rating | Popularity |
|---|---|---|---|---|---|---|---|---|---|---|
| 5 | S05 | SAS | Analytics | 9.4 | SAS | Commercial | Windows | 5000 | 4.9 | 85 |
| 6 | S06 | RStudio | Analytics | 2023 | Posit | OpenSource | Multi | 0 | 4.6 | 75 |
%filter_by_category(Programming)
Output:
| Filtered Software List for Category: Programming |
| Obs | Software_ID | Software_Name | Category | Version | Vendor | License_Type | Platform | Cost | Rating | Popularity |
|---|---|---|---|---|---|---|---|---|---|---|
| 18 | S18 | Python | Programming | 3.12 | PythonOrg | OpenSource | Multi | 0 | 4.9 | 95 |
| 19 | S19 | MATLAB | Programming | R2023a | MathWorks | Commercial | Multi | 5000 | 4.6 | 72 |
Macro 2: Summarize by Platform
%macro summary_by_platform(platform);
proc sql;
select count(*) as Software_Count,
avg(Cost) as Avg_Cost,
avg(Popularity) as Avg_Popularity
from software_inventory
where Platform = "&platform";
quit;
%mend;
%summary_by_platform(Windows)
Output:
| Software_Count | Avg_Cost | Avg_Popularity |
|---|---|---|
| 5 | 1580 | 83.2 |
%summary_by_platform(Multi)
Output:
| Software_Count | Avg_Cost | Avg_Popularity |
|---|---|---|
| 15 | 748.6667 | 80.73333 |
Step 9: Additional Logical Variables – High Cost or Not
data software_inventory_updated;
set software_inventory;
High_Cost = (Cost > 1000);
run;
proc print;run;
Output:
| Obs | Software_ID | Software_Name | Category | Version | Vendor | License_Type | Platform | Cost | Rating | Popularity | High_Cost |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | S01 | Excel | Spreadsheet | 365 | Microsoft | Commercial | Windows | 500 | 4.5 | 95 | 0 |
| 2 | S02 | LibreOffice | Spreadsheet | 7.6 | TheDocumentFoundation | OpenSource | Multi | 0 | 4.2 | 70 | 0 |
| 3 | S03 | Photoshop | Graphic | 2024 | Adobe | Commercial | Windows | 1200 | 4.8 | 88 | 1 |
| 4 | S04 | GIMP | Graphic | 2.10 | GNU | OpenSource | Multi | 0 | 4.1 | 60 | 0 |
| 5 | S05 | SAS | Analytics | 9.4 | SAS | Commercial | Windows | 5000 | 4.9 | 85 | 1 |
| 6 | S06 | RStudio | Analytics | 2023 | Posit | OpenSource | Multi | 0 | 4.6 | 75 | 0 |
| 7 | S07 | MySQL | Database | 8.0 | Oracle | OpenSource | Multi | 0 | 4.5 | 78 | 0 |
| 8 | S08 | OracleDB | Database | 19c | Oracle | Commercial | Multi | 2500 | 4.7 | 80 | 1 |
| 9 | S09 | Zoom | Communication | 5.17 | Zoom | Commercial | Multi | 250 | 4.3 | 90 | 0 |
| 10 | S10 | Teams | Communication | 2023 | Microsoft | Commercial | Multi | 300 | 4.4 | 85 | 0 |
| 11 | S11 | Notepad++ | Editor | 8.5 | DonHo | OpenSource | Windows | 0 | 4.0 | 65 | 0 |
| 12 | S12 | VSCode | Editor | 1.89 | Microsoft | OpenSource | Multi | 0 | 4.8 | 92 | 0 |
| 13 | S13 | Tableau | BI | 2024 | Salesforce | Commercial | Multi | 3000 | 4.9 | 87 | 1 |
| 14 | S14 | PowerBI | BI | 2024 | Microsoft | Commercial | Windows | 1200 | 4.7 | 83 | 1 |
| 15 | S15 | VLC | Media | 3.0 | VideoLAN | OpenSource | Multi | 0 | 4.5 | 91 | 0 |
| 16 | S16 | AdobeReader | Document | 2024 | Adobe | Commercial | Multi | 100 | 4.2 | 76 | 0 |
| 17 | S17 | FoxitReader | Document | 2023 | Foxit | Commercial | Multi | 80 | 4.1 | 70 | 0 |
| 18 | S18 | Python | Programming | 3.12 | PythonOrg | OpenSource | Multi | 0 | 4.9 | 95 | 0 |
| 19 | S19 | MATLAB | Programming | R2023a | MathWorks | Commercial | Multi | 5000 | 4.6 | 72 | 1 |
| 20 | S20 | Git | VersionControl | 2.42 | GitProject | OpenSource | Multi | 0 | 4.7 | 90 | 0 |
proc freq data=software_inventory_updated;
tables High_Cost;
title "High Cost Software Count";
run;
Output:
| High Cost Software Count |
| High_Cost | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
|---|---|---|---|---|
| 0 | 14 | 70.00 | 14 | 70.00 |
| 1 | 6 | 30.00 | 20 | 100.00 |
Step 10: PROC REPORT – Stylish Summary
proc report data=software_inventory nowd;
column Category Software_Name Cost Rating Popularity;
define Category / group;
define Software_Name / display;
define Cost / analysis mean;
define Rating / analysis mean;
define Popularity / analysis mean;
title "Report by Software Category";
run;
Output:
| Report by Software Category |
| Category | Software_Name | Cost | Rating | Popularity |
|---|---|---|---|---|
| Analytics | SAS | 5000 | 4.9 | 85 |
| RStudio | 0 | 4.6 | 75 | |
| BI | Tableau | 3000 | 4.9 | 87 |
| PowerBI | 1200 | 4.7 | 83 | |
| Communication | Zoom | 250 | 4.3 | 90 |
| Teams | 300 | 4.4 | 85 | |
| Database | MySQL | 0 | 4.5 | 78 |
| OracleDB | 2500 | 4.7 | 80 | |
| Document | AdobeReader | 100 | 4.2 | 76 |
| FoxitReader | 80 | 4.1 | 70 | |
| Editor | Notepad++ | 0 | 4 | 65 |
| VSCode | 0 | 4.8 | 92 | |
| Graphic | Photoshop | 1200 | 4.8 | 88 |
| GIMP | 0 | 4.1 | 60 | |
| Media | VLC | 0 | 4.5 | 91 |
| Programming | Python | 0 | 4.9 | 95 |
| MATLAB | 5000 | 4.6 | 72 | |
| Spreadsheet | Excel | 500 | 4.5 | 95 |
| LibreOffice | 0 | 4.2 | 70 | |
| VersionControl | Git | 0 | 4.7 | 90 |
Step 11: Visualizations
proc sgplot data=software_inventory;
vbar Category / response=Cost stat=mean datalabel;
title "Average Cost by Software Category";
run;
Log:
Comments
Post a Comment