Saturday, 6 September 2025

281.SOFTWARE COMPANY ANALYSIS USING PROC CONTENTS | PROC PRINT | PROC MEANS | PROC FREQ | PROC SORT | PROC REPORT | PROC SQL | PROC REG

SOFTWARE COMPANY ANALYSIS USING PROC CONTENTS | PROC PRINT | PROC MEANS | PROC FREQ | PROC SORT | PROC REPORT | PROC SQL | PROC REG

/*Creating The Dataset Of Software Company */

1) Create the dataset

options nocenter;

data software_companies;

    infile datalines dsd truncover;

    length CompanyID 8 CompanyName $40 FoundedYear 8 Employees 8

           Revenue_MnUSD 8 PrimaryProduct $30 HQ_City $30 Valuation_MnUSD 8;

    input CompanyID CompanyName :$40. FoundedYear Employees Revenue_MnUSD

          PrimaryProduct :$30. HQ_City :$30. Valuation_MnUSD;

    RevenuePerEmployee = round((Revenue_MnUSD*1000000)/max(Employees,1),0.01);

    format RevenuePerEmployee comma12.2 Revenue_MnUSD Valuation_MnUSD comma12.2;

    datalines;

1,BlueOrbit Solutions,2008,120,18.5,Cloud ERP,Hyderabad,120.0

2,NeuraCode Labs,2015,45,6.2,AI Analytics,Bengaluru,32.0

3,StackWave Systems,2000,520,210.3,DevOps Platform,Pune,950.0

4,PixelForge Studio,2012,30,2.1,Mobile Games,Mumbai,11.5

5,QuantumBridge,2018,80,25.0,Blockchain Infra,Chennai,140.0

6,SafeGuard Tech,2005,220,55.4,Cybersecurity,New Delhi,410.0

7,DataSpring Inc,2010,150,38.0,Data Integration,Bengaluru,210.0

8,CloudVista,2014,300,90.7,Cloud Hosting,Hyderabad,480.0

9,EdgeLeap Systems,2020,25,1.0,Edge IoT,Coimbatore,6.5

10,OpenStream Soft,1998,900,512.3,Streaming Infra,Mumbai,3200.0

11,GreenByte Solutions,2003,60,7.4,Enterprise SaaS,Ahmedabad,45.0

12,MonoTech Labs,2016,12,0.4,Prototype Tools,Visakhapatnam,2.0

;

run;

proc print;run;

Output:

ObsCompanyIDCompanyNameFoundedYearEmployeesRevenue_MnUSDPrimaryProductHQ_CityValuation_MnUSDRevenuePerEmployee
11BlueOrbit Solutions200812018.50Cloud ERPHyderabad120.00154,166.67
22NeuraCode Labs2015456.20AI AnalyticsBengaluru32.00137,777.78
33StackWave Systems2000520210.30DevOps PlatformPune950.00404,423.08
44PixelForge Studio2012302.10Mobile GamesMumbai11.5070,000.00
55QuantumBridge20188025.00Blockchain InfraChennai140.00312,500.00
66SafeGuard Tech200522055.40CybersecurityNew Delhi410.00251,818.18
77DataSpring Inc201015038.00Data IntegrationBengaluru210.00253,333.33
88CloudVista201430090.70Cloud HostingHyderabad480.00302,333.33
99EdgeLeap Systems2020251.00Edge IoTCoimbatore6.5040,000.00
1010OpenStream Soft1998900512.30Streaming InfraMumbai3,200.00569,222.22
1111GreenByte Solutions2003607.40Enterprise SaaSAhmedabad45.00123,333.33
1212MonoTech Labs2016120.40Prototype ToolsVisakhapatnam2.0033,333.33


2) Basic exploration with PROCs 

PROC CONTENTS — Purpose: Show dataset structure and variable attributes in one view.

proc contents data=software_companies; 

run;

Output:

The CONTENTS Procedure

Data Set NameWORK.SOFTWARE_COMPANIESObservations12
Member TypeDATAVariables9
EngineV9Indexes0
Created09/06/2025 18:43:25Observation Length152
Last Modified09/06/2025 18:43:25Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
Engine/Host Dependent Information
Data Set Page Size131072
Number of Data Set Pages1
First Data Page1
Max Obs per Page861
Obs in First Data Page12
Number of Data Set Repairs0
Filename/saswork/SAS_work47260000B4E9_odaws01-apse1-2.oda.sas.com/SAS_work70210000B4E9_odaws01-apse1-2.oda.sas.com/software_companies.sas7bdat
Release Created9.0401M8
Host CreatedLinux
Inode Number134456831
Access Permissionrw-r--r--
Owner Nameu63247146
File Size256KB
File Size (bytes)262144
Alphabetic List of Variables and Attributes
#VariableTypeLenFormat
1CompanyIDNum8 
2CompanyNameChar40 
4EmployeesNum8 
3FoundedYearNum8 
7HQ_CityChar30 
6PrimaryProductChar30 
9RevenuePerEmployeeNum8COMMA12.2
5Revenue_MnUSDNum8COMMA12.2
8Valuation_MnUSDNum8COMMA12.2

PROC PRINT — Purpose: Print the raw observations for quick manual inspection.

proc print data=software_companies noobs;

run;

Output:

CompanyIDCompanyNameFoundedYearEmployeesRevenue_MnUSDPrimaryProductHQ_CityValuation_MnUSDRevenuePerEmployee
1BlueOrbit Solutions200812018.50Cloud ERPHyderabad120.00154,166.67
2NeuraCode Labs2015456.20AI AnalyticsBengaluru32.00137,777.78
3StackWave Systems2000520210.30DevOps PlatformPune950.00404,423.08
4PixelForge Studio2012302.10Mobile GamesMumbai11.5070,000.00
5QuantumBridge20188025.00Blockchain InfraChennai140.00312,500.00
6SafeGuard Tech200522055.40CybersecurityNew Delhi410.00251,818.18
7DataSpring Inc201015038.00Data IntegrationBengaluru210.00253,333.33
8CloudVista201430090.70Cloud HostingHyderabad480.00302,333.33
9EdgeLeap Systems2020251.00Edge IoTCoimbatore6.5040,000.00
10OpenStream Soft1998900512.30Streaming InfraMumbai3,200.00569,222.22
11GreenByte Solutions2003607.40Enterprise SaaSAhmedabad45.00123,333.33
12MonoTech Labs2016120.40Prototype ToolsVisakhapatnam2.0033,333.33


PROC MEANS — Purpose: Get numeric summary statistics (N, mean, std, min, max) for numeric variables.

proc means data=software_companies n mean std min max maxdec=2;

    var FoundedYear Employees Revenue_MnUSD Valuation_MnUSD RevenuePerEmployee;

run;

Output:

The MEANS Procedure

VariableNMeanStd DevMinimumMaximum
FoundedYear
Employees
Revenue_MnUSD
Valuation_MnUSD
RevenuePerEmployee
12
12
12
12
12
2009.92
205.17
80.61
467.25
221020.10
7.18
263.59
148.48
904.66
160566.47
1998.00
12.00
0.40
2.00
33333.33
2020.00
900.00
512.30
3200.00
569222.22

PROC FREQ — Purpose: Frequency counts for categorical variables to see distribution.

proc freq data=software_companies;

    tables PrimaryProduct HQ_City / nocum nopercent;

run;

Output:

The FREQ Procedure

PrimaryProductFrequency
AI Analytics1
Blockchain Infra1
Cloud ERP1
Cloud Hosting1
Cybersecurity1
Data Integration1
DevOps Platform1
Edge IoT1
Enterprise SaaS1
Mobile Games1
Prototype Tools1
Streaming Infra1
HQ_CityFrequency
Ahmedabad1
Bengaluru2
Chennai1
Coimbatore1
Hyderabad2
Mumbai2
New Delhi1
Pune1
Visakhapatnam1

PROC SORT — Purpose: Sort dataset by metric (e.g., Revenue) for reporting.

proc sort data=software_companies out=swc_sorted_by_revenue;

    by descending Revenue_MnUSD;

run;

proc print data=swc_sorted_by_revenue;run;

Output:

ObsCompanyIDCompanyNameFoundedYearEmployeesRevenue_MnUSDPrimaryProductHQ_CityValuation_MnUSDRevenuePerEmployee
110OpenStream Soft1998900512.30Streaming InfraMumbai3,200.00569,222.22
23StackWave Systems2000520210.30DevOps PlatformPune950.00404,423.08
38CloudVista201430090.70Cloud HostingHyderabad480.00302,333.33
46SafeGuard Tech200522055.40CybersecurityNew Delhi410.00251,818.18
57DataSpring Inc201015038.00Data IntegrationBengaluru210.00253,333.33
65QuantumBridge20188025.00Blockchain InfraChennai140.00312,500.00
71BlueOrbit Solutions200812018.50Cloud ERPHyderabad120.00154,166.67
811GreenByte Solutions2003607.40Enterprise SaaSAhmedabad45.00123,333.33
92NeuraCode Labs2015456.20AI AnalyticsBengaluru32.00137,777.78
104PixelForge Studio2012302.10Mobile GamesMumbai11.5070,000.00
119EdgeLeap Systems2020251.00Edge IoTCoimbatore6.5040,000.00
1212MonoTech Labs2016120.40Prototype ToolsVisakhapatnam2.0033,333.33


PROC REPORT — Purpose: Create a compact summary table combining computed columns for reporting.

proc report data=swc_sorted_by_revenue nowd;

    columns CompanyID CompanyName HQ_City Employees Revenue_MnUSD RevenuePerEmployee Valuation_MnUSD;

    define CompanyName / display width=30;

    define Revenue_MnUSD / analysis format=comma12.2;

    define RevenuePerEmployee / analysis format=comma12.2;

run;

Output:

CompanyIDCompanyNameHQ_CityEmployeesRevenue_MnUSDRevenuePerEmployeeValuation_MnUSD
10OpenStream SoftMumbai900512.30569,222.223,200.00
3StackWave SystemsPune520210.30404,423.08950.00
8CloudVistaHyderabad30090.70302,333.33480.00
6SafeGuard TechNew Delhi22055.40251,818.18410.00
7DataSpring IncBengaluru15038.00253,333.33210.00
5QuantumBridgeChennai8025.00312,500.00140.00
1BlueOrbit SolutionsHyderabad12018.50154,166.67120.00
11GreenByte SolutionsAhmedabad607.40123,333.3345.00
2NeuraCode LabsBengaluru456.20137,777.7832.00
4PixelForge StudioMumbai302.1070,000.0011.50
9EdgeLeap SystemsCoimbatore251.0040,000.006.50
12MonoTech LabsVisakhapatnam120.4033,333.332.00


3) PROC SQL examples 

PROC SQL — Purpose: Create aggregated summary 

                    (total revenue and avg employees by HQ\_City).

proc sql;

    create table city_summary as

    select HQ_City,

           count(*) as NumCompanies,

           sum(Revenue_MnUSD) as TotalRevenue_MnUSD format=comma12.2,

           mean(Employees) as AvgEmployees format=8.2

    from software_companies

    group by HQ_City

    order by calculated TotalRevenue_MnUSD desc;

quit;

proc print data=city_summary;run;

Output:

ObsHQ_CityNumCompaniesTotalRevenue_MnUSDAvgEmployees
1Mumbai2514.40465.00
2Pune1210.30520.00
3Hyderabad2109.20210.00
4New Delhi155.40220.00
5Bengaluru244.2097.50
6Chennai125.0080.00
7Ahmedabad17.4060.00
8Coimbatore11.0025.00
9Visakhapatnam10.4012.00


PROC SQL — Purpose: Select top 5 companies by valuation.

proc sql outobs=5;

    select CompanyID, CompanyName, Valuation_MnUSD

    from software_companies

    order by Valuation_MnUSD desc;

quit;

Output:

CompanyIDCompanyNameValuation_MnUSD
10OpenStream Soft3,200.00
3StackWave Systems950.00
8CloudVista480.00
6SafeGuard Tech410.00
7DataSpring Inc210.00


PROC SQL — Purpose: Inner join with a simulated investment table to show merge via SQL.

data investments;

    infile datalines dsd truncover;

    length CompanyID 8 Investor $40 Investment_MnUSD 8 Round $10;

    input CompanyID Investor :$40. Investment_MnUSD Round :$10.;

    datalines;

1,AlphaVentures,5.0,SeriesA

2,NeuroFund,2.0,Seed

3,InfraCap,50.0,SeriesC

6,ShieldPartners,20.0,SeriesB

8,CloudHoldings,100.0,SeriesD

10,StreamGlobal,200.0,SeriesF

7,DataSeed,10.0,SeriesB

;

run;

proc print;run;

Output:

ObsCompanyIDInvestorInvestment_MnUSDRound
11AlphaVentures5SeriesA
22NeuroFund2Seed
33InfraCap50SeriesC
46ShieldPartners20SeriesB
58CloudHoldings100SeriesD
610StreamGlobal200SeriesF
77DataSeed10SeriesB


proc sql;

    create table company_investments as

    select a.*, b.Investor, b.Investment_MnUSD, b.Round

    from software_companies as a

    left join investments as b

    on a.CompanyID = b.CompanyID;

quit;

proc print data=company_investments;run;

Output:

ObsCompanyIDCompanyNameFoundedYearEmployeesRevenue_MnUSDPrimaryProductHQ_CityValuation_MnUSDRevenuePerEmployeeInvestorInvestment_MnUSDRound
11BlueOrbit Solutions200812018.50Cloud ERPHyderabad120.00154,166.67AlphaVentures5SeriesA
22NeuraCode Labs2015456.20AI AnalyticsBengaluru32.00137,777.78NeuroFund2Seed
33StackWave Systems2000520210.30DevOps PlatformPune950.00404,423.08InfraCap50SeriesC
44PixelForge Studio2012302.10Mobile GamesMumbai11.5070,000.00 . 
55QuantumBridge20188025.00Blockchain InfraChennai140.00312,500.00 . 
66SafeGuard Tech200522055.40CybersecurityNew Delhi410.00251,818.18ShieldPartners20SeriesB
77DataSpring Inc201015038.00Data IntegrationBengaluru210.00253,333.33DataSeed10SeriesB
88CloudVista201430090.70Cloud HostingHyderabad480.00302,333.33CloudHoldings100SeriesD
99EdgeLeap Systems2020251.00Edge IoTCoimbatore6.5040,000.00 . 
1010OpenStream Soft1998900512.30Streaming InfraMumbai3,200.00569,222.22StreamGlobal200SeriesF
1111GreenByte Solutions2003607.40Enterprise SaaSAhmedabad45.00123,333.33 . 
1212MonoTech Labs2016120.40Prototype ToolsVisakhapatnam2.0033,333.33 . 


4) SAS MACROS

Macro: %top_companies_by_metric

— Purpose: Reusable macro to print top N companies by any numeric metric.

%macro top_companies_by_metric(metric=Revenue_MnUSD, n=5);

    %put NOTE: Running top_companies_by_metric for &metric, top &n.;

    proc sql outobs=&n;

        select CompanyID, CompanyName, &metric

        from software_companies

        order by &metric desc;

    quit;

%mend top_companies_by_metric;


%top_companies_by_metric(metric=Valuation_MnUSD,n=5);

Output:

CompanyIDCompanyNameValuation_MnUSD
10OpenStream Soft3,200.00
3StackWave Systems950.00
8CloudVista480.00
6SafeGuard Tech410.00
7DataSpring Inc210.00

%top_companies_by_metric(metric=Revenue_MnUSD,n=3);

Output:

CompanyIDCompanyNameRevenue_MnUSD
10OpenStream Soft512.30
3StackWave Systems210.30
8CloudVista90.70

Macro: %segment_filter 

— Purpose: Create a dataset for companies founded before/after a given year.

%macro segment_filter(year=2010, outds=before_after);

    %if &year = %then %let year=2010;

    data &outds;

        set software_companies;

        if FoundedYear < &year then Segment = 'Before_&year';

        else Segment = 'After_&year';

    run;

    proc freq data=&outds; tables Segment / nocum nopercent; run;

%mend segment_filter;


%segment_filter(year=2010,outds=seg_2010);

Output:

The FREQ Procedure

SegmentFrequency
After_&year7
Before_&year5

Macro: %calc_growth_estimate

— Purpose: Estimate simple hypothetical valuation growth over X years at assumed CAGR.

%macro calc_growth_estimate(cagr=0.20, years=3);

    data growth_estimates;

        set software_companies;

        FutureValuation = round(Valuation_MnUSD * (1 + &cagr)**&years,0.1);

        CAGR = &cagr;

        Years = &years;

    run;

    proc sort data=growth_estimates; by descending FutureValuation; run;

    proc print data=growth_estimates (obs=12) noobs;

        var CompanyName Valuation_MnUSD FutureValuation Years CAGR;

    run;

%mend calc_growth_estimate;


%calc_growth_estimate(cagr=0.25, years=5);

Output:

CompanyNameValuation_MnUSDFutureValuationYearsCAGR
OpenStream Soft3,200.009765.650.25
StackWave Systems950.002899.250.25
CloudVista480.001464.850.25
SafeGuard Tech410.001251.250.25
DataSpring Inc210.00640.950.25
QuantumBridge140.00427.250.25
BlueOrbit Solutions120.00366.250.25
GreenByte Solutions45.00137.350.25
NeuraCode Labs32.0097.750.25
PixelForge Studio11.5035.150.25
EdgeLeap Systems6.5019.850.25
MonoTech Labs2.006.150.25


5) Validation & QC checks 

Purpose: Ensure data sanity

proc sort data=software_companies out=swc_dup nodupkey dupout=dups;

 by CompanyID;

run;

proc print data=dups;

 title 'Duplicate CompanyIDs if any';

run;

Log:

NOTE: 0 observations with duplicate key values were deleted.

proc freq data=software_companies; 

 tables Employees / missing; 

run;

Output:

The FREQ Procedure

EmployeesFrequencyPercentCumulative
Frequency
Cumulative
Percent
1218.3318.33
2518.33216.67
3018.33325.00
4518.33433.33
6018.33541.67
8018.33650.00
12018.33758.33
15018.33866.67
22018.33975.00
30018.331083.33
52018.331191.67
90018.3312100.00
CountNegEmployees
0

proc sql;

    select count(*) as CountNegEmployees from software_companies

    where Employees < 0 or Revenue_MnUSD < 0 or Valuation_MnUSD < 0;

quit;

Output:

6) Example advanced analytics 

Purpose: simple linear regression of valuation on revenue & employees

proc reg data=software_companies;

    model Valuation_MnUSD = Revenue_MnUSD Employees;

    title 'Simple regression: Valuation predicted by Revenue and Employees';

run; quit;

Output:

Simple regression: Valuation predicted by Revenue and Employees

The REG Procedure

Model: MODEL1

Dependent Variable: Valuation_MnUSD

Number of Observations Read12
Number of Observations Used12
Analysis of Variance
SourceDFSum of
Squares
Mean
Square
F ValuePr > F
Model289238844461942511.09<.0001
Error9785738730.32769  
Corrected Total119002457   
Root MSE93.43622R-Square0.9913
Dependent Mean467.25000Adj R-Sq0.9893
Coeff Var19.99705  
Parameter Estimates
VariableDFParameter
Estimate
Standard
Error
t ValuePr > |t|
Intercept156.1044346.936711.200.2625
Revenue_MnUSD18.110220.964388.41<.0001
Employees1-1.182480.54324-2.180.0575

Simple regression: Valuation predicted by Revenue and Employees

The REG Procedure

Model: MODEL1

Dependent Variable: Valuation_MnUSD

Panel of fit diagnostics for Valuation_MnUSD.







To Visit My Previous Online Streaming Flatform Dataset:Click Here
To Visit My Previous Statistical Evaluation Of Clinical Trials:Click Here
To Visit My Previous Unlocking Retail Insights Dataset:Click Here
To Visit My Previous Sas Interview Questions-1:Click Here  



Follow Us On : 


 


--- FOLLOW OUR BLOG FOR MORE INFORMATION.

--->PLEASE DO COMMENTS AND SHARE OUR BLOG.

No comments:

Post a Comment