Friday, 23 May 2025

188.COMPREHENSIVE SAS ANALYSIS OF WEBSITE DATASET | DATA STEP CREATION | PROC CONTENTS FOR METADATA | PROC MEANS AND PROC FREQ FOR STATISTICS | PROC SQL FOR DATA MANIPULATION | PROC SORT FOR RANKING | PROC TRANSPOSE FOR RESHAPING | PROC EXPORT FOR OUTPUT | MACROS FOR AUTOMATION

COMPREHENSIVE SAS ANALYSIS OF WEBSITE DATASET | DATA STEP CREATION | PROC CONTENTS FOR METADATA | PROC MEANS AND PROC FREQ FOR STATISTICS | PROC SQL FOR DATA MANIPULATION | PROC SORT FOR RANKING | PROC TRANSPOSE FOR RESHAPING | PROC EXPORT FOR OUTPUT | MACROS FOR AUTOMATION

/*Creating a dataset of different types of websites*/

Step 1: Creating the Raw Dataset

data websites_info;

    length Name $20 Type $15 Region $15;

    input Website_ID Name $ Type $ Monthly_Visitors Launch_Year Region $;

    datalines;

1 Amazon         E-commerce    2500 1994 Global

2 Facebook       Social_Media  2900 2004 Global

3 CNN            News           120 1995 USA

4 Alibaba        E-commerce    1100 1999 Asia

5 Twitter        Social_Media   550 2006 Global

6 Flipkart       E-commerce     350 2007 India

7 BBC            News           160 1922 UK

8 LinkedIn       Social_Media   800 2003 Global

9 TimesofIndia   News           180 1838 India

10 Etsy          E-commerce     400 2005 Global

;

run;

proc print;run;

Output:

Obs Name Type Region Website_ID Monthly_Visitors Launch_Year
1 Amazon E-commerce Global 1 2500 1994
2 Facebook Social_Media Global 2 2900 2004
3 CNN News USA 3 120 1995
4 Alibaba E-commerce Asia 4 1100 1999
5 Twitter Social_Media Global 5 550 2006
6 Flipkart E-commerce India 6 350 2007
7 BBC News UK 7 160 1922
8 LinkedIn Social_Media Global 8 800 2003
9 TimesofIndia News India 9 180 1838
10 Etsy E-commerce Global 10 400 2005


Step 2: Exploring the Dataset

proc contents data=websites_info;

run;

Output:

The CONTENTS Procedure

Data Set Name WORK.WEBSITES_INFO Observations 10
Member Type DATA Variables 6
Engine V9 Indexes 0
Created 14/09/2015 01:39:11 Observation Length 80
Last Modified 14/09/2015 01:39:11 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 817
Obs in First Data Page 10
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD10316_DESKTOP-QFAA4KV_\websites_info.sas7bdat
Release Created 9.0401M2
Host Created X64_8HOME


Alphabetic List of Variables and Attributes
# Variable Type Len
6 Launch_Year Num 8
5 Monthly_Visitors Num 8
1 Name Char 20
3 Region Char 15
2 Type Char 15
4 Website_ID Num 8


Step 3: Summary Statistics

PROC MEANS

Summarizing continuous variables:

proc means data=websites_info mean median max min std;

    var Monthly_Visitors Launch_Year;

run;

Output:

The MEANS Procedure

Variable Mean Median Maximum Minimum Std Dev
Monthly_Visitors
Launch_Year
906.0000000
1977.30
475.0000000
2001.00
2900.00
2007.00
120.0000000
1838.00
997.7218495
55.1483857


PROC FREQ

To get the frequency of each website type:

proc freq data=websites_info;

    tables Type Region;

run;

Output:

The FREQ Procedure

Type Frequency Percent Cumulative
Frequency
Cumulative
Percent
E-commerce 4 40.00 4 40.00
News 3 30.00 7 70.00
Social_Media 3 30.00 10 100.00


Region Frequency Percent Cumulative
Frequency
Cumulative
Percent
Asia 1 10.00 1 10.00
Global 5 50.00 6 60.00
India 2 20.00 8 80.00
UK 1 10.00 9 90.00
USA 1 10.00 10 100.00


Step 4: Aggregation Using PROC SQL

Total Visitors by Website Type

proc sql;

    select Type, 

           count(*) as Total_Websites, 

           sum(Monthly_Visitors) as Total_Visitors

    from websites_info

    group by Type;

quit;

Output:

Type Total_Websites Total_Visitors
E-commerce 4 4350
News 3 460
Social_Media 3 4250

Step 5: Creating New Variables

data websites_enriched;

    set websites_info;

    Website_Age = year("23may2025"d) - Launch_Year;

run;

proc print;run;

Output:

Obs Name Type Region Website_ID Monthly_Visitors Launch_Year Website_Age
1 Amazon E-commerce Global 1 2500 1994 31
2 Facebook Social_Media Global 2 2900 2004 21
3 CNN News USA 3 120 1995 30
4 Alibaba E-commerce Asia 4 1100 1999 26
5 Twitter Social_Media Global 5 550 2006 19
6 Flipkart E-commerce India 6 350 2007 18
7 BBC News UK 7 160 1922 103
8 LinkedIn Social_Media Global 8 800 2003 22
9 TimesofIndia News India 9 180 1838 187
10 Etsy E-commerce Global 10 400 2005 20


PROC UNIVARIATE

Understanding the distribution of website age:

proc univariate data=websites_enriched;

    var Website_Age;

run;

Output:

The UNIVARIATE Procedure
Variable: Website_Age

Moments
N 10 Sum Weights 10
Mean 47.7 Sum Observations 477
Std Deviation 55.1483857 Variance 3041.34444
Skewness 2.27276154 Kurtosis 4.84585541
Uncorrected SS 50125 Corrected SS 27372.1
Coeff Variation 115.615064 Std Error Mean 17.4394508


Basic Statistical Measures
Location Variability
Mean 47.70000 Std Deviation 55.14839
Median 24.00000 Variance 3041
Mode . Range 169.00000
    Interquartile Range 11.00000


Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 2.735178 Pr > |t| 0.0230
Sign M 5 Pr >= |M| 0.0020
Signed Rank S 27.5 Pr >= |S| 0.0020


Quantiles (Definition 5)
Level Quantile
100% Max 187.0
99% 187.0
95% 187.0
90% 145.0
75% Q3 31.0
50% Median 24.0
25% Q1 20.0
10% 18.5
5% 18.0
1% 18.0
0% Min 18.0


Extreme Observations
Lowest Highest
Value Obs Value Obs
18 6 26 4
19 5 30 3
20 10 31 1
21 2 103 7
22 8 187 9


Step 6: Using MACROS

%macro website_report(webtype);

    title "Report for &webtype Websites";

    

    proc sql;

        select * from websites_info

        where Type="&webtype";

    quit;


    proc means data=websites_info;

        where Type="&webtype";

        var Monthly_Visitors;

    run;


    proc freq data=websites_info;

        where Type="&webtype";

        tables Region;

    run;

    

    title;

%mend website_report;


/* Generate reports */

%website_report(E-commerce);

Output:

Report for E-commerce Websites

Name Type Region Website_ID Monthly_Visitors Launch_Year
Amazon E-commerce Global 1 2500 1994
Alibaba E-commerce Asia 4 1100 1999
Flipkart E-commerce India 6 350 2007
Etsy E-commerce Global 10 400 2005


Report for E-commerce Websites

The MEANS Procedure

Analysis Variable : Monthly_Visitors
N Mean Std Dev Minimum Maximum
4 1087.50 1001.98 350.0000000 2500.00

Report for E-commerce Websites

The FREQ Procedure

Region Frequency Percent Cumulative
Frequency
Cumulative
Percent
Asia 1 25.00 1 25.00
Global 2 50.00 3 75.00
India 1 25.00 4 100.00

%website_report(Social_Media);

Output:

Report for Social_Media Websites

Name Type Region Website_ID Monthly_Visitors Launch_Year
Facebook Social_Media Global 2 2900 2004
Twitter Social_Media Global 5 550 2006
LinkedIn Social_Media Global 8 800 2003

Report for Social_Media Websites

The MEANS Procedure

Analysis Variable : Monthly_Visitors
N Mean Std Dev Minimum Maximum
3 1416.67 1290.67 550.0000000 2900.00

Report for Social_Media Websites

The FREQ Procedure

Region Frequency Percent Cumulative
Frequency
Cumulative
Percent
Global 3 100.00 3 100.00

Step 7: Sorting and Filtering

proc sort data=websites_info out=sorted_websites;

    by descending Monthly_Visitors;

run;


Filter: Websites Launched After 2000

data post2000;

    set websites_info;

    if Launch_Year > 2000;

run;


proc print data=post2000;

    title "Websites Launched After 2000";

run;

Output:

Websites Launched After 2000

Obs Name Type Region Website_ID Monthly_Visitors Launch_Year
1 Facebook Social_Media Global 2 2900 2004
2 Twitter Social_Media Global 5 550 2006
3 Flipkart E-commerce India 6 350 2007
4 LinkedIn Social_Media Global 8 800 2003
5 Etsy E-commerce Global 10 400 2005


Step 8: Classifying Based on Visitors

data websites_classified;

    set websites_info;

    length Visitor_Category $15.;

    if Monthly_Visitors >= 1000 then Visitor_Category = "Very High";

    else if Monthly_Visitors >= 500 then Visitor_Category = "High";

    else if Monthly_Visitors >= 200 then Visitor_Category = "Medium";

    else Visitor_Category = "Low";

run;

proc print;run;

Output:

Obs Name Type Region Website_ID Monthly_Visitors Launch_Year Visitor_Category
1 Amazon E-commerce Global 1 2500 1994 Very High
2 Facebook Social_Media Global 2 2900 2004 Very High
3 CNN News USA 3 120 1995 Low
4 Alibaba E-commerce Asia 4 1100 1999 Very High
5 Twitter Social_Media Global 5 550 2006 High
6 Flipkart E-commerce India 6 350 2007 Medium
7 BBC News UK 7 160 1922 Low
8 LinkedIn Social_Media Global 8 800 2003 High
9 TimesofIndia News India 9 180 1838 Low
10 Etsy E-commerce Global 10 400 2005 Medium


proc freq data=websites_classified;

    tables Visitor_Category;

run;

Output:

The FREQ Procedure

Visitor_Category Frequency Percent Cumulative
Frequency
Cumulative
Percent
High 2 20.00 2 20.00
Low 3 30.00 5 50.00
Medium 2 20.00 7 70.00
Very High 3 30.00 10 100.00


Step 9: Using PROC TRANSPOSE

proc transpose data=websites_info out=transposed_visitors;

    by Region notsorted;

    id Name;

    var Monthly_Visitors;

run;

proc print;run;

Output:

Obs Region _NAME_ Amazon Facebook CNN Alibaba Twitter Flipkart BBC LinkedIn TimesofIndia Etsy
1 Global Monthly_Visitors 2500 2900 . . . . . . . .
2 USA Monthly_Visitors . . 120 . . . . . . .
3 Asia Monthly_Visitors . . . 1100 . . . . . .
4 Global Monthly_Visitors . . . . 550 . . . . .
5 India Monthly_Visitors . . . . . 350 . . . .
6 UK Monthly_Visitors . . . . . . 160 . . .
7 Global Monthly_Visitors . . . . . . . 800 . .
8 India Monthly_Visitors . . . . . . . . 180 .
9 Global Monthly_Visitors . . . . . . . . . 400


Step 10: Export the Final Dataset

proc export data=websites_classified

    outfile="/folders/myfolders/websites.csv"

    dbms=csv

    replace;

run;

Log:

NOTE: "/folders/myfolders/websites.csv" file was successfully created.

NOTE: PROCEDURE EXPORT used (Total process time):

      real time           1.51 seconds

      cpu time            0.23 seconds


Step 11: Creating a Macro for Age Bracket Summary

%macro age_summary(minage, maxage);

    proc sql;

        select Name, Website_Age 

        from websites_enriched

        where Website_Age between &minage and &maxage;

    quit;

%mend age_summary;


%age_summary(10, 20);

Output:

Name Website_Age
Twitter 19
Flipkart 18
Etsy 20


%age_summary(20, 30);

Output:

Name Website_Age
Facebook 21
CNN 30
Alibaba 26
LinkedIn 22
Etsy 20

Step 12: Joining Two Tables Using SQL

data website_ratings;

    input Website_ID Rating Score;

    datalines;

1 5 4.8

2 4 4.3

3 3 3.9

4 4 4.2

5 3 4.0

6 5 4.7

7 4 4.1

8 4 4.4

9 3 3.8

10 4 4.6

;

run;

proc print;run;

Output:

Obs Website_ID Rating Score
1 1 5 4.8
2 2 4 4.3
3 3 3 3.9
4 4 4 4.2
5 5 3 4.0
6 6 5 4.7
7 7 4 4.1
8 8 4 4.4
9 9 3 3.8
10 10 4 4.6


proc sql;

    create table website_full as

    select a.*, b.Rating, b.Score

    from websites_info a

    inner join website_ratings b

    on a.Website_ID = b.Website_ID;

quit;

proc sort data=website_full out=ranked;

    by descending Score;

run;

proc print data=ranked;

    title "Websites Ranked by Score";

run;

Output:
Websites Ranked by Score

Obs Name Type Region Website_ID Monthly_Visitors Launch_Year Rating Score
1 Amazon E-commerce Global 1 2500 1994 5 4.8
2 Flipkart E-commerce India 6 350 2007 5 4.7
3 Etsy E-commerce Global 10 400 2005 4 4.6
4 LinkedIn Social_Media Global 8 800 2003 4 4.4
5 Facebook Social_Media Global 2 2900 2004 4 4.3
6 Alibaba E-commerce Asia 4 1100 1999 4 4.2
7 BBC News UK 7 160 1922 4 4.1
8 Twitter Social_Media Global 5 550 2006 3 4.0
9 CNN News USA 3 120 1995 3 3.9
10 TimesofIndia News India 9 180 1838 3 3.8

PRACTICE AND COMMENT YOUR CODE: 

--->PLEASE FOLLOW OUR BLOG FOR MORE UPDATES.

PLEASE FOLLOW OUR TELEGRAM CHANNEL CLICK HERE

PLEASE FOLLOW OUR FACEBOOK PAGE  CLICK HERE

PLEASE FOLLOW OUR INSTAGRAM PAGE CLICK HERE

No comments:

Post a Comment