191.COMPLETE SAS FASHION DATA ANALYSIS USING PROC PRINT | PROC CONTENTS | PROC FREQ | PROC MEANS | PROC SQL | PROC FORMAT | PROC UNIVARIATE | PROC SORT | PROC SGPLOT | MACROS

COMPLETE SAS FASHION DATA ANALYSIS USING PROC PRINT | PROC CONTENTS | PROC FREQ | PROC MEANS | PROC SQL | PROC FORMAT | PROC UNIVARIATE | PROC SORT | PROC SGPLOT | MACROS

/*Analysis of Different Types of Dresses using SAS (PROC, SQL, Macros)*/

Step 1: Create the Dataset

Creating a dataset called dresses with 20 observations and the following variables:

DressID – Unique identifier

Type – Type of dress (e.g., Casual, Formal, Party, Ethnic)

Material – Fabric (e.g., Cotton, Silk, Polyester, Linen)

Size – Size category (S, M, L, XL)

Price – Cost in INR

Color – Color of the dress

Brand – Brand name

Availability – In stock (Yes/No)

Rating – Customer rating (1 to 5)


data dresses;

    input DressID Type $ Material:$15. Size $ Price Color $ Brand:$15.

          Availability $ Rating;

    datalines;

101 Casual Cotton M 1200 Blue Zara Yes 4

102 Formal Silk L 2500 Black H&M Yes 5

103 Party Polyester S 1800 Red Forever21 No 3

104 Ethnic Cotton XL 2000 Orange Biba Yes 4

105 Casual Linen M 1300 White AllenSolly No 2

106 Formal Polyester L 2600 Grey VanHeusen Yes 5

107 Party Silk S 3000 Pink Zivame Yes 4

108 Ethnic Silk M 2200 Yellow FabIndia Yes 5

109 Casual Cotton L 1400 Green Max Yes 3

110 Formal Linen XL 2700 Maroon Arrow No 4

111 Party Cotton M 1900 Blue ONLY Yes 5

112 Ethnic Polyester L 2100 Beige W Yes 3

113 Casual Silk S 1600 Red Aurelia Yes 4

114 Formal Cotton M 2400 Navy VanHeusen Yes 5

115 Party Linen XL 2800 Golden GlobalDesi No 3

116 Ethnic Cotton M 2000 Pink Biba Yes 4

117 Casual Polyester L 1500 Brown Max Yes 3

118 Formal Silk S 2550 Black AllenSolly Yes 5

119 Party Cotton M 1750 Green ONLY Yes 4

120 Ethnic Linen L 2300 Yellow FabIndia Yes 5

;

run;

proc print;run;

Output:

Obs DressID Type Material Size Price Color Brand Availability Rating
1 101 Casual Cotton M 1200 Blue Zara Yes 4
2 102 Formal Silk L 2500 Black H&M Yes 5
3 103 Party Polyester S 1800 Red Forever21 No 3
4 104 Ethnic Cotton XL 2000 Orange Biba Yes 4
5 105 Casual Linen M 1300 White AllenSolly No 2
6 106 Formal Polyester L 2600 Grey VanHeusen Yes 5
7 107 Party Silk S 3000 Pink Zivame Yes 4
8 108 Ethnic Silk M 2200 Yellow FabIndia Yes 5
9 109 Casual Cotton L 1400 Green Max Yes 3
10 110 Formal Linen XL 2700 Maroon Arrow No 4
11 111 Party Cotton M 1900 Blue ONLY Yes 5
12 112 Ethnic Polyester L 2100 Beige W Yes 3
13 113 Casual Silk S 1600 Red Aurelia Yes 4
14 114 Formal Cotton M 2400 Navy VanHeusen Yes 5
15 115 Party Linen XL 2800 Golden GlobalDesi No 3
16 116 Ethnic Cotton M 2000 Pink Biba Yes 4
17 117 Casual Polyester L 1500 Brown Max Yes 3
18 118 Formal Silk S 2550 Black AllenSolly Yes 5
19 119 Party Cotton M 1750 Green ONLY Yes 4
20 120 Ethnic Linen L 2300 Yellow FabIndia Yes 5


Step 2: Explore the Dataset using PROC PRINT and PROC CONTENTS

/*PROC PRINT is used to display the dataset.*/

/*PROC CONTENTS is used to show metadata (variable names, types, lengths).*/

proc print data=dresses;

    title "Complete Dress Dataset";

run;

Output:

Complete Dress Dataset

Obs DressID Type Material Size Price Color Brand Availability Rating
1 101 Casual Cotton M 1200 Blue Zara Yes 4
2 102 Formal Silk L 2500 Black H&M Yes 5
3 103 Party Polyester S 1800 Red Forever21 No 3
4 104 Ethnic Cotton XL 2000 Orange Biba Yes 4
5 105 Casual Linen M 1300 White AllenSolly No 2
6 106 Formal Polyester L 2600 Grey VanHeusen Yes 5
7 107 Party Silk S 3000 Pink Zivame Yes 4
8 108 Ethnic Silk M 2200 Yellow FabIndia Yes 5
9 109 Casual Cotton L 1400 Green Max Yes 3
10 110 Formal Linen XL 2700 Maroon Arrow No 4
11 111 Party Cotton M 1900 Blue ONLY Yes 5
12 112 Ethnic Polyester L 2100 Beige W Yes 3
13 113 Casual Silk S 1600 Red Aurelia Yes 4
14 114 Formal Cotton M 2400 Navy VanHeusen Yes 5
15 115 Party Linen XL 2800 Golden GlobalDesi No 3
16 116 Ethnic Cotton M 2000 Pink Biba Yes 4
17 117 Casual Polyester L 1500 Brown Max Yes 3
18 118 Formal Silk S 2550 Black AllenSolly Yes 5
19 119 Party Cotton M 1750 Green ONLY Yes 4
20 120 Ethnic Linen L 2300 Yellow FabIndia Yes 5


proc contents data=dresses;

    title "Dataset Metadata: DRESSES";

run;

Output:

Dataset Metadata: DRESSES

The CONTENTS Procedure

Data Set Name WORK.DRESSES Observations 20
Member Type DATA Variables 9
Engine V9 Indexes 0
Created 14/09/2015 00:16:55 Observation Length 88
Last Modified 14/09/2015 00:16:55 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 743
Obs in First Data Page 20
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD11200_DESKTOP-QFAA4KV_\dresses.sas7bdat
Release Created 9.0401M2
Host Created X64_8HOME


Alphabetic List of Variables and Attributes
# Variable Type Len
8 Availability Char 8
7 Brand Char 15
6 Color Char 8
1 DressID Num 8
3 Material Char 15
5 Price Num 8
9 Rating Num 8
4 Size Char 8
2 Type Char 8


Step 3: Frequency Analysis using PROC FREQ

proc freq data=dresses;

    tables Type Material Size Brand / nocum nopercent;

    title "Frequency Distribution of Type, Material, Size, and Brand";

run;

Output:

Frequency Distribution of Type, Material, Size, and Brand

The FREQ Procedure

Type Frequency
Casual 5
Ethnic 5
Formal 5
Party 5


Material Frequency
Cotton 7
Linen 4
Polyester 4
Silk 5


Size Frequency
L 6
M 7
S 4
XL 3


Brand Frequency
AllenSolly 2
Arrow 1
Aurelia 1
Biba 2
FabIndia 2
Forever21 1
GlobalDesi 1
H&M 1
Max 2
ONLY 2
VanHeusen 2
W 1
Zara 1
Zivame 1


Step 4: Summary Statistics using PROC MEANS

proc means data=dresses mean maxdec=2;

    class Type;

    var Price Rating;

    title "Mean Price and Rating by Dress Type";

run;

Output:

Mean Price and Rating by Dress Type

The MEANS Procedure

Type N Obs Variable Mean
Casual 5
Price
Rating
1400.00
3.20
Ethnic 5
Price
Rating
2120.00
4.20
Formal 5
Price
Rating
2550.00
4.80
Party 5
Price
Rating
2250.00
3.80


Step 5: Using PROC SQL for Grouping and Filtering

Let’s use PROC SQL to identify:

1.Top 3 highest-rated brand

2.Count of dresses by size

3.Average price of dresses that are "In stock"


proc sql outobs=3;

    title "Top Rated Brands";

    select Brand, avg(Rating) as AvgRating format=4.2

    from dresses

    group by Brand

    having AvgRating >= 4

    order by AvgRating desc;

quit;

Output:

Top Rated Brands

Brand AvgRating
H&M 5.00
VanHeusen 5.00
FabIndia 5.00

proc sql;

    title "Dress Count by Size";

    select Size, count(*) as Total

    from dresses

    group by Size;

quit;

Output:

Dress Count by Size

Size Total
L 6
M 7
S 4
XL 3

proc sql;

    title "Average Price of Available Dresses";

    select avg(Price) as AvgPrice_InStock format=8.2

    from dresses

    where Availability = 'Yes';

quit;

Output:

Average Price of Available Dresses

AvgPrice_InStock
2062.50


Step 6: Using PROC SORT and PROC UNIVARIATE

proc sort data=dresses out=sorted_dresses;

    by descending Price;

run;

proc print data=sorted_dresses(obs=10);

    title "Top 10 Most Expensive Dresses";

run;

Output:

Top 10 Most Expensive Dresses

Obs DressID Type Material Size Price Color Brand Availability Rating
1 107 Party Silk S 3000 Pink Zivame Yes 4
2 115 Party Linen XL 2800 Golden GlobalDesi No 3
3 110 Formal Linen XL 2700 Maroon Arrow No 4
4 106 Formal Polyester L 2600 Grey VanHeusen Yes 5
5 118 Formal Silk S 2550 Black AllenSolly Yes 5
6 102 Formal Silk L 2500 Black H&M Yes 5
7 114 Formal Cotton M 2400 Navy VanHeusen Yes 5
8 120 Ethnic Linen L 2300 Yellow FabIndia Yes 5
9 108 Ethnic Silk M 2200 Yellow FabIndia Yes 5
10 112 Ethnic Polyester L 2100 Beige W Yes 3


proc univariate data=dresses;

    var Price;

    histogram Price;

    title "Price Distribution of Dresses";

run;

Output:

Price Distribution of Dresses

The UNIVARIATE Procedure
Variable: Price

Moments
N 20 Sum Weights 20
Mean 2080 Sum Observations 41600
Std Deviation 524.504779 Variance 275105.263
Skewness -0.0352472 Kurtosis -0.9926
Uncorrected SS 91755000 Corrected SS 5227000
Coeff Variation 25.2165759 Std Error Mean 117.282834


Basic Statistical Measures
Location Variability
Mean 2080.000 Std Deviation 524.50478
Median 2050.000 Variance 275105
Mode 2000.000 Range 1800
    Interquartile Range 850.00000


Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 17.73491 Pr > |t| <.0001
Sign M 10 Pr >= |M| <.0001
Signed Rank S 105 Pr >= |S| <.0001


Quantiles (Definition 5)
Level Quantile
100% Max 3000
99% 3000
95% 2900
90% 2750
75% Q3 2525
50% Median 2050
25% Q1 1675
10% 1350
5% 1250
1% 1200
0% Min 1200


Extreme Observations
Lowest Highest
Value Obs Value Obs
1200 1 2550 18
1300 5 2600 6
1400 9 2700 10
1500 17 2800 15
1600 13 3000 7


Step 7: Conditional Formatting using PROC FORMAT

 proc format;

    value pricegrp

        low -< 1500 = 'Low'

        1500 -< 2500 = 'Medium'

        2500 - high = 'High';

run;


proc freq data=dresses;

    tables Price;

    format Price pricegrp.;

    title "Price Group Distribution";

run;

Output:

Price Group Distribution

The FREQ Procedure

Price Frequency Percent Cumulative
Frequency
Cumulative
Percent
Low 3 15.00 3 15.00
Medium 11 55.00 14 70.00
High 6 30.00 20 100.00


Step 8: Creating a Macro to Generate Summary Reports

%macro summary_report(groupvar);

    proc sql;

        title "Average Price and Rating by &groupvar";

        select &groupvar,

               count(*) as Total,

               avg(Price) as AvgPrice format=8.2,

               avg(Rating) as AvgRating format=4.2

        from dresses

        group by &groupvar;

    quit;

%mend;


%summary_report(Type);

Output:

Average Price and Rating by Type

Type Total AvgPrice AvgRating
Casual 5 1400.00 3.20
Ethnic 5 2120.00 4.20
Formal 5 2550.00 4.80
Party 5 2250.00 3.80

%summary_report(Brand);

Output:

Average Price and Rating by Brand

Brand Total AvgPrice AvgRating
AllenSolly 2 1925.00 3.50
Arrow 1 2700.00 4.00
Aurelia 1 1600.00 4.00
Biba 2 2000.00 4.00
FabIndia 2 2250.00 5.00
Forever21 1 1800.00 3.00
GlobalDesi 1 2800.00 3.00
H&M 1 2500.00 5.00
Max 2 1450.00 3.00
ONLY 2 1825.00 4.50
VanHeusen 2 2500.00 5.00
W 1 2100.00 3.00
Zara 1 1200.00 4.00
Zivame 1 3000.00 4.00


%summary_report(Size);

Output:

Average Price and Rating by Size

Size Total AvgPrice AvgRating
L 6 2066.67 4.00
M 7 1821.43 4.14
S 4 2237.50 4.00
XL 3 2500.00 3.67


Step 9: Filtering Dresses using Macros and SQL

%macro filter_dress(brand=, minprice=, maxprice=);

    proc sql;

        title "Filtered Dresses for &brand Between &minprice and &maxprice";

        select * from dresses

        where Brand = "&brand"

          and Price between &minprice and &maxprice;

    quit;

%mend;


%filter_dress(brand=Biba, minprice=1800, maxprice=2200);

Output:

Filtered Dresses for Biba Between 1800 and 2200

DressID Type Material Size Price Color Brand Availability Rating
104 Ethnic Cotton XL 2000 Orange Biba Yes 4
116 Ethnic Cotton M 2000 Pink Biba Yes 4

%filter_dress(brand=VanHeusen, minprice=2000, maxprice=2700);

Output:

Filtered Dresses for VanHeusen Between 2000 and 2700

DressID Type Material Size Price Color Brand Availability Rating
106 Formal Polyester L 2600 Grey VanHeusen Yes 5
114 Formal Cotton M 2400 Navy VanHeusen Yes 5


Step 10: Creating New Variables (Derived Columns)

Add two new columns:

1.DiscountedPrice – 10% discount if rating = 5

2.IsPopular – If Rating ≥ 4 then 'Yes' else 'No'


data dresses_updated;

    set dresses;

    if Rating = 5 then DiscountedPrice = Price * 0.99;

    else DiscountedPrice = Price;


    if Rating >= 4 then IsPopular = 'Yes';

    else IsPopular = 'No';

run;


proc print data=dresses_updated;

    title "Dresses with Derived Columns";

run;

Output:
Dresses with Derived Columns

Obs DressID Type Material Size Price Color Brand Availability Rating DiscountedPrice IsPopular
1 101 Casual Cotton M 1200 Blue Zara Yes 4 1200.0 Yes
2 102 Formal Silk L 2500 Black H&M Yes 5 2475.0 Yes
3 103 Party Polyester S 1800 Red Forever21 No 3 1800.0 No
4 104 Ethnic Cotton XL 2000 Orange Biba Yes 4 2000.0 Yes
5 105 Casual Linen M 1300 White AllenSolly No 2 1300.0 No
6 106 Formal Polyester L 2600 Grey VanHeusen Yes 5 2574.0 Yes
7 107 Party Silk S 3000 Pink Zivame Yes 4 3000.0 Yes
8 108 Ethnic Silk M 2200 Yellow FabIndia Yes 5 2178.0 Yes
9 109 Casual Cotton L 1400 Green Max Yes 3 1400.0 No
10 110 Formal Linen XL 2700 Maroon Arrow No 4 2700.0 Yes
11 111 Party Cotton M 1900 Blue ONLY Yes 5 1881.0 Yes
12 112 Ethnic Polyester L 2100 Beige W Yes 3 2100.0 No
13 113 Casual Silk S 1600 Red Aurelia Yes 4 1600.0 Yes
14 114 Formal Cotton M 2400 Navy VanHeusen Yes 5 2376.0 Yes
15 115 Party Linen XL 2800 Golden GlobalDesi No 3 2800.0 No
16 116 Ethnic Cotton M 2000 Pink Biba Yes 4 2000.0 Yes
17 117 Casual Polyester L 1500 Brown Max Yes 3 1500.0 No
18 118 Formal Silk S 2550 Black AllenSolly Yes 5 2524.5 Yes
19 119 Party Cotton M 1750 Green ONLY Yes 4 1750.0 Yes
20 120 Ethnic Linen L 2300 Yellow FabIndia Yes 5 2277.0 Yes


PRACTICE AND COMMENT YOUR OUTPUT: 

--->PLEASE FOLLOW OUR BLOG FOR MORE UPDATES.

PLEASE FOLLOW OUR TELEGRAM CHANNEL CLICK HERE

PLEASE FOLLOW OUR FACEBOOK PAGE  CLICK HERE

PLEASE FOLLOW OUR INSTAGRAM PAGE CLICK HERE

Comments