191.COMPLETE SAS FASHION DATA ANALYSIS USING PROC PRINT | PROC CONTENTS | PROC FREQ | PROC MEANS | PROC SQL | PROC FORMAT | PROC UNIVARIATE | PROC SORT | PROC SGPLOT | MACROS
- Get link
- X
- Other Apps
COMPLETE SAS FASHION DATA ANALYSIS USING PROC PRINT | PROC CONTENTS | PROC FREQ | PROC MEANS | PROC SQL | PROC FORMAT | PROC UNIVARIATE | PROC SORT | PROC SGPLOT | MACROS
/*Analysis of Different Types of Dresses using SAS (PROC, SQL, Macros)*/
Step 1: Create the Dataset
Creating a dataset called dresses with 20 observations and the following variables:
DressID – Unique identifier
Type – Type of dress (e.g., Casual, Formal, Party, Ethnic)
Material – Fabric (e.g., Cotton, Silk, Polyester, Linen)
Size – Size category (S, M, L, XL)
Price – Cost in INR
Color – Color of the dress
Brand – Brand name
Availability – In stock (Yes/No)
Rating – Customer rating (1 to 5)
data dresses;
input DressID Type $ Material:$15. Size $ Price Color $ Brand:$15.
Availability $ Rating;
datalines;
101 Casual Cotton M 1200 Blue Zara Yes 4
102 Formal Silk L 2500 Black H&M Yes 5
103 Party Polyester S 1800 Red Forever21 No 3
104 Ethnic Cotton XL 2000 Orange Biba Yes 4
105 Casual Linen M 1300 White AllenSolly No 2
106 Formal Polyester L 2600 Grey VanHeusen Yes 5
107 Party Silk S 3000 Pink Zivame Yes 4
108 Ethnic Silk M 2200 Yellow FabIndia Yes 5
109 Casual Cotton L 1400 Green Max Yes 3
110 Formal Linen XL 2700 Maroon Arrow No 4
111 Party Cotton M 1900 Blue ONLY Yes 5
112 Ethnic Polyester L 2100 Beige W Yes 3
113 Casual Silk S 1600 Red Aurelia Yes 4
114 Formal Cotton M 2400 Navy VanHeusen Yes 5
115 Party Linen XL 2800 Golden GlobalDesi No 3
116 Ethnic Cotton M 2000 Pink Biba Yes 4
117 Casual Polyester L 1500 Brown Max Yes 3
118 Formal Silk S 2550 Black AllenSolly Yes 5
119 Party Cotton M 1750 Green ONLY Yes 4
120 Ethnic Linen L 2300 Yellow FabIndia Yes 5
;
run;
proc print;run;
Output:
Obs | DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
---|---|---|---|---|---|---|---|---|---|
1 | 101 | Casual | Cotton | M | 1200 | Blue | Zara | Yes | 4 |
2 | 102 | Formal | Silk | L | 2500 | Black | H&M | Yes | 5 |
3 | 103 | Party | Polyester | S | 1800 | Red | Forever21 | No | 3 |
4 | 104 | Ethnic | Cotton | XL | 2000 | Orange | Biba | Yes | 4 |
5 | 105 | Casual | Linen | M | 1300 | White | AllenSolly | No | 2 |
6 | 106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 |
7 | 107 | Party | Silk | S | 3000 | Pink | Zivame | Yes | 4 |
8 | 108 | Ethnic | Silk | M | 2200 | Yellow | FabIndia | Yes | 5 |
9 | 109 | Casual | Cotton | L | 1400 | Green | Max | Yes | 3 |
10 | 110 | Formal | Linen | XL | 2700 | Maroon | Arrow | No | 4 |
11 | 111 | Party | Cotton | M | 1900 | Blue | ONLY | Yes | 5 |
12 | 112 | Ethnic | Polyester | L | 2100 | Beige | W | Yes | 3 |
13 | 113 | Casual | Silk | S | 1600 | Red | Aurelia | Yes | 4 |
14 | 114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 |
15 | 115 | Party | Linen | XL | 2800 | Golden | GlobalDesi | No | 3 |
16 | 116 | Ethnic | Cotton | M | 2000 | Pink | Biba | Yes | 4 |
17 | 117 | Casual | Polyester | L | 1500 | Brown | Max | Yes | 3 |
18 | 118 | Formal | Silk | S | 2550 | Black | AllenSolly | Yes | 5 |
19 | 119 | Party | Cotton | M | 1750 | Green | ONLY | Yes | 4 |
20 | 120 | Ethnic | Linen | L | 2300 | Yellow | FabIndia | Yes | 5 |
Step 2: Explore the Dataset using PROC PRINT and PROC CONTENTS
/*PROC PRINT is used to display the dataset.*/
/*PROC CONTENTS is used to show metadata (variable names, types, lengths).*/
proc print data=dresses;
title "Complete Dress Dataset";
run;
Output:
Complete Dress Dataset |
Obs | DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
---|---|---|---|---|---|---|---|---|---|
1 | 101 | Casual | Cotton | M | 1200 | Blue | Zara | Yes | 4 |
2 | 102 | Formal | Silk | L | 2500 | Black | H&M | Yes | 5 |
3 | 103 | Party | Polyester | S | 1800 | Red | Forever21 | No | 3 |
4 | 104 | Ethnic | Cotton | XL | 2000 | Orange | Biba | Yes | 4 |
5 | 105 | Casual | Linen | M | 1300 | White | AllenSolly | No | 2 |
6 | 106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 |
7 | 107 | Party | Silk | S | 3000 | Pink | Zivame | Yes | 4 |
8 | 108 | Ethnic | Silk | M | 2200 | Yellow | FabIndia | Yes | 5 |
9 | 109 | Casual | Cotton | L | 1400 | Green | Max | Yes | 3 |
10 | 110 | Formal | Linen | XL | 2700 | Maroon | Arrow | No | 4 |
11 | 111 | Party | Cotton | M | 1900 | Blue | ONLY | Yes | 5 |
12 | 112 | Ethnic | Polyester | L | 2100 | Beige | W | Yes | 3 |
13 | 113 | Casual | Silk | S | 1600 | Red | Aurelia | Yes | 4 |
14 | 114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 |
15 | 115 | Party | Linen | XL | 2800 | Golden | GlobalDesi | No | 3 |
16 | 116 | Ethnic | Cotton | M | 2000 | Pink | Biba | Yes | 4 |
17 | 117 | Casual | Polyester | L | 1500 | Brown | Max | Yes | 3 |
18 | 118 | Formal | Silk | S | 2550 | Black | AllenSolly | Yes | 5 |
19 | 119 | Party | Cotton | M | 1750 | Green | ONLY | Yes | 4 |
20 | 120 | Ethnic | Linen | L | 2300 | Yellow | FabIndia | Yes | 5 |
proc contents data=dresses;
title "Dataset Metadata: DRESSES";
run;
Output:
Dataset Metadata:
DRESSES |
Data Set Name | WORK.DRESSES | Observations | 20 |
---|---|---|---|
Member Type | DATA | Variables | 9 |
Engine | V9 | Indexes | 0 |
Created | 14/09/2015 00:16:55 | Observation Length | 88 |
Last Modified | 14/09/2015 00:16:55 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | WINDOWS_64 | ||
Encoding | wlatin1 Western (Windows) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 743 |
Obs in First Data Page | 20 |
Number of Data Set Repairs | 0 |
ExtendObsCounter | YES |
Filename | C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD11200_DESKTOP-QFAA4KV_\dresses.sas7bdat |
Release Created | 9.0401M2 |
Host Created | X64_8HOME |
Alphabetic List of Variables and Attributes | |||
---|---|---|---|
# | Variable | Type | Len |
8 | Availability | Char | 8 |
7 | Brand | Char | 15 |
6 | Color | Char | 8 |
1 | DressID | Num | 8 |
3 | Material | Char | 15 |
5 | Price | Num | 8 |
9 | Rating | Num | 8 |
4 | Size | Char | 8 |
2 | Type | Char | 8 |
Step 3: Frequency Analysis using PROC FREQ
proc freq data=dresses;
tables Type Material Size Brand / nocum nopercent;
title "Frequency Distribution of Type, Material, Size, and Brand";
run;
Output:
Frequency Distribution of Type, Material, Size, and Brand |
Type | Frequency |
---|---|
Casual | 5 |
Ethnic | 5 |
Formal | 5 |
Party | 5 |
Material | Frequency |
---|---|
Cotton | 7 |
Linen | 4 |
Polyester | 4 |
Silk | 5 |
Size | Frequency |
---|---|
L | 6 |
M | 7 |
S | 4 |
XL | 3 |
Brand | Frequency |
---|---|
AllenSolly | 2 |
Arrow | 1 |
Aurelia | 1 |
Biba | 2 |
FabIndia | 2 |
Forever21 | 1 |
GlobalDesi | 1 |
H&M | 1 |
Max | 2 |
ONLY | 2 |
VanHeusen | 2 |
W | 1 |
Zara | 1 |
Zivame | 1 |
Step 4: Summary Statistics using PROC MEANS
proc means data=dresses mean maxdec=2;
class Type;
var Price Rating;
title "Mean Price and Rating by Dress Type";
run;
Output:
Mean Price and Rating by Dress Type |
Type | N Obs | Variable | Mean | ||||
---|---|---|---|---|---|---|---|
Casual | 5 |
|
| ||||
Ethnic | 5 |
|
| ||||
Formal | 5 |
|
| ||||
Party | 5 |
|
|
Step 5: Using PROC SQL for Grouping and Filtering
Let’s use PROC SQL to identify:
1.Top 3 highest-rated brand
2.Count of dresses by size
3.Average price of dresses that are "In stock"
proc sql outobs=3;
title "Top Rated Brands";
select Brand, avg(Rating) as AvgRating format=4.2
from dresses
group by Brand
having AvgRating >= 4
order by AvgRating desc;
quit;
Output:
Top Rated Brands |
Brand | AvgRating |
---|---|
H&M | 5.00 |
VanHeusen | 5.00 |
FabIndia | 5.00 |
proc sql;
title "Dress Count by Size";
select Size, count(*) as Total
from dresses
group by Size;
quit;
Output:
Dress Count by Size |
Size | Total |
---|---|
L | 6 |
M | 7 |
S | 4 |
XL | 3 |
proc sql;
title "Average Price of Available Dresses";
select avg(Price) as AvgPrice_InStock format=8.2
from dresses
where Availability = 'Yes';
quit;
Output:
Average Price of Available
Dresses |
AvgPrice_InStock |
---|
2062.50 |
Step 6: Using PROC SORT and PROC UNIVARIATE
proc sort data=dresses out=sorted_dresses;
by descending Price;
run;
proc print data=sorted_dresses(obs=10);
title "Top 10 Most Expensive Dresses";
run;
Output:
Top 10 Most Expensive Dresses |
Obs | DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
---|---|---|---|---|---|---|---|---|---|
1 | 107 | Party | Silk | S | 3000 | Pink | Zivame | Yes | 4 |
2 | 115 | Party | Linen | XL | 2800 | Golden | GlobalDesi | No | 3 |
3 | 110 | Formal | Linen | XL | 2700 | Maroon | Arrow | No | 4 |
4 | 106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 |
5 | 118 | Formal | Silk | S | 2550 | Black | AllenSolly | Yes | 5 |
6 | 102 | Formal | Silk | L | 2500 | Black | H&M | Yes | 5 |
7 | 114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 |
8 | 120 | Ethnic | Linen | L | 2300 | Yellow | FabIndia | Yes | 5 |
9 | 108 | Ethnic | Silk | M | 2200 | Yellow | FabIndia | Yes | 5 |
10 | 112 | Ethnic | Polyester | L | 2100 | Beige | W | Yes | 3 |
proc univariate data=dresses;
var Price;
histogram Price;
title "Price Distribution of Dresses";
run;
Output:
Price Distribution of Dresses |
Moments | |||
---|---|---|---|
N | 20 | Sum Weights | 20 |
Mean | 2080 | Sum Observations | 41600 |
Std Deviation | 524.504779 | Variance | 275105.263 |
Skewness | -0.0352472 | Kurtosis | -0.9926 |
Uncorrected SS | 91755000 | Corrected SS | 5227000 |
Coeff Variation | 25.2165759 | Std Error Mean | 117.282834 |
Basic Statistical Measures | |||
---|---|---|---|
Location | Variability | ||
Mean | 2080.000 | Std Deviation | 524.50478 |
Median | 2050.000 | Variance | 275105 |
Mode | 2000.000 | Range | 1800 |
Interquartile Range | 850.00000 |
Tests for Location: Mu0=0 | ||||
---|---|---|---|---|
Test | Statistic | p Value | ||
Student's t | t | 17.73491 | Pr > |t| | <.0001 |
Sign | M | 10 | Pr >= |M| | <.0001 |
Signed Rank | S | 105 | Pr >= |S| | <.0001 |
Quantiles (Definition 5) | |
---|---|
Level | Quantile |
100% Max | 3000 |
99% | 3000 |
95% | 2900 |
90% | 2750 |
75% Q3 | 2525 |
50% Median | 2050 |
25% Q1 | 1675 |
10% | 1350 |
5% | 1250 |
1% | 1200 |
0% Min | 1200 |
Extreme Observations | |||
---|---|---|---|
Lowest | Highest | ||
Value | Obs | Value | Obs |
1200 | 1 | 2550 | 18 |
1300 | 5 | 2600 | 6 |
1400 | 9 | 2700 | 10 |
1500 | 17 | 2800 | 15 |
1600 | 13 | 3000 | 7 |
Step 7: Conditional Formatting using PROC FORMAT
proc format;
value pricegrp
low -< 1500 = 'Low'
1500 -< 2500 = 'Medium'
2500 - high = 'High';
run;
proc freq data=dresses;
tables Price;
format Price pricegrp.;
title "Price Group Distribution";
run;
Output:
Price Group Distribution |
Price | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Low | 3 | 15.00 | 3 | 15.00 |
Medium | 11 | 55.00 | 14 | 70.00 |
High | 6 | 30.00 | 20 | 100.00 |
Step 8: Creating a Macro to Generate Summary Reports
%macro summary_report(groupvar);
proc sql;
title "Average Price and Rating by &groupvar";
select &groupvar,
count(*) as Total,
avg(Price) as AvgPrice format=8.2,
avg(Rating) as AvgRating format=4.2
from dresses
group by &groupvar;
quit;
%mend;
%summary_report(Type);
Output:
Average Price and Rating by Type |
Type | Total | AvgPrice | AvgRating |
---|---|---|---|
Casual | 5 | 1400.00 | 3.20 |
Ethnic | 5 | 2120.00 | 4.20 |
Formal | 5 | 2550.00 | 4.80 |
Party | 5 | 2250.00 | 3.80 |
%summary_report(Brand);
Output:
Average Price and Rating by Brand |
Brand | Total | AvgPrice | AvgRating |
---|---|---|---|
AllenSolly | 2 | 1925.00 | 3.50 |
Arrow | 1 | 2700.00 | 4.00 |
Aurelia | 1 | 1600.00 | 4.00 |
Biba | 2 | 2000.00 | 4.00 |
FabIndia | 2 | 2250.00 | 5.00 |
Forever21 | 1 | 1800.00 | 3.00 |
GlobalDesi | 1 | 2800.00 | 3.00 |
H&M | 1 | 2500.00 | 5.00 |
Max | 2 | 1450.00 | 3.00 |
ONLY | 2 | 1825.00 | 4.50 |
VanHeusen | 2 | 2500.00 | 5.00 |
W | 1 | 2100.00 | 3.00 |
Zara | 1 | 1200.00 | 4.00 |
Zivame | 1 | 3000.00 | 4.00 |
%summary_report(Size);
Output:
Average Price and Rating by Size |
Size | Total | AvgPrice | AvgRating |
---|---|---|---|
L | 6 | 2066.67 | 4.00 |
M | 7 | 1821.43 | 4.14 |
S | 4 | 2237.50 | 4.00 |
XL | 3 | 2500.00 | 3.67 |
Step 9: Filtering Dresses using Macros and SQL
%macro filter_dress(brand=, minprice=, maxprice=);
proc sql;
title "Filtered Dresses for &brand Between &minprice and &maxprice";
select * from dresses
where Brand = "&brand"
and Price between &minprice and &maxprice;
quit;
%mend;
%filter_dress(brand=Biba, minprice=1800, maxprice=2200);
Output:
Filtered Dresses for Biba Between 1800 and 2200 |
DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
---|---|---|---|---|---|---|---|---|
104 | Ethnic | Cotton | XL | 2000 | Orange | Biba | Yes | 4 |
116 | Ethnic | Cotton | M | 2000 | Pink | Biba | Yes | 4 |
%filter_dress(brand=VanHeusen, minprice=2000, maxprice=2700);
Output:
Filtered Dresses for VanHeusen Between 2000 and 2700 |
DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
---|---|---|---|---|---|---|---|---|
106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 |
114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 |
Step 10: Creating New Variables (Derived Columns)
Add two new columns:
1.DiscountedPrice – 10% discount if rating = 5
2.IsPopular – If Rating ≥ 4 then 'Yes' else 'No'
data dresses_updated;
set dresses;
if Rating = 5 then DiscountedPrice = Price * 0.99;
else DiscountedPrice = Price;
if Rating >= 4 then IsPopular = 'Yes';
else IsPopular = 'No';
run;
proc print data=dresses_updated;
title "Dresses with Derived Columns";
run;
Dresses with Derived Columns |
Obs | DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating | DiscountedPrice | IsPopular |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 101 | Casual | Cotton | M | 1200 | Blue | Zara | Yes | 4 | 1200.0 | Yes |
2 | 102 | Formal | Silk | L | 2500 | Black | H&M | Yes | 5 | 2475.0 | Yes |
3 | 103 | Party | Polyester | S | 1800 | Red | Forever21 | No | 3 | 1800.0 | No |
4 | 104 | Ethnic | Cotton | XL | 2000 | Orange | Biba | Yes | 4 | 2000.0 | Yes |
5 | 105 | Casual | Linen | M | 1300 | White | AllenSolly | No | 2 | 1300.0 | No |
6 | 106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 | 2574.0 | Yes |
7 | 107 | Party | Silk | S | 3000 | Pink | Zivame | Yes | 4 | 3000.0 | Yes |
8 | 108 | Ethnic | Silk | M | 2200 | Yellow | FabIndia | Yes | 5 | 2178.0 | Yes |
9 | 109 | Casual | Cotton | L | 1400 | Green | Max | Yes | 3 | 1400.0 | No |
10 | 110 | Formal | Linen | XL | 2700 | Maroon | Arrow | No | 4 | 2700.0 | Yes |
11 | 111 | Party | Cotton | M | 1900 | Blue | ONLY | Yes | 5 | 1881.0 | Yes |
12 | 112 | Ethnic | Polyester | L | 2100 | Beige | W | Yes | 3 | 2100.0 | No |
13 | 113 | Casual | Silk | S | 1600 | Red | Aurelia | Yes | 4 | 1600.0 | Yes |
14 | 114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 | 2376.0 | Yes |
15 | 115 | Party | Linen | XL | 2800 | Golden | GlobalDesi | No | 3 | 2800.0 | No |
16 | 116 | Ethnic | Cotton | M | 2000 | Pink | Biba | Yes | 4 | 2000.0 | Yes |
17 | 117 | Casual | Polyester | L | 1500 | Brown | Max | Yes | 3 | 1500.0 | No |
18 | 118 | Formal | Silk | S | 2550 | Black | AllenSolly | Yes | 5 | 2524.5 | Yes |
19 | 119 | Party | Cotton | M | 1750 | Green | ONLY | Yes | 4 | 1750.0 | Yes |
20 | 120 | Ethnic | Linen | L | 2300 | Yellow | FabIndia | Yes | 5 | 2277.0 | Yes |
- Get link
- X
- Other Apps
Comments
Post a Comment