COMPLETE SAS FASHION DATA ANALYSIS USING PROC PRINT | PROC CONTENTS | PROC FREQ | PROC MEANS | PROC SQL | PROC FORMAT | PROC UNIVARIATE | PROC SORT | PROC SGPLOT | MACROS
/*Analysis of Different Types of Dresses using SAS (PROC, SQL, Macros)*/
Step 1: Create the Dataset
Creating a dataset called dresses with 20 observations and the following variables:
DressID – Unique identifier
Type – Type of dress (e.g., Casual, Formal, Party, Ethnic)
Material – Fabric (e.g., Cotton, Silk, Polyester, Linen)
Size – Size category (S, M, L, XL)
Price – Cost in INR
Color – Color of the dress
Brand – Brand name
Availability – In stock (Yes/No)
Rating – Customer rating (1 to 5)
data dresses;
input DressID Type $ Material:$15. Size $ Price Color $ Brand:$15.
Availability $ Rating;
datalines;
101 Casual Cotton M 1200 Blue Zara Yes 4
102 Formal Silk L 2500 Black H&M Yes 5
103 Party Polyester S 1800 Red Forever21 No 3
104 Ethnic Cotton XL 2000 Orange Biba Yes 4
105 Casual Linen M 1300 White AllenSolly No 2
106 Formal Polyester L 2600 Grey VanHeusen Yes 5
107 Party Silk S 3000 Pink Zivame Yes 4
108 Ethnic Silk M 2200 Yellow FabIndia Yes 5
109 Casual Cotton L 1400 Green Max Yes 3
110 Formal Linen XL 2700 Maroon Arrow No 4
111 Party Cotton M 1900 Blue ONLY Yes 5
112 Ethnic Polyester L 2100 Beige W Yes 3
113 Casual Silk S 1600 Red Aurelia Yes 4
114 Formal Cotton M 2400 Navy VanHeusen Yes 5
115 Party Linen XL 2800 Golden GlobalDesi No 3
116 Ethnic Cotton M 2000 Pink Biba Yes 4
117 Casual Polyester L 1500 Brown Max Yes 3
118 Formal Silk S 2550 Black AllenSolly Yes 5
119 Party Cotton M 1750 Green ONLY Yes 4
120 Ethnic Linen L 2300 Yellow FabIndia Yes 5
;
run;
proc print;run;
Output:
| Obs | DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Casual | Cotton | M | 1200 | Blue | Zara | Yes | 4 |
| 2 | 102 | Formal | Silk | L | 2500 | Black | H&M | Yes | 5 |
| 3 | 103 | Party | Polyester | S | 1800 | Red | Forever21 | No | 3 |
| 4 | 104 | Ethnic | Cotton | XL | 2000 | Orange | Biba | Yes | 4 |
| 5 | 105 | Casual | Linen | M | 1300 | White | AllenSolly | No | 2 |
| 6 | 106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 |
| 7 | 107 | Party | Silk | S | 3000 | Pink | Zivame | Yes | 4 |
| 8 | 108 | Ethnic | Silk | M | 2200 | Yellow | FabIndia | Yes | 5 |
| 9 | 109 | Casual | Cotton | L | 1400 | Green | Max | Yes | 3 |
| 10 | 110 | Formal | Linen | XL | 2700 | Maroon | Arrow | No | 4 |
| 11 | 111 | Party | Cotton | M | 1900 | Blue | ONLY | Yes | 5 |
| 12 | 112 | Ethnic | Polyester | L | 2100 | Beige | W | Yes | 3 |
| 13 | 113 | Casual | Silk | S | 1600 | Red | Aurelia | Yes | 4 |
| 14 | 114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 |
| 15 | 115 | Party | Linen | XL | 2800 | Golden | GlobalDesi | No | 3 |
| 16 | 116 | Ethnic | Cotton | M | 2000 | Pink | Biba | Yes | 4 |
| 17 | 117 | Casual | Polyester | L | 1500 | Brown | Max | Yes | 3 |
| 18 | 118 | Formal | Silk | S | 2550 | Black | AllenSolly | Yes | 5 |
| 19 | 119 | Party | Cotton | M | 1750 | Green | ONLY | Yes | 4 |
| 20 | 120 | Ethnic | Linen | L | 2300 | Yellow | FabIndia | Yes | 5 |
Step 2: Explore the Dataset using PROC PRINT and PROC CONTENTS
/*PROC PRINT is used to display the dataset.*/
/*PROC CONTENTS is used to show metadata (variable names, types, lengths).*/
proc print data=dresses;
title "Complete Dress Dataset";
run;
Output:
| Complete Dress Dataset |
| Obs | DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Casual | Cotton | M | 1200 | Blue | Zara | Yes | 4 |
| 2 | 102 | Formal | Silk | L | 2500 | Black | H&M | Yes | 5 |
| 3 | 103 | Party | Polyester | S | 1800 | Red | Forever21 | No | 3 |
| 4 | 104 | Ethnic | Cotton | XL | 2000 | Orange | Biba | Yes | 4 |
| 5 | 105 | Casual | Linen | M | 1300 | White | AllenSolly | No | 2 |
| 6 | 106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 |
| 7 | 107 | Party | Silk | S | 3000 | Pink | Zivame | Yes | 4 |
| 8 | 108 | Ethnic | Silk | M | 2200 | Yellow | FabIndia | Yes | 5 |
| 9 | 109 | Casual | Cotton | L | 1400 | Green | Max | Yes | 3 |
| 10 | 110 | Formal | Linen | XL | 2700 | Maroon | Arrow | No | 4 |
| 11 | 111 | Party | Cotton | M | 1900 | Blue | ONLY | Yes | 5 |
| 12 | 112 | Ethnic | Polyester | L | 2100 | Beige | W | Yes | 3 |
| 13 | 113 | Casual | Silk | S | 1600 | Red | Aurelia | Yes | 4 |
| 14 | 114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 |
| 15 | 115 | Party | Linen | XL | 2800 | Golden | GlobalDesi | No | 3 |
| 16 | 116 | Ethnic | Cotton | M | 2000 | Pink | Biba | Yes | 4 |
| 17 | 117 | Casual | Polyester | L | 1500 | Brown | Max | Yes | 3 |
| 18 | 118 | Formal | Silk | S | 2550 | Black | AllenSolly | Yes | 5 |
| 19 | 119 | Party | Cotton | M | 1750 | Green | ONLY | Yes | 4 |
| 20 | 120 | Ethnic | Linen | L | 2300 | Yellow | FabIndia | Yes | 5 |
proc contents data=dresses;
title "Dataset Metadata: DRESSES";
run;
Output:
| Dataset Metadata:
DRESSES |
| Data Set Name | WORK.DRESSES | Observations | 20 |
|---|---|---|---|
| Member Type | DATA | Variables | 9 |
| Engine | V9 | Indexes | 0 |
| Created | 14/09/2015 00:16:55 | Observation Length | 88 |
| Last Modified | 14/09/2015 00:16:55 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | WINDOWS_64 | ||
| Encoding | wlatin1 Western (Windows) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 65536 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 743 |
| Obs in First Data Page | 20 |
| Number of Data Set Repairs | 0 |
| ExtendObsCounter | YES |
| Filename | C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD11200_DESKTOP-QFAA4KV_\dresses.sas7bdat |
| Release Created | 9.0401M2 |
| Host Created | X64_8HOME |
| Alphabetic List of Variables and Attributes | |||
|---|---|---|---|
| # | Variable | Type | Len |
| 8 | Availability | Char | 8 |
| 7 | Brand | Char | 15 |
| 6 | Color | Char | 8 |
| 1 | DressID | Num | 8 |
| 3 | Material | Char | 15 |
| 5 | Price | Num | 8 |
| 9 | Rating | Num | 8 |
| 4 | Size | Char | 8 |
| 2 | Type | Char | 8 |
Step 3: Frequency Analysis using PROC FREQ
proc freq data=dresses;
tables Type Material Size Brand / nocum nopercent;
title "Frequency Distribution of Type, Material, Size, and Brand";
run;
Output:
| Frequency Distribution of Type, Material, Size, and Brand |
| Type | Frequency |
|---|---|
| Casual | 5 |
| Ethnic | 5 |
| Formal | 5 |
| Party | 5 |
| Material | Frequency |
|---|---|
| Cotton | 7 |
| Linen | 4 |
| Polyester | 4 |
| Silk | 5 |
| Size | Frequency |
|---|---|
| L | 6 |
| M | 7 |
| S | 4 |
| XL | 3 |
| Brand | Frequency |
|---|---|
| AllenSolly | 2 |
| Arrow | 1 |
| Aurelia | 1 |
| Biba | 2 |
| FabIndia | 2 |
| Forever21 | 1 |
| GlobalDesi | 1 |
| H&M | 1 |
| Max | 2 |
| ONLY | 2 |
| VanHeusen | 2 |
| W | 1 |
| Zara | 1 |
| Zivame | 1 |
Step 4: Summary Statistics using PROC MEANS
proc means data=dresses mean maxdec=2;
class Type;
var Price Rating;
title "Mean Price and Rating by Dress Type";
run;
Output:
| Mean Price and Rating by Dress Type |
| Type | N Obs | Variable | Mean | ||||
|---|---|---|---|---|---|---|---|
| Casual | 5 |
|
| ||||
| Ethnic | 5 |
|
| ||||
| Formal | 5 |
|
| ||||
| Party | 5 |
|
|
Step 5: Using PROC SQL for Grouping and Filtering
Let’s use PROC SQL to identify:
1.Top 3 highest-rated brand
2.Count of dresses by size
3.Average price of dresses that are "In stock"
proc sql outobs=3;
title "Top Rated Brands";
select Brand, avg(Rating) as AvgRating format=4.2
from dresses
group by Brand
having AvgRating >= 4
order by AvgRating desc;
quit;
Output:
| Top Rated Brands |
| Brand | AvgRating |
|---|---|
| H&M | 5.00 |
| VanHeusen | 5.00 |
| FabIndia | 5.00 |
proc sql;
title "Dress Count by Size";
select Size, count(*) as Total
from dresses
group by Size;
quit;
Output:
| Dress Count by Size |
| Size | Total |
|---|---|
| L | 6 |
| M | 7 |
| S | 4 |
| XL | 3 |
proc sql;
title "Average Price of Available Dresses";
select avg(Price) as AvgPrice_InStock format=8.2
from dresses
where Availability = 'Yes';
quit;
Output:
| Average Price of Available
Dresses |
| AvgPrice_InStock |
|---|
| 2062.50 |
Step 6: Using PROC SORT and PROC UNIVARIATE
proc sort data=dresses out=sorted_dresses;
by descending Price;
run;
proc print data=sorted_dresses(obs=10);
title "Top 10 Most Expensive Dresses";
run;
Output:
| Top 10 Most Expensive Dresses |
| Obs | DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 107 | Party | Silk | S | 3000 | Pink | Zivame | Yes | 4 |
| 2 | 115 | Party | Linen | XL | 2800 | Golden | GlobalDesi | No | 3 |
| 3 | 110 | Formal | Linen | XL | 2700 | Maroon | Arrow | No | 4 |
| 4 | 106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 |
| 5 | 118 | Formal | Silk | S | 2550 | Black | AllenSolly | Yes | 5 |
| 6 | 102 | Formal | Silk | L | 2500 | Black | H&M | Yes | 5 |
| 7 | 114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 |
| 8 | 120 | Ethnic | Linen | L | 2300 | Yellow | FabIndia | Yes | 5 |
| 9 | 108 | Ethnic | Silk | M | 2200 | Yellow | FabIndia | Yes | 5 |
| 10 | 112 | Ethnic | Polyester | L | 2100 | Beige | W | Yes | 3 |
proc univariate data=dresses;
var Price;
histogram Price;
title "Price Distribution of Dresses";
run;
Output:
| Price Distribution of Dresses |
| Moments | |||
|---|---|---|---|
| N | 20 | Sum Weights | 20 |
| Mean | 2080 | Sum Observations | 41600 |
| Std Deviation | 524.504779 | Variance | 275105.263 |
| Skewness | -0.0352472 | Kurtosis | -0.9926 |
| Uncorrected SS | 91755000 | Corrected SS | 5227000 |
| Coeff Variation | 25.2165759 | Std Error Mean | 117.282834 |
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 2080.000 | Std Deviation | 524.50478 |
| Median | 2050.000 | Variance | 275105 |
| Mode | 2000.000 | Range | 1800 |
| Interquartile Range | 850.00000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 17.73491 | Pr > |t| | <.0001 |
| Sign | M | 10 | Pr >= |M| | <.0001 |
| Signed Rank | S | 105 | Pr >= |S| | <.0001 |
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile |
| 100% Max | 3000 |
| 99% | 3000 |
| 95% | 2900 |
| 90% | 2750 |
| 75% Q3 | 2525 |
| 50% Median | 2050 |
| 25% Q1 | 1675 |
| 10% | 1350 |
| 5% | 1250 |
| 1% | 1200 |
| 0% Min | 1200 |
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs |
| 1200 | 1 | 2550 | 18 |
| 1300 | 5 | 2600 | 6 |
| 1400 | 9 | 2700 | 10 |
| 1500 | 17 | 2800 | 15 |
| 1600 | 13 | 3000 | 7 |
Step 7: Conditional Formatting using PROC FORMAT
proc format;
value pricegrp
low -< 1500 = 'Low'
1500 -< 2500 = 'Medium'
2500 - high = 'High';
run;
proc freq data=dresses;
tables Price;
format Price pricegrp.;
title "Price Group Distribution";
run;
Output:
| Price Group Distribution |
| Price | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
|---|---|---|---|---|
| Low | 3 | 15.00 | 3 | 15.00 |
| Medium | 11 | 55.00 | 14 | 70.00 |
| High | 6 | 30.00 | 20 | 100.00 |
Step 8: Creating a Macro to Generate Summary Reports
%macro summary_report(groupvar);
proc sql;
title "Average Price and Rating by &groupvar";
select &groupvar,
count(*) as Total,
avg(Price) as AvgPrice format=8.2,
avg(Rating) as AvgRating format=4.2
from dresses
group by &groupvar;
quit;
%mend;
%summary_report(Type);
Output:
| Average Price and Rating by Type |
| Type | Total | AvgPrice | AvgRating |
|---|---|---|---|
| Casual | 5 | 1400.00 | 3.20 |
| Ethnic | 5 | 2120.00 | 4.20 |
| Formal | 5 | 2550.00 | 4.80 |
| Party | 5 | 2250.00 | 3.80 |
%summary_report(Brand);
Output:
| Average Price and Rating by Brand |
| Brand | Total | AvgPrice | AvgRating |
|---|---|---|---|
| AllenSolly | 2 | 1925.00 | 3.50 |
| Arrow | 1 | 2700.00 | 4.00 |
| Aurelia | 1 | 1600.00 | 4.00 |
| Biba | 2 | 2000.00 | 4.00 |
| FabIndia | 2 | 2250.00 | 5.00 |
| Forever21 | 1 | 1800.00 | 3.00 |
| GlobalDesi | 1 | 2800.00 | 3.00 |
| H&M | 1 | 2500.00 | 5.00 |
| Max | 2 | 1450.00 | 3.00 |
| ONLY | 2 | 1825.00 | 4.50 |
| VanHeusen | 2 | 2500.00 | 5.00 |
| W | 1 | 2100.00 | 3.00 |
| Zara | 1 | 1200.00 | 4.00 |
| Zivame | 1 | 3000.00 | 4.00 |
%summary_report(Size);
Output:
| Average Price and Rating by Size |
| Size | Total | AvgPrice | AvgRating |
|---|---|---|---|
| L | 6 | 2066.67 | 4.00 |
| M | 7 | 1821.43 | 4.14 |
| S | 4 | 2237.50 | 4.00 |
| XL | 3 | 2500.00 | 3.67 |
Step 9: Filtering Dresses using Macros and SQL
%macro filter_dress(brand=, minprice=, maxprice=);
proc sql;
title "Filtered Dresses for &brand Between &minprice and &maxprice";
select * from dresses
where Brand = "&brand"
and Price between &minprice and &maxprice;
quit;
%mend;
%filter_dress(brand=Biba, minprice=1800, maxprice=2200);
Output:
| Filtered Dresses for Biba Between 1800 and 2200 |
| DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
|---|---|---|---|---|---|---|---|---|
| 104 | Ethnic | Cotton | XL | 2000 | Orange | Biba | Yes | 4 |
| 116 | Ethnic | Cotton | M | 2000 | Pink | Biba | Yes | 4 |
%filter_dress(brand=VanHeusen, minprice=2000, maxprice=2700);
Output:
| Filtered Dresses for VanHeusen Between 2000 and 2700 |
| DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating |
|---|---|---|---|---|---|---|---|---|
| 106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 |
| 114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 |
Step 10: Creating New Variables (Derived Columns)
Add two new columns:
1.DiscountedPrice – 10% discount if rating = 5
2.IsPopular – If Rating ≥ 4 then 'Yes' else 'No'
data dresses_updated;
set dresses;
if Rating = 5 then DiscountedPrice = Price * 0.99;
else DiscountedPrice = Price;
if Rating >= 4 then IsPopular = 'Yes';
else IsPopular = 'No';
run;
proc print data=dresses_updated;
title "Dresses with Derived Columns";
run;
| Dresses with Derived Columns |
| Obs | DressID | Type | Material | Size | Price | Color | Brand | Availability | Rating | DiscountedPrice | IsPopular |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 101 | Casual | Cotton | M | 1200 | Blue | Zara | Yes | 4 | 1200.0 | Yes |
| 2 | 102 | Formal | Silk | L | 2500 | Black | H&M | Yes | 5 | 2475.0 | Yes |
| 3 | 103 | Party | Polyester | S | 1800 | Red | Forever21 | No | 3 | 1800.0 | No |
| 4 | 104 | Ethnic | Cotton | XL | 2000 | Orange | Biba | Yes | 4 | 2000.0 | Yes |
| 5 | 105 | Casual | Linen | M | 1300 | White | AllenSolly | No | 2 | 1300.0 | No |
| 6 | 106 | Formal | Polyester | L | 2600 | Grey | VanHeusen | Yes | 5 | 2574.0 | Yes |
| 7 | 107 | Party | Silk | S | 3000 | Pink | Zivame | Yes | 4 | 3000.0 | Yes |
| 8 | 108 | Ethnic | Silk | M | 2200 | Yellow | FabIndia | Yes | 5 | 2178.0 | Yes |
| 9 | 109 | Casual | Cotton | L | 1400 | Green | Max | Yes | 3 | 1400.0 | No |
| 10 | 110 | Formal | Linen | XL | 2700 | Maroon | Arrow | No | 4 | 2700.0 | Yes |
| 11 | 111 | Party | Cotton | M | 1900 | Blue | ONLY | Yes | 5 | 1881.0 | Yes |
| 12 | 112 | Ethnic | Polyester | L | 2100 | Beige | W | Yes | 3 | 2100.0 | No |
| 13 | 113 | Casual | Silk | S | 1600 | Red | Aurelia | Yes | 4 | 1600.0 | Yes |
| 14 | 114 | Formal | Cotton | M | 2400 | Navy | VanHeusen | Yes | 5 | 2376.0 | Yes |
| 15 | 115 | Party | Linen | XL | 2800 | Golden | GlobalDesi | No | 3 | 2800.0 | No |
| 16 | 116 | Ethnic | Cotton | M | 2000 | Pink | Biba | Yes | 4 | 2000.0 | Yes |
| 17 | 117 | Casual | Polyester | L | 1500 | Brown | Max | Yes | 3 | 1500.0 | No |
| 18 | 118 | Formal | Silk | S | 2550 | Black | AllenSolly | Yes | 5 | 2524.5 | Yes |
| 19 | 119 | Party | Cotton | M | 1750 | Green | ONLY | Yes | 4 | 1750.0 | Yes |
| 20 | 120 | Ethnic | Linen | L | 2300 | Yellow | FabIndia | Yes | 5 | 2277.0 | Yes |
No comments:
Post a Comment