156.COMPREHENSIVE ANALYSIS OF BOOKSTORE SALES DATA UTILIZING SAS PROCEDURES: PROC PRINT | PROC CONTENTS | PROC MEANS | PROC FREQ | PROC SGPLOT | PROC SQL | PROC CORR | PROC EXPORT
- Get link
- X
- Other Apps
COMPREHENSIVE ANALYSIS OF BOOKSTORE SALES DATA UTILIZING SAS PROCEDURES: PROC PRINT | PROC CONTENTS | PROC MEANS | PROC FREQ | PROC SGPLOT | PROC SQL | PROC CORR | PROC EXPORT
/*Create a unique dataset centered around book sales in a fictional bookstore.*/
/*Dataset Creation: Book Sales Data*/
data sales;
infile datalines dsd;
length Book_ID $5 Title $50 Author $30 Genre $20 Pub_Date;
input Book_ID $ Title $ Author $ Genre $ Pub_Date :date9. Price Units_Sold;
format Pub_Date date9. Price dollar8.2;
datalines;
B001,"The Silent Patient","Alex Michaelides","Thriller",15FEB2019,26.99,1500
B002,"Where the Crawdads Sing","Delia Owens","Mystery",14AUG2018,24.00,2000
B003,"Becoming","Michelle Obama","Biography",13NOV2018,32.50,2500
B004,"Educated","Tara Westover","Memoir",20FEB2018,28.00,1800
B005,"The Testaments","Margaret Atwood","Dystopian",10SEP2019,30.00,1700
B006,"Normal People","Sally Rooney","Romance",16APR2019,22.00,1600
B007,"The Midnight Library","Matt Haig","Fantasy",13OCT2020,25.00,1900
B008,"The Vanishing Half","Brit Bennett","Historical",02JUN2020,27.00,2100
B009,"The Guest List","Lucy Foley","Thriller",02JUN2020,26.00,1400
B010,"Untamed","Glennon Doyle","Memoir",10MAR2020,28.00,2300
;
run;
proc contents data=sales;
run;
Output:
Data Set Name | WORK.SALES | Observations | 10 |
---|---|---|---|
Member Type | DATA | Variables | 7 |
Engine | V9 | Indexes | 0 |
Created | 14/09/2015 00:26:20 | Observation Length | 136 |
Last Modified | 14/09/2015 00:26:20 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | WINDOWS_64 | ||
Encoding | wlatin1 Western (Windows) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 481 |
Obs in First Data Page | 10 |
Number of Data Set Repairs | 0 |
ExtendObsCounter | YES |
Filename | C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD15972_DESKTOP-QFAA4KV_\sales.sas7bdat |
Release Created | 9.0401M2 |
Host Created | X64_8HOME |
Alphabetic List of Variables and Attributes | ||||
---|---|---|---|---|
# | Variable | Type | Len | Format |
3 | Author | Char | 30 | |
1 | Book_ID | Char | 5 | |
4 | Genre | Char | 20 | |
6 | Price | Num | 8 | DOLLAR8.2 |
5 | Pub_Date | Num | 8 | DATE9. |
2 | Title | Char | 50 | |
7 | Units_Sold | Num | 8 |
proc print data=sales;
run;
Output:
Obs | Book_ID | Title | Author | Genre | Pub_Date | Price | Units_Sold |
---|---|---|---|---|---|---|---|
1 | B001 | The Silent Patient | Alex Michaelides | Thriller | 15FEB2019 | $26.99 | 1500 |
2 | B002 | Where the Crawdads Sing | Delia Owens | Mystery | 14AUG2018 | $24.00 | 2000 |
3 | B003 | Becoming | Michelle Obama | Biography | 13NOV2018 | $32.50 | 2500 |
4 | B004 | Educated | Tara Westover | Memoir | 20FEB2018 | $28.00 | 1800 |
5 | B005 | The Testaments | Margaret Atwood | Dystopian | 10SEP2019 | $30.00 | 1700 |
6 | B006 | Normal People | Sally Rooney | Romance | 16APR2019 | $22.00 | 1600 |
7 | B007 | The Midnight Library | Matt Haig | Fantasy | 13OCT2020 | $25.00 | 1900 |
8 | B008 | The Vanishing Half | Brit Bennett | Historical | 02JUN2020 | $27.00 | 2100 |
9 | B009 | The Guest List | Lucy Foley | Thriller | 02JUN2020 | $26.00 | 1400 |
10 | B010 | Untamed | Glennon Doyle | Memoir | 10MAR2020 | $28.00 | 2300 |
/*Descriptive Statistics with PROC MEANS*/
proc means data=sales n mean min max std;
var Price Units_Sold;
run;
Output:
Variable | N | Mean | Minimum | Maximum | Std Dev | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
/*Frequency Analysis with PROC FREQ*/
proc freq data=sales;
tables Genre;
run;
Output:
Genre | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Biography | 1 | 10.00 | 1 | 10.00 |
Dystopian | 1 | 10.00 | 2 | 20.00 |
Fantasy | 1 | 10.00 | 3 | 30.00 |
Historical | 1 | 10.00 | 4 | 40.00 |
Memoir | 2 | 20.00 | 6 | 60.00 |
Mystery | 1 | 10.00 | 7 | 70.00 |
Romance | 1 | 10.00 | 8 | 80.00 |
Thriller | 2 | 20.00 | 10 | 100.00 |
/*Sales Over Time with PROC SGPLOT*/
proc sgplot data=sales;
series x=Pub_Date y=Units_Sold / markers;
title "Units Sold Over Publication Dates";
run;
Log:
NOTE: PROCEDURE SGPLOT used (Total process time):
real time 2.43 seconds
cpu time 0.51 seconds
NOTE: The column format DATE9 is replaced by an auto-generated format on the axis.
NOTE: The column format DATE9 is replaced by an auto-generated format on the axis.
NOTE: Listing image output written to SGPlot1.png.
NOTE: There were 10 observations read from the data set WORK.SALES.
/*Total Revenue Calculation with PROC SQL*/
proc sql;
create table revenue as
select Book_ID, Title, Price, Units_Sold, (Price * Units_Sold) as Total_Revenue format=dollar10.2
from sales
order by Total_Revenue desc;
quit;
proc print data=revenue;
run;
Output:
Obs | Book_ID | Title | Price | Units_Sold | Total_Revenue |
---|---|---|---|---|---|
1 | B003 | Becoming | $32.50 | 2500 | $81,250.00 |
2 | B010 | Untamed | $28.00 | 2300 | $64,400.00 |
3 | B008 | The Vanishing Half | $27.00 | 2100 | $56,700.00 |
4 | B005 | The Testaments | $30.00 | 1700 | $51,000.00 |
5 | B004 | Educated | $28.00 | 1800 | $50,400.00 |
6 | B002 | Where the Crawdads Sing | $24.00 | 2000 | $48,000.00 |
7 | B007 | The Midnight Library | $25.00 | 1900 | $47,500.00 |
8 | B001 | The Silent Patient | $26.99 | 1500 | $40,485.00 |
9 | B009 | The Guest List | $26.00 | 1400 | $36,400.00 |
10 | B006 | Normal People | $22.00 | 1600 | $35,200.00 |
/*Creating Dummy Variables for Genres*/
data salesdum;
set sales;
Thriller = (Genre = "Thriller");
Mystery = (Genre = "Mystery");
Biography = (Genre = "Biography");
Memoir = (Genre = "Memoir");
Dystopian = (Genre = "Dystopian");
Romance = (Genre = "Romance");
Fantasy = (Genre = "Fantasy");
Historical = (Genre = "Historical");
run;
proc print data=salesdum;
run;
Output:
Obs | Book_ID | Title | Author | Genre | Pub_Date | Price | Units_Sold | Thriller | Mystery | Biography | Memoir | Dystopian | Romance | Fantasy | Historical |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | B001 | The Silent Patient | Alex Michaelides | Thriller | 15FEB2019 | $26.99 | 1500 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | B002 | Where the Crawdads Sing | Delia Owens | Mystery | 14AUG2018 | $24.00 | 2000 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | B003 | Becoming | Michelle Obama | Biography | 13NOV2018 | $32.50 | 2500 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
4 | B004 | Educated | Tara Westover | Memoir | 20FEB2018 | $28.00 | 1800 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
5 | B005 | The Testaments | Margaret Atwood | Dystopian | 10SEP2019 | $30.00 | 1700 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
6 | B006 | Normal People | Sally Rooney | Romance | 16APR2019 | $22.00 | 1600 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
7 | B007 | The Midnight Library | Matt Haig | Fantasy | 13OCT2020 | $25.00 | 1900 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
8 | B008 | The Vanishing Half | Brit Bennett | Historical | 02JUN2020 | $27.00 | 2100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
9 | B009 | The Guest List | Lucy Foley | Thriller | 02JUN2020 | $26.00 | 1400 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10 | B010 | Untamed | Glennon Doyle | Memoir | 10MAR2020 | $28.00 | 2300 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
/*Correlation Analysis with PROC CORR*/
proc corr data=sales;
var Price Units_Sold;
run;
Output:
2 Variables: | Price Units_Sold |
---|
Simple Statistics | ||||||
---|---|---|---|---|---|---|
Variable | N | Mean | Std Dev | Sum | Minimum | Maximum |
Price | 10 | 26.94900 | 2.98560 | 269.49000 | 22.00000 | 32.50000 |
Units_Sold | 10 | 1880 | 352.13634 | 18800 | 1400 | 2500 |
Pearson Correlation
Coefficients, N = 10 Prob > |r| under H0: Rho=0 | ||||||
---|---|---|---|---|---|---|
Price | Units_Sold | |||||
Price |
|
| ||||
Units_Sold |
|
|
/*Exporting the Dataset to CSV*/
proc export data=sales
outfile="C:\Users\YourName\Documents\book_sales.csv"
dbms=csv
replace;
run;
In this exercise, we've:
Created a custom dataset of book sales.
Explored the data structure and contents.
Analyzed descriptive statistics and genre distributions.
Visualized sales trends over time.
Calculated total revenues.
Prepared the data for modeling by creating dummy variables.
Examined correlations between key variables.
Exported the dataset for external use.
- Get link
- X
- Other Apps
Comments
Post a Comment