156.COMPREHENSIVE ANALYSIS OF BOOKSTORE SALES DATA UTILIZING SAS PROCEDURES: PROC PRINT | PROC CONTENTS | PROC MEANS | PROC FREQ | PROC SGPLOT | PROC SQL | PROC CORR | PROC EXPORT

COMPREHENSIVE ANALYSIS OF BOOKSTORE SALES DATA UTILIZING SAS PROCEDURES: PROC PRINT | PROC CONTENTS | PROC MEANS | PROC FREQ | PROC SGPLOT | PROC SQL | PROC CORR | PROC EXPORT


/*Create a unique dataset centered around book sales in a fictional bookstore.*/

/*Dataset Creation: Book Sales Data*/

data sales;

    infile datalines dsd;

    length Book_ID $5 Title $50 Author $30 Genre $20 Pub_Date;

    input Book_ID $ Title $ Author $ Genre $ Pub_Date :date9. Price Units_Sold;

    format Pub_Date date9. Price dollar8.2;

    datalines;

B001,"The Silent Patient","Alex Michaelides","Thriller",15FEB2019,26.99,1500

B002,"Where the Crawdads Sing","Delia Owens","Mystery",14AUG2018,24.00,2000

B003,"Becoming","Michelle Obama","Biography",13NOV2018,32.50,2500

B004,"Educated","Tara Westover","Memoir",20FEB2018,28.00,1800

B005,"The Testaments","Margaret Atwood","Dystopian",10SEP2019,30.00,1700

B006,"Normal People","Sally Rooney","Romance",16APR2019,22.00,1600

B007,"The Midnight Library","Matt Haig","Fantasy",13OCT2020,25.00,1900

B008,"The Vanishing Half","Brit Bennett","Historical",02JUN2020,27.00,2100

B009,"The Guest List","Lucy Foley","Thriller",02JUN2020,26.00,1400

B010,"Untamed","Glennon Doyle","Memoir",10MAR2020,28.00,2300

;

run;

proc contents data=sales;

run;

Output:

                                                           The CONTENTS Procedure

Data Set Name WORK.SALES Observations 10
Member Type DATA Variables 7
Engine V9 Indexes 0
Created 14/09/2015 00:26:20 Observation Length 136
Last Modified 14/09/2015 00:26:20 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    

Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 481
Obs in First Data Page 10
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD15972_DESKTOP-QFAA4KV_\sales.sas7bdat
Release Created 9.0401M2
Host Created X64_8HOME

Alphabetic List of Variables and Attributes
# Variable Type Len Format
3 Author Char 30  
1 Book_ID Char 5  
4 Genre Char 20  
6 Price Num 8 DOLLAR8.2
5 Pub_Date Num 8 DATE9.
2 Title Char 50  
7 Units_Sold Num 8  

                                                           

proc print data=sales;

run;

Output:

Obs Book_ID Title Author Genre Pub_Date Price Units_Sold
1 B001 The Silent Patient Alex Michaelides Thriller 15FEB2019 $26.99 1500
2 B002 Where the Crawdads Sing Delia Owens Mystery 14AUG2018 $24.00 2000
3 B003 Becoming Michelle Obama Biography 13NOV2018 $32.50 2500
4 B004 Educated Tara Westover Memoir 20FEB2018 $28.00 1800
5 B005 The Testaments Margaret Atwood Dystopian 10SEP2019 $30.00 1700
6 B006 Normal People Sally Rooney Romance 16APR2019 $22.00 1600
7 B007 The Midnight Library Matt Haig Fantasy 13OCT2020 $25.00 1900
8 B008 The Vanishing Half Brit Bennett Historical 02JUN2020 $27.00 2100
9 B009 The Guest List Lucy Foley Thriller 02JUN2020 $26.00 1400
10 B010 Untamed Glennon Doyle Memoir 10MAR2020 $28.00 2300


/*Descriptive Statistics with PROC MEANS*/

proc means data=sales n mean min max std;

    var Price Units_Sold;

run;

Output:

                                                              The MEANS Procedure

Variable N Mean Minimum Maximum Std Dev
Price
Units_Sold
10
10
26.9490000
1880.00
22.0000000
1400.00
32.5000000
2500.00
2.9855967
352.1363372

/*Frequency Analysis with PROC FREQ*/

proc freq data=sales;

    tables Genre;

run;

Output:

                                                               The FREQ Procedure

Genre Frequency Percent Cumulative
Frequency
Cumulative
Percent
Biography 1 10.00 1 10.00
Dystopian 1 10.00 2 20.00
Fantasy 1 10.00 3 30.00
Historical 1 10.00 4 40.00
Memoir 2 20.00 6 60.00
Mystery 1 10.00 7 70.00
Romance 1 10.00 8 80.00
Thriller 2 20.00 10 100.00


/*Sales Over Time with PROC SGPLOT*/

proc sgplot data=sales;

    series x=Pub_Date y=Units_Sold / markers;

    title "Units Sold Over Publication Dates";

run;

Log:

NOTE: PROCEDURE SGPLOT used (Total process time):

      real time           2.43 seconds

      cpu time            0.51 seconds


NOTE: The column format DATE9 is replaced by an auto-generated format on the axis.

NOTE: The column format DATE9 is replaced by an auto-generated format on the axis.

NOTE: Listing image output written to SGPlot1.png.

NOTE: There were 10 observations read from the data set WORK.SALES.


/*Total Revenue Calculation with PROC SQL*/

proc sql;

    create table revenue as

    select Book_ID, Title, Price, Units_Sold, (Price * Units_Sold) as Total_Revenue format=dollar10.2

    from sales

    order by Total_Revenue desc;

quit;

proc print data=revenue;

run;

Output:

Obs Book_ID Title Price Units_Sold Total_Revenue
1 B003 Becoming $32.50 2500 $81,250.00
2 B010 Untamed $28.00 2300 $64,400.00
3 B008 The Vanishing Half $27.00 2100 $56,700.00
4 B005 The Testaments $30.00 1700 $51,000.00
5 B004 Educated $28.00 1800 $50,400.00
6 B002 Where the Crawdads Sing $24.00 2000 $48,000.00
7 B007 The Midnight Library $25.00 1900 $47,500.00
8 B001 The Silent Patient $26.99 1500 $40,485.00
9 B009 The Guest List $26.00 1400 $36,400.00
10 B006 Normal People $22.00 1600 $35,200.00


/*Creating Dummy Variables for Genres*/

data salesdum;

    set sales;

    Thriller = (Genre = "Thriller");

    Mystery = (Genre = "Mystery");

    Biography = (Genre = "Biography");

    Memoir = (Genre = "Memoir");

    Dystopian = (Genre = "Dystopian");

    Romance = (Genre = "Romance");

    Fantasy = (Genre = "Fantasy");

    Historical = (Genre = "Historical");

run;

proc print data=salesdum;

run;

Output:

Obs Book_ID Title Author Genre Pub_Date Price Units_Sold Thriller Mystery Biography Memoir Dystopian Romance Fantasy Historical
1 B001 The Silent Patient Alex Michaelides Thriller 15FEB2019 $26.99 1500 1 0 0 0 0 0 0 0
2 B002 Where the Crawdads Sing Delia Owens Mystery 14AUG2018 $24.00 2000 0 1 0 0 0 0 0 0
3 B003 Becoming Michelle Obama Biography 13NOV2018 $32.50 2500 0 0 1 0 0 0 0 0
4 B004 Educated Tara Westover Memoir 20FEB2018 $28.00 1800 0 0 0 1 0 0 0 0
5 B005 The Testaments Margaret Atwood Dystopian 10SEP2019 $30.00 1700 0 0 0 0 1 0 0 0
6 B006 Normal People Sally Rooney Romance 16APR2019 $22.00 1600 0 0 0 0 0 1 0 0
7 B007 The Midnight Library Matt Haig Fantasy 13OCT2020 $25.00 1900 0 0 0 0 0 0 1 0
8 B008 The Vanishing Half Brit Bennett Historical 02JUN2020 $27.00 2100 0 0 0 0 0 0 0 1
9 B009 The Guest List Lucy Foley Thriller 02JUN2020 $26.00 1400 1 0 0 0 0 0 0 0
10 B010 Untamed Glennon Doyle Memoir 10MAR2020 $28.00 2300 0 0 0 1 0 0 0 0


/*Correlation Analysis with PROC CORR*/

proc corr data=sales;

    var Price Units_Sold;

run;

Output:

                                                               The CORR Procedure

2 Variables: Price Units_Sold


Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Price 10 26.94900 2.98560 269.49000 22.00000 32.50000
Units_Sold 10 1880 352.13634 18800 1400 2500


Pearson Correlation Coefficients, N = 10
Prob > |r| under H0: Rho=0
  Price Units_Sold
Price
1.00000
 
0.49607
0.1448
Units_Sold
0.49607
0.1448
1.00000
 


/*Exporting the Dataset to CSV*/

proc export data=sales

    outfile="C:\Users\YourName\Documents\book_sales.csv"

    dbms=csv

    replace;

run;


In this exercise, we've:

Created a custom dataset of book sales.

Explored the data structure and contents.

Analyzed descriptive statistics and genre distributions.

Visualized sales trends over time.

Calculated total revenues.

Prepared the data for modeling by creating dummy variables.

Examined correlations between key variables.

Exported the dataset for external use.


PRACTICE AND COMMENT YOUR CODE: 

-->PLEASE FOLLOW OUR BLOG FOR MORE UPDATES.

TO FOLLOW OUR TELEGRAM CHANNEL CLICK HERE

Comments