148.ENHANCING BOOKSTORE SALES ANALYSIS WITH SAS: A COMPREHENSIVE GUIDE TO DATA PROCESSING AND INSIGHTFUL REPORTING
ENHANCING BOOKSTORE SALES ANALYSIS WITH SAS: A COMPREHENSIVE GUIDE TO DATA PROCESSING AND INSIGHTFUL REPORTING
/*Step 1: Creating the Initial Dataset*/
/*Creating a dataset named bookstore_sales that contains the following variables:*/
BookID: Unique identifier for each book.
Title: Title of the book.
Author: Name of the author.
Genre: Genre of the book.
Price: Price of the book.
CopiesSold: Number of copies sold.
Rating: Average customer rating (out of 5).
PublicationDate: Date the book was published.
data bookstore_sales;
infile datalines truncover;
input BookID Title & $50. Author & $30. Genre & $20. Price CopiesSold Rating PublicationDate :date9.;
format PublicationDate date9.;
datalines;
1 The Art of SAS John Doe Technology 49.99 150 4.5 15MAR2020
2 Cooking 101 Jane Smith Cooking 29.95 200 4.7 22JUN2018
3 Mystery Manor Alan Poe Mystery 19.99 120 4.3 05OCT2019
4 Historical Facts Mary Johnson History 34.50 80 4.6 12JAN2021
5 Sci-Fi Adventures Luke Skywalker Science Fiction 25.00 300 4.8 30JUL2017
;
run;
proc print data=bookstore_sales;
title "Bookstore Sales Data";
run;
Output:
| Obs | BookID | Title | Author | Genre | Price | CopiesSold | Rating | PublicationDate |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | The Art of SAS | John Doe | Technology | 49.99 | 150 | 4.5 | 15MAR2020 |
| 2 | 2 | Cooking 101 | Jane Smith | Cooking | 29.95 | 200 | 4.7 | 22JUN2018 |
| 3 | 3 | Mystery Manor | Alan Poe | Mystery | 19.99 | 120 | 4.3 | 05OCT2019 |
| 4 | 4 | Historical Facts | Mary Johnson | History | 34.50 | 80 | 4.6 | 12JAN2021 |
| 5 | 5 | Sci-Fi Adventures | Luke Skywalker | Science Fiction | 25.00 | 300 | 4.8 | 30JUL2017 |
/*Step 2: Exploring the Dataset*/
/*To understand the structure and contents of our dataset, we can use the PROC CONTENTS procedure:*/
proc contents data=bookstore_sales;
title "Dataset Structure for Bookstore Sales";
run;
Output:
| Dataset Structure for Bookstore
Sales |
| Data Set Name | WORK.BOOKSTORE_SALES | Observations | 5 |
|---|---|---|---|
| Member Type | DATA | Variables | 8 |
| Engine | V9 | Indexes | 0 |
| Created | 14/09/2015 00:52:48 | Observation Length | 144 |
| Last Modified | 14/09/2015 00:52:48 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | WINDOWS_64 | ||
| Encoding | wlatin1 Western (Windows) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 65536 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 454 |
| Obs in First Data Page | 5 |
| Number of Data Set Repairs | 0 |
| ExtendObsCounter | YES |
| Filename | C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD10152_DESKTOP-QFAA4KV_\bookstore_sales.sas7bdat |
| Release Created | 9.0401M2 |
| Host Created | X64_8HOME |
| Alphabetic List of Variables and Attributes | ||||
|---|---|---|---|---|
| # | Variable | Type | Len | Format |
| 3 | Author | Char | 30 | |
| 1 | BookID | Num | 8 | |
| 6 | CopiesSold | Num | 8 | |
| 4 | Genre | Char | 20 | |
| 5 | Price | Num | 8 | |
| 8 | PublicationDate | Num | 8 | DATE9. |
| 7 | Rating | Num | 8 | |
| 2 | Title | Char | 50 | |
/*Step 3: Data Manipulation Techniques*/
/*Adding New Variables*/
/*Suppose we want to calculate the total revenue for each book and classify the sales performance. We can add two new variables: TotalRevenue and SalesCategory.*/
data bookstore_sales;
set bookstore_sales;
TotalRevenue = Price * CopiesSold;
if CopiesSold > 250 then SalesCategory = 'Best Seller';
else if CopiesSold between 100 and 250 then SalesCategory = 'Average Seller';
else SalesCategory = 'Low Seller';
run;
proc print data=bookstore_sales;
run;
Output:
| Obs | BookID | Title | Author | Genre | Price | CopiesSold | Rating | PublicationDate | TotalRevenue | SalesCategory |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | The Art of SAS | John Doe | Technology | 49.99 | 150 | 4.5 | 15MAR2020 | 7498.5 | Average Seller |
| 2 | 2 | Cooking 101 | Jane Smith | Cooking | 29.95 | 200 | 4.7 | 22JUN2018 | 5990.0 | Average Seller |
| 3 | 3 | Mystery Manor | Alan Poe | Mystery | 19.99 | 120 | 4.3 | 05OCT2019 | 2398.8 | Average Seller |
| 4 | 4 | Historical Facts | Mary Johnson | History | 34.50 | 80 | 4.6 | 12JAN2021 | 2760.0 | Low Seller |
| 5 | 5 | Sci-Fi Adventures | Luke Skywalker | Science Fiction | 25.00 | 300 | 4.8 | 30JUL2017 | 7500.0 | Best Seller |
/*Sorting Data*/
/*To sort the books by TotalRevenue in descending order:*/
proc sort data=bookstore_sales;
by descending TotalRevenue;
run;
proc print data=bookstore_sales;
run;
Output:
| Obs | BookID | Title | Author | Genre | Price | CopiesSold | Rating | PublicationDate | TotalRevenue | SalesCategory |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 5 | Sci-Fi Adventures | Luke Skywalker | Science Fiction | 25.00 | 300 | 4.8 | 30JUL2017 | 7500.0 | Best Seller |
| 2 | 1 | The Art of SAS | John Doe | Technology | 49.99 | 150 | 4.5 | 15MAR2020 | 7498.5 | Average Seller |
| 3 | 2 | Cooking 101 | Jane Smith | Cooking | 29.95 | 200 | 4.7 | 22JUN2018 | 5990.0 | Average Seller |
| 4 | 4 | Historical Facts | Mary Johnson | History | 34.50 | 80 | 4.6 | 12JAN2021 | 2760.0 | Low Seller |
| 5 | 3 | Mystery Manor | Alan Poe | Mystery | 19.99 | 120 | 4.3 | 05OCT2019 | 2398.8 | Average Seller |
/*Step 4: Aggregating Data*/
/*To analyze the average rating and total copies sold per genre, we can use PROC MEANS:*/
proc means data=bookstore_sales ;
class Genre;
var Rating CopiesSold;
output out=genre_summary mean(Rating)=AvgRating sum(CopiesSold)=TotalCopiesSold;
run;
Output:
| Genre | N Obs | Variable | N | Mean | Std Dev | Minimum | Maximum | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cooking | 1 |
|
|
|
|
|
| ||||||||||||
| History | 1 |
|
|
|
|
|
| ||||||||||||
| Mystery | 1 |
|
|
|
|
|
| ||||||||||||
| Science Fiction | 1 |
|
|
|
|
|
| ||||||||||||
| Technology | 1 |
|
|
|
|
|
|
/*Step 5: Data Visualization*/
/*Visualizing data can provide insights that are not immediately apparent from tables. SAS offers various procedures for creating plots and charts.*/
/*Bar Chart of Total Revenue by Genre*/
proc sgplot data=bookstore_sales;
vbar Genre / response=TotalRevenue stat=sum;
title "Total Revenue by Genre";
run;
/*Scatter Plot of Rating vs. Price*/
proc sgplot data=bookstore_sales;
scatter x=Price y=Rating / datalabel=Title;
title "Customer Rating vs. Book Price";
run;
/*Step 6: Advanced Data Techniques*/
/*Merging Datasets*/
/*If we have another dataset containing information about the publishers, we can merge it with our bookstore_sales dataset.*/
/*Assuming we have a dataset publishers:*/
data publishers;
input PublisherID PublisherName $;
datalines;
1 "Tech Books Publishing"
2 "Culinary Press"
3 "Mystery House"
4 "Historical Publications"
5 "Sci-Fi World"
;
run;
proc print;
run;
Output:
| Obs | PublisherID | PublisherName |
|---|---|---|
| 1 | 1 | "Tech |
| 2 | 2 | "Culinar |
| 3 | 3 | "Mystery |
| 4 | 4 | "Histori |
| 5 | 5 | "Sci-Fi |
data book_publishers;
input BookID PublisherID;
datalines;
1 1
2 2
3 3
4 4
5 5
;
run;
proc print;
run;
Output:
| Obs | BookID | PublisherID |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
/*We can merge these datasets to include publisher information in our sales data:*/
proc sort data=bookstore_sales;
by BookID;
run;
proc print;
run;
Output:
| Obs | BookID | Title | Author | Genre | Price | CopiesSold | Rating | PublicationDate | TotalRevenue | SalesCategory |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | The Art of SAS | John Doe | Technology | 49.99 | 150 | 4.5 | 15MAR2020 | 7498.5 | Average Seller |
| 2 | 2 | Cooking 101 | Jane Smith | Cooking | 29.95 | 200 | 4.7 | 22JUN2018 | 5990.0 | Average Seller |
| 3 | 3 | Mystery Manor | Alan Poe | Mystery | 19.99 | 120 | 4.3 | 05OCT2019 | 2398.8 | Average Seller |
| 4 | 4 | Historical Facts | Mary Johnson | History | 34.50 | 80 | 4.6 | 12JAN2021 | 2760.0 | Low Seller |
| 5 | 5 | Sci-Fi Adventures | Luke Skywalker | Science Fiction | 25.00 | 300 | 4.8 | 30JUL2017 | 7500.0 | Best Seller |
proc sort data=book_publishers;
by BookID;
run;
proc print;
run;
Output:
| Obs | BookID | PublisherID |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
data bookstore_sales_with_publishers;
merge bookstore_sales(in=a) book_publishers(in=b);
by BookID;
if a and b;
run;
proc print;
run;
Output:
| Obs | BookID | Title | Author | Genre | Price | CopiesSold | Rating | PublicationDate | TotalRevenue | SalesCategory | PublisherID |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | The Art of SAS | John Doe | Technology | 49.99 | 150 | 4.5 | 15MAR2020 | 7498.5 | Average Seller | 1 |
| 2 | 2 | Cooking 101 | Jane Smith | Cooking | 29.95 | 200 | 4.7 | 22JUN2018 | 5990.0 | Average Seller | 2 |
| 3 | 3 | Mystery Manor | Alan Poe | Mystery | 19.99 | 120 | 4.3 | 05OCT2019 | 2398.8 | Average Seller | 3 |
| 4 | 4 | Historical Facts | Mary Johnson | History | 34.50 | 80 | 4.6 | 12JAN2021 | 2760.0 | Low Seller | 4 |
| 5 | 5 | Sci-Fi Adventures | Luke Skywalker | Science Fiction | 25.00 | 300 | 4.8 | 30JUL2017 | 7500.0 | Best Seller | 5 |
proc sort data=bookstore_sales_with_publishers;
by PublisherID;
run;
proc print;
run;
Output:
| Obs | BookID | Title | Author | Genre | Price | CopiesSold | Rating | PublicationDate | TotalRevenue | SalesCategory | PublisherID |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | The Art of SAS | John Doe | Technology | 49.99 | 150 | 4.5 | 15MAR2020 | 7498.5 | Average Seller | 1 |
| 2 | 2 | Cooking 101 | Jane Smith | Cooking | 29.95 | 200 | 4.7 | 22JUN2018 | 5990.0 | Average Seller | 2 |
| 3 | 3 | Mystery Manor | Alan Poe | Mystery | 19.99 | 120 | 4.3 | 05OCT2019 | 2398.8 | Average Seller | 3 |
| 4 | 4 | Historical Facts | Mary Johnson | History | 34.50 | 80 | 4.6 | 12JAN2021 | 2760.0 | Low Seller | 4 |
| 5 | 5 | Sci-Fi Adventures | Luke Skywalker | Science Fiction | 25.00 | 300 | 4.8 | 30JUL2017 | 7500.0 | Best Seller | 5 |
data final_dataset;
merge bookstore_sales_with_publishers(in=a) publishers(in=b);
by PublisherID;
if a and b;
run;
proc print;
run;
Output:
| Obs | BookID | Title | Author | Genre | Price | CopiesSold | Rating | PublicationDate | TotalRevenue | SalesCategory | PublisherID | PublisherName |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | The Art of SAS | John Doe | Technology | 49.99 | 150 | 4.5 | 15MAR2020 | 7498.5 | Average Seller | 1 | "Tech |
| 2 | 2 | Cooking 101 | Jane Smith | Cooking | 29.95 | 200 | 4.7 | 22JUN2018 | 5990.0 | Average Seller | 2 | "Culinar |
| 3 | 3 | Mystery Manor | Alan Poe | Mystery | 19.99 | 120 | 4.3 | 05OCT2019 | 2398.8 | Average Seller | 3 | "Mystery |
| 4 | 4 | Historical Facts | Mary Johnson | History | 34.50 | 80 | 4.6 | 12JAN2021 | 2760.0 | Low Seller | 4 | "Histori |
| 5 | 5 | Sci-Fi Adventures | Luke Skywalker | Science Fiction | 25.00 | 300 | 4.8 | 30JUL2017 | 7500.0 | Best Seller | 5 | "Sci-Fi |
Comments
Post a Comment