149.GENERATING THE DATASETS OF CUSTOMERS ,TRANSACTIONS ,PRODUCTS AND REVIEWS AND APPLYING PROCEDURES LIKE PROC MEANS | PROC FREQ ETC..

GENERATING THE DATASETS OF CUSTOMERS ,TRANSACTIONS ,PRODUCTS AND REVIEWS AND APPLYING PROCEDURES LIKE PROC MEANS | PROC FREQ ETC..


/*To explore various data manipulation and analysis techniques using SAS*/

1. Generating the Simulated Dataset

Our dataset will consist of four primary components:​

Customers: Details about individuals purchasing products.​

Products: Information on items available in the store.​

Transactions: Records of purchases made by customers.​

Reviews: Feedback provided by customers on products.


1.1 Creating the Customers Dataset

/*Generate a dataset containing 15 customers with attributes such as Customer ID, Name, Age, Gender, and Location*/

data customers;

    call streaminit(12345); /* Set seed for reproducibility */

    length CustomerID $8 Name $20 Gender $1 Location $15;

    array names[10] $20 _temporary_ ('Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ian', 'Julia');

    array locations[5] $15 _temporary_ ('New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix');

    do i = 1 to 15;

        CustomerID = cats('C', put(i, z7.));

        Name = names[ceil(rand('Uniform') * 10)];

        Age = 18 + floor((70 - 18 + 1) * rand('Uniform')); /* Generates a random integer between 18 and 70 */

        Gender = ifc(rand('Uniform') < 0.5, 'M', 'F');

        Location = locations[ceil(rand('Uniform') * 5)];

        output;

    end;

    drop i;

run;

proc print;run;

Output:

Obs CustomerID Name Gender Location Age
1 C0000001 Frank F Phoenix 70
2 C0000002 Ian F Los Angeles 32
3 C0000003 Eva F Phoenix 64
4 C0000004 Charlie M Los Angeles 67
5 C0000005 Bob M Phoenix 57
6 C0000006 Hannah M Houston 70
7 C0000007 Grace F Chicago 55
8 C0000008 Julia M Houston 18
9 C0000009 Hannah M Phoenix 48
10 C0000010 Julia F Los Angeles 62
11 C0000011 Ian M Los Angeles 45
12 C0000012 Hannah F Los Angeles 61
13 C0000013 Ian M Los Angeles 36
14 C0000014 Frank M Chicago 37
15 C0000015 Hannah M Houston 21

In this code:

call streaminit(12345); initializes the random number generator for reproducibility.​

Arrays names and locations store sample names and locations.​

A loop generates 15 customer records with random attributes.

(70 - 18 + 1) * rand('Uniform') scales this number to the range of 0 to 53 (since 70 - 18 + 1 = 53).


1.2 Creating the Products Dataset

/*Next, we'll create a dataset of 20 products with attributes like Product ID, Name, Category, and Price.*/

data products;

    call streaminit(54321);

    length ProductID $8 ProductName $20 Category $15;

    array productNames[10] $20 _temporary_ ('Laptop', 'Smartphone', 'Tablet', 'Headphones', 'Smartwatch', 'Camera', 'Printer', 'Monitor', 'Keyboard', 'Mouse');

    array categories[5] $15 _temporary_ ('Electronics', 'Accessories', 'Wearables', 'Office Supplies', 'Peripherals');

    do i = 1 to 20;

        ProductID = cats('P', put(i, z7.));

        ProductName = productNames[ceil(rand('Uniform') * 10)];

        Category = categories[ceil(rand('Uniform') * 5)];

        Price = round(rand('Uniform') * 900 + 100, 0.01); /* Price between $100 and $1000 */

        output;

    end;

    drop i;

run;

proc print;run;

Output:

Obs ProductID ProductName Category Price
1 P0000001 Smartwatch Wearables 800.74
2 P0000002 Smartphone Accessories 719.81
3 P0000003 Keyboard Electronics 934.84
4 P0000004 Laptop Electronics 323.77
5 P0000005 Smartphone Wearables 510.60
6 P0000006 Printer Peripherals 842.60
7 P0000007 Headphones Office Supplies 474.75
8 P0000008 Headphones Electronics 196.35
9 P0000009 Keyboard Peripherals 385.95
10 P0000010 Keyboard Wearables 828.15
11 P0000011 Smartphone Office Supplies 951.82
12 P0000012 Monitor Wearables 252.59
13 P0000013 Laptop Peripherals 193.77
14 P0000014 Monitor Electronics 149.87
15 P0000015 Keyboard Electronics 758.59
16 P0000016 Smartphone Accessories 783.57
17 P0000017 Mouse Wearables 289.14
18 P0000018 Headphones Peripherals 251.49
19 P0000019 Headphones Electronics 369.08
20 P0000020 Headphones Electronics 168.56


Here:

Arrays productNames and categories provide sample product names and categories.​

Prices are randomly assigned between $100 and $1000.


1.3 Creating the Transactions Dataset

/*We'll simulate 20 transaction records linking customers to products.*/

data transactions;

    call streaminit(67890);

    length TransactionID $10 CustomerID $8 ProductID $8 TransactionDate  Quantity 8 ;

    do i = 1 to 20;

        TransactionID = cats('T', put(i, z9.));

        CustomerID = cats('C', put(ceil(rand('Uniform') * 1000), z7.));

        ProductID = cats('P', put(ceil(rand('Uniform') * 20), z7.));

        TransactionDate = '01JAN2025'd + floor(rand('Uniform') * 365); /* Random date in 2025 */

        Format TransactionDate Date9.;

        Quantity = ceil(rand('Uniform') * 5); /* Quantity between 1 and 5 */

        output;

    end;

    drop i;

run;

proc print;run;

Output:

Obs TransactionID CustomerID ProductID TransactionDate Quantity
1 T000000001 C0000066 P0000010 31JUL2025 1
2 T000000002 C0000654 P0000007 17SEP2025 3
3 T000000003 C0000970 P0000010 22OCT2025 1
4 T000000004 C0000267 P0000005 28FEB2025 4
5 T000000005 C0000593 P0000015 18MAR2025 1
6 T000000006 C0000506 P0000003 08OCT2025 3
7 T000000007 C0000791 P0000011 18FEB2025 2
8 T000000008 C0000024 P0000001 15MAR2025 5
9 T000000009 C0000940 P0000006 28JUN2025 5
10 T000000010 C0000677 P0000005 23MAR2025 1
11 T000000011 C0000561 P0000015 23JUN2025 5
12 T000000012 C0000509 P0000011 17FEB2025 2
13 T000000013 C0000112 P0000003 22FEB2025 5
14 T000000014 C0000598 P0000019 31MAY2025 3
15 T000000015 C0000820 P0000004 03SEP2025 5
16 T000000016 C0000024 P0000020 02SEP2025 5
17 T000000017 C0000392 P0000010 09SEP2025 4
18 T000000018 C0000596 P0000012 09AUG2025 3
19 T000000019 C0000256 P0000020 04JUL2025 3
20 T000000020 C0000365 P0000018 19AUG2025 4


In this dataset:

TransactionDate is assigned a random date in the year 2025.​

Quantity represents the number of units purchased.


/*To calculate TotalAmount, we'll merge the transactions dataset with the products dataset to access the Price information.*/

proc sql;

    create table transactions as

    select t.*, p.Price, (t.Quantity * p.Price) as TotalAmount

    from transactions t

    left join products p

    on t.ProductID = p.ProductID;

quit;

proc print;run;

Output:

Obs TransactionID CustomerID ProductID TransactionDate Quantity Price TotalAmount
1 T000000008 C0000024 P0000001 15MAR2025 5 800.74 4003.70
2 T000000006 C0000506 P0000003 08OCT2025 3 934.84 2804.52
3 T000000013 C0000112 P0000003 22FEB2025 5 934.84 4674.20
4 T000000015 C0000820 P0000004 03SEP2025 5 323.77 1618.85
5 T000000004 C0000267 P0000005 28FEB2025 4 510.60 2042.40
6 T000000010 C0000677 P0000005 23MAR2025 1 510.60 510.60
7 T000000009 C0000940 P0000006 28JUN2025 5 842.60 4213.00
8 T000000002 C0000654 P0000007 17SEP2025 3 474.75 1424.25
9 T000000003 C0000970 P0000010 22OCT2025 1 828.15 828.15
10 T000000001 C0000066 P0000010 31JUL2025 1 828.15 828.15
11 T000000017 C0000392 P0000010 09SEP2025 4 828.15 3312.60
12 T000000007 C0000791 P0000011 18FEB2025 2 951.82 1903.64
13 T000000012 C0000509 P0000011 17FEB2025 2 951.82 1903.64
14 T000000018 C0000596 P0000012 09AUG2025 3 252.59 757.77
15 T000000011 C0000561 P0000015 23JUN2025 5 758.59 3792.95
16 T000000005 C0000593 P0000015 18MAR2025 1 758.59 758.59
17 T000000020 C0000365 P0000018 19AUG2025 4 251.49 1005.96
18 T000000014 C0000598 P0000019 31MAY2025 3 369.08 1107.24
19 T000000019 C0000256 P0000020 04JUL2025 3 168.56 505.68
20 T000000016 C0000024 P0000020 02SEP2025 5 168.56 842.80


1.4 Creating the Reviews Dataset

/*Finally, we'll simulate 20 product reviews from customers.*/

data reviews;

    call streaminit(98765);

    length ReviewID $10 CustomerID $8 ProductID $8 ReviewDate  Rating 8 Comment $50;

    array comments[5] $50 _temporary_ ('Excellent product!', 'Very satisfied.', 'Average quality.', 'Not as expected.', 'Would not recommend.');

    do i = 1 to 20;

        ReviewID = cats('R', put(i, z9.));

        CustomerID = cats('C', put(ceil(rand('Uniform') * 1000), z7.));

        ProductID = cats('P', put(ceil(rand('Uniform') * 20), z7.));

        ReviewDate = '01JAN2025'd + floor(rand('Uniform') * 365);

        Format ReviewDate date9.;

        Rating = ceil(rand('Uniform') * 5); /* Rating between 1 and 5 */

        Comment = comments[ceil(rand('Uniform') * 5)];

        output;

    end;

    drop i;

run;

proc print;run;

Output:

Obs ReviewID CustomerID ProductID ReviewDate Rating Comment
1 R000000001 C0000506 P0000477 23JUL2025 1 Average quality.
2 R000000002 C0000797 P0000110 19MAY2025 4 Would not recommend.
3 R000000003 C0000684 P0000273 27MAR2025 2 Excellent product!
4 R000000004 C0000299 P0000364 31JAN2025 3 Very satisfied.
5 R000000005 C0000382 P0000414 23JUL2025 5 Would not recommend.
6 R000000006 C0000305 P0000248 20APR2025 5 Would not recommend.
7 R000000007 C0000143 P0000412 06JUN2025 5 Average quality.
8 R000000008 C0000250 P0000022 03JUL2025 5 Not as expected.
9 R000000009 C0000674 P0000290 25SEP2025 1 Average quality.
10 R000000010 C0000195 P0000465 31DEC2025 3 Average quality.
11 R000000011 C0000145 P0000466 18MAR2025 2 Not as expected.
12 R000000012 C0000513 P0000276 06FEB2025 5 Average quality.
13 R000000013 C0000835 P0000428 11MAY2025 1 Would not recommend.
14 R000000014 C0000723 P0000026 17NOV2025 3 Not as expected.
15 R000000015 C0000539 P0000148 29MAR2025 2 Excellent product!
16 R000000016 C0000478 P0000254 15JUN2025 5 Would not recommend.
17 R000000017 C0000441 P0000199 26MAY2025 5 Not as expected.
18 R000000018 C0000845 P0000442 20JAN2025 1 Would not recommend.
19 R000000019 C0000886 P0000047 26JAN2025 3 Excellent product!
20 R000000020 C0000558 P0000010 19FEB2025 3 Would not recommend.


Here:

Rating is a score between 1 and 5.

Comment is selected from a predefined list of feedback statements.


2. Exploring the Dataset with SAS Procedures

/*We can now apply various SAS procedures to analyze and visualize the data.*/

2.1 Viewing Data Structure with PROC CONTENTS

/*To understand the structure of our datasets:*/*/

proc contents data=customers;

run;

Output:

                                                          The CONTENTS Procedure

Data Set Name WORK.CUSTOMERS Observations 15
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 14/09/2015 00:39:39 Observation Length 56
Last Modified 14/09/2015 00:39:39 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 1167
Obs in First Data Page 15
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD8792_DESKTOP-QFAA4KV_\customers.sas7bdat
Release Created 9.0401M2
Host Created X64_8HOME


Alphabetic List of Variables and Attributes
# Variable Type Len
5 Age Num 8
1 CustomerID Char 8
3 Gender Char 1
4 Location Char 15
2 Name Char 20


proc contents data=products;

run;

Output:

                                                           The CONTENTS Procedure

Data Set Name WORK.PRODUCTS Observations 20
Member Type DATA Variables 4
Engine V9 Indexes 0
Created 14/09/2015 00:59:01 Observation Length 56
Last Modified 14/09/2015 00:59:01 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 1167
Obs in First Data Page 20
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD8792_DESKTOP-QFAA4KV_\products.sas7bdat
Release Created 9.0401M2
Host Created X64_8HOME


Alphabetic List of Variables and Attributes
# Variable Type Len
3 Category Char 15
4 Price Num 8
1 ProductID Char 8
2 ProductName Char 20


proc contents data=transactions;

run;

Output:

                                                             The CONTENTS Procedure

Data Set Name WORK.TRANSACTIONS Observations 20
Member Type DATA Variables 7
Engine V9 Indexes 0
Created 14/09/2015 01:02:52 Observation Length 64
Last Modified 14/09/2015 01:02:52 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted YES
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 1021
Obs in First Data Page 20
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD8792_DESKTOP-QFAA4KV_\transactions.sas7bdat
Release Created 9.0401M2
Host Created X64_8HOME


Alphabetic List of Variables and Attributes
# Variable Type Len Format
2 CustomerID Char 8  
6 Price Num 8  
3 ProductID Char 8  
5 Quantity Num 8  
7 TotalAmount Num 8  
4 TransactionDate Num 8 DATE9.
1 TransactionID Char 10  


Sort Information
Sortedby ProductID
Validated YES
Character Set ANSI


proc contents data=reviews;

run;

Output:

                                                                  The CONTENTS Procedure

Data Set Name WORK.REVIEWS Observations 20
Member Type DATA Variables 6
Engine V9 Indexes 0
Created 14/09/2015 01:05:00 Observation Length 96
Last Modified 14/09/2015 01:05:00 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 681
Obs in First Data Page 20
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\Lenovo\AppData\Local\Temp\SAS Temporary Files\_TD8792_DESKTOP-QFAA4KV_\reviews.sas7bdat
Release Created 9.0401M2
Host Created X64_8HOME


Alphabetic List of Variables and Attributes
# Variable Type Len Format
6 Comment Char 50  
2 CustomerID Char 8  
3 ProductID Char 8  
5 Rating Num 8  
4 ReviewDate Num 8 DATE9.
1 ReviewID Char 10  


*/PROC CONTENTS provides metadata about each dataset, including variable names, types, and lengths.*/


2.2 Summarizing Data with PROC MEANS

/*To obtain descriptive statistics:*/

proc means data=customers;

    var Age;

run;

Output:

                                                               The MEANS Procedure

Analysis Variable : Age
N Mean Std Dev Minimum Maximum
15 49.5333333 17.3282044 18.0000000 70.0000000

proc means data=products;

    var Price;

run;

Output:

                                                                    The MEANS Procedure

Analysis Variable : Price
N Mean Std Dev Minimum Maximum
20 509.3020000 285.6575398 149.8700000 951.8200000

proc means data=transactions;

    var Quantity TotalAmount;

run;

Output:

                                                           The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum
Quantity
TotalAmount
20
20
3.2500000
1941.93
1.5174424
1366.60
1.0000000
505.6800000
5.0000000
4674.20


/*PROC MEANS calculates statistics like mean, standard deviation, and range for numeric variables.*/


2.3 Frequency Analysis with PROC FREQ

/*To analyze categorical data distributions:*/

proc freq data=customers;

    tables Gender Location;

run;

 Output:

                                                                 The FREQ Procedure

Gender Frequency Percent Cumulative
Frequency
Cumulative
Percent
F 6 40.00 6 40.00
M 9 60.00 15 100.00


Location Frequency Percent Cumulative
Frequency
Cumulative
Percent
Chicago 2 13.33 2 13.33
Houston 3 20.00 5 33.33
Los Angeles 6 40.00 11 73.33
Phoenix 4 26.67 15 100.00


proc freq data=products;

    tables category;

run;

Output:

                                                              The FREQ Procedure

Category Frequency Percent Cumulative
Frequency
Cumulative
Percent
Accessories 2 10.00 2 10.00
Electronics 7 35.00 9 45.00
Office Supplies 2 10.00 11 55.00
Peripherals 4 20.00 15 75.00
Wearables 5 25.00 20 100.00


proc freq data=transactions;

    tables quantity;

run;

Output:

                                                                      The FREQ Procedure

Quantity Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 4 20.00 4 20.00
2 2 10.00 6 30.00
3 5 25.00 11 55.00
4 3 15.00 14 70.00
5 6 30.00 20 100.00


PRACTICE AND COMMENT YOUR CODE: 

-->PLEASE FOLLOW OUR BLOG FOR MORE UPDATES.

TO FOLLOW OUR TELEGRAM CHANNEL CLICK HERE


Comments