433.Can Advanced R Programming Detect, Clean, and Optimize Public Library Data While Identifying Fraud Patterns?
Transforming Public Library Data into Reliable Insights with Advanced R-Based Fraud Analytics
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HERE IN THIS PROJECT WE USED THESE SAS & R TECHNIQUES —R DATA FRAME | DPLYR | RBIND / BIND_ROWS | MERGE | DATE FUNCTIONS | STRING FUNCTIONS | NUMERIC FUNCTIONS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Introduction
Public
libraries are evolving into hybrid knowledge hubs combining physical books,
digital resources, and community services. With increasing operational
complexity—membership growth, digital usage tracking, cost management—data
quality becomes critical. However, real-world datasets are rarely clean. They
often contain inconsistencies, missing values, formatting errors, and even fraudulent
manipulations.
In this
project-style blog, we simulate a Public Library dataset in R,
deliberately introduce errors, and then systematically detect, clean,
transform, and optimize the dataset using advanced R techniques. We also
implement fraud detection logic using macros (functions in R) and
demonstrate practical usage of date functions, string handling, numeric
transformations, and dataset reshaping.
This
guide is structured like a real-world analytics pipeline and is designed to
help you learn, apply, and explain concepts in interviews or projects.
Table of Contents
- Business Context
- Dataset Creation
- Introducing Errors
- Error Detection
- Data Cleaning
- Date Functions (mdy, intck,
intnx equivalents in R)
- Character Functions
- Numeric Functions
- Dataset Operations (merge,
append, transpose)
- Fraud Detection Logic
(Macros in R)
- Utilization Classification
- Advanced Transformations
- Final Clean Dataset
- Summary & Conclusion
Business Context
Public
libraries generate data from multiple systems: book inventory, membership
records, digital usage logs, and financial systems. Analysts must:
·
Monitor utilization trends
·
Detect fraudulent activities (fake memberships, inflated usage)
·
Optimize operating costs
· Improve service
scores
1. Dataset Creation
library(dplyr)
library_data <- data.frame(
Library_Name = c("central library","city hub","metro reads","knowledge zone",
"urban library","smart reads","book galaxy","readers point",
"study sphere","infinite pages","wisdom house","learners den",
"digital nest","library x","book corner","elite library"),
City = c("hyderabad","vijayawada","chennai","bangalore",
"delhi","mumbai","kolkata","pune",
"jaipur", "lucknow", "bhopal", "nagpur",
"kochi","indore","surat","patna"),
Books_Count = c(50000,30000,45000, 60000, 55000, NA, 70000, 40000,
38000, 42000, 46000, 48000, 52000, 51000, -1000, 53000),
Members = c(2000, 1500, 1800, 2200, 2100, 1900, 2500, 1700,
1600, 1750, 1850, 1950, 2050, 2150, 0, 2250),
Digital_Usage = c(70, 60, 75, 80, 78, 85, 90, 65,
68, 72, 74, 76, 77, 79, 200, 81),
Operating_Cost = c(200000, 150000, 180000, 220000, 210000, 190000, 250000,
170000, 160000, 175000, 185000, 195000, 205000, 215000,
10000, 225000),
Service_Score = c(4.5, 4.2, 4.6, 4.8, 4.7, 4.9, 5.0, 4.3,
4.1, 4.4, 4.6, 4.7, 4.8, 4.9, 1.0, 5.0),
Fees = c(100, 80, 90, 110, 105, 95, 120, 85,
88, 92, 96, 98, 102, 108, 0, 115),
Join_Date = c("01-01-2020","02-02-2021","03-03-2022","04-04-2023",
"05-05-2020","06-06-2021","07-07-2022","08-08-2023",
"09-09-2020","10-10-2021","11-11-2022","12-12-2023",
"01-01-2024","02-02-2024","03-03-2024","04-04-2024")
)
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
|
1 |
central library |
hyderabad |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
|
2 |
city hub |
vijayawada |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
|
3 |
metro reads |
chennai |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
|
4 |
knowledge zone |
bangalore |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
|
5 |
urban library |
delhi |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
|
6 |
smart reads |
mumbai |
NA |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
|
7 |
book galaxy |
kolkata |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
|
8 |
readers point |
pune |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
|
9 |
study sphere |
jaipur |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
|
10 |
infinite pages |
lucknow |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
|
11 |
wisdom house |
bhopal |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
|
12 |
learners den |
nagpur |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
|
13 |
digital nest |
kochi |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
|
14 |
library x |
indore |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
|
15 |
book corner |
surat |
-1000 |
0 |
200 |
10000 |
1 |
0 |
03-03-2024 |
|
16 |
elite library |
patna |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
Explanation
· Created
dataset with 16 observations
· Includes
numeric, character, and date fields
· dplyr → for
filtering, transforming, summarizing
Key Points
· Real-world
simulation
· Mixed data
types
· Intentional
anomalies included
2.Intentional Errors Introduced
Errors:
· Missing
value (NA)
· Negative
Books_Count
· Zero
Members
· Digital_Usage > 100
· Very low
Operating Cost
· Poor
Service Score anomaly
3.Error Detection
numeric_cols <- library_data[, sapply(library_data, is.numeric)]
OUTPUT:
|
|
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
|
1 |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
|
2 |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
|
3 |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
|
4 |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
|
5 |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
|
6 |
NA |
1900 |
85 |
190000 |
4.9 |
95 |
|
7 |
70000 |
2500 |
90 |
250000 |
5 |
120 |
|
8 |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
|
9 |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
|
10 |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
|
11 |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
|
12 |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
|
13 |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
|
14 |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
|
15 |
-1000 |
0 |
200 |
10000 |
1 |
0 |
|
16 |
53000 |
2250 |
81 |
225000 |
5 |
115 |
summary_output_df <- data.frame(
Variable = names(numeric_cols),
Min = sapply(numeric_cols, min, na.rm=TRUE),
Mean = sapply(numeric_cols, mean, na.rm=TRUE),
Max = sapply(numeric_cols, max, na.rm=TRUE)
)
View(summary_output_df)
OUTPUT:
|
|
Variable |
Min |
Mean |
Max |
|
Books_Count |
Books_Count |
-1000 |
45266.66667 |
70000 |
|
Members |
Members |
0 |
1831.25 |
2500 |
|
Digital_Usage |
Digital_Usage |
60 |
83.125 |
200 |
|
Operating_Cost |
Operating_Cost |
10000 |
183750 |
250000 |
|
Service_Score |
Service_Score |
1 |
4.40625 |
5 |
|
Fees |
Fees |
0 |
92.75 |
120 |
Explanation
· Provides statistical summary
· Helps identify abnormal values
Key Points
· Detects NA
· Highlights
outliers
· Quick diagnostic tool
Why This IS Error
Detection (Partially)
Your table clearly exposes issues:
1.
Books_Count = -1000
·
Invalid (books cannot be negative)
2. Members = 0
·
Suspicious (library must have
members)
3.
Digital_Usage = 200
·
Invalid
(>100%)
4.
Operating_Cost = 10000
·
Too
low compared to others (outlier)
5.
Service_Score = 1
·
Extremely
low (possible anomaly)
6.
Fees = 0
·
Suspicious
(may be free or error)
4.Data Cleaning
library_data_clean <- library_data %>%
mutate(
Books_Count = ifelse(Books_Count < 0| is.na(Books_Count),40000,Books_Count),
Members = ifelse(Members == 0,1500, Members),
Digital_Usage = ifelse(Digital_Usage > 100,85,Digital_Usage),
Operating_Cost = ifelse(Operating_Cost < 50000, 150000, Operating_Cost)
)
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
|
1 |
central library |
hyderabad |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
|
2 |
city hub |
vijayawada |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
|
3 |
metro reads |
chennai |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
|
4 |
knowledge zone |
bangalore |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
|
5 |
urban library |
delhi |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
|
6 |
smart reads |
mumbai |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
|
7 |
book galaxy |
kolkata |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
|
8 |
readers point |
pune |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
|
9 |
study sphere |
jaipur |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
|
10 |
infinite pages |
lucknow |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
|
11 |
wisdom house |
bhopal |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
|
12 |
learners den |
nagpur |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
|
13 |
digital nest |
kochi |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
|
14 |
library x |
indore |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
|
15 |
book corner |
surat |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
|
16 |
elite library |
patna |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
Explanation
· Fix invalid
values using conditions
Key Points
· ifelse() used for correction
· NA handled
· Logical
validation rules applied
5.Character Functions
library_data_clean$Library_Name <- tools::toTitleCase(library_data_clean$Library_Name)
library_data_clean$City <- toupper(library_data_clean$City)
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
|
1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
|
2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
|
3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
|
4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
|
5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
|
6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
|
7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
|
8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
|
9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
|
10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
|
11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
|
12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
|
13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
|
14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
|
15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
|
16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
Explanation
· Converts text
formatting
Key Points
· toupper() = UPCASE (ABC)
· tolower() = LOWCASE (abc)
· toTitleCase() = PROPER CASE (Abc)
6.Trim / Strip Functions
library_data_clean$Library_Name <- trimws(library_data_clean$Library_Name)
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
|
1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
|
2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
|
3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
|
4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
|
5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
|
6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
|
7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
|
8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
|
9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
|
10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
|
11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
|
12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
|
13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
|
14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
|
15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
|
16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
Explanation
· Removes unwanted spaces
7.Concatenation
library_data_clean$Library_ID <- paste0("LIB_",1:nrow(library_data_clean))
library_data_clean$Full_Info <- paste(library_data_clean$Library_Name,
library_data_clean$City,sep = "_")
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Library_ID |
Full_Info |
|
1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
LIB_1 |
Central Library_HYDERABAD |
|
2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
LIB_2 |
City Hub_VIJAYAWADA |
|
3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
LIB_3 |
Metro Reads_CHENNAI |
|
4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
LIB_4 |
Knowledge Zone_BANGALORE |
|
5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
LIB_5 |
Urban Library_DELHI |
|
6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
LIB_6 |
Smart Reads_MUMBAI |
|
7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
LIB_7 |
Book Galaxy_KOLKATA |
|
8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
LIB_8 |
Readers Point_PUNE |
|
9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
LIB_9 |
Study Sphere_JAIPUR |
|
10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
LIB_10 |
Infinite Pages_LUCKNOW |
|
11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
LIB_11 |
Wisdom House_BHOPAL |
|
12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
LIB_12 |
Learners Den_NAGPUR |
|
13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
LIB_13 |
Digital Nest_KOCHI |
|
14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
LIB_14 |
Library x_INDORE |
|
15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
LIB_15 |
Book Corner_SURAT |
|
16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
LIB_16 |
Elite Library_PATNA |
Explanation
· Equivalent to CAT, CATX
8.Date Conversion (MDY Equivalent)
library_data_clean$Join_Date <- as.Date(library_data_clean$Join_Date, format="%d-%m-%Y")
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Library_ID |
Full_Info |
|
1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
LIB_1 |
Central Library_HYDERABAD |
|
2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
LIB_2 |
City Hub_VIJAYAWADA |
|
3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
LIB_3 |
Metro Reads_CHENNAI |
|
4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
LIB_4 |
Knowledge Zone_BANGALORE |
|
5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
LIB_5 |
Urban Library_DELHI |
|
6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
LIB_6 |
Smart Reads_MUMBAI |
|
7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
LIB_7 |
Book Galaxy_KOLKATA |
|
8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
LIB_8 |
Readers Point_PUNE |
|
9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
LIB_9 |
Study Sphere_JAIPUR |
|
10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
LIB_10 |
Infinite Pages_LUCKNOW |
|
11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
LIB_11 |
Wisdom House_BHOPAL |
|
12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
LIB_12 |
Learners Den_NAGPUR |
|
13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
LIB_13 |
Digital Nest_KOCHI |
|
14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
LIB_14 |
Library x_INDORE |
|
15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
LIB_15 |
Book Corner_SURAT |
|
16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
LIB_16 |
Elite Library_PATNA |
install.packages("lubridate")
library(lubridate)
library_data$Join_Date <- dmy(library_data$Join_Date)
OUTPUT:
| Library_Name | City | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Fees | Join_Date | Library_ID | Full_Info |
1 | Central Library | HYDERABAD | 50000 | 2000 | 70 | 200000 | 4.5 | 100 | 01-01-2020 | LIB_1 | Central Library_HYDERABAD |
2 | City Hub | VIJAYAWADA | 30000 | 1500 | 60 | 150000 | 4.2 | 80 | 02-02-2021 | LIB_2 | City Hub_VIJAYAWADA |
3 | Metro Reads | CHENNAI | 45000 | 1800 | 75 | 180000 | 4.6 | 90 | 03-03-2022 | LIB_3 | Metro Reads_CHENNAI |
4 | Knowledge Zone | BANGALORE | 60000 | 2200 | 80 | 220000 | 4.8 | 110 | 04-04-2023 | LIB_4 | Knowledge Zone_BANGALORE |
5 | Urban Library | DELHI | 55000 | 2100 | 78 | 210000 | 4.7 | 105 | 05-05-2020 | LIB_5 | Urban Library_DELHI |
6 | Smart Reads | MUMBAI | 40000 | 1900 | 85 | 190000 | 4.9 | 95 | 06-06-2021 | LIB_6 | Smart Reads_MUMBAI |
7 | Book Galaxy | KOLKATA | 70000 | 2500 | 90 | 250000 | 5 | 120 | 07-07-2022 | LIB_7 | Book Galaxy_KOLKATA |
8 | Readers Point | PUNE | 40000 | 1700 | 65 | 170000 | 4.3 | 85 | 08-08-2023 | LIB_8 | Readers Point_PUNE |
9 | Study Sphere | JAIPUR | 38000 | 1600 | 68 | 160000 | 4.1 | 88 | 09-09-2020 | LIB_9 | Study Sphere_JAIPUR |
10 | Infinite Pages | LUCKNOW | 42000 | 1750 | 72 | 175000 | 4.4 | 92 | 10-10-2021 | LIB_10 | Infinite Pages_LUCKNOW |
11 | Wisdom House | BHOPAL | 46000 | 1850 | 74 | 185000 | 4.6 | 96 | 11-11-2022 | LIB_11 | Wisdom House_BHOPAL |
12 | Learners Den | NAGPUR | 48000 | 1950 | 76 | 195000 | 4.7 | 98 | 12-12-2023 | LIB_12 | Learners Den_NAGPUR |
13 | Digital Nest | KOCHI | 52000 | 2050 | 77 | 205000 | 4.8 | 102 | 01-01-2024 | LIB_13 | Digital Nest_KOCHI |
14 | Library x | INDORE | 51000 | 2150 | 79 | 215000 | 4.9 | 108 | 02-02-2024 | LIB_14 | Library x_INDORE |
15 | Book Corner | SURAT | 40000 | 1500 | 85 | 150000 | 1 | 0 | 03-03-2024 | LIB_15 | Book Corner_SURAT |
16 | Elite Library | PATNA | 53000 | 2250 | 81 | 225000 | 5 | 115 | 04-04-2024 | LIB_16 | Elite Library_PATNA |
Explanation
· Converts string to Date format.
Why Used
· Required for
time calculations.
· lubridate → for handling dates like SAS INTCK,
INTNX
9.INTCK Equivalent (Difference in Years)
#Method 1:
library_data_clean$Years_Active <- as.numeric(difftime(Sys.Date(),
library_data_clean$Join_Date,
units = "days"))/365
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Library_ID |
Full_Info |
Years_Active |
|
1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
LIB_1 |
Central Library_HYDERABAD |
6.235616 |
|
2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
LIB_2 |
City Hub_VIJAYAWADA |
5.145205 |
|
3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
LIB_3 |
Metro Reads_CHENNAI |
4.065753 |
|
4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
LIB_4 |
Knowledge Zone_BANGALORE |
2.978082 |
|
5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
LIB_5 |
Urban Library_DELHI |
5.893151 |
|
6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
LIB_6 |
Smart Reads_MUMBAI |
4.805479 |
|
7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
LIB_7 |
Book Galaxy_KOLKATA |
3.720548 |
|
8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
LIB_8 |
Readers Point_PUNE |
2.632877 |
|
9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
LIB_9 |
Study Sphere_JAIPUR |
5.545205 |
|
10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
LIB_10 |
Infinite Pages_LUCKNOW |
4.460274 |
|
11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
LIB_11 |
Wisdom House_BHOPAL |
3.372603 |
|
12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
LIB_12 |
Learners Den_NAGPUR |
2.287671 |
|
13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
LIB_13 |
Digital Nest_KOCHI |
2.232877 |
|
14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
LIB_14 |
Library x_INDORE |
2.145205 |
|
15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
LIB_15 |
Book Corner_SURAT |
2.063014 |
|
16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
LIB_16 |
Elite Library_PATNA |
1.975342 |
#Method 2:
library_data$Years_Active <- interval(library_data$Join_Date,Sys.Date()) %/% years(1)
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Years_Active |
|
1 |
central library |
hyderabad |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
6 |
|
2 |
city hub |
vijayawada |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
5 |
|
3 |
metro reads |
chennai |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
4 |
|
4 |
knowledge zone |
bangalore |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
2 |
|
5 |
urban library |
delhi |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
5 |
|
6 |
smart reads |
mumbai |
NA |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
4 |
|
7 |
book galaxy |
kolkata |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
3 |
|
8 |
readers point |
pune |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
2 |
|
9 |
study sphere |
jaipur |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
5 |
|
10 |
infinite pages |
lucknow |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
4 |
|
11 |
wisdom house |
bhopal |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
3 |
|
12 |
learners den |
nagpur |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
2 |
|
13 |
digital nest |
kochi |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
2 |
|
14 |
library x |
indore |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
2 |
|
15 |
book corner |
surat |
-1000 |
0 |
200 |
10000 |
1 |
0 |
03-03-2024 |
2 |
|
16 |
elite library |
patna |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
1 |
Explanation
·
Calculates difference in years.
Why Used
· Used for
tenure analysis.
10.INTNX Equivalent (Add Months)
library_data_clean$Next_Review_Date <- library_data_clean$Join_Date %m+% months(6)
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Library_ID |
Full_Info |
Years_Active |
Next_Review_Date |
|
1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
LIB_1 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
|
2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
LIB_2 |
City Hub_VIJAYAWADA |
5.145205 |
02-08-2021 |
|
3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
LIB_3 |
Metro Reads_CHENNAI |
4.065753 |
03-09-2022 |
|
4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
LIB_4 |
Knowledge Zone_BANGALORE |
2.978082 |
04-10-2023 |
|
5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
LIB_5 |
Urban Library_DELHI |
5.893151 |
05-11-2020 |
|
6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
LIB_6 |
Smart Reads_MUMBAI |
4.805479 |
06-12-2021 |
|
7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
LIB_7 |
Book Galaxy_KOLKATA |
3.720548 |
07-01-2023 |
|
8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
LIB_8 |
Readers Point_PUNE |
2.632877 |
08-02-2024 |
|
9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
LIB_9 |
Study Sphere_JAIPUR |
5.545205 |
09-03-2021 |
|
10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
LIB_10 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
|
11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
LIB_11 |
Wisdom House_BHOPAL |
3.372603 |
11-05-2023 |
|
12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
LIB_12 |
Learners Den_NAGPUR |
2.287671 |
12-06-2024 |
|
13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
LIB_13 |
Digital Nest_KOCHI |
2.232877 |
01-07-2024 |
|
14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
LIB_14 |
Library x_INDORE |
2.145205 |
02-08-2024 |
|
15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
LIB_15 |
Book Corner_SURAT |
2.063014 |
03-09-2024 |
|
16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
LIB_16 |
Elite Library_PATNA |
1.975342 |
04-10-2024 |
Explanation
·
Adds 6 months to date.
Why Used
· Used for
scheduling reviews.
11.Utilization Classification
library_data_clean <- library_data_clean %>%
mutate(Utilization = case_when(
Digital_Usage >= 80 ~ "High",
Digital_Usage >= 60 ~ "Medium",
TRUE ~ "Low"
))
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Library_ID |
Full_Info |
Years_Active |
Next_Review_Date |
Utilization |
|
1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
LIB_1 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
Medium |
|
2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
LIB_2 |
City Hub_VIJAYAWADA |
5.145205 |
02-08-2021 |
Medium |
|
3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
LIB_3 |
Metro Reads_CHENNAI |
4.065753 |
03-09-2022 |
Medium |
|
4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
LIB_4 |
Knowledge Zone_BANGALORE |
2.978082 |
04-10-2023 |
High |
|
5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
LIB_5 |
Urban Library_DELHI |
5.893151 |
05-11-2020 |
Medium |
|
6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
LIB_6 |
Smart Reads_MUMBAI |
4.805479 |
06-12-2021 |
High |
|
7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
LIB_7 |
Book Galaxy_KOLKATA |
3.720548 |
07-01-2023 |
High |
|
8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
LIB_8 |
Readers Point_PUNE |
2.632877 |
08-02-2024 |
Medium |
|
9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
LIB_9 |
Study Sphere_JAIPUR |
5.545205 |
09-03-2021 |
Medium |
|
10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
LIB_10 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
Medium |
|
11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
LIB_11 |
Wisdom House_BHOPAL |
3.372603 |
11-05-2023 |
Medium |
|
12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
LIB_12 |
Learners Den_NAGPUR |
2.287671 |
12-06-2024 |
Medium |
|
13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
LIB_13 |
Digital Nest_KOCHI |
2.232877 |
01-07-2024 |
Medium |
|
14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
LIB_14 |
Library x_INDORE |
2.145205 |
02-08-2024 |
Medium |
|
15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
LIB_15 |
Book Corner_SURAT |
2.063014 |
03-09-2024 |
High |
|
16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
LIB_16 |
Elite Library_PATNA |
1.975342 |
04-10-2024 |
High |
Explanation
·
Categorizes usage levels.
Why Used
·
Business segmentation.
12.Merge (JOIN)
extra_data <- data.frame(Library_ID = paste0("LIB_", 1:16),
Rating = runif(16, 3, 5))
OUTPUT:
|
Library_ID |
Rating |
|
LIB_1 |
4.851899 |
|
LIB_2 |
4.228925 |
|
LIB_3 |
4.307081 |
|
LIB_4 |
3.179175 |
|
LIB_5 |
4.741454 |
|
LIB_6 |
3.896237 |
|
LIB_7 |
3.470136 |
|
LIB_8 |
4.801577 |
|
LIB_9 |
3.805064 |
|
LIB_10 |
4.358088 |
|
LIB_11 |
4.875048 |
|
LIB_12 |
4.822165 |
|
LIB_13 |
4.521825 |
|
LIB_14 |
4.911692 |
|
LIB_15 |
3.965751 |
|
LIB_16 |
4.496549 |
merge_data <- merge(library_data_clean,extra_data,by="Library_ID")
OUTPUT:
|
|
Library_ID |
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Full_Info |
Years_Active |
Next_Review_Date |
Utilization |
Rating |
|
1 |
LIB_1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
Medium |
4.851899 |
|
2 |
LIB_10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
Medium |
4.358088 |
|
3 |
LIB_11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
Wisdom House_BHOPAL |
3.372603 |
11-05-2023 |
Medium |
4.875048 |
|
4 |
LIB_12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
Learners Den_NAGPUR |
2.287671 |
12-06-2024 |
Medium |
4.822165 |
|
5 |
LIB_13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
Digital Nest_KOCHI |
2.232877 |
01-07-2024 |
Medium |
4.521825 |
|
6 |
LIB_14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
Library x_INDORE |
2.145205 |
02-08-2024 |
Medium |
4.911692 |
|
7 |
LIB_15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
Book Corner_SURAT |
2.063014 |
03-09-2024 |
High |
3.965751 |
|
8 |
LIB_16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
Elite Library_PATNA |
1.975342 |
04-10-2024 |
High |
4.496549 |
|
9 |
LIB_2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
City Hub_VIJAYAWADA |
5.145205 |
02-08-2021 |
Medium |
4.228925 |
|
10 |
LIB_3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
Metro Reads_CHENNAI |
4.065753 |
03-09-2022 |
Medium |
4.307081 |
|
11 |
LIB_4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
Knowledge Zone_BANGALORE |
2.978082 |
04-10-2023 |
High |
3.179175 |
|
12 |
LIB_5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
Urban Library_DELHI |
5.893151 |
05-11-2020 |
Medium |
4.741454 |
|
13 |
LIB_6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
Smart Reads_MUMBAI |
4.805479 |
06-12-2021 |
High |
3.896237 |
|
14 |
LIB_7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
Book Galaxy_KOLKATA |
3.720548 |
07-01-2023 |
High |
3.470136 |
|
15 |
LIB_8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
Readers Point_PUNE |
2.632877 |
08-02-2024 |
Medium |
4.801577 |
|
16 |
LIB_9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
Study Sphere_JAIPUR |
5.545205 |
09-03-2021 |
Medium |
3.805064 |
Explanation
·
Combines datasets.
Why Used
·
Adds external attributes.
13.Append (Row Bind)
new_data <- merge_data[1:2,]
OUTPUT:
|
|
Library_ID |
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Full_Info |
Years_Active |
Next_Review_Date |
Utilization |
Rating |
|
1 |
LIB_1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
Medium |
4.851899 |
|
2 |
LIB_10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
Medium |
4.358088 |
appended_data <- rbind(merge_data, new_data)
OUTPUT:
|
|
Library_ID |
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Full_Info |
Years_Active |
Next_Review_Date |
Utilization |
Rating |
|
1 |
LIB_1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
Medium |
4.851899 |
|
2 |
LIB_10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
Medium |
4.358088 |
|
3 |
LIB_11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
Wisdom House_BHOPAL |
3.372603 |
11-05-2023 |
Medium |
4.875048 |
|
4 |
LIB_12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
Learners Den_NAGPUR |
2.287671 |
12-06-2024 |
Medium |
4.822165 |
|
5 |
LIB_13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
Digital Nest_KOCHI |
2.232877 |
01-07-2024 |
Medium |
4.521825 |
|
6 |
LIB_14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
Library x_INDORE |
2.145205 |
02-08-2024 |
Medium |
4.911692 |
|
7 |
LIB_15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
Book Corner_SURAT |
2.063014 |
03-09-2024 |
High |
3.965751 |
|
8 |
LIB_16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
Elite Library_PATNA |
1.975342 |
04-10-2024 |
High |
4.496549 |
|
9 |
LIB_2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
City Hub_VIJAYAWADA |
5.145205 |
02-08-2021 |
Medium |
4.228925 |
|
10 |
LIB_3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
Metro Reads_CHENNAI |
4.065753 |
03-09-2022 |
Medium |
4.307081 |
|
11 |
LIB_4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
Knowledge Zone_BANGALORE |
2.978082 |
04-10-2023 |
High |
3.179175 |
|
12 |
LIB_5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
Urban Library_DELHI |
5.893151 |
05-11-2020 |
Medium |
4.741454 |
|
13 |
LIB_6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
Smart Reads_MUMBAI |
4.805479 |
06-12-2021 |
High |
3.896237 |
|
14 |
LIB_7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
Book Galaxy_KOLKATA |
3.720548 |
07-01-2023 |
High |
3.470136 |
|
15 |
LIB_8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
Readers Point_PUNE |
2.632877 |
08-02-2024 |
Medium |
4.801577 |
|
16 |
LIB_9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
Study Sphere_JAIPUR |
5.545205 |
09-03-2021 |
Medium |
3.805064 |
|
17 |
LIB_1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
Medium |
4.851899 |
|
18 |
LIB_10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
Medium |
4.358088 |
Explanation
·
Adds duplicate rows.
Why Used
·
Testing scenarios.
14.Transpose
transposed <- t(appended_data[,4:8])
OUTPUT:
|
|
V1 |
V2 |
V3 |
V4 |
V5 |
V6 |
V7 |
V8 |
V9 |
V10 |
V11 |
V12 |
V13 |
V14 |
V15 |
V16 |
V17 |
V18 |
|
Books_Count |
5.00E+04 |
42000 |
46000 |
48000 |
52000 |
51000 |
40000 |
53000 |
3.00E+04 |
4.50E+04 |
6.00E+04 |
5.50E+04 |
4.00E+04 |
70000 |
4.00E+04 |
3.80E+04 |
5.00E+04 |
42000 |
|
Members |
2.00E+03 |
1750 |
1850 |
1950 |
2050 |
2150 |
1500 |
2250 |
1.50E+03 |
1.80E+03 |
2.20E+03 |
2.10E+03 |
1.90E+03 |
2500 |
1.70E+03 |
1.60E+03 |
2.00E+03 |
1750 |
|
Digital_Usage |
7.00E+01 |
72 |
74 |
76 |
77 |
79 |
85 |
81 |
6.00E+01 |
7.50E+01 |
8.00E+01 |
7.80E+01 |
8.50E+01 |
90 |
6.50E+01 |
6.80E+01 |
7.00E+01 |
72 |
|
Operating_Cost |
2.00E+05 |
175000 |
185000 |
195000 |
205000 |
215000 |
150000 |
225000 |
1.50E+05 |
1.80E+05 |
2.20E+05 |
2.10E+05 |
1.90E+05 |
250000 |
1.70E+05 |
1.60E+05 |
2.00E+05 |
175000 |
|
Service_Score |
4.50E+00 |
4.4 |
4.6 |
4.7 |
4.8 |
4.9 |
1 |
5 |
4.20E+00 |
4.60E+00 |
4.80E+00 |
4.70E+00 |
4.90E+00 |
5 |
4.30E+00 |
4.10E+00 |
4.50E+00 |
4.4 |
Explanation
·
Switches rows and columns.
Why Used
·
Used in reporting formats.
15.Fraud Detection Logic (Macro Equivalent)
fraud_check <- function(df){
df %>%
mutate(Fraud_Flag = ifelse(
Digital_Usage > 90 & Members < 1000 |
Operating_Cost > 240000 |
Service_Score < 2,
"Fraud" , "Clean"
))
}
final_data <- fraud_check(appended_data)
OUTPUT:
|
|
Library_ID |
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Full_Info |
Years_Active |
Next_Review_Date |
Utilization |
Rating |
Fraud_Flag |
|
1 |
LIB_1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
Medium |
4.851899 |
Clean |
|
2 |
LIB_10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
Medium |
4.358088 |
Clean |
|
3 |
LIB_11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
Wisdom House_BHOPAL |
3.372603 |
11-05-2023 |
Medium |
4.875048 |
Clean |
|
4 |
LIB_12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
Learners Den_NAGPUR |
2.287671 |
12-06-2024 |
Medium |
4.822165 |
Clean |
|
5 |
LIB_13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
Digital Nest_KOCHI |
2.232877 |
01-07-2024 |
Medium |
4.521825 |
Clean |
|
6 |
LIB_14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
Library x_INDORE |
2.145205 |
02-08-2024 |
Medium |
4.911692 |
Clean |
|
7 |
LIB_15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
Book Corner_SURAT |
2.063014 |
03-09-2024 |
High |
3.965751 |
Fraud |
|
8 |
LIB_16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
Elite Library_PATNA |
1.975342 |
04-10-2024 |
High |
4.496549 |
Clean |
|
9 |
LIB_2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
City Hub_VIJAYAWADA |
5.145205 |
02-08-2021 |
Medium |
4.228925 |
Clean |
|
10 |
LIB_3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
Metro Reads_CHENNAI |
4.065753 |
03-09-2022 |
Medium |
4.307081 |
Clean |
|
11 |
LIB_4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
Knowledge Zone_BANGALORE |
2.978082 |
04-10-2023 |
High |
3.179175 |
Clean |
|
12 |
LIB_5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
Urban Library_DELHI |
5.893151 |
05-11-2020 |
Medium |
4.741454 |
Clean |
|
13 |
LIB_6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
Smart Reads_MUMBAI |
4.805479 |
06-12-2021 |
High |
3.896237 |
Clean |
|
14 |
LIB_7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
Book Galaxy_KOLKATA |
3.720548 |
07-01-2023 |
High |
3.470136 |
Fraud |
|
15 |
LIB_8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
Readers Point_PUNE |
2.632877 |
08-02-2024 |
Medium |
4.801577 |
Clean |
|
16 |
LIB_9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
Study Sphere_JAIPUR |
5.545205 |
09-03-2021 |
Medium |
3.805064 |
Clean |
|
17 |
LIB_1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
Medium |
4.851899 |
Clean |
|
18 |
LIB_10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
Medium |
4.358088 |
Clean |
Explanation
· Function acts like SAS macro.
·
Flags suspicious patterns.
· Flags suspicious records.
Why Used
· Automates fraud detection logic.
16.Numeric Functions
final_data$Cost_Per_Member <- final_data$Operating_Cost / final_data$Members
final_data$Log_Books <- log(final_data$Books_Count)
OUTPUT:
|
|
Library_ID |
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Full_Info |
Years_Active |
Next_Review_Date |
Utilization |
Rating |
Fraud_Flag |
Cost_Per_Member |
Log_Books |
|
1 |
LIB_1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
Medium |
4.851899 |
Clean |
100 |
10.81978 |
|
2 |
LIB_10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
Medium |
4.358088 |
Clean |
100 |
10.64542 |
|
3 |
LIB_11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
Wisdom House_BHOPAL |
3.372603 |
11-05-2023 |
Medium |
4.875048 |
Clean |
100 |
10.7364 |
|
4 |
LIB_12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
Learners Den_NAGPUR |
2.287671 |
12-06-2024 |
Medium |
4.822165 |
Clean |
100 |
10.77896 |
|
5 |
LIB_13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
Digital Nest_KOCHI |
2.232877 |
01-07-2024 |
Medium |
4.521825 |
Clean |
100 |
10.859 |
|
6 |
LIB_14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
Library x_INDORE |
2.145205 |
02-08-2024 |
Medium |
4.911692 |
Clean |
100 |
10.83958 |
|
7 |
LIB_15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
Book Corner_SURAT |
2.063014 |
03-09-2024 |
High |
3.965751 |
Fraud |
100 |
10.59663 |
|
8 |
LIB_16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
Elite Library_PATNA |
1.975342 |
04-10-2024 |
High |
4.496549 |
Clean |
100 |
10.87805 |
|
9 |
LIB_2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
City Hub_VIJAYAWADA |
5.145205 |
02-08-2021 |
Medium |
4.228925 |
Clean |
100 |
10.30895 |
|
10 |
LIB_3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
Metro Reads_CHENNAI |
4.065753 |
03-09-2022 |
Medium |
4.307081 |
Clean |
100 |
10.71442 |
|
11 |
LIB_4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
Knowledge Zone_BANGALORE |
2.978082 |
04-10-2023 |
High |
3.179175 |
Clean |
100 |
11.0021 |
|
12 |
LIB_5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
Urban Library_DELHI |
5.893151 |
05-11-2020 |
Medium |
4.741454 |
Clean |
100 |
10.91509 |
|
13 |
LIB_6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
Smart Reads_MUMBAI |
4.805479 |
06-12-2021 |
High |
3.896237 |
Clean |
100 |
10.59663 |
|
14 |
LIB_7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
Book Galaxy_KOLKATA |
3.720548 |
07-01-2023 |
High |
3.470136 |
Fraud |
100 |
11.15625 |
|
15 |
LIB_8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
Readers Point_PUNE |
2.632877 |
08-02-2024 |
Medium |
4.801577 |
Clean |
100 |
10.59663 |
|
16 |
LIB_9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
Study Sphere_JAIPUR |
5.545205 |
09-03-2021 |
Medium |
3.805064 |
Clean |
100 |
10.54534 |
|
17 |
LIB_1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
Central Library_HYDERABAD |
6.235616 |
01-07-2020 |
Medium |
4.851899 |
Clean |
100 |
10.81978 |
|
18 |
LIB_10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
Infinite Pages_LUCKNOW |
4.460274 |
10-04-2022 |
Medium |
4.358088 |
Clean |
100 |
10.64542 |
Explanation
· Calculates cost efficiency.
Why Used
· Optimization metric.
17.Coalesce Equivalent
final_data$Books_Count <- ifelse(is.na(final_data$Books_Count),40000,
final_data$Books_Count)
OUTPUT:
| Library_ID | Library_Name | City | Books_Count | Members | Digital_Usage | Operating_Cost | Service_Score | Fees | Join_Date | Full_Info | Years_Active | Next_Review_Date | Utilization | Rating | Fraud_Flag | Cost_Per_Member | Log_Books |
1 | LIB_1 | Central Library | HYDERABAD | 50000 | 2000 | 70 | 200000 | 4.5 | 100 | 01-01-2020 | Central Library_HYDERABAD | 6.235616 | 01-07-2020 | Medium | 4.851899 | Clean | 100 | 10.81978 |
2 | LIB_10 | Infinite Pages | LUCKNOW | 42000 | 1750 | 72 | 175000 | 4.4 | 92 | 10-10-2021 | Infinite Pages_LUCKNOW | 4.460274 | 10-04-2022 | Medium | 4.358088 | Clean | 100 | 10.64542 |
3 | LIB_11 | Wisdom House | BHOPAL | 46000 | 1850 | 74 | 185000 | 4.6 | 96 | 11-11-2022 | Wisdom House_BHOPAL | 3.372603 | 11-05-2023 | Medium | 4.875048 | Clean | 100 | 10.7364 |
4 | LIB_12 | Learners Den | NAGPUR | 48000 | 1950 | 76 | 195000 | 4.7 | 98 | 12-12-2023 | Learners Den_NAGPUR | 2.287671 | 12-06-2024 | Medium | 4.822165 | Clean | 100 | 10.77896 |
5 | LIB_13 | Digital Nest | KOCHI | 52000 | 2050 | 77 | 205000 | 4.8 | 102 | 01-01-2024 | Digital Nest_KOCHI | 2.232877 | 01-07-2024 | Medium | 4.521825 | Clean | 100 | 10.859 |
6 | LIB_14 | Library x | INDORE | 51000 | 2150 | 79 | 215000 | 4.9 | 108 | 02-02-2024 | Library x_INDORE | 2.145205 | 02-08-2024 | Medium | 4.911692 | Clean | 100 | 10.83958 |
7 | LIB_15 | Book Corner | SURAT | 40000 | 1500 | 85 | 150000 | 1 | 0 | 03-03-2024 | Book Corner_SURAT | 2.063014 | 03-09-2024 | High | 3.965751 | Fraud | 100 | 10.59663 |
8 | LIB_16 | Elite Library | PATNA | 53000 | 2250 | 81 | 225000 | 5 | 115 | 04-04-2024 | Elite Library_PATNA | 1.975342 | 04-10-2024 | High | 4.496549 | Clean | 100 | 10.87805 |
9 | LIB_2 | City Hub | VIJAYAWADA | 30000 | 1500 | 60 | 150000 | 4.2 | 80 | 02-02-2021 | City Hub_VIJAYAWADA | 5.145205 | 02-08-2021 | Medium | 4.228925 | Clean | 100 | 10.30895 |
10 | LIB_3 | Metro Reads | CHENNAI | 45000 | 1800 | 75 | 180000 | 4.6 | 90 | 03-03-2022 | Metro Reads_CHENNAI | 4.065753 | 03-09-2022 | Medium | 4.307081 | Clean | 100 | 10.71442 |
11 | LIB_4 | Knowledge Zone | BANGALORE | 60000 | 2200 | 80 | 220000 | 4.8 | 110 | 04-04-2023 | Knowledge Zone_BANGALORE | 2.978082 | 04-10-2023 | High | 3.179175 | Clean | 100 | 11.0021 |
12 | LIB_5 | Urban Library | DELHI | 55000 | 2100 | 78 | 210000 | 4.7 | 105 | 05-05-2020 | Urban Library_DELHI | 5.893151 | 05-11-2020 | Medium | 4.741454 | Clean | 100 | 10.91509 |
13 | LIB_6 | Smart Reads | MUMBAI | 40000 | 1900 | 85 | 190000 | 4.9 | 95 | 06-06-2021 | Smart Reads_MUMBAI | 4.805479 | 06-12-2021 | High | 3.896237 | Clean | 100 | 10.59663 |
14 | LIB_7 | Book Galaxy | KOLKATA | 70000 | 2500 | 90 | 250000 | 5 | 120 | 07-07-2022 | Book Galaxy_KOLKATA | 3.720548 | 07-01-2023 | High | 3.470136 | Fraud | 100 | 11.15625 |
15 | LIB_8 | Readers Point | PUNE | 40000 | 1700 | 65 | 170000 | 4.3 | 85 | 08-08-2023 | Readers Point_PUNE | 2.632877 | 08-02-2024 | Medium | 4.801577 | Clean | 100 | 10.59663 |
16 | LIB_9 | Study Sphere | JAIPUR | 38000 | 1600 | 68 | 160000 | 4.1 | 88 | 09-09-2020 | Study Sphere_JAIPUR | 5.545205 | 09-03-2021 | Medium | 3.805064 | Clean | 100 | 10.54534 |
17 | LIB_1 | Central Library | HYDERABAD | 50000 | 2000 | 70 | 200000 | 4.5 | 100 | 01-01-2020 | Central Library_HYDERABAD | 6.235616 | 01-07-2020 | Medium | 4.851899 | Clean | 100 | 10.81978 |
18 | LIB_10 | Infinite Pages | LUCKNOW | 42000 | 1750 | 72 | 175000 | 4.4 | 92 | 10-10-2021 | Infinite Pages_LUCKNOW | 4.460274 | 10-04-2022 | Medium | 4.358088 | Clean | 100 | 10.64542 |
18.Delete Dataset
rm(extra_data,new_data)
CONSOLE:
> rm(extra_data,new_data)
19.Final Clean Dataset Code
final_pipeline <- function(){
df <- library_data
df <- df %>%
mutate(
Books_Count=ifelse(Books_Count < 0 | is.na(Books_Count),40000,Books_Count),
Members = ifelse(Members == 0,1500, Members),
Digital_Usage = ifelse(Digital_Usage > 100,85, Digital_Usage),
Operating_Cost = ifelse(Operating_Cost < 50000,150000,Operating_Cost)
)
df$Library_Name <- tools::toTitleCase(trimws(df$Library_Name))
df$City <- toupper(df$City)
df$Join_Date <- as.Date(df$Join_Date,"%d-%m-%Y")
df$Utilization <- ifelse(df$Digital_Usage > 80,"High","Medium")
df$Fraud_Flag <- ifelse(df$Operating_Cost > 240000,"Fraud","Clean")
return(df)
}
final_dataset <- final_pipeline()
OUTPUT:
|
|
Library_Name |
City |
Books_Count |
Members |
Digital_Usage |
Operating_Cost |
Service_Score |
Fees |
Join_Date |
Years_Active |
Utilization |
Fraud_Flag |
|
1 |
Central Library |
HYDERABAD |
50000 |
2000 |
70 |
200000 |
4.5 |
100 |
01-01-2020 |
6 |
Medium |
Clean |
|
2 |
City Hub |
VIJAYAWADA |
30000 |
1500 |
60 |
150000 |
4.2 |
80 |
02-02-2021 |
5 |
Medium |
Clean |
|
3 |
Metro Reads |
CHENNAI |
45000 |
1800 |
75 |
180000 |
4.6 |
90 |
03-03-2022 |
4 |
Medium |
Clean |
|
4 |
Knowledge Zone |
BANGALORE |
60000 |
2200 |
80 |
220000 |
4.8 |
110 |
04-04-2023 |
2 |
Medium |
Clean |
|
5 |
Urban Library |
DELHI |
55000 |
2100 |
78 |
210000 |
4.7 |
105 |
05-05-2020 |
5 |
Medium |
Clean |
|
6 |
Smart Reads |
MUMBAI |
40000 |
1900 |
85 |
190000 |
4.9 |
95 |
06-06-2021 |
4 |
High |
Clean |
|
7 |
Book Galaxy |
KOLKATA |
70000 |
2500 |
90 |
250000 |
5 |
120 |
07-07-2022 |
3 |
High |
Fraud |
|
8 |
Readers Point |
PUNE |
40000 |
1700 |
65 |
170000 |
4.3 |
85 |
08-08-2023 |
2 |
Medium |
Clean |
|
9 |
Study Sphere |
JAIPUR |
38000 |
1600 |
68 |
160000 |
4.1 |
88 |
09-09-2020 |
5 |
Medium |
Clean |
|
10 |
Infinite Pages |
LUCKNOW |
42000 |
1750 |
72 |
175000 |
4.4 |
92 |
10-10-2021 |
4 |
Medium |
Clean |
|
11 |
Wisdom House |
BHOPAL |
46000 |
1850 |
74 |
185000 |
4.6 |
96 |
11-11-2022 |
3 |
Medium |
Clean |
|
12 |
Learners Den |
NAGPUR |
48000 |
1950 |
76 |
195000 |
4.7 |
98 |
12-12-2023 |
2 |
Medium |
Clean |
|
13 |
Digital Nest |
KOCHI |
52000 |
2050 |
77 |
205000 |
4.8 |
102 |
01-01-2024 |
2 |
Medium |
Clean |
|
14 |
Library x |
INDORE |
51000 |
2150 |
79 |
215000 |
4.9 |
108 |
02-02-2024 |
2 |
Medium |
Clean |
|
15 |
Book Corner |
SURAT |
40000 |
1500 |
85 |
150000 |
1 |
0 |
03-03-2024 |
2 |
High |
Clean |
|
16 |
Elite Library |
PATNA |
53000 |
2250 |
81 |
225000 |
5 |
115 |
04-04-2024 |
1 |
High |
Clean |
Summary
In this project, we created a public library
dataset and intentionally added errors to simulate real-world messy data. We
used R programming to detect issues like missing values, negative counts, and
unrealistic usage. Then we cleaned the dataset using conditional logic and
transformations. We applied character functions to standardize text, date
functions to calculate durations, and numeric operations for deeper insights.
We also performed dataset operations like merging, appending, and transposing.
A custom fraud detection function helped identify suspicious records based on
business rules. Finally, we built a reusable pipeline function to automate the
entire process. This project shows how R can handle data cleaning,
transformation, and fraud detection efficiently. It is useful for beginners and
also valuable for interview preparation.
Conclusion
Advanced R programming provides a powerful and practical framework for
handling real-world public library data challenges. In this project, we
demonstrated how messy and inconsistent data such as missing values, invalid
counts, and unrealistic usage metrics can be systematically detected using
summary statistics and filtering techniques. With the help of vectorized
transformations and conditional logic, the dataset was efficiently cleaned and
standardized for reliable analysis. Date functions enabled time-based insights
like active years and review scheduling, while classification logic helped
segment libraries based on utilization patterns. Most importantly, a rule-based
fraud detection approach allowed us to identify suspicious records, such as
unusually high digital usage or abnormal operating costs. By integrating all
these steps into a reusable pipeline, R proves to be highly effective for data
validation, optimization, and fraud analytics. This approach is scalable,
interview-ready, and directly applicable to real-world data-driven
decision-making scenarios.
R INTERVIEW QUESTIONS
1. What is the difference between <- and = in R?
Answer:
Both are used for assignment, but <- is the standard and preferred
operator in R for assigning values. = is mainly used inside function
arguments.
2. What is a data frame in R?
Answer:
A data frame is a table-like structure where each column can have different
data types (numeric, character, etc.), similar to a dataset in SAS or a table
in SQL.
3. What is the use of %>% (pipe operator)?
Answer:
The pipe operator passes the output of one step into the next, making code more
readable and easier to write compared to nested functions.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics ,SAS programming and R Language platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources and R Language.
Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent LIBRARY data.
Our Mission:
This blog provides industry-focused SAS & R programming tutorials and analytics projects covering finance, healthcare, and technology.
This project is suitable for:
· Students learning SAS & R
· Data analysts building portfolios
· Professionals preparing for SAS & R interviews
· Bloggers writing about analytics
· Clinical SAS & R Programmer
· Research Data Analyst
· Regulatory Data Validator
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comments
Post a Comment