433.Can Advanced R Programming Detect, Clean, and Optimize Public Library Data While Identifying Fraud Patterns?

Transforming Public Library Data into Reliable Insights with Advanced R-Based Fraud Analytics

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS & R TECHNIQUES —R DATA FRAME | DPLYR | RBIND / BIND_ROWS | MERGE | DATE FUNCTIONS | STRING FUNCTIONS | NUMERIC FUNCTIONS

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Introduction

Public libraries are evolving into hybrid knowledge hubs combining physical books, digital resources, and community services. With increasing operational complexity—membership growth, digital usage tracking, cost management—data quality becomes critical. However, real-world datasets are rarely clean. They often contain inconsistencies, missing values, formatting errors, and even fraudulent manipulations.

In this project-style blog, we simulate a Public Library dataset in R, deliberately introduce errors, and then systematically detect, clean, transform, and optimize the dataset using advanced R techniques. We also implement fraud detection logic using macros (functions in R) and demonstrate practical usage of date functions, string handling, numeric transformations, and dataset reshaping.

This guide is structured like a real-world analytics pipeline and is designed to help you learn, apply, and explain concepts in interviews or projects.

Table of Contents

  1. Business Context
  2. Dataset Creation
  3. Introducing Errors
  4. Error Detection
  5. Data Cleaning
  6. Date Functions (mdy, intck, intnx equivalents in R)
  7. Character Functions
  8. Numeric Functions
  9. Dataset Operations (merge, append, transpose)
  10. Fraud Detection Logic (Macros in R)
  11. Utilization Classification
  12. Advanced Transformations
  13. Final Clean Dataset
  14. Summary & Conclusion

Business Context

Public libraries generate data from multiple systems: book inventory, membership records, digital usage logs, and financial systems. Analysts must:

·  Monitor utilization trends

·  Detect fraudulent activities (fake memberships, inflated usage)

·  Optimize operating costs

·  Improve service scores

1. Dataset Creation

library(dplyr)


library_data <- data.frame(

  Library_Name = c("central library","city hub","metro reads","knowledge zone",

                   "urban library","smart reads","book galaxy","readers point",

                   "study sphere","infinite pages","wisdom house","learners den",

                   "digital nest","library x","book corner","elite library"),

  

  City = c("hyderabad","vijayawada","chennai","bangalore",

           "delhi","mumbai","kolkata","pune",

           "jaipur", "lucknow", "bhopal", "nagpur",

           "kochi","indore","surat","patna"),

  

  Books_Count = c(50000,30000,45000, 60000, 55000, NA, 70000, 40000,

                  38000, 42000, 46000, 48000, 52000, 51000, -1000, 53000),

  

  Members = c(2000, 1500, 1800, 2200, 2100, 1900, 2500, 1700,

              1600, 1750, 1850, 1950, 2050, 2150, 0, 2250),

  

  Digital_Usage = c(70, 60, 75, 80, 78, 85, 90, 65,

                    68, 72, 74, 76, 77, 79, 200, 81),

  

  Operating_Cost = c(200000, 150000, 180000, 220000, 210000, 190000, 250000,

                     170000, 160000, 175000, 185000, 195000, 205000, 215000,

                     10000, 225000),

  

  Service_Score = c(4.5, 4.2, 4.6, 4.8, 4.7, 4.9, 5.0, 4.3,

                    4.1, 4.4, 4.6, 4.7, 4.8, 4.9, 1.0, 5.0),

  

  Fees = c(100, 80, 90, 110, 105, 95, 120, 85,

           88, 92, 96, 98, 102, 108, 0, 115),

  

  Join_Date = c("01-01-2020","02-02-2021","03-03-2022","04-04-2023",

                "05-05-2020","06-06-2021","07-07-2022","08-08-2023",

                "09-09-2020","10-10-2021","11-11-2022","12-12-2023",

                "01-01-2024","02-02-2024","03-03-2024","04-04-2024")

)

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

1

central library

hyderabad

50000

2000

70

200000

4.5

100

01-01-2020

2

city hub

vijayawada

30000

1500

60

150000

4.2

80

02-02-2021

3

metro reads

chennai

45000

1800

75

180000

4.6

90

03-03-2022

4

knowledge zone

bangalore

60000

2200

80

220000

4.8

110

04-04-2023

5

urban library

delhi

55000

2100

78

210000

4.7

105

05-05-2020

6

smart reads

mumbai

NA

1900

85

190000

4.9

95

06-06-2021

7

book galaxy

kolkata

70000

2500

90

250000

5

120

07-07-2022

8

readers point

pune

40000

1700

65

170000

4.3

85

08-08-2023

9

study sphere

jaipur

38000

1600

68

160000

4.1

88

09-09-2020

10

infinite pages

lucknow

42000

1750

72

175000

4.4

92

10-10-2021

11

wisdom house

bhopal

46000

1850

74

185000

4.6

96

11-11-2022

12

learners den

nagpur

48000

1950

76

195000

4.7

98

12-12-2023

13

digital nest

kochi

52000

2050

77

205000

4.8

102

01-01-2024

14

library x

indore

51000

2150

79

215000

4.9

108

02-02-2024

15

book corner

surat

-1000

0

200

10000

1

0

03-03-2024

16

elite library

patna

53000

2250

81

225000

5

115

04-04-2024


Explanation

·  Created dataset with 16 observations

·  Includes numeric, character, and date fields

·  dplyr → for filtering, transforming, summarizing

Key Points

·  Real-world simulation

·  Mixed data types

·  Intentional anomalies included

2.Intentional Errors Introduced

Errors:

·  Missing value (NA)

·  Negative Books_Count

·  Zero Members

·  Digital_Usage > 100

·  Very low Operating Cost

·  Poor Service Score anomaly

3.Error Detection

numeric_cols <- library_data[, sapply(library_data, is.numeric)]

OUTPUT:

 

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

1

50000

2000

70

200000

4.5

100

2

30000

1500

60

150000

4.2

80

3

45000

1800

75

180000

4.6

90

4

60000

2200

80

220000

4.8

110

5

55000

2100

78

210000

4.7

105

6

NA

1900

85

190000

4.9

95

7

70000

2500

90

250000

5

120

8

40000

1700

65

170000

4.3

85

9

38000

1600

68

160000

4.1

88

10

42000

1750

72

175000

4.4

92

11

46000

1850

74

185000

4.6

96

12

48000

1950

76

195000

4.7

98

13

52000

2050

77

205000

4.8

102

14

51000

2150

79

215000

4.9

108

15

-1000

0

200

10000

1

0

16

53000

2250

81

225000

5

115

summary_output_df <- data.frame(

  Variable = names(numeric_cols),

  Min = sapply(numeric_cols, min, na.rm=TRUE),

  Mean = sapply(numeric_cols, mean, na.rm=TRUE),

  Max = sapply(numeric_cols, max, na.rm=TRUE)

)


View(summary_output_df)

OUTPUT:

 

Variable

Min

Mean

Max

Books_Count

Books_Count

-1000

45266.66667

70000

Members

Members

0

1831.25

2500

Digital_Usage

Digital_Usage

60

83.125

200

Operating_Cost

Operating_Cost

10000

183750

250000

Service_Score

Service_Score

1

4.40625

5

Fees

Fees

0

92.75

120


Explanation

·  Provides statistical summary

·  Helps identify abnormal values

Key Points

·  Detects NA

·  Highlights outliers

·  Quick diagnostic tool

Why This IS Error Detection (Partially)

Your table clearly exposes issues:

1. Books_Count = -1000

·  Invalid (books cannot be negative)

2. Members = 0

·  Suspicious (library must have members)

3. Digital_Usage = 200

·  Invalid (>100%)

4. Operating_Cost = 10000

·  Too low compared to others (outlier)

5. Service_Score = 1

·  Extremely low (possible anomaly)

6. Fees = 0

·  Suspicious (may be free or error)

4.Data Cleaning

library_data_clean <- library_data %>%

  mutate(

    Books_Count = ifelse(Books_Count < 0| is.na(Books_Count),40000,Books_Count),

    Members = ifelse(Members == 0,1500, Members),

    Digital_Usage = ifelse(Digital_Usage > 100,85,Digital_Usage),

    Operating_Cost = ifelse(Operating_Cost < 50000, 150000, Operating_Cost)

  )

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

1

central library

hyderabad

50000

2000

70

200000

4.5

100

01-01-2020

2

city hub

vijayawada

30000

1500

60

150000

4.2

80

02-02-2021

3

metro reads

chennai

45000

1800

75

180000

4.6

90

03-03-2022

4

knowledge zone

bangalore

60000

2200

80

220000

4.8

110

04-04-2023

5

urban library

delhi

55000

2100

78

210000

4.7

105

05-05-2020

6

smart reads

mumbai

40000

1900

85

190000

4.9

95

06-06-2021

7

book galaxy

kolkata

70000

2500

90

250000

5

120

07-07-2022

8

readers point

pune

40000

1700

65

170000

4.3

85

08-08-2023

9

study sphere

jaipur

38000

1600

68

160000

4.1

88

09-09-2020

10

infinite pages

lucknow

42000

1750

72

175000

4.4

92

10-10-2021

11

wisdom house

bhopal

46000

1850

74

185000

4.6

96

11-11-2022

12

learners den

nagpur

48000

1950

76

195000

4.7

98

12-12-2023

13

digital nest

kochi

52000

2050

77

205000

4.8

102

01-01-2024

14

library x

indore

51000

2150

79

215000

4.9

108

02-02-2024

15

book corner

surat

40000

1500

85

150000

1

0

03-03-2024

16

elite library

patna

53000

2250

81

225000

5

115

04-04-2024

Explanation

·  Fix invalid values using conditions

Key Points

·  ifelse() used for correction

·  NA handled

·  Logical validation rules applied 

5.Character Functions

library_data_clean$Library_Name <- tools::toTitleCase(library_data_clean$Library_Name)

library_data_clean$City <- toupper(library_data_clean$City)

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

Explanation

·  Converts text formatting

Key Points

·  toupper() = UPCASE (ABC)

·  tolower() = LOWCASE (abc)

·  toTitleCase() = PROPER CASE (Abc)

6.Trim / Strip Functions

library_data_clean$Library_Name <- trimws(library_data_clean$Library_Name)

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

Explanation

·  Removes unwanted spaces 

7.Concatenation

library_data_clean$Library_ID <- paste0("LIB_",1:nrow(library_data_clean))

library_data_clean$Full_Info <- paste(library_data_clean$Library_Name,

                                      library_data_clean$City,sep = "_")

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Library_ID

Full_Info

1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

LIB_1

Central Library_HYDERABAD

2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

LIB_2

City Hub_VIJAYAWADA

3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

LIB_3

Metro Reads_CHENNAI

4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

LIB_4

Knowledge Zone_BANGALORE

5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

LIB_5

Urban Library_DELHI

6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

LIB_6

Smart Reads_MUMBAI

7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

LIB_7

Book Galaxy_KOLKATA

8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

LIB_8

Readers Point_PUNE

9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

LIB_9

Study Sphere_JAIPUR

10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

LIB_10

Infinite Pages_LUCKNOW

11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

LIB_11

Wisdom House_BHOPAL

12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

LIB_12

Learners Den_NAGPUR

13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

LIB_13

Digital Nest_KOCHI

14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

LIB_14

Library x_INDORE

15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

LIB_15

Book Corner_SURAT

16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

LIB_16

Elite Library_PATNA

Explanation

·  Equivalent to CAT, CATX

8.Date Conversion (MDY Equivalent)

library_data_clean$Join_Date <- as.Date(library_data_clean$Join_Date, format="%d-%m-%Y")

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Library_ID

Full_Info

1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

LIB_1

Central Library_HYDERABAD

2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

LIB_2

City Hub_VIJAYAWADA

3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

LIB_3

Metro Reads_CHENNAI

4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

LIB_4

Knowledge Zone_BANGALORE

5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

LIB_5

Urban Library_DELHI

6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

LIB_6

Smart Reads_MUMBAI

7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

LIB_7

Book Galaxy_KOLKATA

8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

LIB_8

Readers Point_PUNE

9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

LIB_9

Study Sphere_JAIPUR

10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

LIB_10

Infinite Pages_LUCKNOW

11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

LIB_11

Wisdom House_BHOPAL

12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

LIB_12

Learners Den_NAGPUR

13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

LIB_13

Digital Nest_KOCHI

14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

LIB_14

Library x_INDORE

15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

LIB_15

Book Corner_SURAT

16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

LIB_16

Elite Library_PATNA

install.packages("lubridate")

library(lubridate)

library_data$Join_Date <- dmy(library_data$Join_Date)

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Library_ID

Full_Info

1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

LIB_1

Central Library_HYDERABAD

2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

LIB_2

City Hub_VIJAYAWADA

3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

LIB_3

Metro Reads_CHENNAI

4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

LIB_4

Knowledge Zone_BANGALORE

5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

LIB_5

Urban Library_DELHI

6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

LIB_6

Smart Reads_MUMBAI

7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

LIB_7

Book Galaxy_KOLKATA

8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

LIB_8

Readers Point_PUNE

9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

LIB_9

Study Sphere_JAIPUR

10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

LIB_10

Infinite Pages_LUCKNOW

11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

LIB_11

Wisdom House_BHOPAL

12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

LIB_12

Learners Den_NAGPUR

13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

LIB_13

Digital Nest_KOCHI

14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

LIB_14

Library x_INDORE

15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

LIB_15

Book Corner_SURAT

16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

LIB_16

Elite Library_PATNA


Explanation

 ·  Converts string to Date format.

Why Used

·  Required for time calculations.

·  lubridate → for handling dates like SAS INTCK, INTNX 

9.INTCK Equivalent (Difference in Years)

#Method 1:

library_data_clean$Years_Active <- as.numeric(difftime(Sys.Date(),

                                                       library_data_clean$Join_Date,

                                                       units = "days"))/365

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Library_ID

Full_Info

Years_Active

1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

LIB_1

Central Library_HYDERABAD

6.235616

2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

LIB_2

City Hub_VIJAYAWADA

5.145205

3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

LIB_3

Metro Reads_CHENNAI

4.065753

4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

LIB_4

Knowledge Zone_BANGALORE

2.978082

5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

LIB_5

Urban Library_DELHI

5.893151

6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

LIB_6

Smart Reads_MUMBAI

4.805479

7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

LIB_7

Book Galaxy_KOLKATA

3.720548

8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

LIB_8

Readers Point_PUNE

2.632877

9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

LIB_9

Study Sphere_JAIPUR

5.545205

10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

LIB_10

Infinite Pages_LUCKNOW

4.460274

11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

LIB_11

Wisdom House_BHOPAL

3.372603

12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

LIB_12

Learners Den_NAGPUR

2.287671

13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

LIB_13

Digital Nest_KOCHI

2.232877

14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

LIB_14

Library x_INDORE

2.145205

15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

LIB_15

Book Corner_SURAT

2.063014

16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

LIB_16

Elite Library_PATNA

1.975342

#Method 2:

library_data$Years_Active <- interval(library_data$Join_Date,Sys.Date()) %/% years(1)

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Years_Active

1

central library

hyderabad

50000

2000

70

200000

4.5

100

01-01-2020

6

2

city hub

vijayawada

30000

1500

60

150000

4.2

80

02-02-2021

5

3

metro reads

chennai

45000

1800

75

180000

4.6

90

03-03-2022

4

4

knowledge zone

bangalore

60000

2200

80

220000

4.8

110

04-04-2023

2

5

urban library

delhi

55000

2100

78

210000

4.7

105

05-05-2020

5

6

smart reads

mumbai

NA

1900

85

190000

4.9

95

06-06-2021

4

7

book galaxy

kolkata

70000

2500

90

250000

5

120

07-07-2022

3

8

readers point

pune

40000

1700

65

170000

4.3

85

08-08-2023

2

9

study sphere

jaipur

38000

1600

68

160000

4.1

88

09-09-2020

5

10

infinite pages

lucknow

42000

1750

72

175000

4.4

92

10-10-2021

4

11

wisdom house

bhopal

46000

1850

74

185000

4.6

96

11-11-2022

3

12

learners den

nagpur

48000

1950

76

195000

4.7

98

12-12-2023

2

13

digital nest

kochi

52000

2050

77

205000

4.8

102

01-01-2024

2

14

library x

indore

51000

2150

79

215000

4.9

108

02-02-2024

2

15

book corner

surat

-1000

0

200

10000

1

0

03-03-2024

2

16

elite library

patna

53000

2250

81

225000

5

115

04-04-2024

1

Explanation

·  Calculates difference in years.

Why Used

·  Used for tenure analysis.

10.INTNX Equivalent (Add Months)

library_data_clean$Next_Review_Date <- library_data_clean$Join_Date %m+% months(6)

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Library_ID

Full_Info

Years_Active

Next_Review_Date

1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

LIB_1

Central Library_HYDERABAD

6.235616

01-07-2020

2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

LIB_2

City Hub_VIJAYAWADA

5.145205

02-08-2021

3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

LIB_3

Metro Reads_CHENNAI

4.065753

03-09-2022

4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

LIB_4

Knowledge Zone_BANGALORE

2.978082

04-10-2023

5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

LIB_5

Urban Library_DELHI

5.893151

05-11-2020

6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

LIB_6

Smart Reads_MUMBAI

4.805479

06-12-2021

7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

LIB_7

Book Galaxy_KOLKATA

3.720548

07-01-2023

8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

LIB_8

Readers Point_PUNE

2.632877

08-02-2024

9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

LIB_9

Study Sphere_JAIPUR

5.545205

09-03-2021

10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

LIB_10

Infinite Pages_LUCKNOW

4.460274

10-04-2022

11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

LIB_11

Wisdom House_BHOPAL

3.372603

11-05-2023

12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

LIB_12

Learners Den_NAGPUR

2.287671

12-06-2024

13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

LIB_13

Digital Nest_KOCHI

2.232877

01-07-2024

14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

LIB_14

Library x_INDORE

2.145205

02-08-2024

15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

LIB_15

Book Corner_SURAT

2.063014

03-09-2024

16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

LIB_16

Elite Library_PATNA

1.975342

04-10-2024


Explanation

·  Adds 6 months to date.

Why Used

·  Used for scheduling reviews.

11.Utilization Classification

library_data_clean <- library_data_clean %>%

  mutate(Utilization = case_when(

    Digital_Usage >= 80 ~ "High",

    Digital_Usage >= 60 ~ "Medium",

    TRUE ~ "Low"

  ))

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Library_ID

Full_Info

Years_Active

Next_Review_Date

Utilization

1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

LIB_1

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

LIB_2

City Hub_VIJAYAWADA

5.145205

02-08-2021

Medium

3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

LIB_3

Metro Reads_CHENNAI

4.065753

03-09-2022

Medium

4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

LIB_4

Knowledge Zone_BANGALORE

2.978082

04-10-2023

High

5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

LIB_5

Urban Library_DELHI

5.893151

05-11-2020

Medium

6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

LIB_6

Smart Reads_MUMBAI

4.805479

06-12-2021

High

7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

LIB_7

Book Galaxy_KOLKATA

3.720548

07-01-2023

High

8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

LIB_8

Readers Point_PUNE

2.632877

08-02-2024

Medium

9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

LIB_9

Study Sphere_JAIPUR

5.545205

09-03-2021

Medium

10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

LIB_10

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

LIB_11

Wisdom House_BHOPAL

3.372603

11-05-2023

Medium

12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

LIB_12

Learners Den_NAGPUR

2.287671

12-06-2024

Medium

13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

LIB_13

Digital Nest_KOCHI

2.232877

01-07-2024

Medium

14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

LIB_14

Library x_INDORE

2.145205

02-08-2024

Medium

15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

LIB_15

Book Corner_SURAT

2.063014

03-09-2024

High

16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

LIB_16

Elite Library_PATNA

1.975342

04-10-2024

High

Explanation

·  Categorizes usage levels.

Why Used

·  Business segmentation.

12.Merge (JOIN)

extra_data <- data.frame(Library_ID = paste0("LIB_", 1:16),

                         Rating = runif(16, 3, 5))

OUTPUT:

Library_ID

Rating

LIB_1

4.851899

LIB_2

4.228925

LIB_3

4.307081

LIB_4

3.179175

LIB_5

4.741454

LIB_6

3.896237

LIB_7

3.470136

LIB_8

4.801577

LIB_9

3.805064

LIB_10

4.358088

LIB_11

4.875048

LIB_12

4.822165

LIB_13

4.521825

LIB_14

4.911692

LIB_15

3.965751

LIB_16

4.496549

merge_data <- merge(library_data_clean,extra_data,by="Library_ID")

OUTPUT:

 

Library_ID

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Full_Info

Years_Active

Next_Review_Date

Utilization

Rating

1

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

2

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088

3

LIB_11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

Wisdom House_BHOPAL

3.372603

11-05-2023

Medium

4.875048

4

LIB_12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

Learners Den_NAGPUR

2.287671

12-06-2024

Medium

4.822165

5

LIB_13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

Digital Nest_KOCHI

2.232877

01-07-2024

Medium

4.521825

6

LIB_14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

Library x_INDORE

2.145205

02-08-2024

Medium

4.911692

7

LIB_15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

Book Corner_SURAT

2.063014

03-09-2024

High

3.965751

8

LIB_16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

Elite Library_PATNA

1.975342

04-10-2024

High

4.496549

9

LIB_2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

City Hub_VIJAYAWADA

5.145205

02-08-2021

Medium

4.228925

10

LIB_3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

Metro Reads_CHENNAI

4.065753

03-09-2022

Medium

4.307081

11

LIB_4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

Knowledge Zone_BANGALORE

2.978082

04-10-2023

High

3.179175

12

LIB_5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

Urban Library_DELHI

5.893151

05-11-2020

Medium

4.741454

13

LIB_6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

Smart Reads_MUMBAI

4.805479

06-12-2021

High

3.896237

14

LIB_7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

Book Galaxy_KOLKATA

3.720548

07-01-2023

High

3.470136

15

LIB_8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

Readers Point_PUNE

2.632877

08-02-2024

Medium

4.801577

16

LIB_9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

Study Sphere_JAIPUR

5.545205

09-03-2021

Medium

3.805064

Explanation

·  Combines datasets.

Why Used

·  Adds external attributes.

13.Append (Row Bind)

new_data <- merge_data[1:2,]

OUTPUT:

 

Library_ID

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Full_Info

Years_Active

Next_Review_Date

Utilization

Rating

1

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

2

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088


appended_data <- rbind(merge_data, new_data)

OUTPUT:

 

Library_ID

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Full_Info

Years_Active

Next_Review_Date

Utilization

Rating

1

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

2

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088

3

LIB_11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

Wisdom House_BHOPAL

3.372603

11-05-2023

Medium

4.875048

4

LIB_12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

Learners Den_NAGPUR

2.287671

12-06-2024

Medium

4.822165

5

LIB_13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

Digital Nest_KOCHI

2.232877

01-07-2024

Medium

4.521825

6

LIB_14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

Library x_INDORE

2.145205

02-08-2024

Medium

4.911692

7

LIB_15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

Book Corner_SURAT

2.063014

03-09-2024

High

3.965751

8

LIB_16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

Elite Library_PATNA

1.975342

04-10-2024

High

4.496549

9

LIB_2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

City Hub_VIJAYAWADA

5.145205

02-08-2021

Medium

4.228925

10

LIB_3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

Metro Reads_CHENNAI

4.065753

03-09-2022

Medium

4.307081

11

LIB_4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

Knowledge Zone_BANGALORE

2.978082

04-10-2023

High

3.179175

12

LIB_5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

Urban Library_DELHI

5.893151

05-11-2020

Medium

4.741454

13

LIB_6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

Smart Reads_MUMBAI

4.805479

06-12-2021

High

3.896237

14

LIB_7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

Book Galaxy_KOLKATA

3.720548

07-01-2023

High

3.470136

15

LIB_8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

Readers Point_PUNE

2.632877

08-02-2024

Medium

4.801577

16

LIB_9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

Study Sphere_JAIPUR

5.545205

09-03-2021

Medium

3.805064

17

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

18

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088

Explanation

·  Adds duplicate rows.

Why Used

·  Testing scenarios.

14.Transpose

transposed <- t(appended_data[,4:8])

OUTPUT:

 

V1

V2

V3

V4

V5

V6

V7

V8

V9

V10

V11

V12

V13

V14

V15

V16

V17

V18

Books_Count

5.00E+04

42000

46000

48000

52000

51000

40000

53000

3.00E+04

4.50E+04

6.00E+04

5.50E+04

4.00E+04

70000

4.00E+04

3.80E+04

5.00E+04

42000

Members

2.00E+03

1750

1850

1950

2050

2150

1500

2250

1.50E+03

1.80E+03

2.20E+03

2.10E+03

1.90E+03

2500

1.70E+03

1.60E+03

2.00E+03

1750

Digital_Usage

7.00E+01

72

74

76

77

79

85

81

6.00E+01

7.50E+01

8.00E+01

7.80E+01

8.50E+01

90

6.50E+01

6.80E+01

7.00E+01

72

Operating_Cost

2.00E+05

175000

185000

195000

205000

215000

150000

225000

1.50E+05

1.80E+05

2.20E+05

2.10E+05

1.90E+05

250000

1.70E+05

1.60E+05

2.00E+05

175000

Service_Score

4.50E+00

4.4

4.6

4.7

4.8

4.9

1

5

4.20E+00

4.60E+00

4.80E+00

4.70E+00

4.90E+00

5

4.30E+00

4.10E+00

4.50E+00

4.4

Explanation

·  Switches rows and columns.

Why Used

·  Used in reporting formats.

15.Fraud Detection Logic (Macro Equivalent)

fraud_check <- function(df){

  df %>% 

     mutate(Fraud_Flag = ifelse(

       Digital_Usage > 90 & Members < 1000 | 

         Operating_Cost > 240000 |

         Service_Score < 2,

       "Fraud" , "Clean"

     ))

}


final_data <- fraud_check(appended_data)

OUTPUT:

 

Library_ID

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Full_Info

Years_Active

Next_Review_Date

Utilization

Rating

Fraud_Flag

1

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

Clean

2

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088

Clean

3

LIB_11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

Wisdom House_BHOPAL

3.372603

11-05-2023

Medium

4.875048

Clean

4

LIB_12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

Learners Den_NAGPUR

2.287671

12-06-2024

Medium

4.822165

Clean

5

LIB_13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

Digital Nest_KOCHI

2.232877

01-07-2024

Medium

4.521825

Clean

6

LIB_14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

Library x_INDORE

2.145205

02-08-2024

Medium

4.911692

Clean

7

LIB_15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

Book Corner_SURAT

2.063014

03-09-2024

High

3.965751

Fraud

8

LIB_16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

Elite Library_PATNA

1.975342

04-10-2024

High

4.496549

Clean

9

LIB_2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

City Hub_VIJAYAWADA

5.145205

02-08-2021

Medium

4.228925

Clean

10

LIB_3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

Metro Reads_CHENNAI

4.065753

03-09-2022

Medium

4.307081

Clean

11

LIB_4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

Knowledge Zone_BANGALORE

2.978082

04-10-2023

High

3.179175

Clean

12

LIB_5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

Urban Library_DELHI

5.893151

05-11-2020

Medium

4.741454

Clean

13

LIB_6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

Smart Reads_MUMBAI

4.805479

06-12-2021

High

3.896237

Clean

14

LIB_7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

Book Galaxy_KOLKATA

3.720548

07-01-2023

High

3.470136

Fraud

15

LIB_8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

Readers Point_PUNE

2.632877

08-02-2024

Medium

4.801577

Clean

16

LIB_9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

Study Sphere_JAIPUR

5.545205

09-03-2021

Medium

3.805064

Clean

17

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

Clean

18

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088

Clean

Explanation

·  Function acts like SAS macro.

·  Flags suspicious patterns.

·  Flags suspicious records.

Why Used

·  Automates fraud detection logic.

16.Numeric Functions

final_data$Cost_Per_Member <- final_data$Operating_Cost / final_data$Members

final_data$Log_Books <- log(final_data$Books_Count)

OUTPUT:

 

Library_ID

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Full_Info

Years_Active

Next_Review_Date

Utilization

Rating

Fraud_Flag

Cost_Per_Member

Log_Books

1

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

Clean

100

10.81978

2

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088

Clean

100

10.64542

3

LIB_11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

Wisdom House_BHOPAL

3.372603

11-05-2023

Medium

4.875048

Clean

100

10.7364

4

LIB_12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

Learners Den_NAGPUR

2.287671

12-06-2024

Medium

4.822165

Clean

100

10.77896

5

LIB_13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

Digital Nest_KOCHI

2.232877

01-07-2024

Medium

4.521825

Clean

100

10.859

6

LIB_14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

Library x_INDORE

2.145205

02-08-2024

Medium

4.911692

Clean

100

10.83958

7

LIB_15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

Book Corner_SURAT

2.063014

03-09-2024

High

3.965751

Fraud

100

10.59663

8

LIB_16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

Elite Library_PATNA

1.975342

04-10-2024

High

4.496549

Clean

100

10.87805

9

LIB_2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

City Hub_VIJAYAWADA

5.145205

02-08-2021

Medium

4.228925

Clean

100

10.30895

10

LIB_3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

Metro Reads_CHENNAI

4.065753

03-09-2022

Medium

4.307081

Clean

100

10.71442

11

LIB_4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

Knowledge Zone_BANGALORE

2.978082

04-10-2023

High

3.179175

Clean

100

11.0021

12

LIB_5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

Urban Library_DELHI

5.893151

05-11-2020

Medium

4.741454

Clean

100

10.91509

13

LIB_6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

Smart Reads_MUMBAI

4.805479

06-12-2021

High

3.896237

Clean

100

10.59663

14

LIB_7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

Book Galaxy_KOLKATA

3.720548

07-01-2023

High

3.470136

Fraud

100

11.15625

15

LIB_8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

Readers Point_PUNE

2.632877

08-02-2024

Medium

4.801577

Clean

100

10.59663

16

LIB_9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

Study Sphere_JAIPUR

5.545205

09-03-2021

Medium

3.805064

Clean

100

10.54534

17

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

Clean

100

10.81978

18

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088

Clean

100

10.64542

Explanation

·  Calculates cost efficiency.

Why Used

·  Optimization metric.

17.Coalesce Equivalent

final_data$Books_Count <- ifelse(is.na(final_data$Books_Count),40000,

                                 final_data$Books_Count)

OUTPUT:

 

Library_ID

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Full_Info

Years_Active

Next_Review_Date

Utilization

Rating

Fraud_Flag

Cost_Per_Member

Log_Books

1

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

Clean

100

10.81978

2

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088

Clean

100

10.64542

3

LIB_11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

Wisdom House_BHOPAL

3.372603

11-05-2023

Medium

4.875048

Clean

100

10.7364

4

LIB_12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

Learners Den_NAGPUR

2.287671

12-06-2024

Medium

4.822165

Clean

100

10.77896

5

LIB_13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

Digital Nest_KOCHI

2.232877

01-07-2024

Medium

4.521825

Clean

100

10.859

6

LIB_14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

Library x_INDORE

2.145205

02-08-2024

Medium

4.911692

Clean

100

10.83958

7

LIB_15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

Book Corner_SURAT

2.063014

03-09-2024

High

3.965751

Fraud

100

10.59663

8

LIB_16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

Elite Library_PATNA

1.975342

04-10-2024

High

4.496549

Clean

100

10.87805

9

LIB_2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

City Hub_VIJAYAWADA

5.145205

02-08-2021

Medium

4.228925

Clean

100

10.30895

10

LIB_3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

Metro Reads_CHENNAI

4.065753

03-09-2022

Medium

4.307081

Clean

100

10.71442

11

LIB_4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

Knowledge Zone_BANGALORE

2.978082

04-10-2023

High

3.179175

Clean

100

11.0021

12

LIB_5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

Urban Library_DELHI

5.893151

05-11-2020

Medium

4.741454

Clean

100

10.91509

13

LIB_6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

Smart Reads_MUMBAI

4.805479

06-12-2021

High

3.896237

Clean

100

10.59663

14

LIB_7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

Book Galaxy_KOLKATA

3.720548

07-01-2023

High

3.470136

Fraud

100

11.15625

15

LIB_8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

Readers Point_PUNE

2.632877

08-02-2024

Medium

4.801577

Clean

100

10.59663

16

LIB_9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

Study Sphere_JAIPUR

5.545205

09-03-2021

Medium

3.805064

Clean

100

10.54534

17

LIB_1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

Central Library_HYDERABAD

6.235616

01-07-2020

Medium

4.851899

Clean

100

10.81978

18

LIB_10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

Infinite Pages_LUCKNOW

4.460274

10-04-2022

Medium

4.358088

Clean

100

10.64542



18.Delete Dataset

rm(extra_data,new_data)

CONSOLE:

> rm(extra_data,new_data)

19.Final Clean Dataset Code

final_pipeline <- function(){

  df <- library_data

  

  df <- df %>%

    mutate(

      Books_Count=ifelse(Books_Count < 0 | is.na(Books_Count),40000,Books_Count),

      Members = ifelse(Members == 0,1500, Members),

      Digital_Usage = ifelse(Digital_Usage > 100,85, Digital_Usage),

      Operating_Cost = ifelse(Operating_Cost < 50000,150000,Operating_Cost)

    )

  

  df$Library_Name <- tools::toTitleCase(trimws(df$Library_Name))

  df$City <- toupper(df$City)

  df$Join_Date <- as.Date(df$Join_Date,"%d-%m-%Y")

  df$Utilization <- ifelse(df$Digital_Usage > 80,"High","Medium")

  df$Fraud_Flag <- ifelse(df$Operating_Cost > 240000,"Fraud","Clean")

  return(df)

}


final_dataset <- final_pipeline()

OUTPUT:

 

Library_Name

City

Books_Count

Members

Digital_Usage

Operating_Cost

Service_Score

Fees

Join_Date

Years_Active

Utilization

Fraud_Flag

1

Central Library

HYDERABAD

50000

2000

70

200000

4.5

100

01-01-2020

6

Medium

Clean

2

City Hub

VIJAYAWADA

30000

1500

60

150000

4.2

80

02-02-2021

5

Medium

Clean

3

Metro Reads

CHENNAI

45000

1800

75

180000

4.6

90

03-03-2022

4

Medium

Clean

4

Knowledge Zone

BANGALORE

60000

2200

80

220000

4.8

110

04-04-2023

2

Medium

Clean

5

Urban Library

DELHI

55000

2100

78

210000

4.7

105

05-05-2020

5

Medium

Clean

6

Smart Reads

MUMBAI

40000

1900

85

190000

4.9

95

06-06-2021

4

High

Clean

7

Book Galaxy

KOLKATA

70000

2500

90

250000

5

120

07-07-2022

3

High

Fraud

8

Readers Point

PUNE

40000

1700

65

170000

4.3

85

08-08-2023

2

Medium

Clean

9

Study Sphere

JAIPUR

38000

1600

68

160000

4.1

88

09-09-2020

5

Medium

Clean

10

Infinite Pages

LUCKNOW

42000

1750

72

175000

4.4

92

10-10-2021

4

Medium

Clean

11

Wisdom House

BHOPAL

46000

1850

74

185000

4.6

96

11-11-2022

3

Medium

Clean

12

Learners Den

NAGPUR

48000

1950

76

195000

4.7

98

12-12-2023

2

Medium

Clean

13

Digital Nest

KOCHI

52000

2050

77

205000

4.8

102

01-01-2024

2

Medium

Clean

14

Library x

INDORE

51000

2150

79

215000

4.9

108

02-02-2024

2

Medium

Clean

15

Book Corner

SURAT

40000

1500

85

150000

1

0

03-03-2024

2

High

Clean

16

Elite Library

PATNA

53000

2250

81

225000

5

115

04-04-2024

1

High

Clean

Summary

In this project, we created a public library dataset and intentionally added errors to simulate real-world messy data. We used R programming to detect issues like missing values, negative counts, and unrealistic usage. Then we cleaned the dataset using conditional logic and transformations. We applied character functions to standardize text, date functions to calculate durations, and numeric operations for deeper insights. We also performed dataset operations like merging, appending, and transposing. A custom fraud detection function helped identify suspicious records based on business rules. Finally, we built a reusable pipeline function to automate the entire process. This project shows how R can handle data cleaning, transformation, and fraud detection efficiently. It is useful for beginners and also valuable for interview preparation.

Conclusion

Advanced R programming provides a powerful and practical framework for handling real-world public library data challenges. In this project, we demonstrated how messy and inconsistent data such as missing values, invalid counts, and unrealistic usage metrics can be systematically detected using summary statistics and filtering techniques. With the help of vectorized transformations and conditional logic, the dataset was efficiently cleaned and standardized for reliable analysis. Date functions enabled time-based insights like active years and review scheduling, while classification logic helped segment libraries based on utilization patterns. Most importantly, a rule-based fraud detection approach allowed us to identify suspicious records, such as unusually high digital usage or abnormal operating costs. By integrating all these steps into a reusable pipeline, R proves to be highly effective for data validation, optimization, and fraud analytics. This approach is scalable, interview-ready, and directly applicable to real-world data-driven decision-making scenarios.


R INTERVIEW QUESTIONS

1. What is the difference between <- and = in R?

Answer:
Both are used for assignment, but <- is the standard and preferred operator in R for assigning values. = is mainly used inside function arguments.

2. What is a data frame in R?

Answer:
A data frame is a table-like structure where each column can have different data types (numeric, character, etc.), similar to a dataset in SAS or a table in SQL.

3. What is the use of %>% (pipe operator)?

Answer:
The pipe operator passes the output of one step into the next, making code more readable and easier to write compared to nested functions.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics ,SAS programming and R Language platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources and R Language.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent LIBRARY data.


Our Mission:

This blog provides industry-focused SAS & R programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS & R

·  Data analysts building portfolios

·  Professionals preparing for SAS & R interviews

·  Bloggers writing about analytics 

·  Clinical SAS & R Programmer

·  Research Data Analyst

·  Regulatory Data Validator

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS & R-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of R & SAS analytics, please refer to our other data science and industry-focused projects listed below:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy






Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

420.Can We Detect Errors, Prevent Fraud, And Optimize Biometric Access System Security Using Advanced SAS Programming?

418.Can We Design, Debug, Detect Fraud, And Optimize A Smart Parking System Using Advanced SAS Programming Techniques?