454.The Hidden Power of SAS Dates: Building Smart Time Logic with INTNX and INTCK

Signal Chaos to Structured Insight: Mastering Free TV Dish Data with INTNX & INTCK in SAS

1. Introduction

Imagine you are working as a data analyst in a broadcasting company that tracks free TV channels available across different satellite dishes. Sounds simple, right? But when the dataset lands on your desk, reality hits hard.

You see:

  • Installation dates like 2023/15/01 (invalid)
  • Missing channel counts
  • Negative subscription durations (how is that even possible?)
  • Duplicate dish IDs
  • Country names like india, INDIA, Null, and blanks

Now imagine making business decisions on this data like expanding satellite coverage or pricing strategies. One wrong assumption, and millions are misallocated.

This is exactly why data cleaning is not optional it is foundational.

In industries like clinical trials (SDTM/ADaM) or telecom broadcasting, data must be:

  • Accurate
  • Standardized
  • Auditable

Tools like SAS and R are powerful not just for analysis but for engineering trustworthy data pipelines.

In this blog, we’ll take a messy “Free TV Channels in Dish Dataset”, intentionally break it, and then fix it step-by-step using:

  • SAS (with INTNX & INTCK functions)
  • R (tidyverse approach)

2. Raw Data Creation in SAS and R (With Intentional Errors)

SAS Raw Dataset

DATA tv_raw;

INPUT Dish_ID Channel_Name $ Country $ Subscribers 

Install_Date :$12. Duration_Months Rating Launch_Year;

DATALINES;

1 StarTV india 50000 2023-01-15 12 4.5 2023

2 ZeeTV INDIA . 2023/15/01 -5 4.2 2022

3 SunTV null 30000 2022-05-10 10 4.7 2022

4 NULL USA 45000 2021-12-01 8 . 2021

5 HBO usa 60000 2023-02-30 15 4.8 2023

5 HBO usa 60000 2023-02-30 15 4.8 2023

6 ESPN UK -20000 2022-07-20 6 4.3 2022

7 Discovery india 40000 . 9 4.6 2023

8 NatGeo INDIA 35000 2021-11-11 0 4.1 2021

9 CartoonNet USA 30000 2020-10-05 5 4.0 2020

10 SonyTV india 50000 2023-03-01 12 4.5 2023

11 ColorsTV india 48000 2023-03-15 12 4.6 2023

12 DDNational india 70000 2023-01-01 24 4.9 2023

13 AnimalPlanet UK 20000 2022-06-10 7 4.2 2022

14 StarSports india 65000 2023-04-01 12 4.8 2023

15 Pogo india 30000 2023-02-01 10 4.3 2023

;

RUN;

PROC PRINT DATA=tv_raw;

RUN;

OUTPUT:

ObsDish_IDChannel_NameCountrySubscribersInstall_DateDuration_MonthsRatingLaunch_Year
11StarTVindia500002023-01-15124.52023
22ZeeTVINDIA.2023/15/01-54.22022
33SunTVnull300002022-05-10104.72022
44NULLUSA450002021-12-018.2021
55HBOusa600002023-02-30154.82023
65HBOusa600002023-02-30154.82023
76ESPNUK-200002022-07-2064.32022
87Discoverindia40000 94.62023
98NatGeoINDIA350002021-11-1104.12021
109CartoonNUSA300002020-10-0554.02020
1110SonyTVindia500002023-03-01124.52023
1211ColorsTVindia480002023-03-15124.62023
1312DDNationindia700002023-01-01244.92023
1413AnimalPlUK200002022-06-1074.22022
1514StarSporindia650002023-04-01124.82023
1615Pogoindia300002023-02-01104.32023

Explanation

This dataset simulates real-world inconsistencies. We intentionally introduced errors such as invalid dates (2023-02-30), missing values (.), negative subscribers (-20000), duplicate records (Dish_ID=5), and inconsistent country formats (india, INDIA, null). These are typical in raw ingestion pipelines. SAS reads character and numeric values separately, and improper date formats are kept as strings initially. This setup allows us to demonstrate how cleaning transforms unreliable raw inputs into structured datasets suitable for analytics or regulatory reporting.

R Raw Dataset

tv_raw <- data.frame(

  Dish_ID = c(1,2,3,4,5,5,6,7,8,9,10,11,12,13,14,15),

  Channel_Name = c("StarTV","ZeeTV","SunTV","NULL","HBO",

                   "HBO","ESPN","Discovery","NatGeo","CartoonNet",

                   "SonyTV","ColorsTV","DDNational","AnimalPlanet",

                   "StarSports","Pogo"),

  Country = c("india","INDIA","null","USA","usa","usa","UK","india",

              "INDIA","USA","india","india","india","UK","india","india"),

  Subscribers = c(50000,NA,30000,45000,60000,60000,-20000,40000,35000,

                  30000,50000,48000,70000,20000,65000,30000),

  Install_Date = c("2023-01-15","2023/15/01","2022-05-10","2021-12-01",

                   "2023-02-30","2023-02-30","2022-07-20",NA,"2021-11-11",

                   "2020-10-05","2023-03-01","2023-03-15","2023-01-01",

                   "2022-06-10","2023-04-01","2023-02-01"),

  Duration_Months = c(12,-5,10,8,15,15,6,9,0,5,12,12,24,7,12,10),

  Rating = c(4.5,4.2,4.7,NA,4.8,4.8,4.3,4.6,4.1,4.0,4.5,4.6,4.9,4.2,

             4.8,4.3),

  Launch_Year = c(2023,2022,2022,2021,2023,2023,2022,2023,2021,2020,

                  2023,2023,2023,2022,2023,2023)

)

OUTPUT:

 

Dish_ID

Channel_Name

Country

Subscribers

Install_Date

Duration_Months

Rating

Launch_Year

1

1

StarTV

india

50000

15-01-2023

12

4.5

2023

2

2

ZeeTV

INDIA

NA

2023/15/01

-5

4.2

2022

3

3

SunTV

null

30000

10-05-2022

10

4.7

2022

4

4

NULL

USA

45000

01-12-2021

8

NA

2021

5

5

HBO

usa

60000

2023-02-30

15

4.8

2023

6

5

HBO

usa

60000

2023-02-30

15

4.8

2023

7

6

ESPN

UK

-20000

20-07-2022

6

4.3

2022

8

7

Discovery

india

40000

NA

9

4.6

2023

9

8

NatGeo

INDIA

35000

11-11-2021

0

4.1

2021

10

9

CartoonNet

USA

30000

05-10-2020

5

4

2020

11

10

SonyTV

india

50000

01-03-2023

12

4.5

2023

12

11

ColorsTV

india

48000

15-03-2023

12

4.6

2023

13

12

DDNational

india

70000

01-01-2023

24

4.9

2023

14

13

AnimalPlanet

UK

20000

10-06-2022

7

4.2

2022

15

14

StarSports

india

65000

01-04-2023

12

4.8

2023

16

15

Pogo

india

30000

01-02-2023

10

4.3

2023

Explanation

The R dataset mirrors the SAS structure but uses data.frame() for creation. Missing values are represented as NA, and character inconsistencies remain uncleaned. This dataset is ideal for applying dplyr transformations. Unlike SAS, R does not enforce strict typing during input, making it flexible but potentially risky if validation is skipped. The dataset demonstrates real-world ingestion issues such as inconsistent casing, duplicate rows, and invalid numeric values, which must be cleaned before any modeling or reporting.

3. Phase 1: Data Cleaning in SAS

PROC SORT DATA=tv_raw NODUPKEY OUT=tv_nodup;

BY Dish_ID;

RUN;

PROC PRINT DATA=tv_nodup;

RUN;

OUTPUT:

ObsDish_IDChannel_NameCountrySubscribersInstall_DateDuration_MonthsRatingLaunch_Year
11StarTVindia500002023-01-15124.52023
22ZeeTVINDIA.2023/15/01-54.22022
33SunTVnull300002022-05-10104.72022
44NULLUSA450002021-12-018.2021
55HBOusa600002023-02-30154.82023
66ESPNUK-200002022-07-2064.32022
77Discoverindia40000 94.62023
88NatGeoINDIA350002021-11-1104.12021
99CartoonNUSA300002020-10-0554.02020
1010SonyTVindia500002023-03-01124.52023
1111ColorsTVindia480002023-03-15124.62023
1212DDNationindia700002023-01-01244.92023
1313AnimalPlUK200002022-06-1074.22022
1414StarSporindia650002023-04-01124.82023
1515Pogoindia300002023-02-01104.32023

DATA tv_clean;

SET tv_nodup;

/* 1. Standardize Country */

Country = UPCASE(COALESCEC(Country, "UNKNOWN"));

IF Country = "NULL" THEN Country = "UNKNOWN";

/* 2. Fix Subscribers */

Subscribers = ABS(Subscribers);

IF Subscribers = . THEN Subscribers = 0;

/* 3. Fix Duration */

IF Duration_Months < 0 THEN Duration_Months = ABS(Duration_Months);

/* 4. Clean Install_Date STRING FIRST */

Install_Date_Clean = STRIP(Install_Date);

/* Replace / with - */

Install_Date_Clean = TRANWRD(Install_Date_Clean, "/", "-");

/* 5. Convert to SAS Date SAFELY */

Install_DT = INPUT(Install_Date_Clean, ANYDTDTE.);

/* 6. Handle INVALID DATES */

IF Install_DT = . THEN DO;

    Install_DT = '01JAN2023'd;

END;

FORMAT Install_DT DATE9.;

/* 7. INTNX (Now SAFE) */

Next_Renewal = INTNX('MONTH', Install_DT, Duration_Months, 'SAME');

/* 8. INTCK */

Months_Since = INTCK('MONTH', Install_DT, TODAY());

RUN;

PROC PRINT DATA=tv_clean;

RUN;

OUTPUT:

ObsDish_IDChannel_NameCountrySubscribersInstall_DateDuration_MonthsRatingLaunch_YearInstall_Date_CleanInstall_DTNext_RenewalMonths_Since
11StarTVINDIA500002023-01-15124.520232023-01-1501JAN20232337639
22ZeeTVINDIA02023/15/0154.220222023-15-0101JAN20232316239
33SunTVUNKNOWN300002022-05-10104.720222022-05-1001MAY20222307047
44NULLUSA450002021-12-018.20212021-12-0101JAN20232325439
55HBOUSA600002023-02-30154.820232023-02-3003FEB20232349938
66ESPNUK200002022-07-2064.320222022-07-2002JUL20222301245
77DiscoverINDIA40000 94.62023 01JAN20232328439
88NatGeoINDIA350002021-11-1104.120212021-11-1101NOV20212258553
99CartoonNUSA300002020-10-0554.020202020-10-0501JAN20232316239
1010SonyTVINDIA500002023-03-01124.520232023-03-0101JAN20232337639
1111ColorsTVINDIA480002023-03-15124.620232023-03-1501MAR20232343637
1212DDNationINDIA700002023-01-01244.920232023-01-0101JAN20232374239
1313AnimalPlUK200002022-06-1074.220222022-06-1001JUN20222301146
1414StarSporINDIA650002023-04-01124.820232023-04-0101JAN20232337639
1515PogoINDIA300002023-02-01104.320232023-02-0101JAN20232331539

Explanation

This step transforms raw data into analysis-ready format. PROC SORT NODUPKEY removes duplicate Dish_IDs. COALESCEC fills missing text values, while UPCASE standardizes casing. Numeric corrections use ABS() to handle negative values. Used TRANWRD() to standardize delimiters (/ → -). Then INPUT(..., ANYDTDTE.) attempts conversion. The critical part is date handling: INPUT(..., ANYDTDTE.) converts messy date strings into SAS date values. Then, INTNX calculates future dates (renewal cycle), while INTCK computes time intervals (months since installation). These functions are essential in domains like telecom billing and clinical follow-ups, where time-based derivations are critical.

Key Takeaways 

  • INTNX and INTCK NEVER work with missing dates
  • Always validate before using time functions
  • ANYDTDTE. is powerful but not foolproof
  • Always create fallback logic
  • In real projects → document imputation rules 

library(dplyr)

library(stringr)

library(lubridate)


tv_clean <- tv_raw %>%

  # 1. Remove duplicates

  distinct(Dish_ID, .keep_all = TRUE) %>% 

  # 2. Clean and standardize data

  mutate(    

    # Standardize Country

    Country = toupper(trimws(Country)),

    Country = ifelse(Country %in% c("NULL", "", NA), "UNKNOWN", Country),    

    # Fix Subscribers

    Subscribers = ifelse(is.na(Subscribers), 0, abs(Subscribers)),    

    # Fix Duration

    Duration_Months = abs(Duration_Months),

    # Clean Date String

    Install_Date = str_replace_all(Install_Date, "/", "-"), 

    # Convert to Date 

    Install_DT = suppressWarnings(ymd(Install_Date)),    

    # Handle invalid dates (NA)

    Install_DT = ifelse(is.na(Install_DT),

                        as.Date("2023-01-01"),

                        Install_DT)

  )

OUTPUT:

 

Dish_ID

Channel_Name

Country

Subscribers

Install_Date

Duration_Months

Rating

Launch_Year

Install_DT

1

1

StarTV

INDIA

50000

15-01-2023

12

4.5

2023

19372

2

2

ZeeTV

INDIA

0

2023-15-01

5

4.2

2022

19358

3

3

SunTV

UNKNOWN

30000

10-05-2022

10

4.7

2022

19122

4

4

NULL

USA

45000

01-12-2021

8

NA

2021

18962

5

5

HBO

USA

60000

2023-02-30

15

4.8

2023

19358

6

6

ESPN

UK

20000

20-07-2022

6

4.3

2022

19193

7

7

Discovery

INDIA

40000

NA

9

4.6

2023

19358

8

8

NatGeo

INDIA

35000

11-11-2021

0

4.1

2021

18942

9

9

CartoonNet

USA

30000

05-10-2020

5

4

2020

18540

10

10

SonyTV

INDIA

50000

01-03-2023

12

4.5

2023

19417

11

11

ColorsTV

INDIA

48000

15-03-2023

12

4.6

2023

19431

12

12

DDNational

INDIA

70000

01-01-2023

24

4.9

2023

19358

13

13

AnimalPlanet

UK

20000

10-06-2022

7

4.2

2022

19153

14

14

StarSports

INDIA

65000

01-04-2023

12

4.8

2023

19448

15

15

Pogo

INDIA

30000

01-02-2023

10

4.3

2023

19389

Explanation

In R, dplyr simplifies transformations. distinct() removes duplicates, mutate() handles transformations, and ifelse() manages conditional logic. toupper() and trimws() standardize text. Missing numeric values are replaced using is.na().Lubridate::ymd(), which is far more robust than base as.Date(). It can parse multiple formats and gracefully handle inconsistencies. We first standardize delimiters (/ → -) using str_replace_all(). Then ymd() converts the string into a proper Date object. Invalid dates (like February 30) become NA, which we explicitly replace with a default value (2023-01-01). This mirrors real-world imputation strategies used in regulated environments. suppressWarnings() avoids clutter from parsing issues, but in production, you would log those warnings for audit purposes. Compared to SAS, R offers more concise syntax but requires careful handling of type conversions and missing values to avoid silent errors.

5. Phase 3: Additional SAS Enhancements

DATA tv_enhanced;

SET tv_clean;

/* Calculate Year Difference */

Years_Since_Launch = INTCK('YEAR', MDY(1,1,Launch_Year), TODAY());

/* Beginning of Month */

Start_Month = INTNX('MONTH', Install_DT, 0, 'B');

/* End of Month */

End_Month = INTNX('MONTH', Install_DT, 0, 'E');

FORMAT Start_Month End_Month DATE9.;

RUN;

PROC PRINT DATA=tv_enhanced;

RUN;

OUTPUT:

ObsDish_IDChannel_NameCountrySubscribersInstall_DateDuration_MonthsRatingLaunch_YearInstall_Date_CleanInstall_DTNext_RenewalMonths_SinceYears_Since_LaunchStart_MonthEnd_Month
11StarTVINDIA500002023-01-15124.520232023-01-1501JAN20232337639301JAN202331JAN2023
22ZeeTVINDIA02023/15/0154.220222023-15-0101JAN20232316239401JAN202331JAN2023
33SunTVUNKNOWN300002022-05-10104.720222022-05-1001MAY20222307047401MAY202231MAY2022
44NULLUSA450002021-12-018.20212021-12-0101JAN20232325439501JAN202331JAN2023
55HBOUSA600002023-02-30154.820232023-02-3003FEB20232349938301FEB202328FEB2023
66ESPNUK200002022-07-2064.320222022-07-2002JUL20222301245401JUL202231JUL2022
77DiscoverINDIA40000 94.62023 01JAN20232328439301JAN202331JAN2023
88NatGeoINDIA350002021-11-1104.120212021-11-1101NOV20212258553501NOV202130NOV2021
99CartoonNUSA300002020-10-0554.020202020-10-0501JAN20232316239601JAN202331JAN2023
1010SonyTVINDIA500002023-03-01124.520232023-03-0101JAN20232337639301JAN202331JAN2023
1111ColorsTVINDIA480002023-03-15124.620232023-03-1501MAR20232343637301MAR202331MAR2023
1212DDNationINDIA700002023-01-01244.920232023-01-0101JAN20232374239301JAN202331JAN2023
1313AnimalPlUK200002022-06-1074.220222022-06-1001JUN20222301146401JUN202230JUN2022
1414StarSporINDIA650002023-04-01124.820232023-04-0101JAN20232337639301JAN202331JAN2023
1515PogoINDIA300002023-02-01104.320232023-02-0101JAN20232331539301JAN202331JAN2023

Explanation

Here we leverage INTCK and INTNX more deeply. INTCK('YEAR',...) calculates years since launch useful for lifecycle analysis. INTNX with 'B' and 'E' aligns dates to beginning and end of periods, which is crucial in reporting cycles (monthly billing, clinical visits). These functions are powerful because they abstract calendar logic, reducing manual errors. In industries like clinical trials, similar logic is used to derive visit windows and treatment durations.

6. 20 Data Cleaning Best Practices

  1. Always validate raw data before transformation
  2. Maintain audit trails for regulatory compliance
  3. Use controlled terminology (CDISC standards)
  4. Never overwrite raw datasets
  5. Validate date formats strictly
  6. Handle missing values explicitly
  7. Use macros for reusable cleaning logic
  8. Document every transformation step
  9. Apply consistency checks across datasets
  10. Use PROC COMPARE for validation
  11. Ensure traceability from SDTM to ADaM
  12. Validate key variables (USUBJID equivalent)
  13. Avoid hardcoding values
  14. Standardize text fields early
  15. Use validation reports
  16. Perform double programming (QC)
  17. Handle duplicates carefully
  18. Log all warnings and errors
  19. Ensure reproducibility
  20. Follow regulatory guidelines (FDA, EMA)

7. Business Logic Behind Cleaning

Data cleaning is not cosmetic it is decision-critical.

  • Missing subscribers → replaced with 0 to avoid inflated averages
  • Negative duration → corrected because time cannot be negative
  • Invalid dates → converted to ensure accurate time calculations

Example:
If a dish shows -20000 subscribers, revenue forecasting becomes meaningless.

In clinical trials:

  • Incorrect patient age → wrong safety conclusions
  • Wrong visit date → invalid efficacy analysis

8. 20 Key Sharp Insights

  • Dirty data leads to wrong conclusions
  • Cleaning is 70% of analytics work
  • Standardization ensures reproducibility
  • Dates drive timelines
  • Missing values distort models
  • Validation prevents disasters
  • Duplicate data inflates metrics
  • SAS ensures auditability
  • R ensures flexibility
  • INTNX simplifies time shifts
  • INTCK simplifies intervals
  • Data lineage is critical
  • QC is non-negotiable
  • Metadata drives clarity
  • Structured data = trusted insights
  • Automation reduces errors
  • Logs are your best friend
  • Always test edge cases
  • Clean data builds confidence
  • Accuracy beats speed

9. Summary (SAS vs R)

Feature

SAS

R

Auditability

Strong

Moderate

Flexibility

Moderate

High

Regulatory Use

Preferred

Limited

Syntax

Structured

Concise

Date Handling

Powerful (INTNX/INTCK)

Flexible but manual

10. Conclusion

Data cleaning is not just preprocessing it is data engineering discipline. Whether you work in telecom, finance, or clinical trials, structured cleaning frameworks ensure:

  • Accuracy
  • Reliability
  • Regulatory compliance

Mastering INTNX and INTCK in SAS gives you a serious edge in handling time-based data something that appears in almost every real-world dataset.

11. Interview Questions

Q1: Why use INTNX instead of adding days manually?

Answer: INTNX respects calendar boundaries (months, years), avoiding errors like month overflow.

Q2: How do you handle invalid dates in SAS?

Answer: Use INPUT(..., ANYDTDTE.) and validate using IF Install_DT = .

Q3: Difference between INTCK and INTNX?

Answer:

  • INTCK → counts intervals
  • INTNX → shifts dates

Q4: How do you validate duplicates?

Answer: Use PROC SORT NODUPKEY and compare counts before/after.

Q5: R vs SAS for cleaning?

Answer:

  • SAS → regulated environments
  • R → exploratory analysis

 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About the Author:

SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.


Disclaimer:

The datasets and analysis in this article are created for educational and demonstration purposes only. Here we learn about TV DISH DATA.


Our Mission:

This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.


This project is suitable for:

·  Students learning SAS

·  Data analysts building portfolios

·  Professionals preparing for SAS interviews

·  Bloggers writing about analytics and smart cities

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On : 


 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:



--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact Privacy Policy


 


Comments

Popular posts from this blog

412.Can We Build And Clean A University Course Analytics & Fraud Detection System In Sas Using Only Macros While Intentionally Creating And Fixing Errors?

420.Can We Detect Errors, Prevent Fraud, And Optimize Biometric Access System Security Using Advanced SAS Programming?

453.Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS