Detecting and Fixing Errors in Astronomical Observation Data Using Advanced SAS Programming for Better Accuracy and Reliability

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA | SET | INPUT | DATALINES | IF | OUTPUT | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | RUN | %MACRO | %MEND | FUNCTIONS | R LANGUAGE

Introduction
Business Context
Raw Dataset Design (SAS & R Same)
Creating Dataset with Intentional Errors
Code 1–15+ (Using PROG1 Statements)
Error Identification & Explanation
Data Cleaning & Correction
Final Clean Dataset Code
Advanced Insights
20 Key Points About The Project
Summary
Conclusion

1. Introduction

Astronomical observation data plays a crucial role in scientific discovery. However, raw telescope data often contains inconsistencies such as missing values, incorrect formats, and unrealistic measurements. This project demonstrates how Advanced SAS Programming can identify, correct, and optimize such errors.

Astronomical observation data is critical for understanding celestial objects and cosmic events. However, raw data collected from telescopes often contains inconsistencies such as missing values, incorrect measurements, and logical errors. These issues can significantly impact the accuracy of scientific analysis. Advanced SAS programming provides robust tools and techniques to detect, correct, and standardize such data efficiently. In this project, we simulate real-world astronomical data with intentional errors and apply SAS-based data cleaning methods. The objective is to transform unreliable raw data into a structured and accurate dataset, ensuring better reliability, consistency, and usability for researchers and analysts in space science.

2. Business Context

Space research organizations depend on high-quality observational data. Errors in signal readings or confidence scores can lead to incorrect conclusions about celestial objects.

This project simulates a space analytics system where:

Telescopes collect raw data
Data contains noise/errors
SAS is used to clean and validate

Space research organizations and observatories rely heavily on accurate observational data for decision-making and scientific discoveries. Errors in telescope data, such as incorrect intensity readings or invalid confidence levels, can lead to misleading conclusions about celestial objects. Therefore, maintaining high-quality data is essential. In this context, Advanced SAS programming acts as a powerful solution to identify, validate, and correct such errors systematically. This project represents a data processing workflow where raw astronomical data is cleaned and optimized before analysis. The goal is to improve data integrity, reduce uncertainty, and support reliable insights, ultimately enhancing the efficiency and credibility of astronomical research operations.

Goal:
👉 Improve data accuracy
👉 Ensure scientific reliability
👉 Enable decision-making for astronomers

3. Raw Dataset (SAS & R SAME STRUCTURE)

DATA astro_raw;

INPUT Telescope_ID $ Observation_Time :DATETIME20.

Light_Intensity Signal_Noise_Ratio Object_Type $

Detection_Confidence Percentage Fees;

FORMAT Observation_Time DATETIME20.;

DATALINES;

T001 01JAN2025:10:00:00 500 20 STAR 95 85 1000

T002 01JAN2025:11:00:00 . 15 GALAXY 90 80 1200

T003 01JAN2025:12:00:00 700 . NEBULA 88 78 1100

T004 01JAN2025:13:00:00 -100 25 STAR 105 90 1300

T005 01JAN2025:14:00:00 600 18 UNKNOWN 85 . 900

T006 01JAN2025:15:00:00 550 20 STAR . 82 1000

T007 01JAN2025:16:00:00 800 30 GALAXY 92 88 .

T008 01JAN2025:17:00:00 650 22 NEBULA 87 85 1150

T009 01JAN2025:18:00:00 720 28 STAR 91 89 1250

T010 01JAN2025:19:00:00 680 26 GALAXY 89 87 1180

;

RUN;

Proc print data=astro_raw;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000
2	T002	01JAN2025:11:00:00	.	15	GALAXY	90	80	1200
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100
4	T004	01JAN2025:13:00:00	-100	25	STAR	105	90	1300
5	T005	01JAN2025:14:00:00	600	18	UNKNOWN	85	.	900
6	T006	01JAN2025:15:00:00	550	20	STAR	.	82	1000
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	.
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180

astro_raw <- data.frame(

Telescope_ID = c("T001","T002","T003","T004","T005","T006","T007",

"T008","T009","T010"),

Observation_Time = as.POSIXct(c(

"2025-01-01 10:00:00","2025-01-01 11:00:00","2025-01-01 12:00:00",

"2025-01-01 13:00:00","2025-01-01 14:00:00","2025-01-01 15:00:00",

"2025-01-01 16:00:00","2025-01-01 17:00:00","2025-01-01 18:00:00",

"2025-01-01 19:00:00")),

Light_Intensity = c(500, NA, 700, -100, 600, 550, 800, 650, 720, 680),

Signal_Noise_Ratio = c(20,15,NA,25,18,20,30,22,28,26),

Object_Type = c("STAR","GALAXY","NEBULA","STAR","UNKNOWN","STAR","GALAXY",

"NEBULA","STAR","GALAXY"),

Detection_Confidence = c(95,90,88,105,85,NA,92,87,91,89),

Percentage = c(85,80,78,90,NA,82,88,85,89,87),

Fees = c(1000,1200,1100,1300,900,1000,NA,1150,1250,1180)

OUTPUT:

	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	2025-01-01 10:00:00	500	20	STAR	95	85	1000
2	T002	2025-01-01 11:00:00	NA	15	GALAXY	90	80	1200
3	T003	2025-01-01 12:00:00	700	NA	NEBULA	88	78	1100
4	T004	2025-01-01 13:00:00	-100	25	STAR	105	90	1300
5	T005	2025-01-01 14:00:00	600	18	UNKNOWN	85	NA	900
6	T006	2025-01-01 15:00:00	550	20	STAR	NA	82	1000
7	T007	2025-01-01 16:00:00	800	30	GALAXY	92	88	NA
8	T008	2025-01-01 17:00:00	650	22	NEBULA	87	85	1150
9	T009	2025-01-01 18:00:00	720	28	STAR	91	89	1250
10	T010	2025-01-01 19:00:00	680	26	GALAXY	89	87	1180

4. Intentional Errors

Error Type	Example
Missing values	Light_Intensity = .
Negative values	-100
Invalid range	Detection_Confidence = 105
Unknown category	UNKNOWN
Missing fees	.

5. SAS Codes (PROG1 Based)

Code 1: Sorting Data

PROC SORT DATA=astro_raw OUT=astro_sorted;

BY Telescope_ID;

RUN;

Proc print data=astro_sorted;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000
2	T002	01JAN2025:11:00:00	.	15	GALAXY	90	80	1200
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100
4	T004	01JAN2025:13:00:00	-100	25	STAR	105	90	1300
5	T005	01JAN2025:14:00:00	600	18	UNKNOWN	85	.	900
6	T006	01JAN2025:15:00:00	550	20	STAR	.	82	1000
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	.
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180

Explanation

Sorts dataset for structured processing.

Why used

· Required before BY processing

· Improves readability

Code 2: Handling Missing Light Intensity

DATA astro_fix1;

SET astro_sorted;

IF Light_Intensity=. THEN Light_Intensity=MEAN(500,600,700);

RUN;

Proc print data=astro_fix1;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100
4	T004	01JAN2025:13:00:00	-100	25	STAR	105	90	1300
5	T005	01JAN2025:14:00:00	600	18	UNKNOWN	85	.	900
6	T006	01JAN2025:15:00:00	550	20	STAR	.	82	1000
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	.
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180

Explanation

Replaces missing values with average.

Why

· Avoids bias

· Maintains continuity

Code 3: Fix Negative Values

DATA astro_fix2;

SET astro_fix1;

IF Light_Intensity < 0 THEN Light_Intensity=ABS(Light_Intensity);

RUN;

Proc print data=astro_fix2;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100
4	T004	01JAN2025:13:00:00	100	25	STAR	105	90	1300
5	T005	01JAN2025:14:00:00	600	18	UNKNOWN	85	.	900
6	T006	01JAN2025:15:00:00	550	20	STAR	.	82	1000
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	.
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180

Explanation

Converts negative values to positive.

Why

· Physical measurements cannot be negative

Code 4: Fix Detection Confidence

DATA astro_fix3;

SET astro_fix2;

IF Detection_Confidence > 100 THEN Detection_Confidence=100;

RUN;

Proc print data=astro_fix3;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100
4	T004	01JAN2025:13:00:00	100	25	STAR	100	90	1300
5	T005	01JAN2025:14:00:00	600	18	UNKNOWN	85	.	900
6	T006	01JAN2025:15:00:00	550	20	STAR	.	82	1000
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	.
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180

Code 5: Replace Missing Confidence

DATA astro_fix4;

SET astro_fix3;

IF Detection_Confidence=. THEN Detection_Confidence=90;

RUN;

Proc print data=astro_fix4;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100
4	T004	01JAN2025:13:00:00	100	25	STAR	100	90	1300
5	T005	01JAN2025:14:00:00	600	18	UNKNOWN	85	.	900
6	T006	01JAN2025:15:00:00	550	20	STAR	90	82	1000
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	.
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180

Code 6: Fix Percentage

DATA astro_fix5;

SET astro_fix4;

IF Percentage=. THEN Percentage=80;

RUN;

Proc print data=astro_fix5;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100
4	T004	01JAN2025:13:00:00	100	25	STAR	100	90	1300
5	T005	01JAN2025:14:00:00	600	18	UNKNOWN	85	80	900
6	T006	01JAN2025:15:00:00	550	20	STAR	90	82	1000
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	.
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180

Code 7: Fix Fees

DATA astro_fix6;

SET astro_fix5;

IF Fees=. THEN Fees=1000;

RUN;

Proc print data=astro_fix6;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100
4	T004	01JAN2025:13:00:00	100	25	STAR	100	90	1300
5	T005	01JAN2025:14:00:00	600	18	UNKNOWN	85	80	900
6	T006	01JAN2025:15:00:00	550	20	STAR	90	82	1000
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	1000
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180

Code 8: Clean Object Type

DATA astro_fix7;

SET astro_fix6;

IF Object_Type="UNKNOWN" THEN Object_Type="STAR";

RUN;

Proc print data=astro_fix7;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100
4	T004	01JAN2025:13:00:00	100	25	STAR	100	90	1300
5	T005	01JAN2025:14:00:00	600	18	STAR	85	80	900
6	T006	01JAN2025:15:00:00	550	20	STAR	90	82	1000
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	1000
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180

Code 9: Create New Variable

DATA astro_fix8;

SET astro_fix7;

Quality_Score = (Light_Intensity * Detection_Confidence)/100;

RUN;

Proc print data=astro_fix8;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees	Quality_Score
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000	475.0
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200	540.0
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100	616.0
4	T004	01JAN2025:13:00:00	100	25	STAR	100	90	1300	100.0
5	T005	01JAN2025:14:00:00	600	18	STAR	85	80	900	510.0
6	T006	01JAN2025:15:00:00	550	20	STAR	90	82	1000	495.0
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	1000	736.0
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150	565.5
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250	655.2
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180	605.2

Code 10: Filtering High Quality

DATA astro_high;

SET astro_fix8;

IF Quality_Score > 500;

RUN;

Proc print data=astro_high;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees	Quality_Score
1	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200	540.0
2	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100	616.0
3	T005	01JAN2025:14:00:00	600	18	STAR	85	80	900	510.0
4	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	1000	736.0
5	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150	565.5
6	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250	655.2
7	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180	605.2

Code 11: PROC MEANS

PROC MEANS DATA=astro_fix8;

VAR Light_Intensity Signal_Noise_Ratio;

RUN;

OUTPUT:

The MEANS Procedure


Variable	N	Mean	Std Dev	Minimum	Maximum
Light_Intensity Signal_Noise_Ratio	10 9	590.0000000 22.6666667	192.8730152 4.9244289	100.0000000 15.0000000	800.0000000 30.0000000

Code 12: PROC FREQ

PROC FREQ DATA=astro_fix8;

TABLES Object_Type;

RUN;

OUTPUT:

The FREQ Procedure


Object_Type	Frequency	Percent	Cumulative Frequency	Cumulative Percent
GALAXY	3	30.00	3	30.00
NEBULA	2	20.00	5	50.00
STAR	5	50.00	10	100.00

Code 13: PROC TRANSPOSE

PROC TRANSPOSE DATA=astro_fix8 OUT=astro_trans;

VAR Light_Intensity Signal_Noise_Ratio;

RUN;

Proc print data=astro_trans;

run;

OUTPUT:


Obs	_NAME_	COL1	COL2	COL3	COL4	COL5	COL6	COL7	COL8	COL9	COL10
1	Light_Intensity	500	600	700	100	600	550	800	650	720	680
2	Signal_Noise_Ratio	20	15	.	25	18	20	30	22	28	26

Code 14: Macro Creation

%MACRO summary;

PROC MEANS DATA=astro_fix8;

RUN;

%MEND;

%summary;

OUTPUT:

The MEANS Procedure


Variable	N	Mean	Std Dev	Minimum	Maximum
Observation_Time Light_Intensity Signal_Noise_Ratio Detection_Confidence Percentage Fees Quality_Score	10 10 9 10 10 10 10	2051361000 590.0000000 22.6666667 90.7000000 84.4000000 1108.00 529.7900000	10899.54 192.8730152 4.9244289 4.2700507 4.1952354 129.4260836 170.6431914	2051344800 100.0000000 15.0000000 85.0000000 78.0000000 900.0000000 100.0000000	2051377200 800.0000000 30.0000000 100.0000000 90.0000000 1300.00 736.0000000

Code 15: Append Dataset

PROC APPEND BASE=astro_fix8

DATA=astro_high;

RUN;

Proc print data=astro_fix8;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees	Quality_Score
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000	475.0
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200	540.0
3	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100	616.0
4	T004	01JAN2025:13:00:00	100	25	STAR	100	90	1300	100.0
5	T005	01JAN2025:14:00:00	600	18	STAR	85	80	900	510.0
6	T006	01JAN2025:15:00:00	550	20	STAR	90	82	1000	495.0
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	1000	736.0
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150	565.5
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250	655.2
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180	605.2
11	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200	540.0
12	T003	01JAN2025:12:00:00	700	.	NEBULA	88	78	1100	616.0
13	T005	01JAN2025:14:00:00	600	18	STAR	85	80	900	510.0
14	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	1000	736.0
15	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150	565.5
16	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250	655.2
17	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180	605.2

6. Error Explanation

Example Errors:

Missing Light_Intensity

Causes incorrect analysis

Negative values

Physically impossible

Confidence >100

Logical inconsistency

Unknown Object

Classification issue

7. Corrected Final Dataset (FULL CODE)

DATA astro_final;

SET astro_raw;

IF Light_Intensity=. THEN Light_Intensity=600;

IF Light_Intensity<0 THEN Light_Intensity=ABS(Light_Intensity);

IF Signal_Noise_Ratio=. THEN Signal_Noise_Ratio=20;

IF Detection_Confidence=. THEN Detection_Confidence=90;

IF Detection_Confidence>100 THEN Detection_Confidence=100;

IF Percentage=. THEN Percentage=80;

IF Fees=. THEN Fees=1000;

IF Object_Type="UNKNOWN" THEN Object_Type="STAR";

Quality_Score = (Light_Intensity * Detection_Confidence)/100;

RUN;

Proc print data=astro_final;

run;

OUTPUT:


Obs	Telescope_ID	Observation_Time	Light_Intensity	Signal_Noise_Ratio	Object_Type	Detection_Confidence	Percentage	Fees	Quality_Score
1	T001	01JAN2025:10:00:00	500	20	STAR	95	85	1000	475.0
2	T002	01JAN2025:11:00:00	600	15	GALAXY	90	80	1200	540.0
3	T003	01JAN2025:12:00:00	700	20	NEBULA	88	78	1100	616.0
4	T004	01JAN2025:13:00:00	100	25	STAR	100	90	1300	100.0
5	T005	01JAN2025:14:00:00	600	18	STAR	85	80	900	510.0
6	T006	01JAN2025:15:00:00	550	20	STAR	90	82	1000	495.0
7	T007	01JAN2025:16:00:00	800	30	GALAXY	92	88	1000	736.0
8	T008	01JAN2025:17:00:00	650	22	NEBULA	87	85	1150	565.5
9	T009	01JAN2025:18:00:00	720	28	STAR	91	89	1250	655.2
10	T010	01JAN2025:19:00:00	680	26	GALAXY	89	87	1180	605.2

Explanation of Final Code

· Combines all corrections in one step

· Improves performance

· Ensures clean dataset

8. Advanced Insights

High Light_Intensity → Better detection
GALAXY observations → Higher noise
Quality Score helps ranking

9. 20 Key Points About The Project

Astronomical datasets often contain missing and inconsistent values.
Raw telescope data may include noise and measurement errors.
Advanced SAS helps in identifying data quality issues efficiently.
DATA step is used for row-level data manipulation and corrections.
SET statement allows reading and processing existing datasets.
IF conditions help detect logical errors like invalid ranges.
Missing values can be imputed using MEAN or fixed values.
Negative measurements are corrected using ABS functions.
PROC SORT organizes data for structured processing.
PROC MEANS provides statistical summaries for validation.
PROC FREQ helps analyze categorical inconsistencies.
Invalid values (e.g., confidence >100) are capped logically.
Unknown categories can be standardized for consistency.
Derived variables like Quality_Score improve analysis depth.
PROC TRANSPOSE helps reshape data for reporting needs.
Macros automate repetitive validation and reporting tasks.
PROC APPEND combines datasets for extended analysis.
Data cleaning improves accuracy and reduces bias.
Clean datasets enable reliable scientific conclusions.
SAS ensures scalability and efficiency in data processing workflows.

10. Summary

This project demonstrates how Advanced SAS Programming can effectively detect and correct errors in astronomical observation data. By creating a raw dataset with intentional issues such as missing values, negative measurements, and invalid ranges, we simulated real-world data challenges. Using PROG1 statements, we applied structured techniques like data step transformations, conditional logic, and procedures such as PROC SORT, PROC MEANS, and PROC FREQ to clean and analyze the data. Each error was identified and corrected systematically, ensuring the dataset became accurate and reliable. Additionally, derived variables like Quality_Score provided deeper analytical insights. The final dataset was optimized for performance and usability, making it suitable for scientific analysis and reporting. This approach highlights the importance of data validation, cleaning, and transformation in ensuring high-quality results in astronomical research and other data-driven domains.

11. Conclusion

Advanced SAS programming proves to be a powerful solution for detecting and correcting errors in astronomical observation data. By systematically identifying issues such as missing values, negative measurements, and invalid ranges, SAS enables the transformation of unreliable raw data into a clean and structured format. Techniques like DATA step processing, conditional logic, and procedures such as PROC MEANS and PROC FREQ play a crucial role in validating and improving data quality. Additionally, the creation of derived variables enhances analytical capabilities and provides deeper insights into observations. The use of macros further improves efficiency by automating repetitive tasks. Overall, this approach ensures that the final dataset is accurate, consistent, and suitable for scientific analysis. As a result, researchers and organizations can rely on high-quality data to make informed decisions, reduce uncertainty, and improve the credibility and effectiveness of astronomical research and discoveries.

This project clearly shows:

✔ SAS can detect errors
✔ SAS can correct inconsistencies
✔ Data quality improves significantly
✔ Reliable insights are generated

INTERVIEW QUESTIONS FOR YOU

1. WHERE vs. IF Statements

Question: What is the difference between a `WHERE` statement and an `IF` statement in a DATA step?
Short Answer: A `WHERE` statement filters data before it enters the Program Data Vector (PDV), making it more efficient because SAS doesn't have to process every row. An `IF` statement filters data after it is in the PDV. I use `IF` when I need to filter based on a variable I just created in that same DATA step, which `WHERE` cannot see.

2. Character to Numeric Conversion (PUT vs. INPUT)

Question: How do you remember when to use `PUT` versus `INPUT` functions?
Short Answer: I use `INPUT()` for In-put (converting Character → Numeric) using an informat. I use `PUT()` for Out-put (converting Numeric → Character) using a format. A simple trick is: Input = Informat, and Put = Format.

3. PROC APPEND vs. SET Statement

Question: Why would you use `PROC APPEND` instead of a `SET` statement to combine two datasets?

Short Answer: I use `PROC APPEND` when I want to add a small 'base' dataset to a very large 'master' dataset. Unlike a `SET` statement, which reads and rewrites the entire combined file, `PROC APPEND` only processes the new data and adds it to the end. This saves a massive amount of time and computer memory (I/O).

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
About the Author:
SAS Learning Hub is a data analytics and SAS programming platform focused on clinical, financial, and real-world data analysis. The content is created by professionals with academic training in Pharmaceutics and hands-on experience in Base SAS, PROC SQL, Macros, SDTM, and ADaM, providing practical and industry-relevant SAS learning resources.

Disclaimer:
The datasets and analysis in this article are created for educational and demonstration purposes only. They do not represent ASTRONOMICAL data.

Our Mission:
This blog provides industry-focused SAS programming tutorials and analytics projects covering finance, healthcare, and technology.

This project is suitable for:
·  Students learning SAS
·  Data analysts building portfolios
·  Professionals preparing for SAS interviews
·  Bloggers writing about analytics and smart cities
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Follow Us On :

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--->Follow our blog for more SAS-based analytics projects and industry data models.

---> Support Us By Following Our Blog..

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

1.Is there a strong relationship between employment rate and poverty reduction across states?A Complete Sas Study
2.Which mobile payment apps handle the highest number of transactions, and are they truly the most reliable?
3.Which namkeen products sell the most, and what does SAS data creation reveal about customer taste?A Sas Study
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

About Us | Contact | Privacy Policy

Search This Blog

SAS Learning Hub

Can Advanced SAS Programming Detect and Fix Errors in Astronomical Observation Data While Improving Accuracy and Reliability?

Detecting and Fixing Errors in Astronomical Observation Data Using Advanced SAS Programming for Better Accuracy and Reliability

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA | SET | INPUT | DATALINES | IF | OUTPUT | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | RUN | %MACRO | %MEND | FUNCTIONS | R LANGUAGE

Table of Contents

Explanation

Sorts dataset for structured processing.

Why used

Explanation

Why

Explanation

Why

6. Error Explanation

Example Errors:

8. Advanced Insights

9. 20 Key Points About The Project

11. Conclusion

INTERVIEW QUESTIONS FOR YOU

1. WHERE vs. IF Statements

2. Character to Numeric Conversion (PUT vs. INPUT)

Question: How do you remember when to use `PUT` versus `INPUT` functions?
Short Answer: I use `INPUT()` for In-put (converting Character → Numeric) using an informat. I use `PUT()` for Out-put (converting Numeric → Character) using a format. A simple trick is: Input = Informat, and Put = Format.

3. PROC APPEND vs. SET Statement

Follow Us On :

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

About Us | Contact | Privacy Policy

Comments

Post a Comment

Popular posts from this blog

Beyond Fabric and Fashion: Turning the World’s Most Beautiful Sarees Dataset into Structured Intelligence with SAS and R

Data Cleaning Secrets Using Famous Food Dataset:Handling Duplicate Records in SAS

Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS

Can Advanced SAS Programming Detect and Fix Errors in Astronomical Observation Data While Improving Accuracy and Reliability?

Detecting and Fixing Errors in Astronomical Observation Data Using Advanced SAS Programming for Better Accuracy and Reliability

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS — DATA | SET | INPUT | DATALINES | IF | OUTPUT | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | RUN | %MACRO | %MEND | FUNCTIONS | R LANGUAGE

Table of Contents

Explanation

Sorts dataset for structured processing.

Why used

Explanation

Why

Explanation

Why

6. Error Explanation

Example Errors:

8. Advanced Insights

9. 20 Key Points About The Project

11. Conclusion

INTERVIEW QUESTIONS FOR YOU

1. WHERE vs. IF Statements

2. Character to Numeric Conversion (PUT vs. INPUT)

Question: How do you remember when to use PUT versus INPUT functions?Short Answer: I use INPUT() for In-put (converting Character → Numeric) using an informat. I use PUT() for Out-put (converting Numeric → Character) using a format. A simple trick is: Input = Informat, and Put = Format.

3. PROC APPEND vs. SET Statement

Follow Us On :

To deepen your understanding of SAS analytics, please refer to our other data science and industry-focused projects listed below:

About Us | Contact | Privacy Policy

Comments

Post a Comment

Popular posts from this blog

Beyond Fabric and Fashion: Turning the World’s Most Beautiful Sarees Dataset into Structured Intelligence with SAS and R

Data Cleaning Secrets Using Famous Food Dataset:Handling Duplicate Records in SAS

Global AI Trends Unlocked Through SCAN and SUBSTR Precision in SAS

HERE IN THIS PROJECT WE USED THESE SAS STATEMENTS —
DATA | SET | INPUT | DATALINES | IF | OUTPUT | PROC SORT | PROC APPEND | PROC TRANSPOSE | PROC DATASETS | RUN | %MACRO | %MEND | FUNCTIONS | R LANGUAGE