SAS Learning Hub: 41.KEEP AND DROP STATEMENTS

KEEP AND DROP STATEMENTS

KEEP Statement

Purpose: Specifies which variables to include in the output dataset.
Syntax: DATA new_dataset (KEEP = variable1, variable2, ...); SET original_dataset; RUN;

Example:

data new_data (KEEP = Age, Gender, Income);
set original_data;
run;
DATA A20;SET SASUSER.CLASS2;DROP DOB;RUN;PROC PRINT;RUN;
LOG:NOTE: There were 19 observations read from the data set SASUSER.CLASS2.
NOTE: The data set WORK.A20 has 19 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.42 seconds
      cpu time            0.04 seconds

NOTE: There were 19 observations read from the data set WORK.A20.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           1.82 seconds
      cpu time            0.40 seconds

RESULT:

Obs
Name
Sex
Age
Height
Weight
CLASS


1
Alfred
M
14
69
112.5
9

2
Alice
F
13
56.5
84
8

3
Barbara
F
13
65.3
98
8

4
Carol
F
14
62.8
102.5
9

5
Henry
M
14
63.5
102.5
9

6
James
M
12
57.3
83
7

7
Jane
F
12
59.8
84.5
7

8
Janet
F
15
62.5
112.5
10

9
Jeffrey
M
13
62.5
84
8

10
John
M
12
59
99.5
7

11
Joyce
F
11
51.3
50.5
6

12
Judy
F
14
64.3
90
9

13
Louise
F
12
56.3
77
7

14
Mary
F
15
66.5
112
10

15
Philip
M
16
72
150
11

16
Robert
M
12
64.8
128
7

17
Ronald
M
15
67
133
10

18
Thomas
M
11
57.5
85
6

19
William
M
15
66.5
112
10

Obs	Name	Sex	Age	Height	Weight	CLASS
1	Alfred	M	14	69	112.5	9
2	Alice	F	13	56.5	84	8
3	Barbara	F	13	65.3	98	8
4	Carol	F	14	62.8	102.5	9
5	Henry	M	14	63.5	102.5	9
6	James	M	12	57.3	83	7
7	Jane	F	12	59.8	84.5	7
8	Janet	F	15	62.5	112.5	10
9	Jeffrey	M	13	62.5	84	8
10	John	M	12	59	99.5	7
11	Joyce	F	11	51.3	50.5	6
12	Judy	F	14	64.3	90	9
13	Louise	F	12	56.3	77	7
14	Mary	F	15	66.5	112	10
15	Philip	M	16	72	150	11
16	Robert	M	12	64.8	128	7
17	Ronald	M	15	67	133	10
18	Thomas	M	11	57.5	85	6
19	William	M	15	66.5	112	10

DROP Statement

Purpose: Specifies which variables to exclude from the output dataset.
Syntax: DATA new_dataset (DROP = variable1, variable2, ...); SET original_dataset; RUN;

Example:

data new_data (DROP = Address, PhoneNumber, Email);
set original_data;
run;

DATA A21;SET SASUSER.CLASS2;KEEP NAME AGE;RUN;PROC PRINT;RUN;
LOG:NOTE: There were 19 observations read from the data set SASUSER.CLASS2.
NOTE: The data set WORK.A21 has 19 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds;

NOTE: There were 19 observations read from the data set WORK.A21.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.06 seconds
      cpu time            0.03 seconds

RESULT:



Obs
Name
Age


1
Alfred
14

2
Alice
13

3
Barbara
13

4
Carol
14

5
Henry
14

6
James
12

7
Jane
12

8
Janet
15

9
Jeffrey
13

10
John
12

11
Joyce
11

12
Judy
14

13
Louise
12

14
Mary
15

15
Philip
16

16
Robert
12

17
Ronald
15

18
Thomas
11

19
William
15

Obs	Name	Age
1	Alfred	14
2	Alice	13
3	Barbara	13
4	Carol	14
5	Henry	14
6	James	12
7	Jane	12
8	Janet	15
9	Jeffrey	13
10	John	12
11	Joyce	11
12	Judy	14
13	Louise	12
14	Mary	15
15	Philip	16
16	Robert	12
17	Ronald	15
18	Thomas	11
19	William	15

Key Points

Placement: KEEP and DROP statements are typically used within the DATA step.
Priority: If both KEEP and DROP are used for the same variable, DROP takes precedence.
Efficiency: KEEP can be more efficient when dealing with large datasets with many variables, as it avoids reading unnecessary data into memory.

KEEP and DROP can also be used in the SET statement to select variables before reading them into memory.

The Colon Operator as a Wildcard:

The colon operator, when used after a variable name, acts as a wildcard. It selects all variables starting with that specific prefix.

Example:

Suppose you have a dataset with variables like Age,Gender,Income,Agegroup,Gendercode and Incomelevel.

Keeping Variables Starting with "Age":

data new_data (KEEP=Age:);  set original_data;run;

This will keep Age and Agegroup in the new dataset.

Dropping Variables Starting with "Income":

data new_data (DROP=Income:);  set original_data;
run;

This will drop Income and Incomelevel.

Key Points:

Efficiency: Using the colon operator can be efficient when dealing with many variables, especially when you want to keep or drop a group of variables based on a common prefix.
Specificity: If you need more precise control over variable selection, you can combine the colon operator with explicit variable names. For example:

data new_data (KEEP=Age:, Gender, IncomeLevel);  set original_data;run;

SAS Learning Hub

Saturday, 23 November 2024

41.KEEP AND DROP STATEMENTS

KEEP AND DROP STATEMENTS

No comments:

Post a Comment

Popular Posts