- Get link
- X
- Other Apps
KEEP AND DROP STATEMENTS
KEEP Statement
- Purpose: Specifies which variables to include in the output dataset.
- Syntax:
DATA new_dataset (KEEP = variable1, variable2, ...); SET original_dataset; RUN;
Example:
data new_data (KEEP = Age, Gender, Income);
set original_data;
run;
DATA A20;
SET SASUSER.CLASS2;
DROP DOB;
RUN;
PROC PRINT;
RUN;
LOG:
NOTE: There were 19 observations read from the data set SASUSER.CLASS2.
NOTE: The data set WORK.A20 has 19 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.42 seconds
cpu time 0.04 seconds
NOTE: There were 19 observations read from the data set WORK.A20.
NOTE: PROCEDURE PRINT used (Total process time):
real time 1.82 seconds
cpu time 0.40 seconds
RESULT:
Obs
Name
Sex
Age
Height
Weight
CLASS
1
Alfred
M
14
69
112.5
9
2
Alice
F
13
56.5
84
8
3
Barbara
F
13
65.3
98
8
4
Carol
F
14
62.8
102.5
9
5
Henry
M
14
63.5
102.5
9
6
James
M
12
57.3
83
7
7
Jane
F
12
59.8
84.5
7
8
Janet
F
15
62.5
112.5
10
9
Jeffrey
M
13
62.5
84
8
10
John
M
12
59
99.5
7
11
Joyce
F
11
51.3
50.5
6
12
Judy
F
14
64.3
90
9
13
Louise
F
12
56.3
77
7
14
Mary
F
15
66.5
112
10
15
Philip
M
16
72
150
11
16
Robert
M
12
64.8
128
7
17
Ronald
M
15
67
133
10
18
Thomas
M
11
57.5
85
6
19
William
M
15
66.5
112
10
DROP Statement
- Purpose: Specifies which variables to exclude from the output dataset.
- Syntax:
DATA new_dataset (DROP = variable1, variable2, ...); SET original_dataset; RUN;
Example:
data new_data (DROP = Address, PhoneNumber, Email);
set original_data;
run;
DATA A21;
SET SASUSER.CLASS2;
KEEP NAME AGE;
RUN;
PROC PRINT;
RUN;
LOG:
NOTE: There were 19 observations read from the data set SASUSER.CLASS2.
NOTE: The data set WORK.A21 has 19 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds;
NOTE: There were 19 observations read from the data set WORK.A21.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.03 seconds
RESULT:
Obs | Name | Age |
---|---|---|
1 | Alfred | 14 |
2 | Alice | 13 |
3 | Barbara | 13 |
4 | Carol | 14 |
5 | Henry | 14 |
6 | James | 12 |
7 | Jane | 12 |
8 | Janet | 15 |
9 | Jeffrey | 13 |
10 | John | 12 |
11 | Joyce | 11 |
12 | Judy | 14 |
13 | Louise | 12 |
14 | Mary | 15 |
15 | Philip | 16 |
16 | Robert | 12 |
17 | Ronald | 15 |
18 | Thomas | 11 |
19 | William | 15 |
Key Points
- Placement: KEEP and DROP statements are typically used within the
DATA
step. - Priority: If both KEEP and DROP are used for the same variable, DROP takes precedence.
- Efficiency: KEEP can be more efficient when dealing with large datasets with many variables, as it avoids reading unnecessary data into memory.
- KEEP and DROP can also be used in the SET statement to select variables before reading them into memory.
The Colon Operator as a Wildcard:
The colon operator, when used after a variable name, acts as a wildcard. It selects all variables starting with that specific prefix.
Example:
Suppose you have a dataset with variables like Age,Gender,Income,Agegroup,Gendercode and Incomelevel.
Keeping Variables Starting with "Age":
data new_data (KEEP=Age:);
set original_data;
run;
This will keep Age and Agegroup in the new dataset.
Dropping Variables Starting with "Income":
data new_data (DROP=Income:);
set original_data;
run;
This will drop Income and Incomelevel.
Key Points:
- Efficiency: Using the colon operator can be efficient when dealing with many variables, especially when you want to keep or drop a group of variables based on a common prefix.
- Specificity: If you need more precise control over variable selection, you can combine the colon operator with explicit variable names. For example:
data new_data (KEEP=Age:, Gender, IncomeLevel);
set original_data;
run;
This will keep Age , Agegroup , Gender and Incomelevel.
--PLEASE FOLLOW THE BLOG FOR MORE UPDATES...
--FOLLOW US IN FACEBOOK SASALL4YOU AND JOIN ...
--JOIN US IN FACEBOOK AND TELEGRAM CHANNEL FOR MORE UPDATES
CLICK HERE: https://t.me/SasAll4You
- Get link
- X
- Other Apps
Comments
Post a Comment