SAS Learning Hub: 136.Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement

Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement

1. Creating a Unique Dataset

Let's begin by creating a dataset named employee_data that contains information about employees, including their ID, name, department, position, hire date, salary, and performance score.

data employee_data;

input Employee_ID $ First_Name $ Last_Name $ Department $ Position $ Hire_Date :date9. Salary Performance_Score;

format Hire_Date date9.;

datalines;

E001 John Doe Sales Manager 15JAN2015 75000 85

E002 Jane Smith HR Specialist 22MAR2016 62000 90

E003 Emily Johnson IT Analyst 10JUL2017 68000 88

E004 Michael Brown Finance Director 05SEP2014 89000 92

E005 Sarah Davis Marketing Coordinator 12DEC2018 54000 78

E006 David Wilson Sales Representative 03FEB2019 58000 83

E007 Laura Martinez HR Manager 25MAY2013 73000 87

E008 James Taylor IT Developer 17AUG2020 71000 91

E009 Linda Anderson Finance Analyst 30NOV2016 67000 86

E010 Robert Thomas Marketing Manager 08APR2012 80000 89

;

run;

In this dataset:

Employee_ID: Unique identifier for each employee.

First_Name and Last_Name: Employee's first and last names.

Department: Department where the employee works.

Position: Job title of the employee.

Hire_Date: Date when the employee was hired.

Salary: Annual salary of the employee.

Performance_Score: Numerical score representing the employee's performance.

2. Viewing the Dataset

To view the contents of the employee_data dataset, you can use the following PROC PRINT statement:

proc print data=employee_data;

title 'Employee Data';

run;

This will display all the records in the dataset.

Output:

Obs	Employee_ID	First_Name	Last_Name	Department	Position	Hire_Date	Salary	Performance_Score
1	E001	John	Doe	Sales	Manager	15JAN2015	75000	85
2	E002	Jane	Smith	HR	Speciali	22MAR2016	62000	90
3	E003	Emily	Johnson	IT	Analyst	10JUL2017	68000	88
4	E004	Michael	Brown	Finance	Director	05SEP2014	89000	92
5	E005	Sarah	Davis	Marketin	Coordina	12DEC2018	54000	78
6	E006	David	Wilson	Sales	Represen	03FEB2019	58000	83
7	E007	Laura	Martinez	HR	Manager	25MAY2013	73000	87
8	E008	James	Taylor	IT	Develope	17AUG2020	71000	91
9	E009	Linda	Anderson	Finance	Analyst	30NOV2016	67000	86
10	E010	Robert	Thomas	Marketin	Manager	08APR2012	80000	89

3. Using PROC SQL for Data Manipulation

PROC SQL in SAS provides powerful capabilities for data manipulation. Below are several examples demonstrating how to use PROC SQL with the employee_data dataset.

a. Selecting Specific Columns

To retrieve specific columns from the dataset, such as Employee_ID, First_Name, Last_Name, and Department, use the following query:

proc sql;

select Employee_ID, First_Name, Last_Name, Department

from employee_data;

quit;

This query selects and displays only the specified columns from the employee_data dataset.

Output:

Employee_ID	First_Name	Last_Name	Department
E001	John	Doe	Sales
E002	Jane	Smith	HR
E003	Emily	Johnson	IT
E004	Michael	Brown	Finance
E005	Sarah	Davis	Marketin
E006	David	Wilson	Sales
E007	Laura	Martinez	HR
E008	James	Taylor	IT
E009	Linda	Anderson	Finance
E010	Robert	Thomas	Marketing

b. Filtering Rows with a WHERE Clause

To filter employees who work in the 'Sales' department, you can use the WHERE clause:

proc sql;

select Employee_ID, First_Name, Last_Name, Department

from employee_data

where Department = 'Sales';

quit;

This query returns records of employees whose Department is 'Sales'.

Output:

Employee_ID	First_Name	Last_Name	Department
E001	John	Doe	Sales
E006	David	Wilson	Sales

c. Sorting Data with ORDER BY

To sort employees by Hire_Date in ascending order:

proc sql;

select Employee_ID, First_Name, Last_Name, Hire_Date

from employee_data

order by Hire_Date asc;

quit;

This query orders the results based on the Hire_Date, showing the earliest hires first.

Output:

Employee_ID	First_Name	Last_Name	Hire_Date
E010	Robert	Thomas	08APR2012
E007	Laura	Martinez	25MAY2013
E004	Michael	Brown	05SEP2014
E001	John	Doe	15JAN2015
E002	Jane	Smith	22MAR2016
E009	Linda	Anderson	30NOV2016
E003	Emily	Johnson	10JUL2017
E005	Sarah	Davis	12DEC2018
E006	David	Wilson	03FEB2019
E008	James	Taylor	17AUG2020

d. Calculating Aggregate Functions

To calculate the average salary across all employees:

proc sql;

select avg(Salary) as Average_Salary

from employee_data;

quit;

This query computes and displays the average salary of all employees in the dataset.

Output:

Average_Salary
69700

e. Grouping Data with GROUP BY

To find the average salary and average performance score per department:

proc sql;

select Department,

avg(Salary) as Avg_Salary,

avg(Performance_Score) as Avg_Performance

from employee_data

group by Department;

quit;

This query groups the data by Department and calculates the average Salary and Performance_Score for each department.

Output:

Department	Avg_Salary	Avg_Performance
Finance	78000	89
HR	67500	88.5
IT	69500	89.5
Marketin	67000	83.5
Sales	66500	84

f.Filtering Grouped Data with HAVING

To retrieve departments where the average salary exceeds 70,000:

proc sql;

select Department,

avg(Salary) as Avg_Salary

from employee_data

group by Department

having Avg_Salary > 70000;

quit;

The HAVING clause filters groups based on the condition specified, in this case, departments with an average salary greater than 70,000.

Output:

Department	Avg_Salary
Finance	78000

g. Creating New Columns with CASE Statements

To create a new column Salary_Category that classifies employees based on their salary:

proc sql;

select Employee_ID,

First_Name,

Last_Name,

Salary,

case

when Salary > 80000 then 'High'

when Salary between 60000 and 80000 then 'Medium'

else 'Low'

end as Salary_Category

from employee_data;

quit;

This query adds a Salary_Category column that categorizes salaries into 'High', 'Medium', or 'Low' based on specified ranges.

Output:

Employee_ID	First_Name	Last_Name	Salary	Salary_Category
E001	John	Doe	75000	Medium
E002	Jane	Smith	62000	Medium
E003	Emily	Johnson	68000	Medium
E004	Michael	Brown	89000	High
E005	Sarah	Davis	54000	Low
E006	David	Wilson	58000	Low
E007	Laura	Martinez	73000	Medium
E008	James	Taylor	71000	Medium
E009	Linda	Anderson	67000	Medium
E010	Robert	Thomas	80000	Medium

h. Joining Tables

Assuming there's another dataset department_info containing department details:

data department_info;

input Department $ Department_Head $ Location $;

datalines;

Sales Alice Green New York

HR Bob White Chicago

IT Carol Black San Francisco

Finance Dave Grey Boston

Marketing Eve Blue Los Angeles

;

run;

To join employee_data with department_info on the Department column:

proc sql;

select e.Employee_ID,

e.First_Name,

e.Last_Name,

e.Department,

d.Department_Head,

d.Location

from employee_data as e

left join department_info as d

on e.Department = d.Department;

quit;

This left join combines information from both datasets, including all employees and their corresponding department details.

Output:

Employee_ID	First_Name	Last_Name	Department	Department_Head	Location
E009	Linda	Anderson	Finance	Dave	Grey
E004	Michael	Brown	Finance	Dave	Grey
E002	Jane	Smith	HR	Bob	White
E007	Laura	Martinez	HR	Bob	White
E003	Emily	Johnson	IT	Carol	Black
E008	James	Taylor	IT	Carol	Black
E005	Sarah	Davis	Marketin	Eve	Blue
E010	Robert	Thomas	Marketin	Eve	Blue
E001	John	Doe	Sales	Alice	Green
E006	David	Wilson	Sales	Alice	Green

i. Creating a View

To create a view that shows employees hired after January 1, 2016:

proc sql;

create view recent_hires as

select Employee_ID,

First_Name,

Last_Name,

Hire_Date

from employee_data

where Hire_Date > '01JAN2016'd;

quit;

This view, named recent_hires, can be queried like a table and includes employees hired after the specified date.

Log:

NOTE: SQL view WORK.RECENT_HIRES has been defined.

j. Updating Data

To update the Performance_Score of employee 'E005' to 82:

proc sql;

update employee_data

set Performance_Score = 82

where Employee_ID = 'E005';

quit;

This statement modifies the Performance_Score for the employee with Employee_ID 'E005'.

Log:

NOTE: 1 row was updated in WORK.EMPLOYEE_DATA.

k. Deleting Data

To remove records of employees in the 'Marketing' department:

proc sql;

delete from employee_data

where Department = 'Marketing';

quit;

This deletes all records where the Department is 'Marketing'.

Log:

NOTE: No rows were deleted from WORK.EMPLOYEE_DATA.

l. Inserting Data

To insert a new employee record into the employee_data table using the INSERT INTO statement in PROC SQL, you can use the following syntax:

proc sql;

insert into employee_data

values ('E011', 'Alice', 'White', 'Legal', 'Counsel', '20FEB2021'd, 78000, 88);

SAS Learning Hub

Monday, 24 March 2025

136.Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement

Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement

No comments:

Post a Comment

Popular Posts

Monday, 24 March 2025

​136.Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement

​Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement

No comments:

Post a Comment

136.Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement

Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement