136.Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement
- Get link
- X
- Other Apps
Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement
1. Creating a Unique Dataset
Let's begin by creating a dataset named employee_data that contains information about employees, including their ID, name, department, position, hire date, salary, and performance score.
data employee_data;
input Employee_ID $ First_Name $ Last_Name $ Department $ Position $ Hire_Date :date9. Salary Performance_Score;
format Hire_Date date9.;
datalines;
E001 John Doe Sales Manager 15JAN2015 75000 85
E002 Jane Smith HR Specialist 22MAR2016 62000 90
E003 Emily Johnson IT Analyst 10JUL2017 68000 88
E004 Michael Brown Finance Director 05SEP2014 89000 92
E005 Sarah Davis Marketing Coordinator 12DEC2018 54000 78
E006 David Wilson Sales Representative 03FEB2019 58000 83
E007 Laura Martinez HR Manager 25MAY2013 73000 87
E008 James Taylor IT Developer 17AUG2020 71000 91
E009 Linda Anderson Finance Analyst 30NOV2016 67000 86
E010 Robert Thomas Marketing Manager 08APR2012 80000 89
;
run;
In this dataset:
Employee_ID: Unique identifier for each employee.
First_Name and Last_Name: Employee's first and last names.
Department: Department where the employee works.
Position: Job title of the employee.
Hire_Date: Date when the employee was hired.
Salary: Annual salary of the employee.
Performance_Score: Numerical score representing the employee's performance.
2. Viewing the Dataset
To view the contents of the employee_data dataset, you can use the following PROC PRINT statement:
proc print data=employee_data;
title 'Employee Data';
run;
This will display all the records in the dataset.
Output:
Obs | Employee_ID | First_Name | Last_Name | Department | Position | Hire_Date | Salary | Performance_Score |
---|---|---|---|---|---|---|---|---|
1 | E001 | John | Doe | Sales | Manager | 15JAN2015 | 75000 | 85 |
2 | E002 | Jane | Smith | HR | Speciali | 22MAR2016 | 62000 | 90 |
3 | E003 | Emily | Johnson | IT | Analyst | 10JUL2017 | 68000 | 88 |
4 | E004 | Michael | Brown | Finance | Director | 05SEP2014 | 89000 | 92 |
5 | E005 | Sarah | Davis | Marketin | Coordina | 12DEC2018 | 54000 | 78 |
6 | E006 | David | Wilson | Sales | Represen | 03FEB2019 | 58000 | 83 |
7 | E007 | Laura | Martinez | HR | Manager | 25MAY2013 | 73000 | 87 |
8 | E008 | James | Taylor | IT | Develope | 17AUG2020 | 71000 | 91 |
9 | E009 | Linda | Anderson | Finance | Analyst | 30NOV2016 | 67000 | 86 |
10 | E010 | Robert | Thomas | Marketin | Manager | 08APR2012 | 80000 | 89 |
3. Using PROC SQL for Data Manipulation
PROC SQL in SAS provides powerful capabilities for data manipulation. Below are several examples demonstrating how to use PROC SQL with the employee_data dataset.
a. Selecting Specific Columns
To retrieve specific columns from the dataset, such as Employee_ID, First_Name, Last_Name, and Department, use the following query:
proc sql;
select Employee_ID, First_Name, Last_Name, Department
from employee_data;
quit;
This query selects and displays only the specified columns from the employee_data dataset.
Output:
Employee_ID | First_Name | Last_Name | Department |
---|---|---|---|
E001 | John | Doe | Sales |
E002 | Jane | Smith | HR |
E003 | Emily | Johnson | IT |
E004 | Michael | Brown | Finance |
E005 | Sarah | Davis | Marketin |
E006 | David | Wilson | Sales |
E007 | Laura | Martinez | HR |
E008 | James | Taylor | IT |
E009 | Linda | Anderson | Finance |
E010 | Robert | Thomas | Marketing |
b. Filtering Rows with a WHERE Clause
To filter employees who work in the 'Sales' department, you can use the WHERE clause:
proc sql;
select Employee_ID, First_Name, Last_Name, Department
from employee_data
where Department = 'Sales';
quit;
This query returns records of employees whose Department is 'Sales'.
Output:
Employee_ID | First_Name | Last_Name | Department |
---|---|---|---|
E001 | John | Doe | Sales |
E006 | David | Wilson | Sales |
c. Sorting Data with ORDER BY
To sort employees by Hire_Date in ascending order:
proc sql;
select Employee_ID, First_Name, Last_Name, Hire_Date
from employee_data
order by Hire_Date asc;
quit;
This query orders the results based on the Hire_Date, showing the earliest hires first.
Output:
Employee_ID | First_Name | Last_Name | Hire_Date |
---|---|---|---|
E010 | Robert | Thomas | 08APR2012 |
E007 | Laura | Martinez | 25MAY2013 |
E004 | Michael | Brown | 05SEP2014 |
E001 | John | Doe | 15JAN2015 |
E002 | Jane | Smith | 22MAR2016 |
E009 | Linda | Anderson | 30NOV2016 |
E003 | Emily | Johnson | 10JUL2017 |
E005 | Sarah | Davis | 12DEC2018 |
E006 | David | Wilson | 03FEB2019 |
E008 | James | Taylor | 17AUG2020 |
d. Calculating Aggregate Functions
To calculate the average salary across all employees:
proc sql;
select avg(Salary) as Average_Salary
from employee_data;
quit;
This query computes and displays the average salary of all employees in the dataset.
Output:
Average_Salary |
---|
69700 |
e. Grouping Data with GROUP BY
To find the average salary and average performance score per department:
proc sql;
select Department,
avg(Salary) as Avg_Salary,
avg(Performance_Score) as Avg_Performance
from employee_data
group by Department;
quit;
This query groups the data by Department and calculates the average Salary and Performance_Score for each department.
Output:
Department | Avg_Salary | Avg_Performance |
---|---|---|
Finance | 78000 | 89 |
HR | 67500 | 88.5 |
IT | 69500 | 89.5 |
Marketin | 67000 | 83.5 |
Sales | 66500 | 84 |
f.Filtering Grouped Data with HAVING
To retrieve departments where the average salary exceeds 70,000:
proc sql;
select Department,
avg(Salary) as Avg_Salary
from employee_data
group by Department
having Avg_Salary > 70000;
quit;
The HAVING clause filters groups based on the condition specified, in this case, departments with an average salary greater than 70,000.
Output:
Department | Avg_Salary |
---|---|
Finance | 78000 |
g. Creating New Columns with CASE Statements
To create a new column Salary_Category that classifies employees based on their salary:
proc sql;
select Employee_ID,
First_Name,
Last_Name,
Salary,
case
when Salary > 80000 then 'High'
when Salary between 60000 and 80000 then 'Medium'
else 'Low'
end as Salary_Category
from employee_data;
quit;
This query adds a Salary_Category column that categorizes salaries into 'High', 'Medium', or 'Low' based on specified ranges.
Output:
Employee_ID | First_Name | Last_Name | Salary | Salary_Category |
---|---|---|---|---|
E001 | John | Doe | 75000 | Medium |
E002 | Jane | Smith | 62000 | Medium |
E003 | Emily | Johnson | 68000 | Medium |
E004 | Michael | Brown | 89000 | High |
E005 | Sarah | Davis | 54000 | Low |
E006 | David | Wilson | 58000 | Low |
E007 | Laura | Martinez | 73000 | Medium |
E008 | James | Taylor | 71000 | Medium |
E009 | Linda | Anderson | 67000 | Medium |
E010 | Robert | Thomas | 80000 | Medium |
h. Joining Tables
Assuming there's another dataset department_info containing department details:
data department_info;
input Department $ Department_Head $ Location $;
datalines;
Sales Alice Green New York
HR Bob White Chicago
IT Carol Black San Francisco
Finance Dave Grey Boston
Marketing Eve Blue Los Angeles
;
run;
To join employee_data with department_info on the Department column:
proc sql;
select e.Employee_ID,
e.First_Name,
e.Last_Name,
e.Department,
d.Department_Head,
d.Location
from employee_data as e
left join department_info as d
on e.Department = d.Department;
quit;
This left join combines information from both datasets, including all employees and their corresponding department details.
Output:
Employee_ID | First_Name | Last_Name | Department | Department_Head | Location |
---|---|---|---|---|---|
E009 | Linda | Anderson | Finance | Dave | Grey |
E004 | Michael | Brown | Finance | Dave | Grey |
E002 | Jane | Smith | HR | Bob | White |
E007 | Laura | Martinez | HR | Bob | White |
E003 | Emily | Johnson | IT | Carol | Black |
E008 | James | Taylor | IT | Carol | Black |
E005 | Sarah | Davis | Marketin | Eve | Blue |
E010 | Robert | Thomas | Marketin | Eve | Blue |
E001 | John | Doe | Sales | Alice | Green |
E006 | David | Wilson | Sales | Alice | Green |
i. Creating a View
To create a view that shows employees hired after January 1, 2016:
proc sql;
create view recent_hires as
select Employee_ID,
First_Name,
Last_Name,
Hire_Date
from employee_data
where Hire_Date > '01JAN2016'd;
quit;
This view, named recent_hires, can be queried like a table and includes employees hired after the specified date.
Log:
NOTE: SQL view WORK.RECENT_HIRES has been defined.
j. Updating Data
To update the Performance_Score of employee 'E005' to 82:
proc sql;
update employee_data
set Performance_Score = 82
where Employee_ID = 'E005';
quit;
This statement modifies the Performance_Score for the employee with Employee_ID 'E005'.
Log:
NOTE: 1 row was updated in WORK.EMPLOYEE_DATA.
k. Deleting Data
To remove records of employees in the 'Marketing' department:
proc sql;
delete from employee_data
where Department = 'Marketing';
quit;
This deletes all records where the Department is 'Marketing'.
Log:
NOTE: No rows were deleted from WORK.EMPLOYEE_DATA.
l. Inserting Data
To insert a new employee record into the employee_data table using the INSERT INTO statement in PROC SQL, you can use the following syntax:
proc sql;
insert into employee_data
values ('E011', 'Alice', 'White', 'Legal', 'Counsel', '20FEB2021'd, 78000, 88);
quit;
Log:
NOTE: 1 row was inserted into WORK.EMPLOYEE_DATA.
- Get link
- X
- Other Apps
Comments
Post a Comment