​136.Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement

Mastering Data Insertion in SAS: A Comprehensive Guide to Using PROC SQL's INSERT INTO Statement


1. Creating a Unique Dataset

Let's begin by creating a dataset named employee_data that contains information about employees, including their ID, name, department, position, hire date, salary, and performance score.


data employee_data;

    input Employee_ID $ First_Name $ Last_Name $ Department $ Position $ Hire_Date :date9. Salary Performance_Score;

    format Hire_Date date9.;

    datalines;

E001 John Doe Sales Manager 15JAN2015 75000 85

E002 Jane Smith HR Specialist 22MAR2016 62000 90

E003 Emily Johnson IT Analyst 10JUL2017 68000 88

E004 Michael Brown Finance Director 05SEP2014 89000 92

E005 Sarah Davis Marketing Coordinator 12DEC2018 54000 78

E006 David Wilson Sales Representative 03FEB2019 58000 83

E007 Laura Martinez HR Manager 25MAY2013 73000 87

E008 James Taylor IT Developer 17AUG2020 71000 91

E009 Linda Anderson Finance Analyst 30NOV2016 67000 86

E010 Robert Thomas Marketing Manager 08APR2012 80000 89

;

run;


In this dataset:

Employee_ID: Unique identifier for each employee.

First_Name and Last_Name: Employee's first and last names.

Department: Department where the employee works.

Position: Job title of the employee.

Hire_Date: Date when the employee was hired.

Salary: Annual salary of the employee.

Performance_Score: Numerical score representing the employee's performance.


2. Viewing the Dataset

To view the contents of the employee_data dataset, you can use the following PROC PRINT statement:


proc print data=employee_data;

    title 'Employee Data';

run;

This will display all the records in the dataset.


Output:

Obs Employee_ID First_Name Last_Name Department Position Hire_Date Salary Performance_Score
1 E001 John Doe Sales Manager 15JAN2015 75000 85
2 E002 Jane Smith HR Speciali 22MAR2016 62000 90
3 E003 Emily Johnson IT Analyst 10JUL2017 68000 88
4 E004 Michael Brown Finance Director 05SEP2014 89000 92
5 E005 Sarah Davis Marketin Coordina 12DEC2018 54000 78
6 E006 David Wilson Sales Represen 03FEB2019 58000 83
7 E007 Laura Martinez HR Manager 25MAY2013 73000 87
8 E008 James Taylor IT Develope 17AUG2020 71000 91
9 E009 Linda Anderson Finance Analyst 30NOV2016 67000 86
10 E010 Robert Thomas Marketin Manager 08APR2012 80000 89


3. Using PROC SQL for Data Manipulation

PROC SQL in SAS provides powerful capabilities for data manipulation. Below are several examples demonstrating how to use PROC SQL with the employee_data dataset.

a. Selecting Specific Columns

To retrieve specific columns from the dataset, such as Employee_ID, First_Name, Last_Name, and Department, use the following query:


proc sql;

    select Employee_ID, First_Name, Last_Name, Department

    from employee_data;

quit;

This query selects and displays only the specified columns from the employee_data dataset.


Output:

Employee_ID First_Name Last_Name Department
E001 John Doe Sales
E002 Jane Smith HR
E003 Emily Johnson IT
E004 Michael Brown Finance
E005 Sarah Davis Marketin
E006 David Wilson Sales
E007 Laura Martinez HR
E008 James Taylor IT
E009 Linda Anderson Finance
E010 Robert Thomas Marketing


b. Filtering Rows with a WHERE Clause

To filter employees who work in the 'Sales' department, you can use the WHERE clause:


proc sql;

    select Employee_ID, First_Name, Last_Name, Department

    from employee_data

    where Department = 'Sales';

quit;

This query returns records of employees whose Department is 'Sales'.


Output:

Employee_ID First_Name Last_Name Department
E001 John Doe Sales
E006 David Wilson Sales


c. Sorting Data with ORDER BY

To sort employees by Hire_Date in ascending order:


proc sql;

    select Employee_ID, First_Name, Last_Name, Hire_Date

    from employee_data

    order by Hire_Date asc;

quit;

This query orders the results based on the Hire_Date, showing the earliest hires first.


Output:

Employee_ID First_Name Last_Name Hire_Date
E010 Robert Thomas 08APR2012
E007 Laura Martinez 25MAY2013
E004 Michael Brown 05SEP2014
E001 John Doe 15JAN2015
E002 Jane Smith 22MAR2016
E009 Linda Anderson 30NOV2016
E003 Emily Johnson 10JUL2017
E005 Sarah Davis 12DEC2018
E006 David Wilson 03FEB2019
E008 James Taylor 17AUG2020


d. Calculating Aggregate Functions

To calculate the average salary across all employees:


proc sql;

    select avg(Salary) as Average_Salary

    from employee_data;

quit;

This query computes and displays the average salary of all employees in the dataset.


Output:

Average_Salary
69700


e. Grouping Data with GROUP BY

To find the average salary and average performance score per department:


proc sql;

    select Department,

           avg(Salary) as Avg_Salary,

           avg(Performance_Score) as Avg_Performance

    from employee_data

    group by Department;

quit;

This query groups the data by Department and calculates the average Salary and Performance_Score for each department.


Output:

Department Avg_Salary Avg_Performance
Finance 78000 89
HR 67500 88.5
IT 69500 89.5
Marketin 67000 83.5
Sales 66500 84


f.Filtering Grouped Data with HAVING

To retrieve departments where the average salary exceeds 70,000:


proc sql;

    select Department,

           avg(Salary) as Avg_Salary

    from employee_data

    group by Department

    having Avg_Salary > 70000;

quit;

The HAVING clause filters groups based on the condition specified, in this case, departments with an average salary greater than 70,000.


Output:

Department Avg_Salary
Finance 78000

g. Creating New Columns with CASE Statements

To create a new column Salary_Category that classifies employees based on their salary:


proc sql;

    select Employee_ID,

           First_Name,

           Last_Name,

           Salary,

           case

               when Salary > 80000 then 'High'

               when Salary between 60000 and 80000 then 'Medium'

               else 'Low'

           end as Salary_Category

    from employee_data;

quit;

This query adds a Salary_Category column that categorizes salaries into 'High', 'Medium', or 'Low' based on specified ranges.


Output:

Employee_ID First_Name Last_Name Salary Salary_Category
E001 John Doe 75000 Medium
E002 Jane Smith 62000 Medium
E003 Emily Johnson 68000 Medium
E004 Michael Brown 89000 High
E005 Sarah Davis 54000 Low
E006 David Wilson 58000 Low
E007 Laura Martinez 73000 Medium
E008 James Taylor 71000 Medium
E009 Linda Anderson 67000 Medium
E010 Robert Thomas 80000 Medium


h. Joining Tables

Assuming there's another dataset department_info containing department details:


data department_info;

    input Department $ Department_Head $ Location $;

    datalines;

Sales Alice Green New York

HR Bob White Chicago

IT Carol Black San Francisco

Finance Dave Grey Boston

Marketing Eve Blue Los Angeles

;

run;

To join employee_data with department_info on the Department column:


proc sql;

    select e.Employee_ID,

           e.First_Name,

           e.Last_Name,

           e.Department,

           d.Department_Head,

           d.Location

    from employee_data as e

    left join department_info as d

    on e.Department = d.Department;

quit;

This left join combines information from both datasets, including all employees and their corresponding department details.


Output:

Employee_ID First_Name Last_Name Department Department_Head Location
E009 Linda Anderson Finance Dave Grey
E004 Michael Brown Finance Dave Grey
E002 Jane Smith HR Bob White
E007 Laura Martinez HR Bob White
E003 Emily Johnson IT Carol Black
E008 James Taylor IT Carol Black
E005 Sarah Davis Marketin Eve Blue
E010 Robert Thomas Marketin Eve Blue
E001 John Doe Sales Alice Green
E006 David Wilson Sales Alice Green


i. Creating a View

To create a view that shows employees hired after January 1, 2016:


proc sql;

    create view recent_hires as

    select Employee_ID,

           First_Name,

           Last_Name,

           Hire_Date

    from employee_data

    where Hire_Date > '01JAN2016'd;

quit;

This view, named recent_hires, can be queried like a table and includes employees hired after the specified date.


Log:

NOTE: SQL view WORK.RECENT_HIRES has been defined.


j. Updating Data

To update the Performance_Score of employee 'E005' to 82:


proc sql;

    update employee_data

    set Performance_Score = 82

    where Employee_ID = 'E005';

quit;

This statement modifies the Performance_Score for the employee with Employee_ID 'E005'.


Log:

NOTE: 1 row was updated in WORK.EMPLOYEE_DATA.


k. Deleting Data

To remove records of employees in the 'Marketing' department:


proc sql;

    delete from employee_data

    where Department = 'Marketing';

quit;

This deletes all records where the Department is 'Marketing'.


Log:

NOTE: No rows were deleted from WORK.EMPLOYEE_DATA.


l. Inserting Data

​To insert a new employee record into the employee_data table using the INSERT INTO statement in PROC SQL, you can use the following syntax:​


proc sql;

    insert into employee_data

    values ('E011', 'Alice', 'White', 'Legal', 'Counsel', '20FEB2021'd, 78000, 88);

quit;


Log:

NOTE: 1 row was inserted into WORK.EMPLOYEE_DATA.



PRACTICE AND COMMENT YOUR CODE: 

-->PLEASE FOLLOW OUR BLOG FOR MORE UPDATES.

TO FOLLOW OUR TELEGRAM CHANNEL CLICK HERE



Comments