150.EXPLORING ONLINE COURSE ENGAGEMENT THROUGH SAS: A COMPREHENSIVE ANALYSIS UTILIZING DATA VISUALIZATION, STATISTICAL PROCEDURES, AND REPORTING TECHNIQUES

EXPLORING ONLINE COURSE ENGAGEMENT THROUGH SAS: A COMPREHENSIVE ANALYSIS UTILIZING DATA VISUALIZATION, STATISTICAL PROCEDURES, AND REPORTING TECHNIQUES


/*Create a unique dataset centered around Online Course Engagement and demonstrate various SAS procedures to analyze and visualize this data.*/


Dataset Overview: Online Course Engagement

We'll simulate a dataset named course_engagement that captures student interactions with an online course platform. The dataset includes:

student_id: Unique identifier for each student

course_id: Identifier for the course

enrollment_date: Date the student enrolled in the course

completion_date: Date the student completed the course (if completed)

time_spent: Total time spent on the course (in hours)

assignments_submitted: Number of assignments submitted

quizzes_attempted: Number of quizzes attempted

final_score: Final score achieved in the course

course_rating: Rating given by the student (1 to 5)

country: Country of the student


Step 1: Data Creation

/*First, we'll create the course_engagement dataset using SAS:*/

data course_engagement;

    format enrollment_date completion_date date9.;

    do student_id = 1 to 20;

        course_id = ceil(ranuni(0)*10);

        enrollment_date = '01JAN2025'd + ceil(ranuni(0)*90);

        if ranuni(0) < 0.8 then do;

            completion_date = enrollment_date + ceil(ranuni(0)*60);

            completed = 1;

        end;

        else do;

            completion_date = .;

            completed = 0;

        end;

        time_spent = round(ranuni(0)*50, 0.1);

        assignments_submitted = ceil(ranuni(0)*10);

        quizzes_attempted = ceil(ranuni(0)*5);

        final_score = round(ranuni(0)*100, 0.1);

        course_rating = ceil(ranuni(0)*5);

        country = scan("USA Canada UK India Australia Germany France Brazil Japan SouthAfrica", ceil(ranuni(0)*10));

        output;

    end;

run;

proc print;run;

Output:

Obs enrollment_date completion_date student_id course_id completed time_spent assignments_submitted quizzes_attempted final_score course_rating country
1 28JAN2025 14FEB2025 1 7 1 34.7 1 5 72.7 4 Canada
2 11JAN2025 21FEB2025 2 2 1 17.0 4 4 38.9 1 France
3 19MAR2025 15APR2025 3 5 1 27.6 1 5 58.4 2 Brazil
4 27MAR2025 22MAY2025 4 5 1 18.6 2 3 71.6 1 Brazil
5 13JAN2025 10FEB2025 5 7 1 13.6 2 4 54.2 4 SouthAfrica
6 10JAN2025 02FEB2025 6 6 1 28.8 1 2 62.0 2 India
7 15JAN2025 24JAN2025 7 6 1 36.1 6 4 4.9 4 Australia
8 17FEB2025 16APR2025 8 7 1 9.8 2 3 38.2 2 Brazil
9 02JAN2025 25FEB2025 9 10 1 32.3 2 1 30.5 4 Japan
10 01FEB2025 . 10 2 0 1.4 9 5 32.3 5 France
11 15MAR2025 19MAR2025 11 10 1 7.1 9 1 76.7 3 Australia
12 13FEB2025 18MAR2025 12 9 1 47.7 4 3 46.3 5 Germany
13 13FEB2025 21MAR2025 13 7 1 1.2 2 2 67.7 5 Brazil
14 01FEB2025 16MAR2025 14 5 1 17.0 4 2 38.6 2 Germany
15 20JAN2025 30JAN2025 15 7 1 32.5 8 1 20.1 2 Brazil
16 30JAN2025 01MAR2025 16 7 1 36.1 5 2 83.2 5 Australia
17 17FEB2025 14MAR2025 17 3 1 8.0 4 3 76.8 2 France
18 25JAN2025 13MAR2025 18 10 1 43.3 1 1 76.4 1 Germany
19 16JAN2025 13MAR2025 19 10 1 4.4 4 3 28.6 1 Brazil
20 06FEB2025 . 20 7 0 28.6 3 2 11.7 3 USA


Step 2: Descriptive Statistics with PROC MEANS

/*To understand the central tendencies and dispersion of our numeric variables:*/

proc means data=course_engagement n mean std min max;

    var time_spent assignments_submitted quizzes_attempted final_score;

run;

Output:

                                                                   The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum
time_spent
assignments_submitted
quizzes_attempted
final_score
20
20
20
20
22.2900000
3.7000000
2.8000000
49.4900000
14.2803472
2.5772282
1.3611141
23.7364676
1.2000000
1.0000000
1.0000000
4.9000000
47.7000000
9.0000000
5.0000000
83.2000000

/*This provides insights into average time spent, assignment submissions, quiz attempts, and final scores.*/


Step 3: Frequency Analysis with PROC FREQ

/*Analyzing categorical variables:*/

proc freq data=course_engagement;

    tables course_rating country completed;

run;

Output:

                                                             The FREQ Procedure

course_rating Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 4 20.00 4 20.00
2 6 30.00 10 50.00
3 2 10.00 12 60.00
4 4 20.00 16 80.00
5 4 20.00 20 100.00


country Frequency Percent Cumulative
Frequency
Cumulative
Percent
Australia 3 15.00 3 15.00
Brazil 6 30.00 9 45.00
Canada 1 5.00 10 50.00
France 3 15.00 13 65.00
Germany 3 15.00 16 80.00
India 1 5.00 17 85.00
Japan 1 5.00 18 90.00
SouthAfrica 1 5.00 19 95.00
USA 1 5.00 20 100.00


completed Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 2 10.00 2 10.00
1 18 90.00 20 100.00


/*This reveals the distribution of course ratings, student countries, and completion status.*/


Step 4: Correlation Analysis with PROC CORR

/*Understanding relationships between numeric variables:*/

proc corr data=course_engagement;

    var time_spent assignments_submitted quizzes_attempted final_score;

run;

Output:

                                                         The CORR Procedure

4 Variables: time_spent assignments_submitted quizzes_attempted final_score


Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
time_spent 20 22.29000 14.28035 445.80000 1.20000 47.70000
assignments_submitted 20 3.70000 2.57723 74.00000 1.00000 9.00000
quizzes_attempted 20 2.80000 1.36111 56.00000 1.00000 5.00000
final_score 20 49.49000 23.73647 989.80000 4.90000 83.20000


Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
  time_spent assignments_submitted quizzes_attempted final_score
time_spent
1.00000
 
-0.24062
0.3068
-0.15933
0.5023
-0.04665
0.8452
assignments_submitted
-0.24062
0.3068
1.00000
 
-0.09302
0.6965
-0.28328
0.2262
quizzes_attempted
-0.15933
0.5023
-0.09302
0.6965
1.00000
 
-0.07793
0.7440
final_score
-0.04665
0.8452
-0.28328
0.2262
-0.07793
0.7440
1.00000
 


/*This identifies how time spent correlates with performance metrics.*/


Step 5: Regression Analysis with PROC REG

/*Exploring how engagement metrics predict final scores:*/

proc reg data=course_engagement;

    model final_score = time_spent assignments_submitted quizzes_attempted;

run;

Output:

                                                                 The CORR Procedure

4 Variables: time_spent assignments_submitted quizzes_attempted final_score


Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
time_spent 20 22.29000 14.28035 445.80000 1.20000 47.70000
assignments_submitted 20 3.70000 2.57723 74.00000 1.00000 9.00000
quizzes_attempted 20 2.80000 1.36111 56.00000 1.00000 5.00000
final_score 20 49.49000 23.73647 989.80000 4.90000 83.20000


Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
  time_spent assignments_submitted quizzes_attempted final_score
time_spent
1.00000
 
-0.24062
0.3068
-0.15933
0.5023
-0.04665
0.8452
assignments_submitted
-0.24062
0.3068
1.00000
 
-0.09302
0.6965
-0.28328
0.2262
quizzes_attempted
-0.15933
0.5023
-0.09302
0.6965
1.00000
 
-0.07793
0.7440
final_score
-0.04665
0.8452
-0.28328
0.2262
-0.07793
0.7440
1.00000
 


/*This regression model assesses the impact of engagement on final performance.*/


Step 6: Data Visualization with PROC SGPLOT

/*Visualizing the relationship between time spent and final score:*/

proc sgplot data=course_engagement;

    scatter x=time_spent y=final_score / group=completed;

    reg x=time_spent y=final_score / group=completed;

    xaxis label="Time Spent (hours)";

    yaxis label="Final Score";

run;


/*This scatter plot with regression lines illustrates performance differences between completed and non-completed courses.*/


Step 7: Geographic Distribution with PROC GCHART

/*Visualizing student distribution by country:*/

proc gchart data=course_engagement;

    vbar country / discrete;

run;


/*This bar chart shows the number of students from each country.*/


Step 8: Box Plot with PROC SGPLOT

/*Analyzing score distribution by course rating:*/

proc sgplot data=course_engagement;

    vbox final_score / category=course_rating;

    xaxis label="Course Rating";

    yaxis label="Final Score";

run;


/*This box plot highlights how student ratings relate to their final scores.*/


Step 9: Panel Plot with PROC SGPANEL

/*Comparing time spent across countries:*/

proc sgpanel data=course_engagement;

    panelby country / columns=3;

    histogram time_spent;

    colaxis label="Time Spent (hours)";

run;


/*This panel plot provides a country-wise distribution of time spent.*/


Step 10: Creating a Summary Report with PROC REPORT

/*Generating a summary table:*/

proc report data=course_engagement nowd;

    column country completed n mean_final_score;

    define country / group;

    define completed / group;

    define n / "Number of Students";

    define mean_final_score /analysis mean "Average Final Score";

run;

Output:

country completed Number of Students Average Final Score
Australia 1 3 54.933333
Brazil 1 6 47.433333
Canada 1 1 72.7
France 0 1 32.3
  1 2 57.85
Germany 1 3 53.766667
India 1 1 62
Japan 1 1 30.5
SouthAfrica 1 1 54.2
USA 0 1 11.7


/*This report summarizes the number of students and average scores by country and completion status.*/


Step 11: Exporting Results with PROC EXPORT

/*Exporting the dataset to a CSV file:*/

proc export data=course_engagement

    outfile="course_engagement.csv"

    dbms=csv

    replace;

run;

/*This allows for sharing or further analysis in other tools.*/

PRACTICE AND COMMENT YOUR CODE: 

-->PLEASE FOLLOW OUR BLOG FOR MORE UPDATES.

TO FOLLOW OUR TELEGRAM CHANNEL CLICK HERE


Comments