308.COMPREHENSIVE ANALYSIS OF INDIAN AGRICULTURAL CROPS USING DATA STEP | PROC SQL | MACRO DERIVATIONS | PROC MEANS | PROC FREQ | PROC SGPLOT | PROC SORT
COMPREHENSIVE ANALYSIS OF INDIAN AGRICULTURAL CROPS USING DATA STEP | PROC SQL | MACRO DERIVATIONS | PROC MEANS | PROC FREQ | PROC SGPLOT | PROC SORT
1. Create the Raw Dataset Using DATA Step
options nocenter validvarname=any;
data work.india_crops;
infile datalines dlm=',' dsd truncover;
length Crop_Name $25 Region $20 Season $12 Water_Requirement $15;
input Crop_Name :$25. Region :$20. Yield_Tons Price_per_Ton Season :$12.
Water_Requirement :$15.;
datalines;
Basmati Rice,North India,4200,55000,Kharif,High
Wheat,Northwest India,3800,22000,Rabi,Medium
Sugarcane,South India,9000,32000,Annual,High
Cotton,West India,2800,45000,Kharif,Low
Tea,North East India,2500,160000,Annual,High
Groundnut,South India,3300,48000,Kharif,Low
Maize,Central India,3600,20000,Kharif,Medium
Jowar,West India,3000,18000,Rabi,Low
Soybean,Central India,3400,42000,Kharif,Medium
Turmeric,South India,2900,85000,Annual,Medium
;
run;
proc print data=work.india_crops;
run;
OUTPUT:
| Obs | Crop_Name | Region | Season | Water_Requirement | Yield_Tons | Price_per_Ton |
|---|---|---|---|---|---|---|
| 1 | Basmati Rice | North India | Kharif | High | 4200 | 55000 |
| 2 | Wheat | Northwest India | Rabi | Medium | 3800 | 22000 |
| 3 | Sugarcane | South India | Annual | High | 9000 | 32000 |
| 4 | Cotton | West India | Kharif | Low | 2800 | 45000 |
| 5 | Tea | North East India | Annual | High | 2500 | 160000 |
| 6 | Groundnut | South India | Kharif | Low | 3300 | 48000 |
| 7 | Maize | Central India | Kharif | Medium | 3600 | 20000 |
| 8 | Jowar | West India | Rabi | Low | 3000 | 18000 |
| 9 | Soybean | Central India | Kharif | Medium | 3400 | 42000 |
| 10 | Turmeric | South India | Annual | Medium | 2900 | 85000 |
2. PROC SQL – Create Derived Variables
proc sql;
create table work.india_crops_sql as
select *,
(Yield_Tons * Price_per_Ton) as Total_Revenue format=comma12.
from work.india_crops;
quit;
proc print data=work.india_crops_sql;
run;
OUTPUT:
| Obs | Crop_Name | Region | Season | Water_Requirement | Yield_Tons | Price_per_Ton | Total_Revenue |
|---|---|---|---|---|---|---|---|
| 1 | Basmati Rice | North India | Kharif | High | 4200 | 55000 | 231,000,000 |
| 2 | Wheat | Northwest India | Rabi | Medium | 3800 | 22000 | 83,600,000 |
| 3 | Sugarcane | South India | Annual | High | 9000 | 32000 | 288,000,000 |
| 4 | Cotton | West India | Kharif | Low | 2800 | 45000 | 126,000,000 |
| 5 | Tea | North East India | Annual | High | 2500 | 160000 | 400,000,000 |
| 6 | Groundnut | South India | Kharif | Low | 3300 | 48000 | 158,400,000 |
| 7 | Maize | Central India | Kharif | Medium | 3600 | 20000 | 72,000,000 |
| 8 | Jowar | West India | Rabi | Low | 3000 | 18000 | 54,000,000 |
| 9 | Soybean | Central India | Kharif | Medium | 3400 | 42000 | 142,800,000 |
| 10 | Turmeric | South India | Annual | Medium | 2900 | 85000 | 246,500,000 |
3. Macro for Repetitive Derivation
%macro yieldcat(input=, output=);
data &output.;
set &input.;
length Yield_Category $15;
if Yield_Tons >= 4000 then Yield_Category = "HIGH_YIELD";
else if Yield_Tons >= 3000 then Yield_Category = "MEDIUM_YIELD";
else Yield_Category = "LOW_YIELD";
run;
proc print data=&output;
run;
%mend;
%yieldcat(input=work.india_crops_sql, output=work.crops_final);
OUTPUT:
| Obs | Crop_Name | Region | Season | Water_Requirement | Yield_Tons | Price_per_Ton | Total_Revenue | Yield_Category |
|---|---|---|---|---|---|---|---|---|
| 1 | Basmati Rice | North India | Kharif | High | 4200 | 55000 | 231,000,000 | HIGH_YIELD |
| 2 | Wheat | Northwest India | Rabi | Medium | 3800 | 22000 | 83,600,000 | MEDIUM_YIELD |
| 3 | Sugarcane | South India | Annual | High | 9000 | 32000 | 288,000,000 | HIGH_YIELD |
| 4 | Cotton | West India | Kharif | Low | 2800 | 45000 | 126,000,000 | LOW_YIELD |
| 5 | Tea | North East India | Annual | High | 2500 | 160000 | 400,000,000 | LOW_YIELD |
| 6 | Groundnut | South India | Kharif | Low | 3300 | 48000 | 158,400,000 | MEDIUM_YIELD |
| 7 | Maize | Central India | Kharif | Medium | 3600 | 20000 | 72,000,000 | MEDIUM_YIELD |
| 8 | Jowar | West India | Rabi | Low | 3000 | 18000 | 54,000,000 | MEDIUM_YIELD |
| 9 | Soybean | Central India | Kharif | Medium | 3400 | 42000 | 142,800,000 | MEDIUM_YIELD |
| 10 | Turmeric | South India | Annual | Medium | 2900 | 85000 | 246,500,000 | LOW_YIELD |
4. Summary Statistics Using PROC MEANS
proc means data=work.crops_final mean min max maxdec=2;
var Yield_Tons Price_per_Ton Total_Revenue;
run;
OUTPUT:
The MEANS Procedure
| Variable | Mean | Minimum | Maximum |
|---|---|---|---|
Yield_Tons Price_per_Ton Total_Revenue | 3850.00 52700.00 180230000.00 | 2500.00 18000.00 54000000.00 | 9000.00 160000.00 400000000.00 |
5. Frequency Tables Using PROC FREQ
proc freq data=work.crops_final;
tables Region Season Water_Requirement Yield_Category;
run;
OUTPUT:
The FREQ Procedure
| Region | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| Central India | 2 | 20.00 | 2 | 20.00 |
| North East India | 1 | 10.00 | 3 | 30.00 |
| North India | 1 | 10.00 | 4 | 40.00 |
| Northwest India | 1 | 10.00 | 5 | 50.00 |
| South India | 3 | 30.00 | 8 | 80.00 |
| West India | 2 | 20.00 | 10 | 100.00 |
| Season | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| Annual | 3 | 30.00 | 3 | 30.00 |
| Kharif | 5 | 50.00 | 8 | 80.00 |
| Rabi | 2 | 20.00 | 10 | 100.00 |
| Water_Requirement | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| High | 3 | 30.00 | 3 | 30.00 |
| Low | 3 | 30.00 | 6 | 60.00 |
| Medium | 4 | 40.00 | 10 | 100.00 |
| Yield_Category | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|---|
| HIGH_YIELD | 2 | 20.00 | 2 | 20.00 |
| LOW_YIELD | 3 | 30.00 | 5 | 50.00 |
| MEDIUM_YIELD | 5 | 50.00 | 10 | 100.00 |
6. Visualization Using PROC SGPLOT
Plot 1: Yield by Crop
proc sgplot data=work.crops_final;
vbar Crop_Name / response=Yield_Tons datalabel;
xaxis display=(nolabel);
yaxis label="Yield (Tons)";
title "Yield Comparison of Agricultural Crops in India";
run;
OUTPUT:
Plot 2: Price per Ton by Region
proc sgplot data=work.crops_final;
scatter x=Region y=Price_per_Ton / group=Season datalabel=Crop_Name;
title "Price per Ton by Region and Season";
run;
OUTPUT:
7. Sort Dataset by Total Revenue
proc sort data=work.crops_final out=work.crops_sorted;
by descending Total_Revenue;
run;
proc print data=work.crops_sorted;
title "Crops Sorted by Total Revenue (Highest to Lowest)";
run;
OUTPUT:
| Obs | Crop_Name | Region | Season | Water_Requirement | Yield_Tons | Price_per_Ton | Total_Revenue | Yield_Category |
|---|---|---|---|---|---|---|---|---|
| 1 | Tea | North East India | Annual | High | 2500 | 160000 | 400,000,000 | LOW_YIELD |
| 2 | Sugarcane | South India | Annual | High | 9000 | 32000 | 288,000,000 | HIGH_YIELD |
| 3 | Turmeric | South India | Annual | Medium | 2900 | 85000 | 246,500,000 | LOW_YIELD |
| 4 | Basmati Rice | North India | Kharif | High | 4200 | 55000 | 231,000,000 | HIGH_YIELD |
| 5 | Groundnut | South India | Kharif | Low | 3300 | 48000 | 158,400,000 | MEDIUM_YIELD |
| 6 | Soybean | Central India | Kharif | Medium | 3400 | 42000 | 142,800,000 | MEDIUM_YIELD |
| 7 | Cotton | West India | Kharif | Low | 2800 | 45000 | 126,000,000 | LOW_YIELD |
| 8 | Wheat | Northwest India | Rabi | Medium | 3800 | 22000 | 83,600,000 | MEDIUM_YIELD |
| 9 | Maize | Central India | Kharif | Medium | 3600 | 20000 | 72,000,000 | MEDIUM_YIELD |
| 10 | Jowar | West India | Rabi | Low | 3000 | 18000 | 54,000,000 | MEDIUM_YIELD |
8. Generate a Boxplot of Yield by Season
proc sgplot data=work.crops_final;
vbox Yield_Tons / category=Season;
title "Distribution of Crop Yield Across Seasons";
run;
Comments
Post a Comment