282.100+ HISTORIC BUSINESSES WORLDWIDE DATASET | PROC CONTENTS | PROC PRINT | PROC FREQ | PROC MEANS | PROC SORT | PROC REPORT| PROC SQL |MACRO SHOW_TOP_INDUSTRIES | MACRO COUNTRY_REPORT
100+ HISTORIC BUSINESSES WORLDWIDE DATASET | PROC CONTENTS | PROC PRINT | PROC FREQ | PROC MEANS | PROC SORT | PROC REPORT| PROC SQL |MACRO SHOW_TOP_INDUSTRIES | MACRO COUNTRY_REPORT
/*CREATING A DATASET OF 100+ HISTORIC BUSINESSES */
1.Macro lists for generating plausible names, countries, industries and founders
%let prefixes = Abernathy Burling Cortes Hanover Silkwood Tudor Imperial Royal Eastern
Atlantic Pioneer Union Continental Heritage OldTown Windsor Blackwell
Greenfield Lancaster Brighton Franklin Sterling Mayfair Cambridge;
%let suffixes = %str(Trading Co) %str(Mercantile) %str(& Co.) %str(Foundry Works)
%str(Emporium) %str(& Sons) %str(& Partners) %str(Consortium)
%str(Guild) %str(Company) %str(& Company) %str(Manufactories)
%str(Shipping House) %str(Warehouses) %str(Baker & Co.) %str(Brewer & Co.);
%let countries = United_Kingdom United_States France Germany Italy Spain India China Japan
Netherlands Portugal Sweden Norway Denmark Belgium Switzerland Austria
Turkey Russia Brazil Mexico Australia Canada South_Africa Egypt Morocco
Saudi_Arabia UAE Singapore Hong_Kong Indonesia Malaysia Vietnam Thailand;
%let industries = Textiles Shipping Brewing Banking Metalworking Ceramics Agriculture
SpiceTrading Printing Publishing Glassmaking Carpentry Shipbuilding
SilkTrade TeaTrade CoffeeTrade Insurance Retailing Mining LeatherTanning
Jewelry Watchmaking Porcelain;
%let founders = Thomas_Abbott William_Burling Jose_Cortes Hans_Hanover Li_Silkwood Raj_Tudor
Alessandro_Imperial Maria_Royal Omar_Eastern John_Atlantic Pierre_Pioneer
Ana_Union Karl_Continental Isabel_Heritage Suresh_OldTown Henry_Windsor
Paul_Blackwell Grace_Greenfield Marcus_Lancaster Sophia_Brighton
Noah_Franklin Emily_Sterling;
2.Count words in macro lists
%let npref = %sysfunc(countw(&prefixes));
%let nsuf = %sysfunc(countw(&suffixes));
%let nctry = %sysfunc(countw(&countries));
%let nind = %sysfunc(countw(&industries));
%let nfound= %sysfunc(countw(&founders));
options nocenter;
data old_businesses;
length Business_ID 8 Name $60 Country $30 Founded_Year 8
Industry $30 Founder $30 Status $8;
call streaminit(12345);
do Business_ID = 1 to 110;
/* Pick prefix, suffix, country, industry, founder deterministically for coverage */
pref = scan("&prefixes", mod(Business_ID-1,&npref)+1, ' ');
suf = scan("&suffixes", mod(Business_ID-1,&nsuf)+1, ' ');
Name = catx(' ', pref, suf);
Country = tranwrd(scan("&countries", mod(Business_ID-1,&nctry)+1, ' '), '_', ' ');
Industry = scan("&industries", mod(Business_ID-1,&nind)+1, ' ');
Founder = tranwrd(scan("&founders", mod(Business_ID-1,&nfound)+1, ' '), '_', ' ');
/* Older businesses: random year between 1600 and 1950, with 10% modern */
if rand('uniform') < 0.10 then
Founded_Year = 1951 + floor(rand('uniform')*70);
else
Founded_Year = 1600 + floor(rand('uniform')*351);
/* Status: many old firms defunct, some still active */
if Founded_Year < 1850 then do;
Status = ifc(rand('uniform')<0.5,'Defunct','Active');
end;
else do;
Status = ifc(rand('uniform')<0.7,'Active','Defunct');
end;
output;
end;
drop pref suf;
run;
proc print data=old_businesses(obs=20);
title "Sample of Historic Businesses";
run;
Output:
| Obs | Business_ID | Name | Country | Founded_Year | Industry | Founder | Status |
|---|---|---|---|---|---|---|---|
| 1 | 1 | Abernathy Trading | United Kingdom | 1948 | Textiles | Thomas Abbott | Active |
| 2 | 2 | Burling Co | United States | 1889 | Shipping | William Burling | Active |
| 3 | 3 | Cortes Mercantile | France | 1734 | Brewing | Jose Cortes | Defunct |
| 4 | 4 | Hanover & | Germany | 1781 | Banking | Hans Hanover | Active |
| 5 | 5 | Silkwood Co. | Italy | 1927 | Metalworking | Li Silkwood | Active |
| 6 | 6 | Tudor Foundry | Spain | 1635 | Ceramics | Raj Tudor | Active |
| 7 | 7 | Imperial Works | India | 1895 | Agriculture | Alessandro Imperial | Defunct |
| 8 | 8 | Royal Emporium | China | 1632 | SpiceTrading | Maria Royal | Active |
| 9 | 9 | Eastern & | Japan | 1845 | Printing | Omar Eastern | Active |
| 10 | 10 | Atlantic Sons | Netherlands | 1930 | Publishing | John Atlantic | Active |
| 11 | 11 | Pioneer & | Portugal | 1863 | Glassmaking | Pierre Pioneer | Defunct |
| 12 | 12 | Union Partners | Sweden | 1767 | Carpentry | Ana Union | Active |
| 13 | 13 | Continental Consortium | Norway | 1895 | Shipbuilding | Karl Continental | Defunct |
| 14 | 14 | Heritage Guild | Denmark | 1885 | SilkTrade | Isabel Heritage | Active |
| 15 | 15 | OldTown Company | Belgium | 1738 | TeaTrade | Suresh OldTown | Active |
| 16 | 16 | Windsor & | Switzerland | 1850 | CoffeeTrade | Henry Windsor | Active |
| 17 | 17 | Blackwell Company | Austria | 1724 | Insurance | Paul Blackwell | Defunct |
| 18 | 18 | Greenfield Manufactories | Turkey | 1798 | Retailing | Grace Greenfield | Defunct |
| 19 | 19 | Lancaster Shipping | Russia | 1989 | Mining | Marcus Lancaster | Defunct |
| 20 | 20 | Brighton House | Brazil | 1952 | LeatherTanning | Sophia Brighton | Defunct |
3.PROC CONTENTS -
purpose: show dataset structure and variable attributes in one listing
proc contents data=old_businesses position;
run;
Output:
The CONTENTS Procedure
| Data Set Name | WORK.OLD_BUSINESSES | Observations | 110 |
|---|---|---|---|
| Member Type | DATA | Variables | 7 |
| Engine | V9 | Indexes | 0 |
| Created | 09/07/2025 18:22:48 | Observation Length | 176 |
| Last Modified | 09/07/2025 18:22:48 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
| Encoding | utf-8 Unicode (UTF-8) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 131072 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 743 |
| Obs in First Data Page | 110 |
| Number of Data Set Repairs | 0 |
| Filename | /saswork/SAS_work662C0001A968_odaws02-apse1-2.oda.sas.com/SAS_work88120001A968_odaws02-apse1-2.oda.sas.com/old_businesses.sas7bdat |
| Release Created | 9.0401M8 |
| Host Created | Linux |
| Inode Number | 1186018 |
| Access Permission | rw-r--r-- |
| Owner Name | u63247146 |
| File Size | 256KB |
| File Size (bytes) | 262144 |
| Alphabetic List of Variables and Attributes | |||
|---|---|---|---|
| # | Variable | Type | Len |
| 1 | Business_ID | Num | 8 |
| 3 | Country | Char | 30 |
| 4 | Founded_Year | Num | 8 |
| 6 | Founder | Char | 30 |
| 5 | Industry | Char | 30 |
| 2 | Name | Char | 60 |
| 7 | Status | Char | 8 |
| Variables in Creation Order | |||
|---|---|---|---|
| # | Variable | Type | Len |
| 1 | Business_ID | Num | 8 |
| 2 | Name | Char | 60 |
| 3 | Country | Char | 30 |
| 4 | Founded_Year | Num | 8 |
| 5 | Industry | Char | 30 |
| 6 | Founder | Char | 30 |
| 7 | Status | Char | 8 |
4.PROC PRINT -
purpose: print first 15 observations for a quick visual check
proc print data=old_businesses(obs=15) label;
title 'First 15 Historic Businesses - Quick Look';
run;
Output:
| Obs | Business_ID | Name | Country | Founded_Year | Industry | Founder | Status |
|---|---|---|---|---|---|---|---|
| 1 | 1 | Abernathy Trading | United Kingdom | 1948 | Textiles | Thomas Abbott | Active |
| 2 | 2 | Burling Co | United States | 1889 | Shipping | William Burling | Active |
| 3 | 3 | Cortes Mercantile | France | 1734 | Brewing | Jose Cortes | Defunct |
| 4 | 4 | Hanover & | Germany | 1781 | Banking | Hans Hanover | Active |
| 5 | 5 | Silkwood Co. | Italy | 1927 | Metalworking | Li Silkwood | Active |
| 6 | 6 | Tudor Foundry | Spain | 1635 | Ceramics | Raj Tudor | Active |
| 7 | 7 | Imperial Works | India | 1895 | Agriculture | Alessandro Imperial | Defunct |
| 8 | 8 | Royal Emporium | China | 1632 | SpiceTrading | Maria Royal | Active |
| 9 | 9 | Eastern & | Japan | 1845 | Printing | Omar Eastern | Active |
| 10 | 10 | Atlantic Sons | Netherlands | 1930 | Publishing | John Atlantic | Active |
| 11 | 11 | Pioneer & | Portugal | 1863 | Glassmaking | Pierre Pioneer | Defunct |
| 12 | 12 | Union Partners | Sweden | 1767 | Carpentry | Ana Union | Active |
| 13 | 13 | Continental Consortium | Norway | 1895 | Shipbuilding | Karl Continental | Defunct |
| 14 | 14 | Heritage Guild | Denmark | 1885 | SilkTrade | Isabel Heritage | Active |
| 15 | 15 | OldTown Company | Belgium | 1738 | TeaTrade | Suresh OldTown | Active |
5.PROC FREQ -
purpose: frequency distribution for categorical variables
proc freq data=old_businesses;
tables Country Industry Status / nocum nopercent;
run;
Output:
The FREQ Procedure
| Country | Frequency |
|---|---|
| Australia | 3 |
| Austria | 3 |
| Belgium | 3 |
| Brazil | 3 |
| Canada | 3 |
| China | 4 |
| Denmark | 3 |
| Egypt | 3 |
| France | 4 |
| Germany | 4 |
| Hong Kong | 3 |
| India | 4 |
| Indonesia | 3 |
| Italy | 4 |
| Japan | 3 |
| Malaysia | 3 |
| Mexico | 3 |
| Morocco | 3 |
| Netherlands | 3 |
| Norway | 3 |
| Portugal | 3 |
| Russia | 3 |
| Saudi Arabia | 3 |
| Singapore | 3 |
| South Africa | 3 |
| Spain | 4 |
| Sweden | 3 |
| Switzerland | 3 |
| Thailand | 3 |
| Turkey | 3 |
| UAE | 3 |
| United Kingdom | 4 |
| United States | 4 |
| Vietnam | 3 |
| Industry | Frequency |
|---|---|
| Agriculture | 5 |
| Banking | 5 |
| Brewing | 5 |
| Carpentry | 5 |
| Ceramics | 5 |
| CoffeeTrade | 5 |
| Glassmaking | 5 |
| Insurance | 5 |
| Jewelry | 4 |
| LeatherTanning | 4 |
| Metalworking | 5 |
| Mining | 4 |
| Porcelain | 4 |
| Printing | 5 |
| Publishing | 5 |
| Retailing | 5 |
| Shipbuilding | 5 |
| Shipping | 5 |
| SilkTrade | 5 |
| SpiceTrading | 5 |
| TeaTrade | 5 |
| Textiles | 5 |
| Watchmaking | 4 |
| Status | Frequency |
|---|---|
| Active | 68 |
| Defunct | 42 |
6.PROC MEANS -
purpose: basic numeric summary of Founded_Year
proc means data=old_businesses min mean median max std n;
var Founded_Year;
run;
Output:
The MEANS Procedure
| Analysis Variable : Founded_Year | |||||
|---|---|---|---|---|---|
| Minimum | Mean | Median | Maximum | Std Dev | N |
| 1601.00 | 1809.61 | 1824.00 | 2012.00 | 105.9860728 | 110 |
7.PROC SORT + PROC REPORT -
purpose: sorted grouped report showing counts by country and earliest founded year
proc sort data=old_businesses out=obs_sorted; by Country Founded_Year;
run;
proc print data=obs_sorted(obs=20);
run;
Output:
| Obs | Business_ID | Name | Country | Founded_Year | Industry | Founder | Status |
|---|---|---|---|---|---|---|---|
| 1 | 90 | Greenfield Foundry | Australia | 1668 | Jewelry | William Burling | Active |
| 2 | 22 | Sterling Trading | Australia | 1840 | Watchmaking | Emily Sterling | Defunct |
| 3 | 56 | Royal Guild | Australia | 1887 | Publishing | Ana Union | Defunct |
| 4 | 17 | Blackwell Company | Austria | 1724 | Insurance | Paul Blackwell | Defunct |
| 5 | 85 | Continental Trading | Austria | 1772 | CoffeeTrade | Marcus Lancaster | Active |
| 6 | 51 | Cortes & | Austria | 1887 | Metalworking | Alessandro Imperial | Active |
| 7 | 15 | OldTown Company | Belgium | 1738 | TeaTrade | Suresh OldTown | Active |
| 8 | 49 | Abernathy Works | Belgium | 1757 | Brewing | Li Silkwood | Defunct |
| 9 | 83 | Pioneer House | Belgium | 1803 | SilkTrade | Paul Blackwell | Defunct |
| 10 | 88 | Windsor & | Brazil | 1635 | Mining | Emily Sterling | Active |
| 11 | 54 | Tudor Partners | Brazil | 1862 | SpiceTrading | John Atlantic | Defunct |
| 12 | 20 | Brighton House | Brazil | 1952 | LeatherTanning | Sophia Brighton | Defunct |
| 13 | 91 | Lancaster Works | Canada | 1674 | Watchmaking | Jose Cortes | Active |
| 14 | 23 | Mayfair Co | Canada | 1680 | Porcelain | Thomas Abbott | Defunct |
| 15 | 57 | Eastern Company | Canada | 1864 | Glassmaking | Karl Continental | Active |
| 16 | 8 | Royal Emporium | China | 1632 | SpiceTrading | Maria Royal | Active |
| 17 | 110 | Heritage Co. | China | 1739 | Retailing | Emily Sterling | Active |
| 18 | 42 | Greenfield Warehouses | China | 1881 | Mining | Sophia Brighton | Active |
| 19 | 76 | Hanover Consortium | China | 1936 | Agriculture | John Atlantic | Active |
| 20 | 14 | Heritage Guild | Denmark | 1885 | SilkTrade | Isabel Heritage | Active |
proc report data=obs_sorted nowd;
column Country n Founded_Year=MinYear Founded_Year=MaxYear;
define Country / group;
define n / 'Count';
define MinYear / min 'Earliest Founded Year';
define MaxYear / max 'Latest Founded Year';
run;
Output:
| Country | Count | Earliest Founded Year | Latest Founded Year |
|---|---|---|---|
| Australia | 3 | 1668 | 1887 |
| Austria | 3 | 1724 | 1887 |
| Belgium | 3 | 1738 | 1803 |
| Brazil | 3 | 1635 | 1952 |
| Canada | 3 | 1674 | 1864 |
| China | 4 | 1632 | 1936 |
| Denmark | 3 | 1885 | 1953 |
| Egypt | 3 | 1656 | 1983 |
| France | 4 | 1734 | 1951 |
| Germany | 4 | 1749 | 1962 |
| Hong Kong | 3 | 1601 | 1886 |
| India | 4 | 1787 | 1895 |
| Indonesia | 3 | 1659 | 1937 |
| Italy | 4 | 1768 | 1927 |
| Japan | 3 | 1642 | 1920 |
| Malaysia | 3 | 1703 | 1898 |
| Mexico | 3 | 1672 | 1844 |
| Morocco | 3 | 1701 | 1837 |
| Netherlands | 3 | 1700 | 1930 |
| Norway | 3 | 1726 | 1895 |
| Portugal | 3 | 1722 | 1863 |
| Russia | 3 | 1663 | 1989 |
| Saudi Arabia | 3 | 1639 | 1699 |
| Singapore | 3 | 1805 | 2012 |
| South Africa | 3 | 1841 | 1889 |
| Spain | 4 | 1621 | 1707 |
| Sweden | 3 | 1767 | 1849 |
| Switzerland | 3 | 1811 | 1850 |
| Thailand | 3 | 1900 | 1901 |
| Turkey | 3 | 1694 | 1798 |
| UAE | 3 | 1714 | 1940 |
| United Kingdom | 4 | 1745 | 1979 |
| United States | 4 | 1660 | 1956 |
| Vietnam | 3 | 1692 | 1983 |
8.PROC SQL -
purpose: create summary tables and an example join / aggregation
proc sql;
create table industry_summary as
select Industry,
count(*) as N_Businesses,
min(Founded_Year) as Earliest format=4.,
max(Founded_Year) as Latest format=4.
from old_businesses
group by Industry
order by N_Businesses desc;
quit;
proc print data=industry_summary(obs=20);run;
Output:
| Obs | Industry | N_Businesses | Earliest | Latest |
|---|---|---|---|---|
| 1 | CoffeeTrade | 5 | 1621 | 1876 |
| 2 | SpiceTrading | 5 | 1632 | 1920 |
| 3 | TeaTrade | 5 | 1699 | 1863 |
| 4 | Printing | 5 | 1692 | 1898 |
| 5 | Shipping | 5 | 1770 | 1953 |
| 6 | Ceramics | 5 | 1635 | 1938 |
| 7 | Textiles | 5 | 1740 | 1983 |
| 8 | Metalworking | 5 | 1699 | 2012 |
| 9 | Brewing | 5 | 1639 | 1950 |
| 10 | Glassmaking | 5 | 1745 | 1900 |
| 11 | Publishing | 5 | 1722 | 1983 |
| 12 | Shipbuilding | 5 | 1656 | 1951 |
| 13 | Carpentry | 5 | 1726 | 1896 |
| 14 | SilkTrade | 5 | 1701 | 1962 |
| 15 | Agriculture | 5 | 1676 | 1936 |
| 16 | Banking | 5 | 1650 | 1940 |
| 17 | Insurance | 5 | 1694 | 1805 |
| 18 | Retailing | 5 | 1601 | 1850 |
| 19 | Porcelain | 4 | 1680 | 1979 |
| 20 | Mining | 4 | 1635 | 1989 |
9.Macro to print top N industries with a custom header
%macro show_top_industries(n=5);
%put NOTE: Showing top &n industries by business count;
proc print data=industry_summary(obs=&n) noobs; title "Top &n Industries"; run;
%mend show_top_industries;
%show_top_industries(n=8)
Output:
| Industry | N_Businesses | Earliest | Latest |
|---|---|---|---|
| CoffeeTrade | 5 | 1621 | 1876 |
| SpiceTrading | 5 | 1632 | 1920 |
| TeaTrade | 5 | 1699 | 1863 |
| Printing | 5 | 1692 | 1898 |
| Shipping | 5 | 1770 | 1953 |
| Ceramics | 5 | 1635 | 1938 |
| Textiles | 5 | 1740 | 1983 |
| Metalworking | 5 | 1699 | 2012 |
10.Macro to create a country-specific dataset and basic stats
%macro country_report(country=United States);
%let cname = %sysfunc(tranwrd(&country, %str( ), _));
proc sql;
create table &cname._biz as
select * from old_businesses
where Country = "%sysfunc(tranwrd(&country,_ , ))";
quit;
proc means data=&cname._biz min mean median max n;
var Founded_Year;
title "Summary for &country";
run;
%mend country_report;
%country_report(country=United States)
Output:
The MEANS Procedure
| Analysis Variable : Founded_Year | ||||
|---|---|---|---|---|
| Minimum | Mean | Median | Maximum | N |
| 1660.00 | 1810.00 | 1812.00 | 1956.00 | 4 |
11.Validation / QC: check for duplicates on Name + Founded_Year
proc sort data=old_businesses nodupkey dupout=dups;
by Name Founded_Year;
run;
proc print data=dups;
title 'Duplicate Name-Founded_Year combinations (if any)';
run;
Log:
NOTE: 0 observations with duplicate key values were deleted.
NOTE: No observations in data set WORK.DUPS.
Comments
Post a Comment