For one reason or another, last night I was partially replicating a paper by Beaver, McNichols, and Nelson (2007). I pulled together some code in R that draws on my PostgreSQL database, produces some summary statistics like those in the paper, produces a histogram and does a little statistical inference. Total runtime: around 9 seconds. Pretty fast.
Anyway, I discovered this morning that there is already SAS code out there to do much of this using WRDS. Just for comparison, I pulled out just part of the data step and got it running on the WRDS server. Run time: 23 minutes. Big difference.
Then I noticed more straightforward code to replicate Richardson, Sloan, Soliman, and Tuna (2005). I ran this code on WRDS:
proc sql; create table comp1 as select gvkey, datadate, ACT, CHE, LCT, DLC, AT, IVAO, LT, DLTT, IVST, DLC, PSTK, OIADP from comp.funda where INDFMT='INDL'and DATAFMT='STD'and POPSRC='D' and CONSOL='C' and 1962 <= year(datadate) <= 2001 and missing(AT)=0 and missing(CHE)=0 and missing(ACT)=0 and missing(LCT)=0 and missing(SALE)=0 and missing(LT)=0 and missing(OIADP)=0 and prcc_f > 1; quit;
NOTE: PROCEDURE SQL used (Total process time): real time 33.87 seconds cpu time 10.05 seconds
Now, a little tweaking of the SQL gives me code I can run in PostgreSQL:
CREATE TABLE comp1 AS SELECT gvkey, datadate, act, che, lct, at, ivao, lt, dltt, ivst, dlc, pstk, oiadp FROM comp.funda WHERE indfmt='INDL' AND datafmt='STD' AND popsrc='D' and consol='C' AND EXTRACT(YEAR FROM datadate) BETWEEN 1962 AND 2001 AND at IS NOT NULL and che IS NOT NULL AND act IS NOT NULL AND LCT IS NOT NULL AND sale IS NOT NULL and lt IS NOT NULL AND oiadp IS NOT NULL AND prcc_f > 1;
Runtime data: 2.27 seconds. I also did a version that used the RPostgreSQL package to pull the data into R:
> system.time(source("~/Dropbox/rsst.R")) user system elapsed 0.940 0.120 3.242
Not as big of a difference (a mere ten-or-so times), but still pretty impressive.