Remote WRDS SAS from R

Here’s an illustration of how once can get something like PC Connect SAS from R. The following code takes a ticker and a date and then pulls TAQ consolidated trade and quote data from WRDS. Note that this is probably not the most compelling application of this approach (it’s probably more efficient to do more processing in SAS before shunting data back to R), but this is easily adapted for other purposes.

date <- as.Date("2012-01-06")
ticker <- "GM"

generate_sas_code <- function(ticker, date, tq="t") {
    # This function generates SAS code that will be piped into SAS
    paste("
    PROC SQL;
        CREATE TABLE temp AS
        SELECT * FROM taq.c", tq, "_", format(date, "%Y%m%d"),
        " WHERE symbol =\"", ticker, "\";
    quit;

    proc export data=temp
        outfile=stdout
        dbms=csv;
    run;", sep="")
}

get_data <- function(ticker, date, tq="t") {
    sas_code <- generate_sas_code(ticker, date, tq)
    temp_file <- tempfile()
    # This command calls SAS on the remote server.
    # -C means "compress output" ... this seems to have an impact even though we're
    # using gzip for compression of the CSV file spat out by SAS after it's
    # been transferred to the local computer (trial and error suggested this was
    # the most efficient approach).
    # -stdio means that SAS will take input from STDIN and output to STDOUT
    sas_command <- paste("ssh -C iangow@wrds.wharton.upenn.edu ",
                         "'sas -stdio -noterminal' | gzip > ",
                         temp_file)
    
    # The following pipes the SAS code to the SAS command. The "intern=TRUE"
    # means that we can capture the output in an R variable.
    system(paste("echo '", sas_code, "' |", sas_command), intern=TRUE)
    read.csv(temp_file, as.is=TRUE)
}

# Now get the data from WRDS
system.time(trades <- get_data(ticker, date, tq="t"))
system.time(quotes <- get_data(ticker, date, tq="q"))

Note that to get this working, I needed to get public-key authentication going on WRDS/my computer. Following hints taken from here, I set up a public key. I then copied that key to the WRDS server from the terminal on my computer. (Note that this code assumes you have a directory “.ssh” in your home directory. If not, log into WRDS via SSH, then type “mkdir ~/.ssh” to create this.)

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub | ssh iangow@wrds.wharton.upenn.edu "cat >> ~/.ssh/authorized_keys"
Advertisements
This entry was posted in R, SAS and tagged , . Bookmark the permalink.

19 Responses to Remote WRDS SAS from R

  1. Hess says:

    Does this approach require a SAS license?

    • iangow says:

      It requires access to WRDS, which gives access to SAS. I think there’s no easy way to access native SAS data without SAS. StatTransfer is the only way that I know of.

  2. Wayne Smith says:

    Or export it from SAS (actually, the web interface in WRDS does this automagically) and use the “foreign” package in R to read the sas7bdat file. In general, I try to avoid the use of SAS to avoid the dependency, but sometimes (argh!) it can’t be worked-around.

    • iangow says:

      I guess WRDS is using a SAS program behind the scenes. I find going via the web interface far too manual. I have scripts with names like “update_comp.sh” that are far simpler for handling updates of my database.

  3. Wayne Smith says:

    Actually, I forgot a step–the foreign package in R only reads the XPORT format, not the SAS7BDAT format directly. On Windows, one can use the (free as in beer) “dsread” program from the folks at:

    http://www.oview.co.uk/dsread/

    to convert the sas7bdat file to a CSV file–and then it can be read into R with read.csv.

    • iangow says:

      I don’t use Windows and don’t have a copy of SAS. I use StatTransfer to convert some files to CSV that don’t work using PROC EXPORT due to special missing values (PROC EXPORT converts these to text, confounding my import-to-PostgreSQL script).

  4. Wayne Smith says:

    I’ll bet “dsread” runs on Wine (perhaps I’ll test that on Ubuntu tonight). It’s not optimal, but perhaps it’s serviceable. Related topic; I’m surprised that WRDS doesn’t have R available in the WRDS cloud yet. i nice python/pandas and Java (machine learning) stack would be useful too (much more than SAS, IMHO).

    • iangow says:

      Python is available in the WRDS cloud, though I’ve only used it a little for scripting SAS (so no idea about pandas … though I suspect not).

      The approach here works for most WRDS SAS data sets (e.g., I have scripts that allow me to update my version of CRSP or Compustat with a single command).

      From PostgreSQL data are easily accessed from R, Perl, and Python (see here for a simple example).

      Another approach that might avoid the need for other software might be to export SAS to Stata directly. Though be careful with dates with this approach (they may be converted to integers and I can’t recall whether they are in SAS (origin=1960-01-01) or Stata (origin=1970-01-01) format. Actually, this seems to work; see here.

  5. Pingback: Remote WRDS SAS from R via Stata .dta format | iangow

  6. jiasunli says:

    Great post! One more question (maybe a stupid one): how do you convert an absolute path to an SQL path? To be specific, in “SELECT * FROM taq.c”, tq, “_”, format(date, “%Y%m%d”),..”, how do you translate the absolute path on WRDS “/wrds/taq/sasdata/ct_20120101.sas7bdat@” into “taq.ct_20120101” ? If instead now I want to get “/wrds/nyse/sasdata/wrds_taqs_nbbo/nbbo_20120101.sas7bdat ” from the wrds-cloud server, how shall I translate the absolute path into SQL? I tried “SELECT * FROM nyse.sasdata.wrds_taqs_nbbo.nbbo_20120101;” but it gives an empty file. Thanks!

  7. iangow says:

    When I run SAS on WRDS, it creates library names for me and I use those. For example, from the SAS log:

    NOTE: Libref TAQ was successfully assigned as follows:
    Engine: V9
    Physical Name: /wrds/taq/sasdata

    If SAS doesn’t create a LIBNAME for the NBBO data, then you may have to tweak the code to do so. Something like

    libname nbbo '/wrds/nyse/sasdata/wrds_taqs_nbbo/';

    You could perhaps add this to autoexec.sas in your home directory on WRDS.

    • Jiasun Li says:

      thank you very much. this helps a lot.
      a separate point just FYI: although this calling-sas-from-r script is a perfect tool for remote accessing wrds server. wrds-cloud server does not accept it. we have to write everything entirely in sas if we would like to use the wrds-cloud server. also seems for the r-script more than half of the execution time was for the autoexec.sas (of course we can comment out most irrelevant parts in autoexec.sas to make the whole execution faster).

      • iangow says:

        You can easily modify the script to run on the wrds-cloud server. For example, this works fine:
        echo 'proc contents data=crsp.msf' | ssh -C iangow@wrds-cloud.wharton.upenn.edu 'qsas -stdio -noterminal'.

      • Jiasun Li says:

        Thank you so much! Just tried on WRDS-cloud. It works very well!

  8. Jiasun Li says:

    btw just FYI Joel Hasbrouck’s note “The Best Bid and Offer: A Short Note on Programs and Practices” recommends the usage of “set by” against “PROC SQL” to preserve intra-second data order. Depending on the research question this may or may not matter.

  9. Damien Benacon says:

    iangow- somehow the rsa part does not work for me. I did “ssh-keygen -t rsa” in the wrds terminal and “cat ~/.ssh/id_rsa.pub | ssh username@wrds.wharton.upenn.edu “cat >> ~/.ssh/authorized_keys”” afterwards. Nonetheless I have to enter my password when running the R code locally.

  10. shash says:

    Hi! Thanks for posting this. I used to run access words using R without any issues before. However, on my new macbook, I keep getting the message
    ssh_askpass: exec(/usr/libexec/ssh-askpass): No such file or directory
    Permission denied (gssapi-keyex,gssapi-with-mic,publickey,keyboard-interactive).

    Please let me know if you have any suggestions. Thanks a lot!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s