RPostgreSQL (R) versus psycopg (Python)

Here’s a rather simplistic comparison of the performance of R and Python in getting data from PostgreSQL.

R:

system.time({
  require(RPostgreSQL)
  drv <- dbDriver("PostgreSQL")
  pg <- dbConnect(drv, dbname="crsp") 

  # Get 1,000,000 records from the CRSP daily stock file
  df <- dbGetQuery(pg, 'SELECT * FROM crsp.dsf LIMIT 1000000')

  # Print the PERMNOs for the first ten records
  print(df[1:10, "permno"])
})

Time taken: 26-27 seconds.

Python:

import psycopg2 as pg
import pandas as pd
from pandas.io.sql import frame_query

conn = pg.connect(dbname='crsp')
df = frame_query('SELECT * FROM crsp.dsf LIMIT 1000000', con=conn)

print(df['permno'][:10])

Time taken: Around 6 seconds. I’m impressed. I’ve never really been bothered by performance in getting data into R (perhaps because, oftentimes, I will process data in PostgreSQL and just bring a small dataset into R). But it seems that Python (and the packages used above) do a nice job here.

Advertisements
This entry was posted in PostgreSQL, R and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s