RPostgreSQL (R) versus psycopg (Python)

Here’s a rather simplistic comparison of the performance of R and Python in getting data from PostgreSQL.


  drv <- dbDriver("PostgreSQL")
  pg <- dbConnect(drv, dbname="crsp") 

  # Get 1,000,000 records from the CRSP daily stock file
  df <- dbGetQuery(pg, 'SELECT * FROM crsp.dsf LIMIT 1000000')

  # Print the PERMNOs for the first ten records
  print(df[1:10, "permno"])

Time taken: 26-27 seconds.


import psycopg2 as pg
import pandas as pd
from pandas.io.sql import frame_query

conn = pg.connect(dbname='crsp')
df = frame_query('SELECT * FROM crsp.dsf LIMIT 1000000', con=conn)


Time taken: Around 6 seconds. I’m impressed. I’ve never really been bothered by performance in getting data into R (perhaps because, oftentimes, I will process data in PostgreSQL and just bring a small dataset into R). But it seems that Python (and the packages used above) do a nice job here.

This entry was posted in PostgreSQL, R and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s