Alternative dbWriteTable function for RPostgreSQL

The following function wraps around the dbWriteTable function from RPostgreSQL to provide a little more in terms of data types (e.g., keeps Dates as Dates) and fixing some gremlins with backslashes and double-quotes. It also gives the option to convert names to lower case (avoiding the need to quote them in SQL queries if there is mixed text) and to convert what would otherwise be TEXT fields to VARCHAR.

UPDATE (2011-05-12): Note that this code is very slow. More recent versions of RPostgreSQL under development handle several data types better, though the backslash issue persists. For the cases where I have backslashes in my data, I use a direct call to psql from R . A bit of a kluge and with a number of limitations, but it gets the data into PostgreSQL.

dbWriteTable.plus <- function(conn, name, value, row.names = TRUE, 
                              text.to.char=TRUE, fix.names=TRUE,
                              ..., overwrite = FALSE, append = FALSE, 
                              temp.table=FALSE) {  

  # Fix backslashes and double-quotes
  temp <- as.data.frame(sapply(value,gsub,
    pattern="\\\\",replacement="\\\\\\\\",perl=TRUE),stringsAsFactors=FALSE)
  temp <- as.data.frame(sapply(temp,gsub,
    pattern="\\\"",replacement="\\\\\"",perl=TRUE),stringsAsFactors=FALSE)
  
	names(value) <- names(temp) <- make.db.names(conn,names(value))

  # If asked to do so in the call, make names lower case to avoid
  # need to quote variable names in SQL queries if mixed case is present.
  if (fix.names) {
    names(value) <- names(temp) <- gsub("(.*)","\\L\\1",names(temp), perl=TRUE)
  }
  
  # Call the underlying RPostgreSQL function
  rs <- dbWriteTable(conn, name, value=temp, row.names = row.names,
              overwrite=overwrite, append=append, temp.table=temp.table)
  
  # Convert fields to appropriate types. This is necessary because the
  # gsub() calls above converted everything to characters. But the following
  # also retains Dates, which the basic function does not do, and allows
  # character vectors to be converted to VARCHAR() in PostgreSQL.
	# An alternative would seem to be to cast the elements of the data.frame
	# back to the appropriate types before sending to PostgreSQL.
  for (i in names(value) ) {
    # Text to VARCHAR (unless asked not to do so)
    if (is.character(value[,i]) && text.to.char) {
      sql <- paste("ALTER TABLE ",name," ALTER COLUMN \"",i,
                   "\" TYPE varchar(", 
                   max(nchar(encodeString(value[,i]))),")",sep="")
      dbGetQuery(conn,sql)

    } else if (is.integer(value[,i])) {
      # Integers
      sql <- paste("ALTER TABLE ",name," ALTER COLUMN \"",i,"\" TYPE bigint ",
        "USING CAST(\"",i,"\" AS bigint)",sep="")
      dbGetQuery(conn,sql)

    } else if (class(value[,i])=="Date") {
      # Dates. Needs to be done before conversion of doubles (below)
      sql <- paste("ALTER TABLE ",name," ALTER COLUMN \"",i,"\" TYPE date ",
        "USING to_date(CAST(\"",i,"\" AS character(10)), 'YYYY-MM-DD')",sep="")
      dbGetQuery(conn,sql)

    } else if(is.double(value[,i])) {
      # Doubles
      sql <- paste("ALTER TABLE ",name," ALTER COLUMN \"",i,
        "\" TYPE double precision",
        " USING CAST(\"",i,"\" AS  double precision)",sep="")
      dbGetQuery(conn,sql)
    }  
  }  
  
  # Vacuum the table (a need arises because of changes to data types, I guess)
  dbGetQuery(conn,paste("VACUUM",name))

  # Return the value returned by dbWriteTable
  return(rs)
}
Advertisements
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s