PL/R is module that allows PostgreSQL to access R functionality.
Here is an illustration of a feature recently added to PL/R. This is meant to represent a “panel” of ten-year time series for 100 firms numbered 1 through 100.
DROP TABLE IF EXISTS test_data; CREATE TABLE test_data ( fyear integer, firm float8, eps float8 ); INSERT INTO test_data SELECT (b.f + 1) % 10 + 2000 AS fyear, floor((b.f+1)/10) + 50 AS firm, f::float8/100 + random()/10 AS eps FROM generate_series(-500,499,1) b(f);
I then run the following queries. The SQL version takes 115ms on my
hardware, the PL/R version 1528ms (this is pretty unscientific, as I don’t
require 9 years of data for the SQL version); so there’s a potential tradeoff between performance and the flexibility of R.
The good news is that most of the 1528ms is R cranking away on the regression; if I replace the $BODY$ of
the function with “return(1)”, the time comes down to 97ms for the PL/R
version. So standard approaches to speeding up on the R side would seem
to apply. (I used “float8” for the types, but didn’t notice a performance
CREATE OR REPLACE FUNCTION r_regr_slope(float8, float8) RETURNS float8 AS $BODY$ slope <- NA y <- farg1 x <- farg2 if (fnumrows==9) try (slope <- lm(y ~ x)$coefficients) return(slope) $BODY$ LANGUAGE plr WINDOW; SELECT *, regr_slope(eps, lag_eps) OVER w AS slope FROM (SELECT firm, fyear, eps, lag(eps) OVER (ORDER BY firm, fyear) AS lag_eps FROM test_data) AS a WHERE eps IS NOT NULL WINDOW w AS (ORDER BY firm, fyear ROWS 8 PRECEDING); SELECT *, r_regr_slope(eps, lag_eps) OVER w AS slope_R FROM (SELECT firm, fyear, eps, lag(eps) OVER (ORDER BY firm, fyear) AS lag_eps FROM test_data) AS a WHERE eps IS NOT NULL WINDOW w AS (ORDER BY firm, fyear ROWS 8 PRECEDING);