IV regression and two-way cluster-robust standard errors

As a follow-up to an earlier post, I was pleasantly surprised to discover that the code to handle two-way cluster-robust standard errors in R that I blogged about earlier worked out of the box with the IV regression routine available in the AER package (ivreg).

# Tests of R function using Mitchell Petersen's test-data

# Read the data             
test <- read.table(
      url(paste("http://www.kellogg.northwestern.edu/",
            "faculty/petersen/htm/papers/se/",
            "test_data.txt",sep="")),
    col.names=c("firmid", "year", "x", "y"))

# Create a silly instrument (x plus noise)
test <- within(test, z <- x + rnorm(length(x), mean=0, sd=sd(x)/3))

# The fitted model
library(AER)
fm <- ivreg(y ~ x | z, data=test)

# Tests
library(sandwich); library(lmtest)
coeftest(fm)                                    # OLS
coeftest(fm, vcov=vcovHC(fm, type="HC0"))       # White
source("http://iangow.me/~igow/code/cluster2.R")
coeftest.cluster(test,fm, cluster1="firmid")    # Clustered by firm
coeftest.cluster(test,fm, cluster1="year")      # Clustered by year
coeftest.cluster(test,fm, cluster1="firmid",
                          cluster2="year")      # Clustered by firm and year

# Save the data to a Stata file for comparison
library(foreign)
write.dta(dataframe=test, file="~/Dropbox/WRDS/petersen.dta")

Here’s the output

> test <- within(test, z <- x + rnorm(length(x), mean=0, sd=sd(x)/3))
Error in within(test, z <- x + rnorm(length(x), mean = 0, sd = sd(x)/3)) : 
  object 'test' not found
> test <- read.table(
+       url(paste("http://www.kellogg.northwestern.edu/",
+             "faculty/petersen/htm/papers/se/",
+             "test_data.txt",sep="")),
+     col.names=c("firmid", "year", "x", "y"))
> 
> test <- within(test, z <- x + rnorm(length(x), mean=0, sd=sd(x)/3))
> fm <- ivreg(y ~ x | z, data=test)
Error: could not find function "ivreg"
> library(AER)
Loading required package: car
Loading required package: MASS
Loading required package: nnet
Loading required package: survival
Loading required package: splines
Loading required package: Formula
Loading required package: lmtest
Loading required package: zoo

Attaching package: ‘zoo’

The following object(s) are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: sandwich
Loading required package: strucchange
> fm <- ivreg(y ~ x | z, data=test)
> library(sandwich); library(lmtest)
> coeftest(fm)                                    # OLS

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029685   0.028359  1.0468   0.2953    
x           1.033802   0.030142 34.2982   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fm, vcov=vcovHC(fm, type="HC0"))       # White

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029685   0.028354  1.0469   0.2952    
x           1.033802   0.029851 34.6326   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest.cluster(test,fm, cluster1="firmid")    # Clustered by firm
Error: could not find function "coeftest.cluster"
> coeftest.cluster(test,fm, cluster1="year")      # Clustered by year
Error: could not find function "coeftest.cluster"
> source("~/Dropbox/AGL/Code/R/cluster2.R")
> coeftest.cluster(test,fm, cluster1="firmid")    # Clustered by firm

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029685   0.067011   0.443   0.6578    
x           1.033802   0.051423  20.104   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest.cluster(test,fm, cluster1="year")      # Clustered by year

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029685   0.023395  1.2688   0.2046    
x           1.033802   0.036246 28.5219   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest.cluster(test,fm, cluster1="firmid",
+                           cluster2="year")      # Clustered by firm and year

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029685   0.065065  0.4562   0.6482    
x           1.033802   0.055377 18.6684   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> library(foreign)
> write.dta(dataframe=test, file="~/Dropbox/WRDS/petersen.dta")

Now, compare with Stata. First, I use the in-built ivreg routine, then I consider the ivreg2 routine available here.
It seems that the R routine produces identical estimates of standard errors to Stata’s ivreg routine, which only handles one-way clustering, but both produce different estimates from the ivreg2 routine. My guess is that it’s some differences in the degrees-of-freedom correction used; the numbers are fairly close.

. use "~/Dropbox/WRDS/petersen.dta"
(Written by R.              )

. ivreg y (x=z)

Instrumental variables (2SLS) regression

      Source |       SS       df       MS              Number of obs =    5000
-------------+------------------------------           F(  1,  4998) = 1176.37
       Model |  5270.65875     1  5270.65875           Prob > F      =  0.0000
    Residual |  20097.6441  4998  4.02113727           R-squared     =  0.2078
-------------+------------------------------           Adj R-squared =  0.2076
       Total |  25368.3028  4999   5.0746755           Root MSE      =  2.0053

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.033802   .0301416    34.30   0.000     .9747116    1.092893
       _cons |   .0296853   .0283594     1.05   0.295    -.0259115    .0852821
------------------------------------------------------------------------------
Instrumented:  x
Instruments:   z
------------------------------------------------------------------------------

. ivreg y (x=z), cluster(firmid)

Instrumental variables (2SLS) regression               Number of obs =    5000
                                                       F(  1,   499) =  404.17
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2078
                                                       Root MSE      =  2.0053

                               (Std. Err. adjusted for 500 clusters in firmid)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.033802   .0514227    20.10   0.000     .9327707    1.134834
       _cons |   .0296853   .0670107     0.44   0.658    -.1019726    .1613431
------------------------------------------------------------------------------
Instrumented:  x
Instruments:   z
------------------------------------------------------------------------------

. ivreg y (x=z), cluster(year)

Instrumental variables (2SLS) regression               Number of obs =    5000
                                                       F(  1,     9) =  813.50
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2078
                                                       Root MSE      =  2.0053

                                  (Std. Err. adjusted for 10 clusters in year)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.033802   .0362459    28.52   0.000     .9518085    1.115796
       _cons |   .0296853   .0233954     1.27   0.236    -.0232389    .0826094
------------------------------------------------------------------------------
Instrumented:  x
Instruments:   z
------------------------------------------------------------------------------

. ivreg2 y (x=z), cluster(year)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on year

Number of clusters (year) =         10                Number of obs =     5000
                                                      F(  1,     9) =   813.50
                                                      Prob > F      =   0.0000
Total (centered) SS     =  25368.30284                Centered R2   =   0.2078
Total (uncentered) SS   =  25374.51146                Uncentered R2 =   0.2080
Residual SS             =  20097.64409                Root MSE      =    2.005

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.033802   .0343824    30.07   0.000     .9664141    1.101191
       _cons |   .0296853   .0221926     1.34   0.181    -.0138115     .073182
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):              9.987
                                                   Chi-sq(1) P-val =    0.0016
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):              4.5e+04
                         (Kleibergen-Paap rk Wald F statistic):        6.1e+04
Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                         15% maximal IV size              8.96
                                         20% maximal IV size              6.66
                                         25% maximal IV size              5.53
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.000
                                                 (equation exactly identified)
------------------------------------------------------------------------------
Instrumented:         x
Excluded instruments: z
------------------------------------------------------------------------------

. ivreg2 y (x=z), cluster(firmid)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on firmid

Number of clusters (firmid) =      500                Number of obs =     5000
                                                      F(  1,   499) =   404.17
                                                      Prob > F      =   0.0000
Total (centered) SS     =  25368.30284                Centered R2   =   0.2078
Total (uncentered) SS   =  25374.51146                Uncentered R2 =   0.2080
Residual SS             =  20097.64409                Root MSE      =    2.005

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.033802   .0513661    20.13   0.000     .9331267    1.134478
       _cons |   .0296853   .0669369     0.44   0.657    -.1015087    .1608792
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):            308.025
                                                   Chi-sq(1) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):              4.5e+04
                         (Kleibergen-Paap rk Wald F statistic):        3.2e+04
Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                         15% maximal IV size              8.96
                                         20% maximal IV size              6.66
                                         25% maximal IV size              5.53
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.000
                                                 (equation exactly identified)
------------------------------------------------------------------------------
Instrumented:         x
Excluded instruments: z
------------------------------------------------------------------------------

. ivreg2 y (x=z), cluster(firmid fyear)
variable fyear not found
in option cluster()
r(111);

. ivreg2 y (x=z), cluster(firmid year)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on firmid and year

Number of clusters (firmid) =      500                Number of obs =     5000
Number of clusters (year) =         10                F(  1,     9) =   328.27
                                                      Prob > F      =   0.0000
Total (centered) SS     =  25368.30284                Centered R2   =   0.2078
Total (uncentered) SS   =  25374.51146                Uncentered R2 =   0.2080
Residual SS             =  20097.64409                Root MSE      =    2.005

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.033802   .0541255    19.10   0.000     .9277184    1.139886
       _cons |   .0296853   .0645686     0.46   0.646     -.096867    .1562375
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):              9.731
                                                   Chi-sq(1) P-val =    0.0018
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):              4.5e+04
                         (Kleibergen-Paap rk Wald F statistic):        4.0e+04
Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                         15% maximal IV size              8.96
                                         20% maximal IV size              6.66
                                         25% maximal IV size              5.53
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.000
                                                 (equation exactly identified)
------------------------------------------------------------------------------
Instrumented:         x
Excluded instruments: z
------------------------------------------------------------------------------


Advertisements
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

8 Responses to IV regression and two-way cluster-robust standard errors

  1. rasmus says:

    I don’t fully comprehend the R script, I must admit, but would you not need to correct the SE when using IV? When using OLS on real data, the script above give me almost identical SE as Stata, but when using AER ivreg there are rather big differences between Stata and R.

    Here is a page on clustered errors in IV is SAS:
    http://works.bepress.com/tbrachet/2/

    • iangow says:

      Absolutely one may need to use cluster-robust standard errors with IV (OLS is simply a special case of IV with X=Z). I’ve found AER ivreg and Stata’s ivreg to give the same results. I believe this is the case for the example on the blog posting.

      • rasmus says:

        Perhaps, my question was unclear (english is my second language). In
        general, should the coeftest.cluster() function give the correct
        ‘theoretical’ sample clustered SEs when using IV (say using AER’s
        ivreg)?

        For example, doing manual TSLS I’d need to correct my SEs ex post, and
        I wondered whether the coeftest.cluster()/ivreg() are `smart’ enough
        to account for this (i.e. does ivreg provides measures in such a way
        so that you’d expect clustered SEs to correct sample estimates?).

        As said, when using real data I do at times get somewhat different
        t-stats, which could be just different coding, but could also be due
        to coeftest.cluster() `assuming’ OLS when it is not appropriate.

        Thanks again.

      • iangow says:

        coeftest.cluster() takes a fitted model as an input and, if the fitted model comes from AER’s ivreg(), it produces standard errors equal to those given by Stata’s ivreg for the same data and clustering. So it won’t “assume OLS when it is not appropriate.”

  2. rasmus says:

    Thanks for clearing that up Ian. I appreciate it. When I’ve got better time I will have to dig further in to core R functions 🙂

    Thanks again

  3. I got error ‘undefined column selected’ when my model includes factor variables. Can you make the function works with factor regressors? Thanks

    • iangow says:

      Muhammad:

      It must be something more than the inclusion of factor variables. If I modify the code to include

      test$year <- as.factor(test$year)

      it still works fine.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s