9 Fixed Effects
9.1 Time-constant Variables
Panel data allows us to control for variables that are constant over time, even if these variables are not directly observable.
Consider a basic panel regression model: Y_{it} = \beta_1 + \beta_2 X_{it} + \beta_3 Z_i + u_{it}. \tag{9.1} Here, Z_i represents a variable that does not change over time and is specific to an individual (e.g., gender, ethnicity, parental education).
For simplicity, assume here that observations are only available for two time periods (t=1 and t=2). We can focus on the changes between these periods.
Subtracting the right-hand side of Equation 9.1 at t=1 from t=2 gives \begin{align*} &\beta_1 + \beta_2 X_{i2} + \beta_3 Z_i + u_{i2} - (\beta_1 + \beta_2 X_{i1} + \beta_3 Z_i + u_{i1}) \\ &= \beta_2 \Delta X_{i2} + \Delta u_{i2}. \end{align*} The symbol \Delta represents first-differencing, i.e. \Delta X_{i2} = X_{i2} - X_{i1} and \Delta u_{i2} = u_{i2} - u_{i1}.
By first-differencing both sides of Equation 9.1, our model becomes \Delta Y_{i2} = \beta_2 \Delta X_{i2} + \Delta u_{i2}. \tag{9.2} \beta_1 and \beta_3 Z_i do not appear in the transformed model Equation 9.2 because they are time-constant and cancel out.
In this differenced model, \beta_2 can be estimated by regressing \Delta Y_{i2} on \Delta X_{i2} without an intercept. This regression isolates the marginal effect of X_{it} on Y_{it} conditional on any unobserved individual characteristics like Z_i. \beta_2 is the marginal effect of X_{it} on Y_{it} given the same individual-specific time-constant characteristics.
We can control for any time-constant variable without actually observing it. This is a remarkable advantage over conventional cross-sectional regression or pooled panel regression.
We may combine the terms \beta_1 and \beta_3 Z_i and define the individual-specific effect \alpha_i = \beta_1 + \beta_3 Z_i. The term \alpha_i is also called individual fixed effect. The fixed effect cancels out after taking first differences.
9.2 Fixed Effects Regression
Consider a panel dataset with dependent variable Y_{it}, a vector of k independent variables \boldsymbol X_{it}, and an individual fixed effect \alpha_i for i=1, \ldots, n and t=1, \ldots, T.
Because \alpha_i already represents any time-constant variable of individual i, we assume that all variables in \boldsymbol X_{it} are time-varying. That is, \boldsymbol X_{it} neither contains an intercept nor any time-constant variables like gender, birthplace, etc.
Fixed-effects Regression
The fixed-effects regression model equation for individual i=1, \ldots, n and time t=1, \ldots, T is Y_{it} = \alpha_i + \boldsymbol X_{it}'\boldsymbol \beta + u_{it}, \tag{9.3} where \boldsymbol \beta = (\beta_1, \ldots, \beta_k)' is the k \times 1 vector of regression coefficients and u_{it} is the error term for individual i at time t.
The fixed effects regression assumptions are:
(A1-fe) conditional mean independence: E[u_{it} | \boldsymbol X_{i1}, \ldots, \boldsymbol X_{iT}, \alpha_i] = 0.
(A2-fe) random sampling: (\alpha_i, Y_{i1}, \ldots, Y_{iT}, \boldsymbol X_{i1}', \ldots, \boldsymbol X_{iT}') are i.i.d. draws from their joint population distribution for i=1, \ldots, n.
(A3-fe) large outliers unlikely: 0 < E[Y_{it}^4] < \infty, 0 < E[u_{it}^4] < \infty.
(A4-fe) no perfect multicollinearity: \boldsymbol X has full column rank.
9.3 Differenced Estimator
The first-differencing transformation can be used to estimate Equation 9.3: \Delta Y_{it} = Y_{i,t} - Y_{i,t-1}, \quad \Delta \boldsymbol X_{it} = \boldsymbol X_{i,t} - \boldsymbol X_{i,t-1}. Taking first differences on both sides of Equation 9.3 implies \Delta Y_{it} = (\Delta \boldsymbol X_{it})' \boldsymbol \beta + \Delta u_{it}, \tag{9.4} where \Delta u_{it} = u_{i,t} - u_{i,t-1}. Notice that the fixed effect \alpha_i cancels out.
Hence, we can apply the OLS principle to Equation 9.4 to estimate \boldsymbol \beta. We regress the differenced dependent variable \Delta Y_{it} on the differenced regressors \Delta \boldsymbol X_{it} for i=1, \ldots, n and t=2, \ldots, T.
A problem with this differenced estimator is that the transformed error term \Delta u_{it} defines an artificial correlation structure, which makes the estimator non-optimal. \Delta u_{i,t+1} = u_{i,t+1} - u_{i,t} is correlated with \Delta u_{i,t} = u_{i,t} - u_{i,t-1} through u_{i,t}.
9.4 Within Estimator
An efficient estimator can be obtained by a different transformation. The idea is to consider the individual specific means \overline Y_{i\cdot} = \frac{1}{T} \sum_{t=1}^T Y_{it}, \quad \overline{\boldsymbol X}_{i\cdot} = \frac{1}{T} \sum_{t=1}^T \boldsymbol X_{it}, \quad \overline{u}_{i\cdot} = \frac{1}{T} \sum_{t=1}^T u_{it}. Taking the means of both sides of Equation 9.3 implies \overline{Y}_{i\cdot} = \alpha_i + \overline{\boldsymbol X}_{i\cdot}'\boldsymbol \beta + \overline{u}_{i\cdot}. \tag{9.5}
Then, subtracting Equation 9.5 from Equation 9.3 removes the fixed effect \alpha_i from the equation: Y_{it} - \overline Y_{i\cdot} = (\boldsymbol X_{it} - \overline{\boldsymbol X}_{i\cdot})'\boldsymbol \beta + (u_{it} - \overline{u}_{i\cdot}).
The deviations from the individual specific means are called within transformations: \dot Y_{it} = Y_{it} - \overline Y_{i\cdot}, \quad \dot{\boldsymbol X}_{it} = \boldsymbol X_{it} - \overline{\boldsymbol X}_{i\cdot}, \quad \dot u_{it} = u_{it} - \overline{u}_{i\cdot} The within-transfromed model equation is \dot Y_{it} = \dot{\boldsymbol X}_{it}'\boldsymbol \beta + \dot u_{it}. \tag{9.6}
Hence, to estimate \boldsymbol \beta, we regress the within-transformed dependent variable \dot Y_{it} on the within-transformed regressors \dot{\boldsymbol X}_{it} for i=1, \ldots, n and t=1, \ldots, T.
The within estimator is also called fixed effects estimator: \widehat{\boldsymbol \beta}_{\text{fe}} = \bigg( \sum_{i=1}^n \sum_{t=1}^T \dot{\boldsymbol X}_{it} \dot{\boldsymbol X}_{it}' \bigg)^{-1} \bigg( \sum_{i=1}^n \sum_{t=1}^T \dot{\boldsymbol X}_{it} \dot Y_{it} \bigg).
fit.fe = plm(inv ~ capital,
index = c("firm", "year"),
effect = "individual",
model = "within",
data=Grunfeld)
fit.fe
Model Formula: inv ~ capital
Coefficients:
capital
0.37075
Under (A2-fe), the collection of the within-transformed variables if individual i, (\dot Y_{i1}, \ldots, \dot Y_{iT}, \dot{\boldsymbol X}_{i1}, \ldots, \dot{\boldsymbol X}_{iT}, \dot u_{i1}, \ldots, \dot u_{iT}), forms an i..i.d. sequence for i=1, \ldots, n. The within-transformed variables satisfy (A1-pool)–(A4-pool).
Hence, we can apply the cluster-robust covariance matrix estimator of the pooled regression to the within-transformed variables: \widehat{\boldsymbol V}_{\text{fe}} = (\dot{\boldsymbol X}' \dot{\boldsymbol X})^{-1} \sum_{i=1}^N \bigg( \sum_{t=1}^T \dot{\boldsymbol X}_{it} \widehat{u}_{it} \bigg) \bigg( \sum_{t=1}^T \dot{\boldsymbol X}_{it} \widehat{u}_{it} \bigg)' (\dot{\boldsymbol X}' \dot{\boldsymbol X})^{-1}, where \widehat{u}_{it} now represents the residuals of \widehat{\boldsymbol \beta}_{\text{fe}}, and \dot{\boldsymbol X}' \dot{\boldsymbol X} = \sum_{i=1}^N \sum_{t=1}^T \dot{\boldsymbol X}_{it} \dot{\boldsymbol X}_{it}'
## cluster-robust covariance matrix
Vfe = vcovHC(fit.fe)
Vfe
capital
capital 0.003796144
attr(,"cluster")
[1] "group"
## cluster-robust standard error
sqrt(Vfe)
capital
capital 0.06161285
attr(,"cluster")
[1] "group"
## t-test
coeftest(fit.fe, vcov. = Vfe)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
capital 0.370750 0.061613 6.0174 9.018e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
9.5 Time Fixed Effects
While individual-specific fixed effects allow to control for variables that are constant over time but vary across individuals, we can also control for variables that are constant across individuals but vary over time. For example, if new government regulations are introduced at a certain point in time that affect all individuals.
We denote time fixed effects by \lambda_t. The time effects only regression equation is Y_{it} = \lambda_t + \boldsymbol X_{it}' \boldsymbol \beta + u_{it}. \tag{9.7}
Here, \boldsymbol X_{it} does not contain any variable that is the same for all individuals, because these variables are captured by the time fixed effect.
To remove \lambda_t from the equation, we can subtract time specific means on both sides: Y_{it} - \overline Y_{\cdot t} = (\boldsymbol X_{it} - \overline{\boldsymbol X}_{\cdot t})' \boldsymbol \beta + (u_{it} - \overline{u}_{\cdot t}). The time specific means are \overline Y_{\cdot t} = \frac{1}{n} \sum_{i=1}^n Y_{it}, \quad \overline{\boldsymbol X}_{\cdot t} = \frac{1}{n} \sum_{i=1}^n \boldsymbol X_{it}, \quad \overline{u}_{\cdot t} = \frac{1}{n} \sum_{i=1}^n u_{it}.
Hence, we regress Y_{it} - \overline Y_{\cdot t} on \boldsymbol X_{it} - \overline{\boldsymbol X}_{\cdot t} to estimate \boldsymbol \beta in Equation 9.7.
9.6 Two-way Fixed Effects
We may include both individual fixed effects and time fixed effects. The two-way fixed effects regression equation is Y_{it} = \alpha_i + \lambda_t + \boldsymbol X_{it}' \boldsymbol \beta + u_{it}. \tag{9.8}
Note that \lambda_t and \alpha_i capture any variable that is the same for all individuals or is time constant. Therefore, the variables in \boldsymbol X_{it} must vary both across individuals and over time.
We can use a combination of the different transformations to remove the fixed effects.
- Individual specific mean: \overline Y_{i \cdot} = \alpha_i + \overline \lambda + \overline{\boldsymbol X}_{i\cdot}'\boldsymbol \beta + \overline u_{i\cdot}, where \overline \lambda = \frac{1}{T} \sum_{t=1}^T \lambda_t.
- Time specific mean: \overline Y_{\cdot t} = \overline \alpha + \lambda_t + \overline{\boldsymbol X}_{\cdot t}'\boldsymbol \beta + \overline u_{\cdot t}, where \overline \alpha = \frac{1}{n} \sum_{i=1}^n \alpha_i.
- Total mean: \overline Y = \frac{1}{nT} \sum_{i=1}^n \sum_{t=1}^T Y_{it} = \overline \alpha + \overline \lambda + \overline{\boldsymbol X}'\boldsymbol \beta + \overline u, where \overline{\boldsymbol X} = \frac{1}{nT} \sum_{i=1}^n \sum_{t=1}^T \boldsymbol X_{it} and \overline u = \frac{1}{nT} \sum_{i=1}^n \sum_{t=1}^T u_{it}.
To eliminate the individual and time fixed effects in Equation 9.8, we use the two-way transformation: \begin{align*} \ddot Y_{it} &= Y_{it} - \overline Y_{i \cdot} - \overline Y_{\cdot t} + \overline Y \\ \ddot{\boldsymbol X}_{it} &= {\boldsymbol X}_{it} - \overline{\boldsymbol X}_{i \cdot} - \overline{\boldsymbol X}_{\cdot t} + \overline{\boldsymbol X} \\ \ddot u_{it} &= u_{it} - \overline u_{i \cdot} - \overline u_{\cdot t} + \overline u. \end{align*} Applying the two-way transformation on both sides of Equation 9.8 gives \ddot Y_{it} = \ddot{\boldsymbol X}_{it}'\boldsymbol \beta + \ddot u_{it}. \tag{9.9}
Hence, we estimate \boldsymbol \beta by regressing \ddot Y_{it} on \ddot{\boldsymbol X}_{it}.
fit.2wayfe = plm(inv ~ capital,
index = c("firm", "year"),
effect = "twoways",
model = "within",
data=Grunfeld)
fit.2wayfe
Model Formula: inv ~ capital
Coefficients:
capital
0.4138
Similarly to the pooled and fixed effects estimator, we can use the cluster-robust covariance matrix estimator and cluster-robust standard errors.
## cluster-robust covariance matrix
V2way = vcovHC(fit.2wayfe)
V2way
capital
capital 0.003241852
attr(,"cluster")
[1] "group"
## cluster-robust standard error
sqrt(Vfe)
capital
capital 0.06161285
attr(,"cluster")
[1] "group"
## t-test
coeftest(fit.2wayfe, vcov. = V2way)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
capital 0.413802 0.056937 7.2677 1.268e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
9.7 Comparison of panel models
The fixed effects estimators are asymptotically normal under assumptions (A1-fe)–(A4-fe), and the clustered standard errors are consistent.
fit.pool1 = lm(inv~capital, data=Grunfeld)
fit.pool2 = plm(inv~capital,
index = c("firm", "year"),
model = "pooling",
data=Grunfeld)
cluster_se = list(
sqrt(diag(vcovHC(fit.pool1))),
sqrt(diag(vcovHC(fit.pool2))),
sqrt(diag(vcovHC(fit.fe))),
sqrt(diag(vcovHC(fit.timefe))),
sqrt(diag(vcovHC(fit.2wayfe)))
)
stargazer(fit.pool1, fit.pool2, fit.fe, fit.timefe, fit.2wayfe,
se = cluster_se,
add.lines=list(
c("Firm FE", "No", "No","Yes","No","Yes"),
c("Year FE", "No", "No","No","Yes","Yes"),
c("Clustered SE", "No", "Yes", "Yes", "Yes", "Yes")
),
type="html",
omit.stat = "f", df=FALSE,
dep.var.labels="Gross Investment",
covariate.labels = "Capital Stock")
Dependent variable: | |||||
Gross Investment | |||||
OLS | panel | ||||
linear | |||||
(1) | (2) | (3) | (4) | (5) | |
Capital Stock | 0.477*** | 0.477*** | 0.371*** | 0.538*** | 0.414*** |
(0.078) | (0.126) | (0.062) | (0.153) | (0.057) | |
Constant | 14.236 | 14.236 | |||
(19.393) | (28.046) | ||||
Firm FE | No | No | Yes | No | Yes |
Year FE | No | No | No | Yes | Yes |
Clustered SE | No | Yes | Yes | Yes | Yes |
Observations | 200 | 200 | 200 | 200 | 200 |
R2 | 0.439 | 0.439 | 0.660 | 0.429 | 0.599 |
Adjusted R2 | 0.436 | 0.436 | 0.642 | 0.365 | 0.530 |
Residual Std. Error | 162.850 | ||||
Note: | p<0.1; p<0.05; p<0.01 |
9.8 Dummy variable regression
An alternative way to estimate the fixed effects model is by an OLS regression of Y_{it} on \boldsymbol X_{it} and a full set of dummy variables, one for each individual in the sample.
For the time fixed effects model, we include a full set of dummy variables for each time point in the sample, and for the two-way fixed effects model, we include individual and time dummies.
This approach is algebraically equivalent to the within and two-way transformations. The coefficients for the auxiliary dummy variables are usually not reported. The coefficients for capital
are the same as in the table above:
Call:
lm(formula = inv ~ capital + factor(firm), data = Grunfeld)
Coefficients:
(Intercept) capital factor(firm)2 factor(firm)3 factor(firm)4
367.6130 0.3707 -66.4553 -413.6821 -326.4410
factor(firm)5 factor(firm)6 factor(firm)7 factor(firm)8 factor(firm)9
-486.2784 -350.8656 -436.7832 -356.4725 -436.1703
factor(firm)10
-366.7313
Call:
lm(formula = inv ~ capital + factor(year), data = Grunfeld)
Coefficients:
(Intercept) capital factor(year)1936 factor(year)1937
39.2068 0.5383 22.4605 27.8993
factor(year)1938 factor(year)1939 factor(year)1940 factor(year)1941
-36.6889 -42.4012 -11.4293 5.3301
factor(year)1942 factor(year)1943 factor(year)1944 factor(year)1945
-26.2522 -36.3995 -32.3887 -33.0571
factor(year)1946 factor(year)1947 factor(year)1948 factor(year)1949
-3.6307 -57.8083 -73.1115 -106.8436
factor(year)1950 factor(year)1951 factor(year)1952 factor(year)1953
-105.8753 -69.2505 -76.6097 -67.6766
factor(year)1954
-112.6339
Call:
lm(formula = inv ~ capital + factor(firm) + factor(year), data = Grunfeld)
Coefficients:
(Intercept) capital factor(firm)2 factor(firm)3
354.9166 0.4138 -51.2329 -402.9933
factor(firm)4 factor(firm)5 factor(firm)6 factor(firm)7
-303.7443 -479.3182 -327.4387 -422.4257
factor(firm)8 factor(firm)9 factor(firm)10 factor(year)1936
-332.2429 -421.0790 -339.0705 23.9405
factor(year)1937 factor(year)1938 factor(year)1939 factor(year)1940
32.9483 -27.0935 -30.7979 0.5826
factor(year)1941 factor(year)1942 factor(year)1943 factor(year)1944
19.5836 -8.6393 -17.5675 -13.7593
factor(year)1945 factor(year)1946 factor(year)1947 factor(year)1948
-13.5253 17.6985 -27.2407 -37.4300
factor(year)1949 factor(year)1950 factor(year)1951 factor(year)1952
-66.7623 -63.2855 -23.9098 -23.9138
factor(year)1953 factor(year)1954
-5.1266 -40.1051
9.9 Panel R-squared
We can decompose the total variation into within group variation and between group variation: Y_{it}- \overline Y = \underbrace{Y_{it} - \overline{Y}_{i \cdot}}_{\text{within group}} + \underbrace{\overline{Y}_{i \cdot} - \overline Y}_{\text{between group}}
Two different R squared versions:
Overall R-squared: R^2_{ov} = 1 - \frac{\sum_{i=1}^n \sum_{t=1}^T \widehat u_{it}^2}{\sum_{i=1}^n \sum_{t=1}^T (Y_{it} - \overline Y)^2} Interpretation: Proportion of total sample variation in Y_{it} explained by the model (the usual R-squared).
Within R-squared R^2_{wit} = 1 - \frac{\sum_{i=1}^n \sum_{t=1}^T \widehat u_{it}^2}{\sum_{i=1}^n \sum_{t=1}^T (Y_{it} - \overline{Y}_{i \cdot})^2} Interpretation: Proportion of sample variation in Y_{it} within the individual units is explained by the model.
For a individual-specific fixed effects regression, consider the two equivalent fixed effects estimators from above:
The summary(object)$r.squared
function applied to the plm object returns the within R-squared, and for the lm object it returns the overall R-squared:
## within R-squared
summary(fit.fe)$r.squared
rsq adjrsq
0.6597327 0.6417291
## overall R-squared
summary(fit.fe.lsdv)$r.squared
[1] 0.9184098
It is not a big surprise that the fixed effects model explains a lot of the total variation in Y_{it}. The equivalent LSDV model assigns each individual its own dummy variable and therefore, by construction, explains a lot of variation between individuals.
The within R squared is often more insightful because it reflects the model’s ability to explain the variation within entities over time.