How to read structural modeling papers in economics?

For the purpose of a research project, I have been reading a lot of literature on locational equilibrium sorting in public economics. While the topic is fascinating, it is easy to get lost in piles of papers without understanding how they unify under the same overarching theme. Since reading structural papers is likely to be a challenge for many economics PhD students, I thought it might be useful to share my thoughts on how to do it effectively.

The purpose of reading others’ papers is not to produce a thorough summary of their work but to critically assess the status quo of the literature and what you can contribute. I have found the following steps useful for achieving this goal.

Step 1: Bird’s Eye View

Start with the review papers. If the topic is well researched, it should have a summary paper (look for Journal of Economic Perspectives/Journal of Economic Literature/handbook chapters). Focus on the fundamental assumptions made in each class of models. Make a list of papers for further reading based on the bibliography of the review paper.

Step 2: Divide and Conquer

For each paper on the “to read” list, read its bare bones and understand the key message. Clearly outline the model assumptions (and whether they are made as abstraction or due to data limitation), data availability, and empirical approach.

Step 3: Say it in Your Words

After you feel you have a good understanding of the class of structural models, try to synthesize the papers by describing them in writing. Focus on how they are linked to each other, and critically assess the pros and cons of each approach.

Step 4: Make the Link to Your Research

By the end of step 3, you should have a fairly clear understanding of which approach (if any) is best suited to your own research, and how your research contributes to the existing literature.


Technological Adoption Analysis with Panel Data

I just submitted my final paper for the panel data class. Sharing it here. Comments and feedback are welcome!

I. Introduction

This paper provides a critical review on the econometric approaches to model technology adoption decisions in developing countries. These decisions include the choice of whether or not to adopt a particular technology (e.g. high yielding variety seeds) and the amount of inputs depending on the technology used.

The developing country setting presents two additional challenges to identifying the determinants to technology adoption. First, imperfect access to credit and insurance introduces correlation between lagged productivity shocks and current input choices, thus violating the strict exogeneity assumption that is commonly maintained in panel data models. Second, the prevalence of informal networks highlights the importance of incorporating learning and externality into the analysis.

Following Foster and Rosenzweig (1995), suppose we are interested in what factors determine the adoption of high yielding variety (HYV) seeds of farmers in developing countries. There are two broad sources of uncertainty that drives differences in different technology adoption behavior. First, farmers may know the returns to HYV seeds but not the optimal levels of inputs. Therefore, a farmer needs to experiment with different levels of input choices once she decides to use HYV seeds. Second, there may be uncertainty in the profitability of this new technology. This source of uncertainty can be especially relevant when the technology is new (Conley and Udry 2010). Although the two sources of uncertainty may co-exist, we focus on only one at a time given the complication of the problem.

II. Input Choice as Technology Adoption

In this section, we assume that returns to technology adoption depends on how close actual input levels are to optimal input levels, i.e. use a target input model. Foster and Rosenzweig (1995) uses this framework to examine how farmers HYV adoption decisions depend on own and neighbors’ experience. In their framework, expected profits of farmer j at time t is
where $\eta_h$ is yield using HYV varieties, $\eta_{ha}$ is the loss associated with using less suitable land as more HYVs are used, $A_j$ is the total amount of land, $H_{jt}$ is the amount of land using HYVs, $\sigma_{\theta jt}^{2}$ is the updated variance of the mean input level, and $\sigma_{u}^2$ is the variance of the error term in target input use (relative to the mean optimal input). The updating of the variance term depends on learning from own and neighbors’ experience. This will be the focus of section IV.

In the empirical analysis, the authors estimate the profit function adding education of the farmers as an additional covariate:
where $S_{jt}$ is the cumulative number of parcels planted by farmer j up to time t, $\bar{S}_{-jt}$ is the average of the cumulative experience of neighboring farmers, $\rho$’s are precision terms of own and neighbors’ experience as signals of optimal input levels. Two approaches are used for estimation.

The first approach uses IV and fixed effects to estimate a first-order reduced-form approximation of equation (2). Instrumental variables are used to address correlation between 1) contemporaneous profit shocks and production decisions, and 2) lagged profit shocks and contemporaneous adoption (potentially because of credit constraints). Fixed effects are used to eliminate individual level heterogeneity $\mu_i$. If we maintain the assumption that input decisions are predetermined, the IV approach address the concern that strict exogeneity is violated. Note that predeterminedness implies that the profit shocks in first differences exhibit first-order autocorrelation but are uncorrelated at all other lags. This seems a reasonable assumption if we believe the profit shocks are unanticipated and are not persistent over time. Because Foster and Rosenzweig (1995) do not describe the nature of the profit shocks, it is difficult to evaluate the validity of the predeterminedness assumption.

The second approach uses nonlinear IV fixed effects to obtain the structural estimates of the profit function. Equation (2) is differenced over time and estimated using standard nonlinear IV procedure. This approach is subject to the same concern as the first approach.

III. Discrete Technology Adoption Decisions

In this section, the outcome variable equals one if the individual adopts technology in period t. Because technology adoption contributes to accumulated experience, adoption in the current period may induce changes in the returns to technology in the next periods in a complicated way.

Foster and Rosenzweig (1995) examine HYV adoption using reduced-form predictions from the structural model. But without solving the decision rules, they are unable to estimate the structural parameters. To address this limitation, we might use nonlinear panel data models with stronger distributional assumptions of the error terms (e.g. logistic distribution) and use conditional maximum likelihood estimators. This, however, rules out serial correlation in the error terms and might be unrealistic. An alternative approach is Manski’s conditional maximum score estimation. This approach achieves identification from “switchers”, but observing enough individuals switching from adopting versus not adopting a specific technology might be challenging as there are often fixed costs involved in a new technology and hence persistence in adoption decisions.

Suri (2011) provides an alternative framework to examine why farmers make different adoption decisions. She uses the information on the correlation between productivity differences and productivity of a technology among farmers who use both technology to project the different productivity levels for farmers who use only one technology. More specifically, she assumes profits for farmer i with productivity


She estimates the following equation for yields:
Based on the primitives of the model,


The identifying assumption is mean independence of the composite error $(\tau_i+\epsilon_{it})$ and the comparative advantage component $\theta_i$, and the histories of the regressors. Translated into assumptions on what drives the hybrid switching behavior, this assumes the unobserved time-varying variables that drive the switching should not be correlated with yields.Chamberlain (1982) correlated random effects approach is used for estimation. Dependence of the observed $\theta_i$’s on the endogenous input $h_{it}$ is accounted for using the linear projection of $\theta_i$ on the full history of inputs and their interactions. Structural parameters are recovered from reduced-form estimates.

The correlated random effects approach reduces the threshold for identification, and it seems reasonable to assume that individual-level heterogeneity are uncorrelated with productivity shocks once the history of input decisions are controlled for in Suri’s setting. Moreover, the focus of Suri (2011) is to identify the \emph{cross sectional} heterogeneity in productivity and its consequence on hybrid seed adoption. It is unclear whether this focus warrants the use of CRE models.

IV.Learning in Technology Adoption

Recent literature on technology adoption highlights the importance of learning from own experience and the experiences of informal network members.

Conley and Udry (2010) collect data on social interactions and address the unobserved variable problem when studying learning effects in technology diffusion: pineapple planting. In their model, risk-neutral farmers each have a single plot, and maximize current expected profits by choosing discrete-valued input $x_{it}$ at time t. Pineapple output realized 5 periods after input decision is
where $\epsilon$’s are unobserved productivity shocks iid distributed with mean 0 and variance 1, $\omega_{it}$ captures spatially and serially correlated shocks to marginal product that is only observed by the farmer (not the econometrician). Farmers do not know the function $f$ but learn about it with a learning rule.

Identification uses the specific timing of plantings to identify opportunities for information transmission. Variation in planting decisions generate a sequence of dates where new information may be revealed to the farmer. Conditional on measures of growing conditions, Conley and Udry isolate events when new productivity information is revealed to the farmer. They then investigate whether new information is associated with changes in farmer’s input use that is consistent with social learning. A logistic regression is used to estimate how farmers’ input decisions respond to actions and outcomes of other farmers in their information networks (data collected by the authors).

The baseline regression model is
where $M_{it}$ is an index of good news on input levels constructed from inputs and profits five years ago and now. The identification assumption is that conditional on measures of changes in growing conditions $\Gamma_{it}$ and other farm level characteristics, the information measure $M_{it}$ is uncorrelated with unobserved determinants in growing conditions and therefore input use. A significant, positive $\beta_1$ is evidence for social learning.

An important limitation of this approach is that it completely ignores the endogenous formation on informal networks and the potential dynamic changes in informal networks. To study the learning effects in technology adoption, we need a better understanding about the formation of informal networks and the nature of learning to evaluate whether the identification assumptions are realistic.


Chamberlain, G. (1982). “Multivariate Regression Models for Panel Data,” Journal of Econometrics 18: 5-46.

Conley, T. and Udry, C. (2010). “Learning about a New Technology: Pineapple in Ghana,” American Economic Review 100(1): 35-69.

Foster, A. and Rosenzweig, M. (1995). “Learning by Doing and Learning from Others: Human Capital and Technological Change in Agriculture,” Journal of Political Economy 103: 1176-1209.

Suri, T. (2011). “Selection and Comparative Advantage in Technology Adoption,” Econometrica 79(1): 159-209.