2015 年终总结

今年是我在美国第一次真正意义上过感恩节和圣诞节。我们认认真真地把礼物摆到圣诞树下,认认真真地拆开给彼此的礼物。看到对方欣喜的表情,心里觉得很温暖。

不知不觉,又一年过去了。

这一年对我来说,是充满戏剧性也充满成长的一年。学业上,我感受到了学术研究的挑战性,也明白了过度的完美主义只能创造焦虑而不能解决实际问题。生活上,一段感情的结束和另一段的开始让我明白了不是所有幸福的开始都有美满的结局,真正长久的感情需要个人的成熟作为基础。

事事都有两面。这一年的种种戏剧性让我学会了如何在迷雾中保持自己的方向。年初的时候我在手机里装上了Insight Timer的app, 每天早上冥想5分钟,聆听内心的声音,增进对自己的了解,也更能看清他人的喜乐。学业上的焦虑令我和师兄师姐增进交流,更加明白PhD是个不断超越自己的过程,享受旅途和追求结果一样重要。

这一年,我重拾声乐这个爱好,在Duke开始跟一位老师学习歌剧演唱。学期最后我竟然能稳稳当当地唱到B minor,在几十个人面前表演也不会腿脚打颤,想想也是不小的成就。

这一年,我开始练习普拉提(pilates),坚持一周练习一到两次。这项刚柔并济而充满美感的运动让我变得更轻盈,更有活力,也更自信了。在此鼓励大家尝试!

好友们各有各的生活:有的在世界各地飞来飞去忙事业,有的找到了自己的另一半幸福地安顿下来了,有的还是浑浑噩噩不知每天在忙什么。我这一年最大的感触就是: 单纯的比较是毫无意义的。明白自己想要什么,而且勇敢地去追求,这样才能得到真正的幸福。工作与爱情都是如此。

最后祝大家新年快乐,在2016心想事成!

 

My Two Cents on Randomized Controlled Trials

Randomized control trials (RCTs) have been at the forefront of development economic research in recent years. How well do these inform us of policy alternatives to reduce poverty?

On the bright side, RCTs allow us to identify the causal impact of policy interventions, and a lot of studies provide evidence that some simple nudging can make a big difference on behavior (see Esther Duflo’s work on encouraging Kenyan farmers to use fertilizers). However, there are also a few caveats in interpreting RCT results:

Publication Bias: only significant results — either positive or negative — get published. Are we learning about the truth or the truth we WANT to know? For instance, microfinance has been applauded as an innovative and effective way to increase savings and investments, encourage entrepreneurship, and reduce poverty. But a recent working paper has found zero effects of access to microfinance on long term development outcomes.

Pre-analysis plan vs. manual selection after study is initiated: Here is a philosophical discussion in the Journal of Economic Perspectives by Ben Olken.

Heterogeneous treatment effects: the magnitude of the effects of policy varies a great deal. External validity is often a concern. Here is a thought-provoking paper by Eva Vivalt, the founder of AidGrade, a database on impact evaluations.

Experimental arms race: Are we simply adding more technical details into the same experiments without shedding light on fundamental channels of how they change behavior? Here is an article by David McKenzie on the tpoic. More specifically, Rachel Glennerster writes about what this implies for RCTs involving governments.

Your thoughts and comments are welcome.

Technological Adoption Analysis with Panel Data

I just submitted my final paper for the panel data class. Sharing it here. Comments and feedback are welcome!

I. Introduction

This paper provides a critical review on the econometric approaches to model technology adoption decisions in developing countries. These decisions include the choice of whether or not to adopt a particular technology (e.g. high yielding variety seeds) and the amount of inputs depending on the technology used.

The developing country setting presents two additional challenges to identifying the determinants to technology adoption. First, imperfect access to credit and insurance introduces correlation between lagged productivity shocks and current input choices, thus violating the strict exogeneity assumption that is commonly maintained in panel data models. Second, the prevalence of informal networks highlights the importance of incorporating learning and externality into the analysis.

Following Foster and Rosenzweig (1995), suppose we are interested in what factors determine the adoption of high yielding variety (HYV) seeds of farmers in developing countries. There are two broad sources of uncertainty that drives differences in different technology adoption behavior. First, farmers may know the returns to HYV seeds but not the optimal levels of inputs. Therefore, a farmer needs to experiment with different levels of input choices once she decides to use HYV seeds. Second, there may be uncertainty in the profitability of this new technology. This source of uncertainty can be especially relevant when the technology is new (Conley and Udry 2010). Although the two sources of uncertainty may co-exist, we focus on only one at a time given the complication of the problem.

II. Input Choice as Technology Adoption

In this section, we assume that returns to technology adoption depends on how close actual input levels are to optimal input levels, i.e. use a target input model. Foster and Rosenzweig (1995) uses this framework to examine how farmers HYV adoption decisions depend on own and neighbors’ experience. In their framework, expected profits of farmer j at time t is
1
where $\eta_h$ is yield using HYV varieties, $\eta_{ha}$ is the loss associated with using less suitable land as more HYVs are used, $A_j$ is the total amount of land, $H_{jt}$ is the amount of land using HYVs, $\sigma_{\theta jt}^{2}$ is the updated variance of the mean input level, and $\sigma_{u}^2$ is the variance of the error term in target input use (relative to the mean optimal input). The updating of the variance term depends on learning from own and neighbors’ experience. This will be the focus of section IV.

In the empirical analysis, the authors estimate the profit function adding education of the farmers as an additional covariate:
2
where $S_{jt}$ is the cumulative number of parcels planted by farmer j up to time t, $\bar{S}_{-jt}$ is the average of the cumulative experience of neighboring farmers, $\rho$’s are precision terms of own and neighbors’ experience as signals of optimal input levels. Two approaches are used for estimation.

The first approach uses IV and fixed effects to estimate a first-order reduced-form approximation of equation (2). Instrumental variables are used to address correlation between 1) contemporaneous profit shocks and production decisions, and 2) lagged profit shocks and contemporaneous adoption (potentially because of credit constraints). Fixed effects are used to eliminate individual level heterogeneity $\mu_i$. If we maintain the assumption that input decisions are predetermined, the IV approach address the concern that strict exogeneity is violated. Note that predeterminedness implies that the profit shocks in first differences exhibit first-order autocorrelation but are uncorrelated at all other lags. This seems a reasonable assumption if we believe the profit shocks are unanticipated and are not persistent over time. Because Foster and Rosenzweig (1995) do not describe the nature of the profit shocks, it is difficult to evaluate the validity of the predeterminedness assumption.

The second approach uses nonlinear IV fixed effects to obtain the structural estimates of the profit function. Equation (2) is differenced over time and estimated using standard nonlinear IV procedure. This approach is subject to the same concern as the first approach.

III. Discrete Technology Adoption Decisions

In this section, the outcome variable equals one if the individual adopts technology in period t. Because technology adoption contributes to accumulated experience, adoption in the current period may induce changes in the returns to technology in the next periods in a complicated way.

Foster and Rosenzweig (1995) examine HYV adoption using reduced-form predictions from the structural model. But without solving the decision rules, they are unable to estimate the structural parameters. To address this limitation, we might use nonlinear panel data models with stronger distributional assumptions of the error terms (e.g. logistic distribution) and use conditional maximum likelihood estimators. This, however, rules out serial correlation in the error terms and might be unrealistic. An alternative approach is Manski’s conditional maximum score estimation. This approach achieves identification from “switchers”, but observing enough individuals switching from adopting versus not adopting a specific technology might be challenging as there are often fixed costs involved in a new technology and hence persistence in adoption decisions.

Suri (2011) provides an alternative framework to examine why farmers make different adoption decisions. She uses the information on the correlation between productivity differences and productivity of a technology among farmers who use both technology to project the different productivity levels for farmers who use only one technology. More specifically, she assumes profits for farmer i with productivity

3

She estimates the following equation for yields:
4
Based on the primitives of the model,

5

The identifying assumption is mean independence of the composite error $(\tau_i+\epsilon_{it})$ and the comparative advantage component $\theta_i$, and the histories of the regressors. Translated into assumptions on what drives the hybrid switching behavior, this assumes the unobserved time-varying variables that drive the switching should not be correlated with yields.Chamberlain (1982) correlated random effects approach is used for estimation. Dependence of the observed $\theta_i$’s on the endogenous input $h_{it}$ is accounted for using the linear projection of $\theta_i$ on the full history of inputs and their interactions. Structural parameters are recovered from reduced-form estimates.

The correlated random effects approach reduces the threshold for identification, and it seems reasonable to assume that individual-level heterogeneity are uncorrelated with productivity shocks once the history of input decisions are controlled for in Suri’s setting. Moreover, the focus of Suri (2011) is to identify the \emph{cross sectional} heterogeneity in productivity and its consequence on hybrid seed adoption. It is unclear whether this focus warrants the use of CRE models.

IV.Learning in Technology Adoption

Recent literature on technology adoption highlights the importance of learning from own experience and the experiences of informal network members.

Conley and Udry (2010) collect data on social interactions and address the unobserved variable problem when studying learning effects in technology diffusion: pineapple planting. In their model, risk-neutral farmers each have a single plot, and maximize current expected profits by choosing discrete-valued input $x_{it}$ at time t. Pineapple output realized 5 periods after input decision is
6
where $\epsilon$’s are unobserved productivity shocks iid distributed with mean 0 and variance 1, $\omega_{it}$ captures spatially and serially correlated shocks to marginal product that is only observed by the farmer (not the econometrician). Farmers do not know the function $f$ but learn about it with a learning rule.

Identification uses the specific timing of plantings to identify opportunities for information transmission. Variation in planting decisions generate a sequence of dates where new information may be revealed to the farmer. Conditional on measures of growing conditions, Conley and Udry isolate events when new productivity information is revealed to the farmer. They then investigate whether new information is associated with changes in farmer’s input use that is consistent with social learning. A logistic regression is used to estimate how farmers’ input decisions respond to actions and outcomes of other farmers in their information networks (data collected by the authors).

The baseline regression model is
7
where $M_{it}$ is an index of good news on input levels constructed from inputs and profits five years ago and now. The identification assumption is that conditional on measures of changes in growing conditions $\Gamma_{it}$ and other farm level characteristics, the information measure $M_{it}$ is uncorrelated with unobserved determinants in growing conditions and therefore input use. A significant, positive $\beta_1$ is evidence for social learning.

An important limitation of this approach is that it completely ignores the endogenous formation on informal networks and the potential dynamic changes in informal networks. To study the learning effects in technology adoption, we need a better understanding about the formation of informal networks and the nature of learning to evaluate whether the identification assumptions are realistic.

References:

Chamberlain, G. (1982). “Multivariate Regression Models for Panel Data,” Journal of Econometrics 18: 5-46.

Conley, T. and Udry, C. (2010). “Learning about a New Technology: Pineapple in Ghana,” American Economic Review 100(1): 35-69.

Foster, A. and Rosenzweig, M. (1995). “Learning by Doing and Learning from Others: Human Capital and Technological Change in Agriculture,” Journal of Political Economy 103: 1176-1209.

Suri, T. (2011). “Selection and Comparative Advantage in Technology Adoption,” Econometrica 79(1): 159-209.

What I have learned from my first academic presentation

Yesterday I presented my work on parental migration and health outcomes of children in Indonesia in the development lunch at Duke. It was my first time to present my own research in front of a (relatively) large academic audience. The presentation did not progress as planned (similar with most research initiatives), but I learned a great deal from it. Here’s a few.

  1. Talk about key facts instead of broad histories when you are introducing the context of your study. Providing a description of broad histories is easy for you as a presenter but usually makes the audience more confused about your main argument.
  2. Related to the first point, structure your presentation to focus on the key questions you are interested in answering, the strategies you use to address these questions, and where you have experienced difficulty and need advice on.
  3. In a short presentation, avoid doing a detailed literature review. You are almost guaranteed to miss some papers in the literature, and it is easy to spend a long time answering tangential questions.
  4. Know your question really, really well. Present it to different people and see if anything confuses them. If they are confused, try to diagnose the problem and clarify your question. If there are broad terms in your main question, try to narrow them down to clear-cut, specific definitions that people can directly relate to.
  5. Know when to answer questions, when to delay them, and when to politely turn them down. Always answer clarification questions, but delay questions which you are going to address later in your presentation.
  6. Practice. Practice. Practice. You cannot anticipate everything, but if you do not practice, there will be too many awkward moments.

I encourage other students to present their work early on in the PhD program to practice thinking deeply about a question and explaining it to other people. It will be painful at first, but you will get better at it over time.

A few notes on academic presentations

For our writing and presentation class we are asked to explain a standard concept in intermediate microeconomics in a ten-minute presentation. Here are a few good practices I concluded from my own and others’ presentations.

1. Practice your script and appear confident on the stage.

2. Make sure your graphs are legible. Fonts should be large enough. Use contrasting colors that will show up clear given your background.

3. If your graphs are not legible, explain the key messages in the graph verbally or draw the graphs on the board (if they are simple).

4. Stay consistent with your notations.

5. Cite the sources to your materials, even if they come from widely used textbooks or online resources.

6. Don’t include information that you are not going to talk about in your slides.

7. It helps if you stick with the same examples and go through facts->explanation->solution for each of them in the same order throughout your presentation.

8. When you are explaining a model, start from the infrastructure (agents/players, relationships, basic assumptions, etc) and continue to the superstructure.

9. Don’t include too much information in your slides! This will make the audience overwhelmed and eventually bored.

10. Don’t read off the slides. Treasure the dynamic nature of presentations and interact with you audience.

Notes from A Guide For the Young Economist (2)

This is a continuation of my previous post on this book.

First, a few tips on cleaning up your text.

1. Delete the redundant information or excessive use of clauses to make yourself across at the least cost of words. Instead of writing “the technology of the firms in the economy is convex”, trim it to “technology is convex” because it is obvious that the firms are the adopters of technology and they exist in the economy (p80 in the book).

2. Do not be obsessed with the word “assume” (or “is/are assumed”). It’s better to state “assumptions” once and list all of them. But when you want to emphasize some aspects of your model that are different from the existing literature, you may start a separate sentence of paragraph highlighting these.

3. Stick with plain words if you can deliver your message. Instead of writing “the set of Nash Equilibrium is nonempty”, write “Nash Equilibrium exists”. The latter expression gets rid of the nerdy feel in the text and makes your writing more accessible.

Then, Thomson talks about how to present a model effectively. I have found the following most useful.

1. When you are introducing your model, go from the infrastructure to the superstructure. For example, when describing a multi-stage game, introduce and describe each of the players separately before bringing them together. Follow the logical steps of defining actors -> relationship -> concepts based on actors and relationship.

2. A good yet under appreciated (in my opinion) way to prevent ambiguity is to avoid using multiple clauses. This is especially true for non-native speakers. Adding clauses will dramatically increase your chance of making grammatical errors. Moreover, badly placed and imbalanced clauses will disorient the reader.

3. When stating a difficult definition, assist the reader by giving an informal and intuitive explanation preceding the formal explanation.

4. Use one enumeration for each object category. Combining different categories into a single list saves your time at the cost of your reader.

5. When specifying your assumptions, make sure there is at least one example satisfying them. If you cannot think of an example, then your assumptions are likely to be practically useless even though they are mathematically meaningful.

Calculating anthropometric z-scores in Stata

Anthropometric measures: height-for-age, weight-for-height, etc, are widely used in economics and sociology to assess the health condition of children relative to their peers. The procedure in Stata works as follows:

1. Download package dm0004_1, which contains commands “zanthro” and “zbmicat” for generating standardized anthropometric z-scores.

2. Download the relevant data files (the health measures for the reference populations) into your working directory.

3. Follow the instructions about the commands to produce z-scores. Note the following:

1) You can choose the reference population/standardizing scheme: US/UK/WHO. Each has different ranges of applicability and calculation algorithms. Make sure you fully understand them before choosing a particular one.

2) Add option “nocutoff” if you do not want your z-scores be truncated to [-5,5], i.e. to only keep observations within five standard deviations to the reference population.

3) Weight-for-length and weight-for-height usually restricts the range of weight and height that these statistics can be calculated. The z-scores are essentially put to missing if any of the two original measures falls out of the range.

Play with the package and have fun!