Nancy Qian on China’s Great Famine

Image from NY Times.

I was surprised and pleased to find Nancy Qian on my department’s list of speakers for development and labor seminar series. I remember her paper on the missing women and price of tea for how well-articulated and convincing her argument was. This afternoon I finally had the pleasure to listen to her talking about the institutional causes of China’s great famine from 1959 to 1961. The working paper is here.

China’s great famine is not a new topic. Due to its unprecedented scale and the special policies China was pursuing at that time, it has attracted a lot of scholarly attention. While previous research has sought answers from rigidity of institutional arrangements and local officers’ zealousness to impress the central, this paper argues that the procurement policy of the central government is the main cause of the famine.

Here are a few facts about China’s great famine that you should know before reading the paper: First, deaths are mostly rural. Interestingly, the central government seemed to have controlled the spreading of news extremely well so that rural residents thought their urban relatives suffered just as much they did (from interviews with survivors). Secondly, estimated per capita average food availability is too high (2000+ calories) to be compatible with a famine. Lastly, there was considerable variation in food availability — a fact largely ignored by previous research but is the focus of Qian and her coauthors.

I was most impressed by how Qian and her coauthors navigate through different data sources for their purposes. They constructed two benchmarks of caloric levels for “food needed to survive”: a higher one for heavy adult labor and healthy child development, and a lower one for staying alive (from the Minnesota starvation experiment). They used USDA guidelines and adjusted total population by demographic breakdown. For grain production, they used post-Mao corrected data. Historic per capita consumption data were acquired from National Bureau of Statistics and aggregate procurement data from Ministry of Agriculture.

Measurement error is obviously a major concern here. Chinese statistics are known to be unreliable, and the fact that these data were collected in a period of unrest exacerbates this problem. to check the robustness of their findings, they used the 1990 birth cohort census (at the county level) as another proxy for the severity of the famine. The idea is: during the famine, couples are less likely to have children, and those who were born immediately before or during the famine were less likely to survive than individuals born after the famine. They addressed the measurement error problem in grain production by constructing predicted production from data on temperature, rainfall, and suitability to produce grains.

Nancy Qian is an impressive presenter. She provided sufficient background knowledge on the subject before going to the key model. She showed only graphs and figures which contained important facts to bear in mind or key insights from their model. The importance of helping your audience visualize your results cannot be understated.

China’s One Child Policy and Parental Investment in Children (1)

This is the first post of a series of reflective essays on fertility choice/human investment models in economics. My goal for this semester is to present a model of parental investment in children under birth planning policies, hopefully with empirically testable hypotheses.

I look at how parents invest in children given multiple constraints. In addition to the usual budget constraint, parents are limited to two children and have to pay a fine if they decide to have a second child. The amount of the fine represents the difficulty of having a second child (i.e. the level of enforcement of One Child Policy). A zero fine suggests that it is legitimate to have a second child, while a fine going to infinity implies that having a second child is strictly prohibited. China’s OCP yields a perfect context to study how parents make fertility decisions and allocate their resources among their children given birth restrictions.

The classic Becker model (Becker 1994, Becker and Tomes 1986) makes two assumptions of parents’ fertility choices that I think are unrealistic. First, it assumes that parents put equal weight on children’s consumption (the only “utility” by the children in the unitary utility function). This is hardly realistic in developing countries where there is a strong preference for sons over daughters. Second, it assumes that “quality of children” can be purchased at a fixed price. This is key to the famous “quality-quantity tradeoff” result. What we observe more often is that parents need to devote time to their children and invest in their education and health in order to enhance the children’s abilities. Some question the unitary model assumption, but bargaining models are hard to estimate empirically. For the bargaining approach, see papers by Chiappori and Browning and the “separate spheres” paper by Shelly Lundberg and her coauthors.

A paragraph from Alderman and King (1998) elucidate the importance of preference:

There are issues not only of the efficiency of the investment, but also of the intra household allocation of the expected benefits. Preferences, then, matter for two distinct reasons. First, learning may contribute directly towards the welfare of the child and of parents, over and above its productive return as an investment. That is, learning may be a consumption good. Second, the decision-makers’ preference for equity among-est children influences how investments in education are allocated to children with different expected rates of return.

There are several ways to incorporate gender bias in the model. One can assume different marginal utility from sons’ and daughters’ consumption (or outcome variable, in general), or even put substitution/complementarity assumptions by restraining the second-order derivatives. A model with remittances can allow for different contribution rates from children to parents. Some papers also assume different labor market returns of parents’ investment by the gender of the child.

One piece of advice for graduate students who are trying to get identifiable information which requires IRB approval: don’t expect the data to arrive any time soon. Work on the theory first so that you will have something to present if your data request gets stuck in the administrative files.

The fun of delving into household surveys

In the week of the semester I was chatting with another grad student in our department. And he was curious about my research project.

“So what are you gonna do for your independent study?”
“Well, I plan to use some Chinese household survey data to analyze the consequences of China’s internal migration on the left behind children.”
“Sounds interesting! But where do you get the data?”
“I got the data from UNC. It’s a longitudinal data set and contains a lot of information that will be useful for my analysis. But I still need some time to clean it and make sure it’s up to the task.”
“Yeah… You can never predict what you can get from these data. That’s why I chose to write a theoretical paper.”

I have had similar conversations with other friends. While I agree that dealing with large-scale household surveys can be frustrating, I don’t think we should shy away from them for this reason. Household surveys provide rich information on the structure and operations of the families. By looking at real world household level data, you get a better sense of how households make decisions (both as a whole and as separate individuals bargaining with each other). You also become more aware of the reasons why some variables can never be measured and why some values are always missing. This is extremely evident in time-use data. Response rates for “yes or no” questions are much higher than questions that asks for a specific number of hours spent on a particular activities. This is hardly surprising given the difficulty of keeping track of time use. Even if we have statistics about how much time mothers spent caring for their children, these are subject to severe measurement errors.

I am using China Health and Nutrition Survey, which contains a variety of questions about household structure, employment decisions, time use, and health and nutrition. It’s been widely used in public health research, but economists can answer interesting questions based from this as well (here and here).

Family Life Surveys in Developing Countries

Recently I’m designing my own survey for my Uganda project this summer, which led me to multiple resources of large-scale household surveys in developing countries. Professor Duncan Thomas has some useful links to BREAD (Bureau for Research and Economic Analysis of Development) and to RAND family surveys on his personal website. It’s a great place to start such research. I have found the following sources particularly helpful:

Another source is International Household Survey Network (IHSN) website. They offer a catalog of household surveys, many of which are of particular focus. A particularly nice feature of this website is that it guides you through the sample design, questionnaires, data collection and processing, and the technical information of each survey.

1. Indonesian Family Life Survey (IFLS). Available through RAND. It’s a large-scale longitudinal survey consisting of four waves from 1993 to 2008. Duncan Thomas at our department was one of the main investigators. It has comprehensive and proves to be the most useful for my purpose. Book 3A and 3B are especially helpful for designing individual-specific questions. They also have questions about trust and informal lending in the newer surveys.

2. Mexican Family Life Survey (MxFLS). Data are freely downloadable. Many of the survey questions are adopted from IFLS.

3. Household Survey to Conduct Micro-Credit Impact Studies-Bangladesh: research efforts by World Bank. This was designed to evaluate the impact of Grameen Bank so contained a rich set of questions about borrowing and lending.

4. Ethiopian Rural Household Survey (ERHS), by International Food Policy Research Institute (IFPRI). You need to register to download the data, and you can download up to ten files for free after registration. This survey has comprehensive data on food prices. I recommend you to read this paper (forthcoming in American Journal of Agricultural Economics) written by my professor Marc Bellemare and his coauthors on food volatility and farmer welfare using this data.

5. Chinese Income Household Project (CHIP). This is a joint research effort by multiple Chinese researchers and international organizations. A brief overview is here. The first three waves (1988, 1995, 2002) are also downloadable from this website.  The 2007 wave is not available online, but is available for request here. Note that the data are not longitudinal. Also, be careful with identifiers for different sub datasets when you merge them. For example, in children’s dataset they numbered children in a particular household instead of their member codes in the whole family. To create a unique identifier for merging purposes, I concatenated the household id and person code, creating a unique string variable for each individual. There are about 10 duplicates after this treatment, and I simply dropped those problematic ones.

6. China Health and Nutrition Survey. Downloadable at UNC website. This is probably one of the most organized and widely cited surveys carried out in China. It is an ocean of data. Make sure you know what you want before you dive into this ocean.

7. China Health and Retirement Longitudinal Survey. Recent data collected from Gansu and Zhejiang in China to investigate elderly health in China. Only pilot survey for now, but they are planning to launch a nationwide survey this year. You have to register at their website in order to download any data, and they are fairly efficient.

8. World Bank Living Standards Measurement Study (LSMS). Thanks Xudong for pointing out this important source. They have a data finder which directs you to the most relevant data for your purpose. In the Uganda case, it happens to be the Uganda Household Survey 09/10, which is also available from the Uganda Bureau of Statistics.

9. Household datasets for development economic research, by RAND economist Sebastian Bauhoff. Comprehensive list of sources in multiple countries and regions. Short descriptions available.

Social Economic Status and Health Disparity in China

This is my project for Bayesian Statistics class.

1. Background

China has experienced remarkable growth and widening regional gap in recent decades. The inequality of wealth distribution can translate into inequality of overall well being among citizens. In this paper, I use a Bayesian probit model to test if the elderly from underdeveloped and rural areas are systematically less healthy than their peers. I incorporate a latent variable to capture the underlying difference in elderly health status by province and hukou status (the Chinese household registration system which suggests where, rural or urban, you are from). My results suggest that the elderly residents in Gansu province are significantly less healthy than those in Zhejiang. But the difference between rural and urban elderly is not significant.

2. Data and Method

2.1. Data

I use data from China Health and Retirement Longitudinal Study (CHARLS) pilot sample downloaded from China Center for Economic Research at Peking University The study interviews people older than 45 living in two provinces — Gansu and Zhejiang. Gansu is a landlocked province in northwest China with GDP per capita lower than 2000 US dollars in 2011; while the coastal province of Zhejiang has GDP per capita approaching 6500 US dollars in 2011.

There are a total of 1620 observations. Among the respondents, 45.3% are from Gansu province and 54.7% are from Zhejiang. The majority of the interviewees, 82.3%, hold rural hukou. The age of the respondents range from 45 to 87, with a median age of 57. For each individual, demographic data including age, gender, smoking habit, hukou status (rural or urban), and province, are collected. Individuals are also asked about their health in the childhood, and they can answer “poor”, “fair”, “good”, “very good”, or “excellent”. Information about disability and diseases is recorded, but the missing rates are too high. So I use 12 indicators of health conditions measured by the difficulty in doing 12 daily activities. A complete list is attached in appendix. Each of these variables is recorded to be 1 if the individual feels difficult in doing the corresponding daily activity, and 0 otherwise.

2.2. Method

Initially I attempted to follow Chib and Greenberg (1998) and fit the data with a multivariate probit model in order to make full use of the multiple responses. But choosing an appropriate correlation matrix and sampling from multi-dimensional truncated normal were non-trivial given the limited time of this project, so I constructed a single indicator for each individual, y  , which suggests that whether an individual feels difficult in doing any of the 12 activities.

y{i}= 1, if difficulty

0, otherwise

I use a probit model.


where x_{i}   is a vector of individual characteristics.

x_{i}^{T}=(1,age, male,HealthYoungPoor,HealthYoungFair, HealthYoungGood, HealthYoungVeryGood, smoke,RuralHukou,Gansu ,RuralGansu)  .

The variable Gansu is coded to be 1 if the respondent lives in Gansu and 0 if the person lives in Zhejiang. Similarly, RuralHukou is an indicator of whether the respondent holds a rural hukou or not. RuralGansu   is an interactive term coded to be 1 if a person lives in Gansu with rural hukou, and 0 otherwise. I break down the levels of childhood health into separate dummies, which are equal to 1 if the corresponding statement is true (e.g. HelathYoungPoor=1   if health in the childhood is poor) and 0 otherwise. I drop the dummy for “excellent health in the childhood” to avoid perfect linearity.

For computational convenience, I use a data augmentation scheme. Let y_{i}=I(z_{i}>0)

where z_{i}~ Normal(x_{i}^{T}\beta,1)   is a latent variable. z_{i}   can be interpreted as individual i’s disability level, with higher score indicating poorer health and thus bigger probability that he or she would feel difficult in doing those daily activities.

I follow Hoff (2009) and choose a multivariate normal g prior for beta  . I set prior mean to be 0 because I assume it is equally likely for the co variates to have positive or negative impact on health. I set g=n   to represent vague information about beta.

Therefore the prior distribution of \beta is: beta~MultivariateNormal(0,n(X^{T}X)^{-1})

Full conditionals for beta and z are both in closed forms.

beta|- ~Normal(\beta^{*},\sum^{*})

where \sum^{*}=\frac{n}{n+1}(X^{T}X)^{-1}  and \beta^{*}=\sum^{*}X^{T}z

z_{i}|y_{i}=1,- ~Normal_{(0,+\infty)}(x_{i}^{T}\beta,1)

z_{i}|y_{i}=0,- ~Normal_{(-\infty,0)}(x_{i}^{T}\beta,1)

I use Gibbs sampling. Total number of simulations is 11000, and burnin is 1000.

2.3. Inference Strategy

I get the posterior distribution of \beta’s   from 10000 post burn-in Gibbs samples. The estimated values for each \beta, denoted as \hat{\beta}  , is the mean of post burn-in Gibbs samples. I also calculate 95% credible intervals using the 0.025 and 0.975 quantiles of each \hat{\beta}  . If the 95% credible interval of \hat{\beta_{i}}   is does not contain values smaller than or equal to zero, \hat{\beta_{i}}   is significantly positive. Otherwise it is significantly negative.

I then use the Bayesian estimates to do in-sample fit and out-sample prediction. I compare the performance of Bayesian and frequentist logistic and probit regressions using Mean Squared Error (MSE) and Mean Squared Prediction Error (MSPE) measures.

3. Results

Traceplots for \beta_{RuralGansu} is shown below. Although there is some auto correlation, mixing is good in general.


Among all the co variates, only the coefficient for Gansu  is significantly positive, with a posterior mean around 0.8. This suggests that in my sample the residents in Gansu are significantly less healthy than those in Zhejiang. Neither\hat{\beta}_{RuralHukou}   nor \hat{\beta}_{RuralGansu}  is significantly different from 0, suggesting lack of evidence for differential health status between urban and rural residents. Health in the childhood is not significantly correlated with general health status as measured by the 12 indicators, but this may be because the childhood health variables are self-reported and are likely to be inaccurate.

I also plotted the posterior distribution of the z_{i}’s  (below) to investigate the landscape of “disability index” among different regions.

fig3.zhat.hukoufig4.zhat.provinceThe z_{i}’s   are calculated by using updated \beta’s   from each iteration. Each individual’s posterior \hat{z_{i}}   is calculated by taking a mean of all his or her post burn-in z_{i}’s  . The distributions of rural and urban residents are not very different. This echos the lack of significance in \beta_{RuralHukou}  . But due to the under representation of urban population in my sample, this result may not reflect the underlying patterns in reality. The residents in Zhejiang have much lower disability scores a posteriori, which shows a pronounced inter-provincial difference in health.

When I compare the Bayesian approach with frequentist logistic and probit regression, Bayesian method yields a higher Mean Squared Error (MSE) and lower Mean Squared Predictive Error (MSPE).

4. Conclusion

Using a Bayesian approach to analyze data from CHARLS, I found that people living in the province of Gansu are in worse health conditions than residents in Zhejiang. The health difference between rural and urban residents is not significant. Policy makers should be aware of the unequal consequences of development in these two regions in particular and the whole nation in general. Future research can incorporate community or region fixed effects and use more direct measures of health outcomes to get more robust results.


Chib, S. and Greenberg, E. 1998. “Analysis of multivariate probit models,” Biometrika 85(2): 347-361.

Hoff, P. 2009. A First Course in Bayesian Statistical Methods, 2nd edition. Springer.

Is Industrial Revolution an accident? Reflections on The Great Divergence by Kenneth Pomeranz

The book is the main reading for my course Economic History of China. Pomeranz’s main argument is simple: Europe and China pursued divergent paths of economic development, one capital-intensive and the other labor-intensive, largely because Europe had the New World to supply them with adequate energy, resources, and labor. I found most parts of his reasoning convincing. In the following, I will supplement my thinking about this issue.

1. The Crops-Only Agriculture

The English adopts crops-cum-animal agricultural system and Chinese a crops-only one. The crops-only agriculture was an essential response to China’s huge population size and limited land resources.

Daniel Little puts forward three issues are worth investing in China’s agriculture: First, the nature and rate of agricultural development (output, productivity and application of new technologies). Second, the direction and nature of change in rural welfare during the period. Third, the character and pace of social change during this period (rural to urban migration, land tenure change, concentration of landholdings).

Chinese agriculture was neither in a boom nor in decline in the eighteenth century. Robert Allen uses convincing data collecting methods and finds out that labor productivity in the Yangzi Delta was about 79% of that in England in 1800. England and Jiangnan went on divergent paths in the mid-eighteenth century, with sustained productivity growth in manufacturing and agriculture in England, and static or worsening productivity in Jiangnan.

Although the labor productivity in the Yangzi delta was comparable to that in England, the Chinese crops-only agriculture generated resistance to labor-saving technological innovation, since the crops were already chosen to have high yields and were likely to cover the peasants’ subsistence needs. In Britain, however, the enclosure movement allowed land owners to combine grain and animal husbandry and increased livestock production by 73 percent (other than farm horses) between 1700 and 1800 in the eighteenth century (Huang, 2002).

2. China’s size effect

We should note the different political unit of Jiangnan and Britain in our comparison. Jiangnan was one region in China and was bound to be affected by market dynamics, labor mobility, demographic change and other factors in the rest of China. Pomeranz argues that the socioeconomic crisis outside the delta had nonetheless affected the delta powerfully (Pomeranz, 2002).

According to the world system theory put forward by Immanuel Wallerstein (1974), there are core regions and peripheral regions in the world. There can be a sustainable economic development mechanism as long as the core region can supply manufactures to and receive raw materials from the peripheries. Because of the sheer size of China and her unbalanced regional developments, the argument also applies to China’s inter-regional trade.

The Jiangnan and Lingnan regions were the most advanced areas by the end of the eighteenth century and could be viewed as the “core”; southwest China (e.g. Sichuan and Yunnan), northeast China, and even southeast Asia (although China’s international trade was insignificant compared with internal trade) could be viewed as the “peripheries”. China’s internal trade patterns, however, were blocked by the import substitution in the middle and upper Yangzi River. During 1750 and 1850, the population growth there exceeded China’s average and local residents began to develop proto-industries on their own instead of importing from the Yangzi delta (Pomeranz, 2000). This was partly because newly developed areas were not located at convenient transport centers and costs would be high if they were to continue importing manufactured goods from Jiangnan. Moreover, the Qing government promoted the gender norm of “men plows, women weaves” and encouraged import substitution in the peripheries. Jiangnan’s ability to exchange manufactures for primary products from the peripheries diminished. The reduced demand lowered profits of rural industries in Jiangnan and discouraged merchants and peasants from production innovation.

China as a big country did not have a great enough cushion to buffer its resources shortage shock in Ming and Qing during a population surge. In addition, the sheer size of China means extended bureaucracy with more intermediate layers and difficulties of effective co-ordination. The advantages of size — resources and imperial wealth– were unfortunately located far from the most developed region of Jiangnan and Lingnan. Moreover, by Elvin’s high-level equilibrium argument, a much larger stimulus would be needed in order to provoke a technological breakthrough in China (Elvin, 1973). In Europe, however, there were “slack” resources which could be tapped into when institutional and prices changes make it profitable. This “slack” resources could be gradually mobilized to meet the new population and resource pressures in nineteenth century (Pomeranz, 2000).


Elvin, M. 1973. The Pattern of the Chinese Past. Stanford University Press.

Huang, P. 2002. “Development or Involution in Eighteenth Century Britain and China: A Review of Kenneth Pomeranz’s The Great Divergence,” The Journal of Asian Studies 61 (2): 501-538.

Little, Daniel. “The Involution Debate: New Perspectives on China’s Rural Economic History,”

Pomeranz, K. 2000. The Great Divergence. Princeton University Press.

Pomeranz, K. 2002. “Beyond the East-West Binary: Resituating Development Paths in the eighteenth Century,” The Journal of Asian Studies 61 (2): 539-590.

Wallerstein, Immanuel. 1974. The Modern World System: Capitalist Agriculture and the Origins of the European World-Economy in the 16th Century. New York: Academic Press.

* To those who want to read The Great Divergence: it’s very technical and requires a lot of thinking. If you want light-hearted stuff, this is not for you.

Notes on the China’s Ming Dynasty

I have been reading Ray Huang’s book China: A Macro History. Huang analyzed Chinese history with a macro economic and social perspective, pointing out the long-term effects of many seemingly trivial events. It is definitely worth reading, either for foreigners to get an idea of Chinese history or for Chinese readers to refresh their memory and deepen their thinking about the past.

One of the biggest problems with monarchies is that the enforcement of rules and regulations can easily be arbitrary, since the power is in the hands of the emperor. But the emperor had to strike a balance between his personal will and the will of his bureaucracy which he depended on to run the state. There were constant conflicts, sometimes subtle and sometimes turbulent, between the two powers.

The Ming dynasty was when the power was most concentrated to the emperor in Chinese history. The first Ming emperor set an example to put strict limitations to military powers (which could be explained by his low social status before he became the emperor). He also abolished the post of Zai Xiang (head of bureaucracy), resulting in chaos in the Chinese bureaucracy and increased burden on the emperor.

The Ming rulers after him were less diligent though. They designated their teachers, supposedly the most knowledgable and responsible man, to help them read the messages from local officials. As time went by, the “teacher” essentially took up the role of the “Zai Xiang”. The most obvious example was Zhang Juzheng in the Wanli years (for details, please refer to Ray Huang’s 1587: A Year of No Significance). But the political system then lacked an efficient way to allocate human resources and most importantly, a practical and effective hierarchy. It was no wonder, then, that the Ming politics ended up being manipulated a great deal by eunuchs. They were the closest to the emperor.

In economics terms, the inflow of silver through trade enabled people to use it as currency. The government neglected to manufacture more copper coins and gradually lost control of the prevalent currency. This also contributed to Ming’s break-down.