Archive

Author Archives: mdwardlab

The current data situation on the Web

The current data situation on the Web. Not pictured: landing net R. Image from commons.wikimedia.org

This is a guest post by Simon Munzert, PhD student at the University of Konstanz, who is currently on a visit at the Lab.

It’s not that the people here at Duke’s Department of Political Science—and the WardLab members in particular—risk to run out of hot data in the near future. As somebody who is primarily concerned with research on public opinion and election forecasting, I was stunned in view of the masses of high quality event data and its potential for so many applications. Still, during my short stay at the Lab as a visiting scholar I had the opportunity to give a little introduction to various web scraping techniques using R.

Why web scraping? We have observed that the rapid growth of the World Wide Web over the past two decades tremendously changed the way we share, collect and publish data. Firms, public institutions and private users provide every imaginable type of information and new channels of communication generate vast amounts of data on human behavior. As many data on the Web are products of social interaction, they are of immediate interest for us as social scientists. Over the past years research on computer-based methods for classification and analysis of existing large amounts of data is booming across all disciplines, and political scientists contribute heavily to this process.

Read More

实验室文章:迈向冲突预测的新时代

Journal Article , 2013

文章摘要

在冲突研究的领域中,虽然预测分析的重要性不言可喻,但是却一直没有受到足够的重视。我们认为,预测不仅具有实质公共政策参考的能力,另一方面也能用来检证既有理论模型、避免统计上过度配适(overfitting)且降低确认误差(confirmation bias),藉以建构出更可靠的冲突预测。在本篇文章中,我们回顾了学界在冲突预测研究中有哪些进展,发现由于这五十年来学科在资料搜集和运算能力的进步下,研究者得以从事过去所难以企及的预测研究工作,尤其在自动化的编码程序辅助下,快速的搜集数字化的新闻讯息成为可能,冲突研究得以应用以每日、每周、每月为单位的事件解析数据(disaggregated event data)来进行国家层次以下,有关政府与反抗团体的个体活动资料进行及时性的冲突预测工作。

为了呈现冲突研究在过去几年的重大进展,本文重新检视Fearon and Laitin (2003)这份奠定冲突研究基础的文献,从而比较和凸显预测分析在近几年的进展。结果发现,虽然Fearon and Laitin的研究中有很多的解释变量具有统计上的显著性,但是模型对于样本外事件的预测精确度却不高,这因为利用观察型的资料建构出具有统计上显著变量的模型,并无法回答像是何时、何处会发生内战这种决策者所关注的预测问题。

Read More

Mining Texts to Generate Fuzzy Measures of Political Regime Type at Low Cost.  Reposted from Dart Throwing Chimp, by Jay Ulfelder.

Political scientists use the term “regime type” to refer to the formal and informal structure of a country’s government. Of course, “government” entails a lot of things, so discussions of regime type focus more specifically on how rulers are selected and how their authority is organized and exercised. The chief distinction in contemporary work on regime type is between democracies and non-democracies, but there’s some really good work on variations of non-democracy as well (see here and here, for example).

Unfortunately, measuring regime type is hard, and conventional measures of regime type suffer from one or two crucial drawbacks.

Read More

It is the end of the year, and we’re supposed to be reflective.  But not too much. After all, this is a blog. The colleagues in this lab are terrific and it serves to pause for a moment to reflect on one tiny aspect of their accomplishments this last year: their publications.  I do think publishing is broken, but not everyone is ready or able to abandon ship just yet.  You will read no whine about publishing here. Well, at least not today.  In any case, we have been remarkably successful as you can see below. Why?

One reason is that research in 2013 is a collaborative process. It took sixteen of us to produce the dozen or so articles listed below. This means that we can do a lot collectively, but each of us has to do a lot individually to make that happen.  Indeed, we can do more collectively than each of us can do individually. Partially, this is supported by good will and common purpose, but more than a sliver of dropbox, github, and skype are involved as well. And some tolerance for the 24/7 lifestyle that everyone leads.  We live in a fantastic world where anyone with a laptop and internet access can really collaborate with colleagues who might be (as “we” have been at various times) in London, India, Seattle, Pennsylvania, Korea, Mexico, Austin, Croatia, Madison, New York, Santiago, Berlin, or Boulder Colorado.

It is also important to recognize that we have made a decision to join together and work together on projects. Most of these projects have a common theme, sure. But that theme is fairly permeable and open. And, the amount of what we really do not know about political life remains enormous. As a result, opportunities abound. But “suddenly” we have a lot of new ways of thinking about and investigating the perplexing world we live in.  We are not really always stuck in the corner solving things the so-called Gell-man way (sitting in our office and thinking real hard).  That may be helpful, but so is doing proofs, writing simulation code, querying databases, and writing computer programs. These things are especially helpful after a bit of reflection, but it turns out that they work better if the ideas being investigated have been annealed by discussion and dialogue among interested colleagues, who often see weakness and nuance where if left to our own devices  we might not perceive even the most glaring imperfection, let along the smallest.

Collaboration with bright colleagues is terrifically fun, and I am truly grateful to have the opportunity to participate with them in this lab.  Here is a list of projects that we published in the year 2013, minus a few things still snagged by reviewer number three.. Stay tuned for more good things in 2014 and for a forthcoming post on current lab projects.

  1. Michael D. Ward, Nils W. Metternich, Cassy L. Dorff, Max Gallop, Florian M. Hollenbach, Anna Schultz, and Simon Weschle. “Learning from the Past and Stepping into the Future: Toward a New Generation of Conflict Prediction,” International Studies Review (2013) 15, 473–490.
  2. Michael D. Ward, Cassy L. Dorff. “Les réseaux, les dyades et le modèle des relations sociales.” Liber amicorum: Hommage en l’honneur du Professeur Jacques Fontanel. Ed. Liliane Perrin-Bensahel and Jean-Francois Guilhaudis L’Harmattan, March, 2013: 271-288.
  3. Kristin M. Bakke, John V. O’Loughlin, Gerard O’Tuathail, and Michael D. Ward. “Convincing State-Builders? Disaggregating Internal Legitimacy in Abkhazia.”International Studies Quarterly 58.3 (2013).
  4. Cassy L. Dorff and Michael D. Ward. “Networks, Dyads, and the Social Relations Model.” Political Science Research Methods 1.2 (December, 2013): 159-178.
  5. Nils W. Metternich Cassy L. Dorff, Max Gallop, Simon Weschle & Michael D. Ward. “Anti-Government Networks in Civil Conflicts; How Network Structures Affect Conflictual Behavior.” American Journal of Political Science 57.4 (October, 2013): 777-1028.
  6. Michael D. Ward, John S. Ahlquist, and Arturas Rozenas. “Gravity’s Rainbow: A Dynamic Latent Space Model for the World Trade Network.” Network Science 1.1 (March, 2013): 95-118.
  7. Xun Cao and Michael D. Ward. “Do Democracies Attract Portfolio Investment? Transnational Portfolio Investments Modeled as Dynamic Network.” International Interactions 39.1 (2013 in press): in press.
  8. Jacob M. Montgomery, Florian M. Hollenbach, and Michael D. Ward. “Aggregation and Ensembles: Principled Combinations of Data.” PS: Political Science & Politics 46.1 (January, 2013): 43-44.
  9. Kristian Skrede Gleditsch and Michael D. Ward. “Forecasting is Difficult, Especially about the Future: Using Contentious Issues to Forecast Interstate Disputes.”Journal of Peace Research 50.1 (2013): 17-31.
  10. Jan Pierskalla and Florian M. Hollenbach. “Technology and Collective Action: The Effect of Cell Phone Coverage on Political Violence in Africa.” American Political Science Review 107.2 (2013): 207-224.
  11. Matthew Dickenson. “Leadership Transition and Violence in Mexican Drug Trafficking Organizations 2006-2010.”  Journal of Quantitative Criminology in.press (2013): tba.
  12. Simon Weschle. “Two Types of Economic Voting: How Economic Conditions Jointly Affect Vote Choice and Turnout.” Electoral Studies in press (2013).
  13. December 30 update: Jacob M. Montgomery and  Josh Cutler. “Computerized Adaptive Testing for Public Opinion Surveys.” Political Analysis 21.2 (2013): 172-192. 

Gilbert F. White was a giant in the field of natural hazards, and a former colleague in Boulder at the University of Colorado, where he was an early director (beginning in 1970) of the Institute of Behavioral Science. Decades before that he had written his dissertation about how humans dealt with floods and his work led to the establishment in the early 1950s of a Federal framework that graded the probability of floods. Now it is easy to ascertain the 100-year flood plain for any locale in the United States, since by law this is required of city and state planners. The city to which he moved, and in which we were colleagues, has it’s own connection to the subject of his research, as Boulder experienced a massive flood that devastated the city about a century ago.

A Century Flood?

The 1894 Boulder Flood

The Boulder flood plain for 100 and 500-year floods developed in part as a result of White’s activism in planning for floods. Gilbert White’s office was just outside of the flood plain, up on a hill, overlooking it–near where I am temporarily sitting at this instance. But his last house in Boulder was not. And, anyone who followed the news this fall of the floods in Boulder–which were considered by many to be of the 100-year variety, may not know that Gilbert White’s advice probably saved many lives, as he argued for structures to be built that could interact with floods in a way to diminish risk (i.e., breakaway bridges, et cetera). Gil was famous for many things, including the quote “Floods are `acts of God’ but flood losses are largely acts of man,” which was taken from his dissertation. In the 1980s he convinced the Boulder City Council that Boulder had previously experienced a flood even larger than the huge flood of 1894. As a result building in the flood plain was restricted (a bit) and knockout bridges were built. I remember reading an article when I arrived in Boulder in the early 1980s about Gilbert’s warnings about a 100-year flood, which pictured Gilbert then in his 70s standing in the rushing Boulder Creek. You can listen to Gilbert discussing this issue as well as see a version of the Boulder floodplain.

Read More

Reviewer 1
This turkey is a bit over done. I think the problem is that the authors need a better theory of turkey before they try to stick one in an oven for four hours and then serve it. A recent example is recently published in the Journal of Poultry and many earlier contributions in Giblets and Drumsticks have been overlooked. Many earlier scholars have actually caught their own turkeys and fed them assumptions and corn to produce a really substantial turkey, that not only reflects the theory of turkey, but also glistens with the implications of a well thought out turkey. Until a better theory of turkey is employed to motivate this particular baked turkey, it is hard to reach a satisfactory conclusion with this effort. While I appreciate the efforts, I don’t support revising this particular turkey for resubmission, though I am tempted to suggest that a soup be created with the remains.
Reviewer 2
Have the authors never tasted chicken? Neither duck? Medieval scholars knew that a combination of these fowl with turkey was necessary to provide a substantial empirical test of the “Thanksgiving Hypothesis.” Curiously, the authors have ignored this long standing research tradition, even though there is a Stata recipe that will undertake this effortlessly for them. Surely this could easily be done in revisions.
Reviewer 3
I appreciate the authors efforts to examine the “Thanksgiving Hypothesis,” but it would appear there is a serious flaw in their analysis. The turkey has been cooked, and we see the standard inclusions: sweet potatoes, mashed potatoes, gravy, freshly cooked rolls, and even cranberry sauce. I even appreciate the introduction of oysters as an instrument into the stuffing to rule out the endogeneity that the turkey was actually fed ground fishmeal. But there is no adequate control–such as a tofurkey–introduced to examine the possiblity that a general triptophane coma is responsible for outcomes in the “Thanksgiving Hypothesis.” That and the absence of soup leads me to conclude that this project is not ready. But I am encouraged enough to recommend revisions.

Editor: The reviewers see much merit in your work, but point to serious missteps as well. I have personally tasted a Turkey dinner, and would like to suggest that after considering the comments above, you revise your procedures and resubmit the results. If you choose to do so, I will send the effort to a new round of reviewers, including one of the original critics. If you decide to accept this invitation, I will need to have your submission by November 27th, 2014.

ICEWS is an early warning system designed to help US policy analysts predict a variety of international crises. This project was created at the Defense Advanced Research Projects Agency in 2007, but has since been funded (through 2013) by the Office of Naval Research. ICEWS has not been widely written about, in part because of its operational nature, and in part because articles about prediction in politics face special hurdles in the publication process. An academic article (gated) described the early phase of the project in 2010, including assessments of its accuracy, and a WIRED article in 2011 criticized ICEWS for missing the Arab Spring–at a time when the project was only focused on Asia.

In an article (here for now) forthcoming in the International Studies Review, as one of the original teams on the ICEWS project, we highlight the basic framework used in the more recent, worldwide version of ICEWS. Specifically, we discuss our model that is focused on forecasting, which is our main contribution to the larger, overall project. We call this CRISP. We argue that forecasting not only increases the dialogue between academia and the policy community, but that it also provides a gold standard for evaluating the empirical content of models. Thus, this gold standard improves not only the dialogue, but actually augments the science itself. In an earlier article in Foreign Policy, with Nils Metternich, we compared Billy Beane and Lewis Frye Richardson (sort of).

wardlab

Read More

Political conflicts are rarely between two parties.  In Iraq, for example, there were as many as 19 different groups engaged, including the Islamic Army in Iraq, Al-Qaeda in Iraq, the Jihadist Leagues, and the Just Punishment Brigades. In Syria, we see a similar picture, including the Free Syrian Army, the Syrian Liberation Front, the Syrian Islamic Front, and Jabhat al-Nusra. Many attempts to understand these kinds of situations group all the rebel forces together against a government.  But neither are the rebels unified, and monolithic. Nor, necessarily, is the government. We explore a theory of the interactions among these various kinds of factions in order to better understand what kinds of actions are most likely to be undertaken. To do so, we combine elements of strategic calculation and the analysis of networks. The basic insight is the old saw, often attributed to the 6th century (BCE) Chinese general, Sun Tzu: hold your friends close, and your enemies closer.

Thailand data

The top rug shows the different parties that are in power in Thailand during the observation period, with markers for changes in power. The bottom plot shows conflictual events in Thailand from 1998 on.

Read More

GDELT (gdelt.utdallas.edu) is a global database of events which have been coded from vast quantities of publicly available text that is produced by the world’s new media. It has created a great deal of excitement in the social science community, especially within the field of international relations. But it has had wider visibility as well: in August 2013, there were 150,000 views of a map of protest activity around the world, based on the GDELT database.  Event data have been around for several decades, but the GDELT project has generated new interest.

ICEWS is an early warning system designed to help US policy analysts predict a variety of international crises to which the US might have to respond. These include international and domestic crises, ethnic and religious violence, as well as rebellion and insurgency. This project was created at the Defense Advanced Research Projects  Agency, but has since been funded (through 2013) by the Office of Naval Research. ICEWS also produces  a  rich corpus of text which is analyzed with powerful techniques  of automated event-data production.  Since GDELT and ICEWS are based on similar, though not identical methods and sources, it is interesting to compare them.

ICEWS data

ICEWS event data, gray line for stories and black line for events, 2001-2013

One area in which they are most conceptually different is that ICEWS follows a more traditional approach to event data in seeking to encode a chronology of events that reflects in some sense  the putative ground truth of what occurred. The figure on the right shows the corpus of stories in ICEWS (gray) and the resulting events (black): total events are fairly stable over time event though the number of media stories increases. GDELT is more concerned with getting a comprehensive catalogue of all media stories (and other text) on reported events, and the corpus of those media stories is increasing exponentially, as the figure below shows. As a result, the number of events in GDELT is also increasing over time, much more so than ICEWS.

Read More