期刊论文详细信息
Research & Politics
Mining texts to efficiently generate global data on political regime types:
Shahryar Minhas1 
关键词: Democracy;    autocracy;    machine learning;    methodology;   
DOI  :  10.1177/2053168015589217
学科分类:社会科学、人文和艺术(综合)
来源: Sage Journals
PDF
【 摘 要 】

We describe the design and results of an experiment in using text-mining and machine-learning techniques to generate annual measures of national political regime types. Valid and reliable measures of countries’ forms of national government are essential to cross-national and dynamic analysis of many phenomena of great interest to political scientists, including civil war, interstate war, democratization, and coups d’état. Unfortunately, traditional measures of regime type are very expensive to produce, and observations for ambiguous cases are often sharply contested. In this project, we train a series of support vector machine (SVM) classifiers to infer regime type from textual data sources. To train the classifiers, we used vectorized textual reports from Freedom House and the State Department as features for a training set of prelabeled regime type data. To validate our SVM classifiers, we compare their predictions in an out-of-sample context, and the performance results across a variety of metrics (accuracy, precision, recall) are very high. The results of this project highlight the ability of these techniques to contribute to producing real-time data sources for use in political science that can also be routinely updated at much lower cost than human-coded data. To this end, we set up a text-processing pipeline that pulls updated textual data from selected sources, conducts feature extraction, and applies supervised machine learning methods to produce measures of regime type. This pipeline, written in Python, can be pulled from the Github repository associated with this project and easily extended as more data becomes available.

【 授权许可】

CC BY-NC-ND   

【 预 览 】
附件列表
Files Size Format View
RO201902021131909ZK.pdf 6575KB PDF download
  文献评价指标  
  下载次数:41次 浏览次数:37次