学位论文详细信息
Neural Language Models for Data-Driven Programming Support
neural language models;Information and Library Science;Social Sciences;Information
Rong, XinRadev, Dragomir Radkov ;
University of Michigan
关键词: neural language models;    Information and Library Science;    Social Sciences;    Information;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/138509/ronxin_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Programming can be hard to learn and master. Search engines and social Q&A websites offer tremendous help to programmers, but great expertise (e.g., ;;Google-fu”) is required to efficiently use these resources and successfully solve complex problems. An integrated system that can recognize a programmer’s tasks and provide contextualized solutions is thus desirable, and ideally programmers can interact with the system using natural input channels, in a way similar to how they communicate with a human expert. To enable such an integrated system, neural language models constitute a promising solution. These models encode programming language in the same high-dimensional space with data of other modalities, and can be trained in an end-to-end fashion. By leveraging the massive data about programming knowledge that are available online, including social Q&A websites, tutorials, blogs, and open-source code repositories, we can train neural language models to support a variety of user intentions, including the long-tail ones. We propose three studies related to using neural language models to solve programming problems in practice. First, we introduce CodeMend, an intelligent programming assistant that supports interactive programming. The system employs a bimodal embedding model to encode programming language and natural language in the same vector space. We demonstrate that this model can effectively understand the code context and associate it with user input to suggest relevant code modifications. We also develop novel user interface to render search results in a way that makes the problem solving process more efficient. Second, we propose a deep learning pipeline that converts data visualization images to source code. The pipeline is built by using computer vision techniques and recurrent neural networks, and it supports the user to get source code generated based on visual examples. We develop novel techniques that augment existing a limited set of training samples via code parameterization and random variation. We also propose strategies that can adapt the general-purpose neural language model to fit the task of predicting source code. Third, we introduce LAMVI, a set of visualization tools for diagnosing issues with neural language models. It tracks the ranks of individual candidate outputs for user-selected queries, and supports the exploration of the corresponding hidden-layer activations. It also tracks influential training instances, and provides guidance for taking actions for tuning the model. The system is evaluated on simulated datasets facilitates the user to efficiently adapt mature neural language models to new datasets or new tasks. Collectively, these three components form an integral solution to computer-assisted problem solving for programmers driven by big data, and may have impact on various different domains, including natural language processing, machine learning, software engineering, and interactive data visualization.

【 预 览 】
附件列表
Files Size Format View
Neural Language Models for Data-Driven Programming Support 4399KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:8次