学位论文

【摘要】

As multi-core processors and networked systems become the norm, concurrent programming has been widely adopted in industry. Practitioners have used concurrent programming models to build many complex and large-scale systems and infrastructures. With the growing popularity of concurrent programming, it is important to assure the reliability of concurrent systems. However, concurrent systems are notoriously difficult to understand, test, and debug, because the interleaving between concurrent processes leads to non-deterministic behaviors in the systems. The number of non-deterministic behaviors grows exponentially, making it challenging to test and analyze concurrent programs at the system level.In this dissertation, we target the problem of system testing and analysis of concurrent programs. We first present a characteristic study on real-world bugs in distributed data-processing production systems to identify common challenges and opportunities in reliability assurance of generic concurrent systems, and motivate the need of testing and analyzing the systems as a whole. Then we describe two solutions to the problem in the context of the Actor model, a popular concurrent programming model based on asynchronous message passing. Actors facilitate building scalable systems by making unintended race conditions and deadlocks less likely. Many large systems such as Twitter, LinkedIn, and Facebook Chat, as well as frameworks such as Microsoft Orleans have used the Actor model. In particular, we propose a target test generation method for effective testing of actor systems, and a behavioral specification inference method for understanding and analyzing actor systems.We conduct a comprehensive characteristic study on 200 production failures and their fixes in a distributed data processing system from Microsoft Bing. We investigate not only major failure types, failure sources, and fixes, but also the debugging practice. Our main findings include (1) one major type of failures is caused by defects in data processing due to frequent data schema changes and exceptional data; (2) another major type of failures is due to the non-determinism in the interactions between a group of concurrent processes; (3) detecting and diagnosing these bugs often requires system level testing and analysis, because in many cases, the root cause of the failure lies in a different process from the failure-manifesting process. Although we study bugs in only distributed data-processing systems, we believe that the second and third findings can be extended to other concurrent systems based on different concurrency models.Motivated by this study, we develop automated solutions to help practitioners improve the reliability of actor systems. To facilitate system testing, we propose a method to support generation of system-level tests to cover a given code location in an actor program. The test generation method consists of two phases. First, static analysis is used to construct an abstraction of an entire actor system in terms of a message flow graph (MFG). An MFG captures potential actor interactions that are defined in a program. Second, a backwards symbolic execution (BSE) from a target location to an “entry point” of the actor system is performed. BSE uses the MFG constructed in the first phase of our targeted test generation method to guide the execution across actors. Because concurrency leads to a huge search space which can potentially be explored through BSE, we prune the search space by using two heuristics combined with a feedback-directed technique. We implement our method in Tap, a tool for Java Akka programs, and evaluate Tap on the Savina benchmarks as well as four open source projects. Our evaluation shows that the Tap achieves a relatively high target coverage (78% on 1,000 targets) and detects six previously unreported bugs in the subjects.To help understand and reason about actor systems, we propose a method for inferring the specification diagram of an actor system from its implementation. The actor specification diagram is a formal model that rigorously describes the global behaviors of a group of actors, in terms of the type and the number of messages exchanged between actors as well as the temporal order between message sending and receiving events. Our inference method first uses static analysis to infer an abstract specification diagram, which soundly captures all potential message flows and faithfully reflects the temporal orders between events enforced by control flows and message flows. Then our method uses dynamic analysis to detect likely invariants from execution traces of the system to further refine the abstract specification diagram. The refinements include instantiating the number of loop iterations, removing false positives, and discovering additional temporal orders between events enforced through coordination constraints in the system. We implement the inference method in a tool ASpec, and evaluate ASpec on the Savina benchmarks as well as two real-world protocols TCP and SIP. The evaluation results show that ASpec is effective in inferring the actor specification diagrams with high accuracy (78 %) on the subjects.

【预览】

附件列表
Files	Size	Format	View
Whole-system testing and analysis of actor programs	942KB	PDF	download


Whole-system testing and analysis of actor programs
Actor;Concurrent System;Concurrent Bug;Test Generation;Message Flow Graph;Backward Symbolic Execution;Behavioral Model;Specification Diagram;Static Analysis;Dynamic Invariant
Li, Sihan
关键词: Actor; Concurrent System; Concurrent Bug; Test Generation; Message Flow Graph; Backward Symbolic Execution; Behavioral Model; Specification Diagram; Static Analysis; Dynamic Invariant;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/106237/LI-DISSERTATION-2019.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：4次	浏览次数：20次

【 摘 要 】

【 预 览 】

【摘要】

【预览】