Software systems have many bugs. Software is getting larger and more complex, with many systems consisting of many millions of lines-of-code: Windows, Oracle, Linux, to name just a few. The rule of thumb in the software industry is that production code contains 5-10 bugs per KLOC, implying that these large systems have many thousands of bugs lurking within, that programmers find too difficult or expensive to eradicate.Large systems are often highly concurrent, and writing correct concurrent systems is one of the most challenging endeavors of software development. With the advent of cheap, parallel hardware such as multi-core CPUs, new software is written with more parallelism and existing systems achieve higher degrees of run-time concurrency, thus exercising more untested paths. Already some of the most insidious bugs are blamed on concurrency; we expect things to get worse.Significant advances have been made in software development tools, but the rate at which they can reduce bugs/KLOC (i.e., bug density) is outpaced by the rate at which software size grows in KLOC (i.e., code volume). E.g., Linux code size more than doubled in the last 5 years and Windows quadrupled in less than 10 years. The net effect of these disparate rates of progress is more overall bugs. In this project, we enabled software systems to learn from past failures and get "stronger" over time. We propose a set of runtime techniques that, with every encounter of a new failure, progressively improve the system's ability to avoid those failures in the future--this is what we call "developing immunity against failures." The specific topic of study is mechanisms for programs to automatically develop immunity against failures that can be avoided with alternate execution paths. We built a system, called Dimmunix, that enables general-purpose applications to defend themselves against deadlock bugs, i.e., avoid deadlocks that they previously encountered. Dimmunix is implemented for Java, POSIX Threads, and Android OS. POSIX Threads and Android Dimmunix currently provide immunity against deadlocks involving mutex locks. Android Dimmunix is implemented within the Dalvik VM, which runs all the Android applications; therefore, Android Dimmunix provides platform-wide deadlock immunity, to all applications running on an Android phone. We also optimized the Java Dimmunix for synchronization-intensive applications. We extended Java Dimmunix with immunity against non-mutex deadlocks, i.e., deadlocks involving read-write locks, semaphores, condition variables, or external synchronization. We ran Dimmunix with real applications, like JBoss, Limewire, Vuze, Eclipse, Apache ActiveMQ, MySQL server, and SQLite. We also implemented a collaborative version of Dimmunix, called Communix. Communix enables machines connected to the Internet to immunize each other against deadlocks. Once a node encounters a deadlock, the other nodes get protected against the deadlock, without having to encounter the deadlock.Dimmunix is available in open-source form for both Java and C/C++ from http://dslab.epfl.ch/proj/dimmunix.
|