Back to overview

Lightweight Multi-language Bindings for Apache Spark

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Author Salucci, Luca; Bonetta, Daniele; Binder, Walter
Project Fundamentals of Parallel Programming for Platform-as-a-Service Clouds
Show all

Proceedings (peer-reviewed)

Title of proceedings 22nd International Conference on Parallel and Distributed Computing, Grenoble, France, August 24-26,
DOI 10.1007/978-3-319-43659-3_21


Apache Spark has emerged as one of the most prominent frameworks for distributed high-performance data analysis. Among Spark's most appealing features are its bindings for dynamic languages such as Python and R. Despite of the great flexibility of such languages, they often cannot match the performance of statically typed languages such as Java or Scala. However, this limitation is not only due to the intrinsic nature of dynamically typed languages. Largely, the performance gap is caused by the way the language runtimes interact with Spark. In this paper we describe a new approach to integrating Python and R into data-intensive Spark applications. Our approach significantly reduces the performance gap between such languages and their statically typed counterpart, making dynamic languages an attractive alternative for the implementation of big-data applications.