Serverless Data Analytics Platform

CloudButton Logo
Contact
Rut Palmero
Coordinator
Universitat Rovira i Virgili
Funding Program
H2020 h2020
Project Date
to

This project is inspired by the following sentence from a professor of computer graphics at UC Berkeley: “Why is there no cloud button?” He outlined how his students simply wish they could easily “push a button and have their code – existing, optimized, single-machine code – running on the cloud."

The main goal is to create CloudButton: a Serverless Data Analytics Platform. CloudButton will “democratize big data” by overly simplifying the overall life cycle and programming model thanks to serverless technologies. To demonstrate the impact of the project, we target two settings with large data volumes.

To achieve these ambitious objectives, CloudButton defines the following goals:

  1. High-Performance Serverless Run-time: We will create the first FaaS compute run-time for Big Data analytics leveraging Apache OpenWhisk.The proposed serverless big data analytics platform will be based on a mature open-source codebase (Apache OpenWhisk) augmented with Apache Open Whisk Composer.
  2. Mutable Shared Data Middleware for Serverless Computing: We will create Distributed Mutable Data Structures leveraging RedHat Infinispan In-Memory Data Grid. Our middleware will provide language-level constructs for data persistence, dependability, and concurrency control to serverless functions.
  3. CloudButton Toolkit: Serverless Data Analytics Platform: The toolkit will implement on top of [1] and [2] the Serverless Cloud Programming Abstractions that can express a wide range of existing data-intensive applications with minimal changes. We will develop new tools and methodologies to port existing data-intensive applications from the HPC, data analytics, and machine learning domains to the CloudButton toolkit.

The impact of the project will be validated with large data volumes of bioinformatics and geospatial data:

  • Genomics: Serverless technologies can overcome scaling limitations of research centers computational resources, improving the scalability and productivity when processing large datasets.
  • Metabolomics: Expand the analysis of metabolomics raw data and boost external access and efficient re-use of open data.
  • Geospatial: Conduct geospatial analyses in order to increase productivity, scalability, and performance of relevant environmental applications using open-access LiDAR and satellite data.

 

Our role

We are in charge of the technical development of the SLAs management to ensure the highest levels of QoS in data-intensive applications. It also provides and manages the testbed to deploy, execute, and test the use cases. While contributing to the dissemination and exploitation of results.