Really Big Data: The Challenges of Managing Zettabyte-Scale Data in Real Time

Oana Balmau - McGill University

Feb. 19, 2021, 2:30 p.m. - Feb. 19, 2021, 3:30 p.m.

Zoom (see link below)

Hosted by: Paul Kry


The data we produce is growing at an unprecedented pace. Interconnected Internet of Things (IoT) devices, including sensors, smartphones, and cameras are expected to generate 79.4 Zettabytes of data in 2025.  In addition, a significant demand for real-time data processing will be posed by applications such as smart factories, health monitoring, augmented/virtual reality, and transportation. Real-time data (e.g., video, logs, location tracking) is predicted to be 30% of the data created in 2025.

Given the vast amounts of IoT data and the increasing real-time requirements, data management will be one of the most important challenges in real-time edge computing. Edge systems will be challenged by new workloads, and new performance requirements. First, workloads evolved from following a read-heavy pattern (e.g., a static web-page) to a write-heavy profile where the read:write ratio is closer to 1:1. Second, performance requirements in edge computing systems now emphasize low tail latency, in addition to high throughput.

In this talk, I will present the pressing challenges that will be posed by data management in edge computing. I will then discuss opportunities to address these challenges based on recent breakthroughs in storage hardware, in particular non-volatile memory and fast block-addressable Optane SSDs. To illustrate the potential of storage systems that use novel storage hardware, I will present KVell, a new design for key-value stores for Optane SSDs. KVell departs from the conventional wisdom of optimizing disk usage –an assumption that has underpinned all past storage system design– and instead optimizes CPU usage. Thanks to its novel design, KVell achieves up to 5x better throughput, and up to two orders of magnitude lower tail latency.

 

Oana Balmau is an Assistant Professor at McGill University. She completed her PhD in Computer Science at the University of Sydney, advised by Prof. Willy Zwaenepoel. She earned her Bachelors and Masters degrees in Computer Science from EPFL, Switzerland. Her research interests are computer systems and storage technologies. Currently, she is focusing on redesigning edge storage systems, persistent memory technologies, and their role in the way we manage large-scale data for Internet of Things workloads and Data Science. Oana received the CORE John Makepeace Bennet Award 2021 for the best computer science dissertation in Australia and New Zealand, as well as a Best Paper Award in the USENIX Annual Technical Conference (USENIX ATC) 2019.

Zoom link: https://mcgill.zoom.us/j/89510442560 (zoom login required)

Reception after the talk in gather town: https://gather.town/app/3qgGGqVmX8sDW2Zb/Reception