VeloxCon 2024: Data Management Innovation

adminApril 29, 2024

VeloxCon 2024, the premier developer conference specializing in the Velox open source project, brought together industry leaders, engineers, and enthusiasts to explore the latest advancements and collaborative efforts shaping the future of data management. Hosted by IBM® in partnership with Meta, VeloxCon showcased the latest innovations in Velox, including project roadmaps, Prestissimo (Presto-on-Velox), Gluten (Spark-on-Velox), hardware acceleration, and more.

Velox Overview

Velox is a unified execution engine built and open sourced by Meta that aims to accelerate and simplify development of data management systems. One of the biggest advantages of Velox is that it integrates with your data management system so you don’t have to keep rewriting the engine. Velox is currently in various stages of integration with several data systems, including Presto (Prestissimo), Spark (Gluten), PyTorch (TorchArrow), and Apache Arrow. You can learn more about why we built Velox on Meta’s engineering blog.

Velox from IBM

Presto is the engine of watsonx.data, IBM’s open data lakehouse platform. Over the past year, we’ve been working hard to advance Velox for Presto (Prestissimo) at IBM. Presto Java workers are being replaced by C++ processes based on Velox. We now have several committers on the Prestissimo project and continue to work closely with Meta as we work on building Presto 2.0.

The main benefits of Prestissimo are:

Hugh Performance improvements: Queries can now be processed on much smaller clusters.
No performance cliffs: Memory tuning improves efficiency, so there are no Java processes, JVMs, or garbage collection.
Easier to deploy and operate at scale: Velox provides reusable and scalable building blocks across data engines like Spark.

This year we plan to do more with Prestissimo, including:

iceberg reader
Production ready (collecting metrics with Prometheus)
Implementation of the new Velox system
Running the TPC-DS benchmark

Veloxcon 2024

We worked closely with Meta to organize VeloxCon 2024, which was a fantastic community event. Over two dynamic days, we heard speakers from Meta, IBM, Pinterest, Intel, Microsoft, and more share their work in progress and their vision for Velox.

Day 1 Highlights

The conference opened with a session from Meta, including Amit Purohit, reaffirming Meta’s commitment to open source and community collaboration. Pedro Pedreira, together with Manos Karpathiotakis and Deblina Gupta, explored the concept of composability in data management, demonstrating the versatility of Velox and its connection to Arrow.

Meta’s Amit Dutta explored the deployment efficiency of Prestissimo at Meta, highlighting the advancements made in optimizing data processing workflows. Remus Lazar, Vice President of Data and AI Software at IBM, introduced Velox’s journey and vision for the future within IBM. IBM’s Aditi Pandit provided insight into Prestissimo’s integration with IBM, highlighting enhancements and future plans.

Equally insightful was the afternoon session, where Meta’s Jimmy Lu revealed Velox’s latest optimizations and features. Intel’s Binwei Yang discussed the integration of the Velox and Apache Gluten projects, emphasizing their global impact. Engineers from Pinterest and Microsoft shared their experience using Velox and Gluten to maximize data query performance and demonstrated tangible performance gains.

The day concluded with a Meta session on Velox memory management by Xiaoxuan Meng and a peek into the new simple aggregation function interface presented by Wei He.

Day 2 Highlights

The second day began with a keynote speech by Orri Erling, co-founder of Velox. He shared insights on Velox Wave and Accelerators to demonstrate the potential for acceleration. NeuroBlade’s Krishna Maheshwari highlighted collaboration with the Velox community by introducing NeuroBlade’s SQL Processing Unit (SPU) and its revolutionary impact on Velox’s computational speed and efficiency.

Sergei Lewis of Rivos explored the possibility of offloading work to accelerators to improve pipeline performance in Velox. William Malpica and Amin Aramoon from Voltron Data introduced Theseus, a composable and scalable distributed data analytics engine that uses Velox as its CPU backend.

Meta’s Yoav Helfman unveiled Nimble, a cutting-edge columnar file format designed to improve data storage and retrieval. Meta’s Pedro Pedreira and Sridhar Anumandla detailed Velox’s new technology governance model, highlighting its importance in guiding the project’s development sustainability.

The day also included a session on Velox I/O optimization by IBM Deepak Majeti, Out-Of-Memory (OOM) prevention strategies by ComputeAI’s Vikram Joshi, and hands-on demos of Velox application debugging by Deepak Majeti.

What’s next for Velox?

VeloxCon 2024 was a testament to the vibrant ecosystem surrounding the Velox project, showcasing groundbreaking innovations and fostering collaboration between industry leaders and developers alike. The conference provided attendees with valuable insights, practical knowledge, and networking opportunities, solidifying Velox’s position as a leading open source project in the data management ecosystem.

If you’d like to learn more and get involved in the Velox community, check out the following resources:

Stay tuned for more updates and developments from the Velox community as we continue to push the boundaries of data management and accelerate innovation together.

try Presto with a free trial of watsonx.data.

Was this article helpful?

yesno

IBM Presto Community Team and Community Chair

adminApril 29, 2024