"Apache Arrow, Hostage Negotiator: Revisiting the case for Client Protocol Redesign" ( 2026 )

Saturday at 17:00, 20 minutes, UB2.252A (Lameere), UB2.252A (Lameere), Databases Matthew Topol , video

In 2017, Mark Raasveldt and Hannes Mühleisen (who went on to create DuckDB presented a VLDB paper entitled “Don’t Hold My Data Hostage – A Case For Client Protocol Redesign.” Their paper proposed the use of columnar serialization to achieve order-of-magnitude improvements in query result transfer performance. Eight years later, this talk revisits Raasveldt and Mühleisen’s argument and describes the central role that the Apache Arrow project has played in realizing this vision—through the dissemination of Arrow IPC, Arrow Flight, Arrow Flight SQL, Arrow over HTTP, and ADBC across numerous open source and commercial query systems. The talk concludes with a call to action to introduce Arrow-based transport to the systems that continue to “hold data hostage.”

 "Future of the Arrow ecosystem BOF" ( 2025 )

Saturday at 10:00, 60 minutes, H.3242, H.3242, BOF - Track B Antoine Pitrou

Apache Arrow has become a critical foundation for the data science and data analytics FOSS communities. It is also a large project, with a number of official specifications, even more implementations, and common grounds with other projects such as Parquet.

This session will gather Arrow maintainers, contributors and interested parties (such as maintainers of other FOSS data analytics projects) to discuss the Arrow project, its continued sustainability and possible directions for the future.

 "ODBC Takes an Arrow to the Knee" ( 2025 )

Saturday at 13:50, 30 minutes, UB5.132, UB5.132, Data Analytics Matthew Topol , slides , video

For decades, ODBC/JDBC have been the standard for row-oriented database access. However, modern OLAP systems tend instead to be column-oriented for performance - leading to significant conversion costs when requesting data from database systems. This is where Arrow Database Connectivity comes in!

ADBC is similar to ODBC/JDBC in that it defines a single API which is implemented by drivers to provide access to different databases. The difference being that ADBC's API is defined in terms of the Apache Arrow in-memory columnar format. Applications can code to this standard API much like they would for ODBC or JDBC, but fetch result sets in the Arrow format, avoiding transposition and conversion costs if possible..

This talk will cover goals, use-cases, and examples of using ADBC to communicate with different Data APIs (such as Snowflake, Flight SQL or postgres) with Arrow Native in-memory data.

 "Apache Arrow: The Great Library Unifier" ( 2025 )

Sunday at 11:20, 30 minutes, UB2.252A (Lameere), UB2.252A (Lameere), Low-level AI Engineering and Hacking Matthew Topol , slides , video

There are multiple low-level libraries used for AI development with GPUs such as PyTorch, libcudf, and TensorFlow. Each has pros and cons with different available algorithms and functions, so how do you pick which one to use? Instead of having to pay the cost for copying data back and forth between GPU and CPU, data can be passed around between these various libraries while leaving it on the GPU and sharing pointers to device data!

This talk will cover how to leverage the Apache Arrow data format and its C Device Interface, in conjunction with DLPack to connect these various libraries together for building low-level AI pipelines. We'll go over examples of handing off data between libraries without forcing extraneous copies from GPU to CPU and back, utilizing HuggingFace's Arrow formatted caches for training, and efficient conversion between Arrow and DLPack interfaces to unify multiple libraries for customized processing.

 "What can PyArrow do for you - Array interchange, storage, compute and transport" ( 2025 )

Sunday at 11:00, 30 minutes, UD2.218A, UD2.218A, Python Rok Mihevc Alenka Frim , slides , video

PyArrow is a powerful tool for Python developers seeking high-performance data processing and interchange. This talk will provide a pragmatic overview of some of PyArrow's capabilities, demonstrating data interchange, storage, manipulation and transport using a single Python library.

We'll explore four key capabilities:

Array Interchange: Seamless data exchange between NumPy, pandas, and other libraries using zero-copy Storage: Efficient serialization and file format support (Parquet, ORC, Feather) with advanced compression Compute: High-performance in-memory computation and data transformation capabilities Transport: Leveraging Arrow Flight RPC for distributed data movement and processing

 "Apache Arrow tensor arrays: an approach for storing tensor data" ( 2025 )

Saturday at 14:30, 5 minutes, UB5.132, UB5.132, Data Analytics Rok Mihevc Alenka Frim , slides , video

This talk introduces Apache Arrow's tensor arrays as a tool for representing an array of tensors in memory, their storage and transportation. We'll introduce the tensor array memory layout specification, its implementation in Arrow C++ and Python, showcasing how it can help interoperate with PyData and database ecosystems.

We'll present the fixed and variable shape tensor array specifications, their implementations and how they can be used to interoperate with Arrow aware ecosystem such as DLPack, NumPy, and others. Further we'll discuss design decisions we made to make the two tensor arrays as generic and universal as possible.

 "Federating Databases with Apache DataFusion: Open Query Planning and Arrow-Native Interoperability" ( 2026 )

Saturday at 17:50, 20 minutes, UB2.252A (Lameere), UB2.252A (Lameere), Databases Michiel De Backker Ghasan Mohammad (hozan23) , slides , video

Apache DataFusion is emerging as a powerful open-source foundation for building interoperable data systems, thanks to its strongly modular design, Arrow-native execution model, and growing ecosystem of extension libraries. In this talk, we'll explore our contributions to the DataFusion ecosystem—most notably DataFusion Federation for cross-database query execution and DataFusion Table Providers that connect DataFusion to a wide range of backends.

We'll show how we use these components to federate queries to databases such as TiDB and InfluxDB 2, and how this fits into a broader data fabric/API generation work we're doing at Twintag. We'll also discuss our work on Arrow-native interfaces, including an Arrow Flight SQL Server implementation for DataFusion and a prototype Flight SQL endpoint for TiDB, which together enable a fully Arrow-based pipeline spanning query submission, execution, and federated dispatch.

The session highlights practical patterns for building distributed data infrastructure using open libraries rather than monolithic systems, and offers a look at where Arrow and DataFusion are headed as shared interoperability layers for modern databases.

 "[Servers] Apache James: Modular email server" ( 2024 )

Sunday at 11:55, 20 minutes, H.2213, H.2213, Modern Email devroom TELLIER Benoit , video

Apache James was born in 2003 as an Apache top level project with the ambition to bring the "mailet" serlvet-for-mail.

20 years later, mailets are still well alive and still proposes a high flexible way to express and overload your email possessing. But Apache James now allows even more extensions: overriding the SMTP stack, listening to mailbox events, adding WebAdmin HTTP endpoints, customizing IMAP commands, and much, much more.

The project also provides a unique toolkit for building your own email server.

 "Don't stand there and gawk, extend it!" ( 2025 )

Sunday at 15:50, 20 minutes, H.1308 (Rolin), H.1308 (Rolin), Declarative and Minimalistic Computing Efraim Flashner , video

Awk was first included in Version 7 Unix in 1977 and then got a major upgrade in 1985 for Unix System V Release 3.1. After this GNU awk (gawk) was first released in 1988 and other versions of awk have also been written. It is part of the Single Unix Specification and part of the Linux Standard Base specification, so it is pretty much everywhere. Awk uses a pattern-action structure to manipulate numbers and strings and has a lot more power than many people really use it for. With a simple, C inspired syntax it is easy to prototype larger projects or to create self-contained scripts. Starting around gawk version 4.1.0 in 2013 the AWKPATH and AWKLIBPATH environment variables were introduced to allow adding external scripts and compiled libraries to be used to extend gawk without needing to vendor functions. Come see how easy it really is to extend gawk and create your own scripts and plugins.

 "AT: The Billion-Edge Open Social Graph" ( 2026 )

Sunday at 16:15, 30 minutes, AW1.126, AW1.126, Decentralised Communication Alexander Garnett , video

Social graphs are a well-understood technology. Using infrastructure and standardized protocols that are usually de facto controlled by large, commercial platforms, they provide a way of structuring and querying data about individual nodes (often users) in a network and the relationships (edges) between these nodes. They are theoretically extensible, and social graph data can typically also be represented using open standards like RDF which can be published and consumed by other authorities participating in a network. However, trying to enable participation or federation this way is frequently wishful thinking, and does not really facilitate scaling that social graph beyond a particular API representation of rows in one organization’s database.

The Atmosphere — built on AT — presents a different approach. When you write data using Atmosphere APIs, such as by posting to Bluesky, that data is associated with your personal data repository. These personal data repositories can be hosted or migrated anywhere across the Atmosphere. Each Atmosphere app declares its own schema (Lexicon), and reads and writes its own set of fields. These fields can be read by any other app built on the Atmosphere, allowing users to both a) own and b) span their graphs across the network.

This enables several in-demand use cases. Building “big world” social apps with AT is only a matter of creating new lexicons to support additional data models, designing app views which serve this data (along with any other data that may already be available to a user’s graph from other AT apps), and self-hosting the necessary infrastructure.

We provide implementation patterns, along with primitives and tools that are of interest to almost all implementers — like OAuth Scopes and moderation tools. We also provide a social networking app (Bluesky Social) that serves as both a reference implementation for the protocol, and a critical-mass opportunity to populate users’ social graphs so that other application developers can benefit from shared data. Regardless of which application is using this data, all of it is open, public, and associated with individual users’ data repositories, which can be migrated across the network at will.

This talk will provide a demonstration of some fundamental AT technologies, including: "Sipping the Firehose" - working with the stream, a demo of creating records and have them pop right out “Getting backlinks with Constellation” - querying social interactions in real time, and building that data into different interfaces “Lexicon Authoring” - a discussion of best practices for creating additional schemas, with examples from other apps in the Atmosphere

 "Open Source based Software Composition Analysis at scale" ( 2024 )

Sunday at 14:30, 30 minutes, K.4.401, K.4.401, Software Bill of Materials devroom Marcel Kurzmann , slides , video

Creating and processing SBOMs at scale based on Open Source solutions: Intro to a new Eclipse Foundation Project Apoapsis (see also https://projects.eclipse.org/projects/technology.apoapsis ) providing a server concept to run continuous Software Composition Analysis for a large number of heterogeneous repositories. The talk will show the general setup how you can continuously generate your SBOMs and reports and provide the status of the published reference implementation the "ORT-Server" interacting with the OSS Review Toolkit. Diversity and agility are high values in the Software community. Diversity and agility in Software Development processes and tools are a challenge for automation, though.

Accurate Software Composition Analysis is an important capability to keep transparency throughout the Software Lifecycle and is the base for the fulfillment of important non-functional requirements in the business context (e.g. SBOM-creation, Vulnerability Tracking, License compliance etc.)

To handle automation with both aspects - accurate Software Composition Analysis and heterogeneous and agile environments -  the Abstraction Layer for Software Composition Analysis (ALSCA) of the new Eclipse Foundation Apoapsis Project plays an important role.

The Eclipse Apoapsis-project consolidates the requirements from the tooling side on the one hand and the requirements from the institutionalized operation side in medium to large organizations on the other hand. Concerning specifications and wording it will be based on the capability map created by the Open Chain Tooling Group in the context of Open Source Management (https://github.com/Open-Source-Compliance/Sharing-creates-value/tree/master/Tooling-Landscape/CapabilityMap).

The Eclipse Apoapsis project provides blueprints to run central Software Composition Analysis pipelines at scale while covering a large range of project setups (e.g. from Mobile Apps using Cocoapods to Cloud Services using Java/Maven) and configurable extent of analysis (e.g. from mere SBOM-creation to full-blast Dependency Analysis including Vulnerabilities and Copyright/License reports).To achieve this, the ORT-server is based on the OSS Review Toolkit and makes use of its integration APIs for dependency analysis, license scanning, vulnerability databases, rule engine, and report generation. The Eclipse Apoapsis project itself will concentrate on the server functionality including user and role management and the necessary APIs.

 "Using elliptic curve cryptography for the purposes of identity" ( 2024 )

Saturday at 16:20, 15 minutes, H.2215 (Ferrer), H.2215 (Ferrer), Lightning talks Yarmo Mackenbach , slides , video

ASPs (or Ariadne Signature Profiles) are "online passports" secured by modern cryptographic standards that let you publicly prove your Fediverse accounts, your Matrix account, your git forge accounts, and many more. People can use websites like keyoxide.org to verify the validity of such online passports.

This talk walks you through the steps of getting started with elliptic curve cryptography and using it to create an online passport compatible with Keyoxide, providing snippets of Rust code for each step. This talk also hopes to convey that "cryptography-based identity" is a lot less daunting than it sounds!

 "Energy Access Explorer : The Digital Public Good to deliver Climate-compatible Energy Transitions for Everyone" ( 2025 )

Sunday at 13:30, 15 minutes, H.2214, H.2214, Energy: Accelerating the Transition through Open Source Akansha Saklani , slides , video

Energy services are highly interconnected with socio-economic development and human well-being. Yet, life without reliable energy is a reality for more than 675 million people globally, while more than 2 billion people use polluting fuels to cook their meals. Addressing the critical challenge of extending energy access to the unserved and underserved communities, more than six decades after full electrification in Europe and the United States, is imperative. Unfortunately, the current trajectory falls short. The latest SDG 7 Tracking Report highlights that the world is off course in achieving Sustainable Development Goal (SDG) 7 by 2030, with 85% of those without electricity residing in Sub-Saharan Africa. The repercussions of this shortfall are significant, with adverse impacts on societies and economies: inadequate healthcare persists, educational institutions struggle to provide quality education, and agricultural and industrial sectors face competitiveness challenges in regions where reliable energy remains elusive. Despite efforts such as grid extension, community-run mini-grids, and individual household solutions, progress has been insufficient because these solutions haven't been delivered at scale, in a coordinated way. We urgently require access to data, analytical tools and innovative strategies to deliver affordable, reliable, and clean energy to those who lack access to it. While existing solutions have primarily focused on supply-side and technology-centric approaches, there's a crucial need to prioritize demand-side perspectives to truly meet the needs of users, whether households or institutions.

What is the Energy Access Explorer (EAE)? To help address this challenge, WRI, in collaboration with partners has developed the Energy Access Explorer (https://www.energyaccessexplorer.org/; https://github.com/energyaccessexplorer) (EAE), a data-driven, integrated and inclusive approach to achieving universal access to energy for equitable, socio-economic development. EAE is the first, open-source, online and interactive geospatial platform that enables energy planners, clean energy entrepreneurs, donors, and development institutions to identify high-priority areas for energy access interventions. EAE functions also as a dynamic information system, reducing software engineering and data transaction costs for both data providers and users and facilitating data management and governance.

Use cases With over 25,000 users, 48% of which are women, EAE users can customize the analysis and identify areas of interest based on their perspective. More specifically, the use of this platform enables the following:

Energy Planning Agencies improve the ways integrated and inclusive planning is carried out using a data-informed approach. They will explore the potential for grid extension, off grid systems, clean cooking technologies and renewables for expanding energy access where needed the most. Clean Energy Enterprises, especially the ones with limited or no market intelligence / GIS capacity in house, identify new market opportunities. They will access demographic, socio-economic data, consumer ability to pay for energy services combined with information on energy resource availability and power infrastructure to locate priority areas for expanding their businesses. Service Delivery Institutions in the health, education, productive use of energy, agriculture sectors get a better understanding of energy needs associated to development services. Clean Cooking agencies identify areas where the uptake of clean cooking technologies should be prioritized based on location specific data on demand, supply and environment. Donors and Development Finance Institutions identify areas where their grants and investment will have the most impact.

Outcomes to Date To date, EAE has contributed to: the Powering Healthcare Roadmap of the Government of Zambia and the Powering Healthcare initiative of the Health Ministry in Uganda, the development of local, integrated and inclusive County Energy Plans in Kenya, to support the 0.5 billion USD Africa Mini Grid and the Energizing Agriculture Program in Nigeria, to inform the results based financing scheme for off grid electrification in Ethiopia and to establish cross sectoral EAE working groups enabling an integrated and inclusive approach to planning.

 "WildDuck: Rethinking Email Server Architecture for the Cloud Era" ( 2026 )

Saturday at 16:30, 30 minutes, K.4.201, K.4.201, Modern Email Andris Reinman , video

Traditional email servers were designed for a different era. They work great for small deployments but struggle at scale: Maildir breaks at 100k+ users, configuration changes require service reloads, and a single blacklisted IP blocks everyone on the server.

WildDuck takes a different approach. Built on MongoDB and Node.js, it treats email as a modern distributed systems problem. This talk explores the architectural decisions behind WildDuck and the lessons learned running it in production with 100,000+ accounts.

 "A decade of lessons from Apache Incubator release votes" ( 2026 )

Sunday at 10:55, 25 minutes, UB5.230, UB5.230, Community Justin Mclean , video

Ten years, 1,600 release votes, and a clear lesson: open collaboration works. Discover how Apache Incubator projects turned release reviews from rule-checking into mentoring, and what this decade of data reveals about building healthier open source communities. Description: What can we learn from a decade of release votes in open source communities? From 2015 to 2025, over 1,600 Apache Incubator release vote threads showed how project collaboration and growth have changed. In this talk, I’ll share practical lessons from analysing votes across more than 160 projects. You’ll see how better documentation, mentoring, and automation changed a stressful compliance process into a positive learning experience. You’ll learn about the changes: fewer rejections, quicker reviews, and a shift from a strict to a more collaborative tone. I’ll also discuss how release cadence reflects community health and what early warning signs to watch for before a project slows down. Whether you’re a maintainer, mentor, or contributor, you’ll come away with ideas to improve release workflows and help build stronger, more confident communities.

 "Vehicle Abstraction in Automotive Grade Linux with Eclipse Kuksa" ( 2024 )

Saturday at 15:30, 25 minutes, UD2.120 (Chavanne), UD2.120 (Chavanne), Embedded, Mobile and Automotive devroom Sven Erik Jeroschewski Scott Murray , slides , video

When building an automotive software stack, a central component is to establish an abstraction layer for uniformly exchanging data between deeply embedded vehicle-specific systems and high-level applications. This abstraction benefits many use cases by making it easier to build and port applications and transfer data away from the vehicle. The Automotive Grade Linux Project (AGL) adopted the data broker from Eclipse Kuksa.val as an abstraction layer to leverage the advantages of a standardized description of vehicle signals, like the COVESA Vehicle Signal Specification (VSS). In this talk, we explain the integration of Eclipse Kuksa in AGL by presenting several use cases. We further showcase ongoing developments in both projects, like a new Eclipse Kuksa Android SDK.

 "Elk: A Nimble Client for Mastodon" ( 2025 )

Saturday at 16:10, 10 minutes, UD2.208 (Decroly), UD2.208 (Decroly), Social Web Ayo Ayco , slides , video

Elk is a nimble Mastodon client with great attention to user experience, boasting features such as being an installable Progressive Web App (PWA), support for code blocks with syntax highlighting, chronological threading, and markdown formatting.

Started by core maintainers behind popular developer tooling in the Vue/Vite/Nuxt ecosystem in 2022, it attracted hundreds of contributors and resulted to the creation of new libraries now widely used in other projects.

In this talk, I will give a brief history of Elk development from the perspective of a contributor who has never written a Vue component (before Elk), a walkthrough of key strengths of the technology under the hood, and a look forward to the future of the project.

 "[Servers] Aerogramme, a multi-region IMAP server" ( 2024 )

Sunday at 11:35, 20 minutes, H.2213, H.2213, Modern Email devroom Quentin Dufour , slides , video

In order to achieve competing quality of service, it is often recommended to deploy not only in multiple availability zone, but also on multiple region. Especially if you can't trust the datacenter or if you are not hosted in a datacenter at all. However, due to the geographical distance, latency-sensitive protocols like Raft or Paxos can't be used. In this talk, I will present the design choices of Aerogramme to make it natively multi-region ready.

 "Hactorscript in ART: Bug-free Software on Unhackable Hardware" ( 2024 )

Saturday at 15:00, 50 minutes, K.1.105 (La Fontaine), K.1.105 (La Fontaine), Main Track - K Building Blaine Garst , slides , video

As the Internet-of-Things moves into space the need for absolute security becomes paramount. Using inexpensive encrypted secure boot RISC-V devices and software minimalism we build first in the home for fun and then commercialize for space and ground based applications. Bug-free modules of actor components compete for efficiency in a distributed matrix of algorithms.

We present an overview of the multi-core memory safe language called Hactorscript and its widely ported Actor RunTime (ART) above minimal POSIX. With no threads, stacks, locks, or loops, the cores directly compete for work on lockless (“MPMC”) queues. These queues can be fed at “interrupt” levels.

Finite-state-machines are nearly direct Actor specifications (as is TLA) and form the first set of composable bug-free modules.

 "Reverse engineering CAN communication and building ECUs using Elixir and the BEAM" ( 2025 )

Saturday at 15:30, 25 minutes, H.1302 (Depage), H.1302 (Depage), Embedded, Mobile and Automotive Thibault Poncelet , slides , video

When tinkering with cars or other vehicles, being confronted with CAN communication or a similar bus is unavoidable. Throughout the past year, Thibault has been using CAN communication to build an Open Vehicle Control System and using it on a real car. In this talk, Thibault will explain how to get started with CAN reverse engineering, how he made different car parts from different brands talk together, and why Elixir and the Erlang Virtual Machine (the BEAM) is a good candidate for them to quickly prototype ECUs with cheap parts.