Feldera: Revolutionizing Incremental Computation With SQL

Feldera stands out as a fast query engine designed for incremental computation. What sets it apart is its ability to evaluate arbitrary SQL programs incrementally, making it more powerful, expressive, and performant than existing alternatives like batch engines, warehouses, stream processors, or streaming databases.
A Feldera pipeline consists of a set of SQL tables and views, which can be deeply nested. Users have the flexibility to start, stop, or pause pipelines to manage and advance computations. These pipelines continuously process changes, which include any number of inserts, updates, or deletes to a set of tables. When changes are received, Feldera incrementally updates all the views by only looking at the changes, completely avoiding recomputation over older data. This approach makes Feldera incredibly fast, capable of handling millions of events per second on a laptop. It also enables unified offline and online compute over both live and historical data.

🎯 Defining Features

Full SQL Support and More: Feldera is the only engine that can evaluate full SQL syntax and semantics completely incrementally. This includes joins and aggregates, group by, correlated subqueries, window functions, complex data types, time series operators, UDFs, and recursive queries. Pipelines can process deeply nested hierarchies of views.
Fast Out-of-the-Box Performance: Users have reported getting complex use cases implemented in 30 minutes or less, achieving millions of events per second in performance on a laptop without any tuning.
Datasets Larger than RAM: Feldera is designed to handle datasets that exceed the available RAM by efficiently spilling to disk, leveraging recent advances in NVMe storage.
Strong Guarantees on Consistency and Freshness: Feldera is strongly consistent and guarantees that the state of the views always corresponds to what you'd get if you ran the queries in a batch system for the same input.
Connectors for Your Favorite Data Sources and Destinations: Feldera connects to various batch and streaming data sources like Kafka, HTTP, CDC streams, S3, Data Lakes, Warehouses, and more. If a needed connector is not yet supported, users can request it.

💻 Architecture

The architecture of Feldera is designed to support its high performance and flexibility. A quick start with Docker is available for demos, development, and testing. Users can bring up a Feldera Platform deployment by downloading a Docker Compose file and running a simple command. The Feldera web console becomes available shortly after, accessible via `https://localhost:8080`.

⚙️ Running Feldera from Sources

To run Feldera from sources, users need to install several dependencies, including the Rust toolchain (at least 1.75), Java (at least JDK 19), Maven, and Typescript. After setting up these dependencies, users can build the SQL compiler and run the pipeline-manager. The Feldera WebConsole can then be accessed at `https://localhost:8080`.

📖 Documentation

For more detailed information, users are encouraged to go through the Feldera documentation.

🤖 Benchmarks

Feldera is generally faster and uses less memory than systems like stream processors. Benchmarks are performed by CI on every commit that goes into the main branch. For detailed results, users can visit benchmarks.feldera.io.

🎓 Theory

Feldera Platform is built on a solid mathematical foundation. The formal model underpinning the system, called DBSP, is described in a paper presented at the Conference on Very Large Databases in August 2023. The model provides both semantics and an algorithm for generating incremental dataflow programs that are efficient and correct.

Description: Diagram illustrating the DBSP model.

👍 Contributing

The software in this repository is governed by an open-source license, and contributions are welcome. Guidelines for contributing are available in the repository.

Remember these 3 key ideas for your startup:

Leverage Full SQL Support: Feldera's ability to evaluate full SQL syntax incrementally can significantly enhance your data processing capabilities, making complex queries more efficient and faster.
Optimize Performance: With Feldera, you can achieve high performance out-of-the-box, handling millions of events per second without extensive tuning. This can be a game-changer for startups needing rapid data processing.
Handle Large Datasets Efficiently: Feldera's design to handle datasets larger than RAM by spilling to disk ensures that you can manage extensive data without performance degradation.

Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion.
For more details, see the original source.

Feldera: Revolutionizing Incremental Computation with SQL Power