Incremental Computing

It's Time to Bring Unified Stream-Batch Processing Engines to Mass Adoption

Note: This article was translated from Chinese. Some technical terms and concepts may differ from the original English terminology. ℹ️ The original article posted in zhihu @ 2023-02-20 This is an article that combines a decade of personal learning and growth to understand the development and iteration of unified stream-batch processing engines. The author, starting as an oblivious undergraduate student, observed the development of big data systems, gradually participated in it, and eventually became a committer in the Apache Flink community, following a spiral upward cognitive journey: starting with MapReduce batch processing, then developing machine learning libraries with Spark’s convenient and powerful batch processing capabilities; promoting Spark’s micro-batch-based real-time computing capabilities at Microsoft, then participating in Flink’s real-time computing development and promotion at Alibaba, moving from offline batch processing to real-time online processing, and after leaving Alibaba, promoting unified stream-batch processing engines within the company again. As the elders say: personal struggle is certainly important, but it’s also necessary to align with the course of history.

September 14, 2025

It's Time to Conclude the Discussion on Stream-Batch Unification in the Data Warehouse Field - Incremental Data Warehouse Series Part II

ℹ️ This article was originally published on zhihu @ 2024-03-27 📝 Note: This article was translated from Chinese. Some technical terms and concepts may differ from the original English terminology. Continuing from the Previous Article (Picking up where we left off - apologies for the delay between articles due to work commitments) Cost Issues of Near Real-Time Offline Data Warehouses - Incremental Data Warehouse Series Part I

September 14, 2025