On this discussion, I aim to explore the evolving trends in data orchestration and data modelling, highlighting the advancements in tools and their core advantages for data engineers. While Airflow has been the dominant player since 2014, the info engineering landscape has significantly transformed, now addressing more sophisticated use cases and requirements, including support for multiple programming languages, integrations, and enhanced scalability. I’ll examine contemporary and maybe unconventional tools that streamline my data engineering processes, enabling me to effortlessly create, manage, and orchestrate robust, durable, and scalable data pipelines.
Through the last decade we witnessed a “Cambrian explosion” of varied ETL frameworks for data extraction, transformation and orchestration. It’s not a surprise that lots of them are open-source and are Python-based.
The preferred ones:
- Airflow, 2014
- Luigi, 2014
- Prefect,2018
- Temporal, 2019
- Flyte, 2020
- Dagster, 2020
- Mage, 2021
- Orchestra, 2023