Managing data models at scale is a standard challenge for data teams using dbt (data construct tool). Initially, teams often start with easy models which are easy to administer and deploy. Nonetheless, because the volume of information grows and business needs evolve, the complexity of those models increases.
This progression often results in a monolithic repository where all dependencies are intertwined, making it difficult for various teams to collaborate efficiently. To handle this, data teams may find it helpful to distribute their data models across multiple dbt projects. This approach not only promotes higher organisation and modularity but additionally enhances the scalability and maintainability of the complete data infrastructure.
One significant complexity introduced by handling multiple dbt projects is the best way they’re executed and deployed. Managing library dependencies becomes a critical concern, especially when different projects require different versions of dbt. While dbt Cloud offers a sturdy solution for scheduling and executing multi-repo dbt projects, it comes with significant investments that not every organisation can afford or find…