Data Engineering & Infrastructure
Introduction
Data teams often juggle multiple workstreams and face changing requirements, causing challenges in managing dbt projects. However, solid project management practices can breed a culture of quality and collaboration between data engineers, analysts, and business stakeholders. GitHub Projects, GitHub’s native project management tool, can be leveraged to successfully manage dbt projects.
Github Project Board Structure
GitHub Projects provide us with a way to visualize our project workflow. Several visual layouts are available for a Project, including Board, Table, and Roadmap. We advocate for the “Board” view, in which the Project is a Kanban Board. In the Board view, as you progress, tasks or logical work items physically represent as cards on the board and move from left to right between different statuses.
Boards typically include at least three statuses: “To Do,” “In Progress,” and “Done.” You can adjust these to fit your specific Project or workflow better. For example, you could consider adding an “Under Review” section containing items reviewed items or a “Backlog” for low-priority, future state enhancements.
In GitHub Projects, tasks are known as “Issues.” You can place the higher-priority Issues closer to the top of each status. With this feature, you can tackle the highest priority tasks first. As you finish that task, the developer can initiate work on the next item in the sequence.
One massive benefit of the GitHub Project Boards layout is visibility. Displaying all work items within a spatial context provides insight into what you are working on and the progress on each workstream, all at a quick glance. This type of status tracking is real-time since all team members can update items concurrently.
Below is an example project board based on the “Jaffle Shop” project from the dbt Fundamentals on-demand training course.
Github Issue Anatomy
GitHub Issues capture discrete tasks or logical pieces of work. Continuing with the Jaffle Shop example, we’ve created an Issue for building out the stg_customers Model. The Description details what the task entails and why it is needed. It’s possible to include subtasks as a checklist to delineate requirements and format code snippets to help the assignee check their work. Being Markdown-compatible, the Description can contain code blocks, attachments, links, and even @mentions to tag team members. Comments can also be written in Markdown, allowing for collaboration across the board.
Notes and comments are captured as you complete the work, so documentation automatically gets baked into each task without additional manual effort.
Scoping with Milestones
You can use Milestones to capture and document time-sensitive workstreams at a higher level. Milestones contain a set of Issues you must complete by a particular date. Issues need not be fully fleshed out for you to attribute them to a Milestone. Placeholder Issues can document tasks that need to be completed but have not been fully scoped yet. We recommend naming them with the prefix “Placeholder” to indicate that you need to revisit them before work on it begins.
Since Issues can be pre-assigned to team members, Milestones allow you to estimate how many effort-specific workstreams there will be and how many or which team members may be required to complete them. Even when requirements change, you can track these changes. This functionality also dramatically benefits the developers, allowing them to visualize how Issues tie into the overall goals of the Project. For example, the screenshot below shows the open Milestones on the dbt core repository, where Milestones represent major version releases for dbt core.
Becoming Agile
While using a Kanban board is an excellent step towards becoming an agile team, the benefits of using GitHub Projects to manage your dbt projects are more profound. Issues are typically small and discrete pieces of work. They should be able to release quickly into production. You can create Development Branches directly from an Issue and automatically receive an auto-generated name sourced from the Issue’s name.
This means that the development Branch and code merge scope will be inherently honed and limited to a logical piece of work. Since you are enabled to turn around features quickly and incrementally, your team will adapt more easily to changing requirements, which means you’ll deliver value to your stakeholders more rapidly.
Want to learn more?
Although we have just scratched the surface of the possibilities with dbt project management using GitHub, we hope we have provided some insight into how this can strengthen the quality of your projects, better align your team and goals, and tie code directly to stakeholder values and goals. Here at Aimpoint Digital, we pride ourselves on enabling companies across all industries to cultivate a data culture rooted in analytics engineering best practices.
Interested in learning more? Our experts are here to help you take your ELT solutions to the next level. Reach out to us through the form below!