Together with
dbt Cloud – A hosted service with added features including an IDE, job scheduling, and more
In this post, we demonstrate some features in dbt that help you manage data transformations in Amazon Redshift. We also provide the
Manage common logic
dbt enables you to write SQL in a modular fashion. This improves maintainability and productivity because common logic can be consolidated (maintain a single instance of logic) and referenced (build on existing logic instead of starting from scratch).
The following figure is an example showing how dbt consolidates common logic. In this example, two
Figure 1: Manage common subquery in dbt
The concept of referencing isn’t limited to logic related to subqueries. You can also use referencing for logic related to fields.
The following is an example showing how dbt consolidates common logic related to fields. In this example, a model applies the same case statement on two fields. Instead of replicating the case statement for each field, dbt allows you to create a
Figure 2: Manage common case statement in dbt
How is a model in dbt subsequently created in Amazon Redshift? dbt provides you with the command
Manage common data mappings
Although you can use macros to manage data mappings (for example, mapping “1” to “One” and “2” to “Two”), an alternative is to maintain data mappings in files and manage the files in dbt.
The following is an example of how dbt manages common data mappings. In this example, a model applies one-to-one data mappings on a field. Instead of creating a macro for the one-to-one data mappings, dbt allows you to create a
Figure 3: Manage common data mapping in dbt
You can create or update a seed with a two-step process. After you create or update a CSV seed file, run the command
Manage data lineage documentation
After you have created models and seeds in dbt, and used dbt’s referencing capability, dbt provides you with a method to generate documentation on your data transformations.
You can run the command
Figure 4: Documentation generated by dbt
You can also visualize dependencies for improved navigation of documentations during impact analysis. In the following example graph, we can see that model rpt_tech_all_users
is built referencing the model base_public_users
, which in turn references the table users
in the public
schema.
Figure 5: Data lineage visualization generated by dbt
Conclusion
This post covered how you can use dbt to manage data transformations in Amazon Redshift. As you explore dbt, you will come across other features like
For a hands-on experience with dbt CLI and Amazon Redshift, we have a workshop with step-by-step instructions to help you create your first dbt project and explore the features mentioned in this post—models, macros, seeds, and hooks. Visit
If you have any questions or suggestions, leave your feedback in the comments section. If you need any further assistance to optimize your Amazon Redshift implementation, contact your AWS account team or a trusted AWS partner.
About the authors
Randy Chng is an Analytics Acceleration Lab Solutions Architect at Amazon Web Services. He works with customers to accelerate their Amazon Redshift journey by delivering proof of concepts on key business problems.
Sean Beath is an Analytics Acceleration Lab Solutions Architect at Amazon Web Services. He delivers proof of concepts with customers on Amazon Redshift, helping customers drive analytics value on AWS.