I have tables in my DWH that have different levels of aggregation. How can I use Aggregate Awareness to query dynamically?

Let’s say you have a large table of orders for your ecommerce store.

Sometimes you need to access the granular level of detail of these orders, eg. to see details about orders that are currently out for Delivery

Other times, you just want a high level overview of your sales behaviour, eg. Revenue per Day. In order for these queries to run faster and cheaper, you want to direct them to an aggregated table of daily sales in your DWH. This should happen automatically behind the scenes - your end users should be able to interact with Omni through a single pane of glass while Omni decides which table to get the data from.

Omni’s “Bring Your Own Aggregate Awareness” does just this. Let’s break down that name:

  • Aggregate Awareness = knowing what level of aggregation a query is at, and what tables are stored at different levels of aggregation in the DWH, and dynamically querying based on that
  • Bring Your Own = you can create these tables yourself using your transformation tool of choice, rather than relying on your BI tool to have to create these tables for you

Omni will have already scanned your DWH and created a view file for the granular table and the aggregate table.

Firstly, you can create your granular measures as normal - either using the UI or in code. The code might look something like this:

dimensions:
  order_items.created_at: {}
  sale_price: {}
  user_id: {}
measure:
  count:
    aggregate_type: count
  sale_price_sum:
    aggregate_type: sum
    sql: ${sale_price}
  user_id_count_distinct:
    aggregate_type: count_distinct
    sql: ${user_id}

Now navigate to the aggregate table’s view file. Omni will have auto-generated dimensions for the aggregated fields

dimensions:
  date: {}
  total_sale_price: {}
  order_items_count: {}
  user_distinct_count: {}

At the bottom of the file, you define the materialized_query. This tells Omni which fields from the granular table are represented in the aggregated table. This association happens by ordering, here we are telling Omni that the 3rd value in the aggregated table (order_items_count) is an aggregate of the 3rd value in fields (order_items.count).

materialized_query:
  fields:
    [
      order_items.created_at,
      order_items.sale_price_sum,
      order_items.count,
      order_items.user_id_count_distinct
    ]
  base_view: order_items

And that’s it! Now Omni knows that for any queries on these fields, it should try to get the answer from the aggregate table before it goes to the granular table. For your end users, the experience is the exact same. They don’t have to worry about the complexities of the underlying data structure. They’ll just notice that some of their queries are running faster and cheaper than before!

If you look under the hood of some of these queries in the SQL inspector, you will see Omni telling you what’s happening

-- Query rewritten to use materialized view "daily_sales".
-- The original unoptimized SQL is commented-out below
-- SELECT DATE_TRUNC('DAY', "created_at") AS "order_items.created_at[date]__raw",
--     COALESCE(SUM("sale_price"), 0) AS "order_items.sale_price_sum",
--     TO_CHAR(DATE_TRUNC('DAY', "created_at"), 'YYYY-MM-DD') AS "order_items.created_at[date]"
-- FROM "order_items"
-- GROUP BY 1
-- ORDER BY 1 NULLS FIRST
-- LIMIT 999

SELECT DATE_TRUNC('DAY', "date") AS "order_items.created_at[date]__raw",
    COALESCE(SUM(CAST("total_sale_price" AS DOUBLE PRECISION)), 0) AS "order_items.sale_price_sum",
    TO_CHAR(DATE_TRUNC('DAY', "date"), 'YYYY-MM-DD') AS "order_items.created_at[date]"
FROM (  SELECT 
      date,
      total_sale_price,
      order_items_count,
      users_distinct_count
  FROM "daily_orders"
    ) AS "t"
GROUP BY 1
ORDER BY 1 NULLS FIRST
LIMIT 999

You can see in this case that Omni has gotten this results set from the aggregated daily_orders table instead of the granular order_items table.

If you’re expecting to see this SQL being optimized, but you aren’t, there’s a few things that might be happening:

  • You might be querying a field that isn’t in your aggregate table
  • You might be running a query that’s not compatible with aggregate awareness (eg. count distinct over a level of aggregation that isn’t the same as the aggregate table)
  • Your query might be hitting cache instead of the DWH, in which case we don’t need to rewrite the dialect SQL because it isn’t being run anyway

If you have any questions on the above then please don’t hesitate to reach out!