Omni is built with performance in mind at every step of the way, including our intelligent cache. At its core, though, Omni is writing SQL down to your database and returning the results. If you’re seeing slowness, the underlying SQL of your Omni query may need to be tuned for performance.
Here are some steps to take to diagnose slow query performance:
- Optimizing your SQL
- Pinpointing the bottleneck
- Fixing the bottleneck:
- Large volumes of data
- Missing join conditions
- Heavy transformations
- Advanced: Storage optimization in your database
- Considering the resources of your database
Below, I’ll share more details on each of these steps.
Quick note: If you’d like to double-check the speed of your query when running it in your database vs. in Omni, you can follow this guide to grab the raw SQL that Omni is running in your database. If your query is running quickly in your database but slowly in Omni, please reach out to the Omni team.
Optimizing your SQL
Pinpointing the bottleneck
Often, we find that queries are held up by one or two elements: a field with a lot of data, a join that fans out the data, a complex transformation, etc.
As a starting point, we recommend removing fields & joins from your query one-by-one, re-running the query, and identifying which element is causing the slowdown. Then, once you have a better sense of what’s slowing down your query, proceed to the next section to learn how to address it.
Fixing the bottleneck
Limiting large volumes of data
Even if your database is set up to run large/complex SQL queries, the volume of data that your query is scanning may still slow it down. Some types of tables are particularly prone to huge volumes of data, such as event data and product usage data.
There are a few paths you can take to limit the amount of data your SQL query needs to scan.
- Implement filters on your query: If you don’t already have filters on your query, consider adding them to limit how much data Omni has to scan (and therefore decrease how long the query takes to run). Check out our documentation on filters for more. For example, you could limit your query to only “Active” users or orders in the past year instead of all-time. If you do already have date & time filters on your query, it might be worth making them more restrictive to further limit the volume of data being scanned.
- Using templated filters: If you’re querying from a saved SQL view, you can also use Templated Filters to dynamically limit how much data that query has to scan upon being run. This example in our docs might be a helpful starting point.
- Pre-aggregate your data: When looking at large volumes of data, you’re likely looking to aggregate your data in some way. It can be helpful to create “rollup” or “aggregate” tables that pre-aggregate your data and then query from these tables instead of the raw data.
- For example, if you’re trying to understand customer usage by looking at
events
data, it can be faster to query a pre-aggregateddaily_customer_usage
table that rolls up yourevents
data to a per-day, per-customer basis. - Typically, this pre-aggregation is done in a data transformation tool like dbt, and it can also be done directly in your data warehouse (such as Snowflake’s Materialized Views or BigQuery’s Materialized Views). That way, these rollup tables will be persisted in your database, and your Omni query can pull directly from those tables, which unlocks faster performance.
- For example, if you’re trying to understand customer usage by looking at
Resolve missing join conditions
Amidst a busy data modeling session, you might forget to add an additional join condition and thus inadvertently increasing the number of rows in your query. For example, if you’re joining two “daily user” tables (one row per day per user), you need to join on both day
and user_id
. Leaving out either of these conditions may drastically increase the amount of data you’re scanning, slow down your query, and potentially lead to incorrect results.
You can check your joins in your model’s Relationships file (docs). Or, if you’re looking at two tables in particular, you can do so directly through the UI:
Write complex transformations down to your database
Some saved SQL queries are slow to run because they’re conducting significant transformations on large volumes of your data. Whenever you run a saved SQL view, Omni is also running that transformation, which can cause the query to be slow.
Using these saved views can be great for ad-hoc analyses. But if you know you’ll be running this query often, it might be worth writing this transformation down to your database. In other words, consider storing this data in its transformed state in your database, rather than doing that transformation in Omni.
You can do that using a transformation tool like dbt (read more about Omni’s integration with dbt here) or directly in your database.
Advanced: Optimize storage in your database
If you’ve considered the steps above and still need to improve your query performance, one option is to utilize advanced configurations in your database. This will vary by database, but generally, these configurations implement partitions, clustering, and index techniques that speed up query times. To get you started, here are how some popular databases recommend optimizing your data storage for performance:
- Snowflake: Optimizing storage for performance | Snowflake Documentation
- BigQuery: Pengantar tentang mengoptimalkan performa kueri | BigQuery | Google Cloud
- Redshift: Tuning Query Performance | Amazon Redshift
Considering the resources of your database
Running high-volume, high-complexity queries can strain the performance of any database, but some more than others:
-
If you’re using a transactional database like Postgres or MySQL: These databases are optimized for fast application performance but often struggle when running large analytical queries. If you anticipate your team needing to query large volumes of data regularly, it may be worthwhile to discuss using an analytical database like Snowflake, BigQuery, or Redshift instead.
-
If you’re using an analytical database like Snowflake, BigQuery, or Redshift: Although these databases are specially equipped to run large queries, they still require ample compute power to run your queries quickly. If the number of queries your team is running is growing, or the queries themselves are increasing in complexity, you may need to increase the size of your warehouse or compute quota allotted by your database in order to achieve the performance you want.
Takeaways
Evidently, there’s a lot to consider when trying to optimize your query . One final note is that sometimes, less is more. A slow, complex query might lead to a nuanced, thoughtful insight, but a simpler, faster query might be able to drive similar insights and actions for your team.
The Omni team is here as you consider these optimization techniques and tradeoffs. Reach out if you’re getting stuck on a stubborn slow query!