Are you curious about how to implement realtime analytics charts like the ones used in the Grafbase Dashboard?
In this blog post, we'll provide you with an insider's perspective on how we offer our users realtime insights into their API usage. We'll explore how you can build similar functionality in your SaaS application using Tinybird, a ClickHouse based service for building real time data APIs. By the end of this article, you'll have a clear understanding of how to create a personalized realtime analytics dashboard for your users.
The Grafbase dashboard showing realtime API performance metrics
At Grafbase, our primary focus has been on enhancing the developer experience by seamlessly integrating multiple APIs into a single GraphQL endpoint. However, once your endpoint is deployed to production, it becomes crucial to gather analytics on usage and API performance.
To address this need, we have implemented various charts on the project dashboard with more detailed analytics available in the "analytics" tab. These metrics include incoming requests, 99th percentile CPU usage, storage utilization, and database read/write units.
Before we started building the analytics dashboards we explored different services and identified several requirements:
- Data will come from different sources and forms: Primarily our serverless globally distributed backend but also from our infrastructure providers like AWS and Cloudflare. Some data will be raw logs and other data are events emitted by our backend.
- Time series data and flexible computations: We need to compute metrics like means and percentiles over sliding windows and perform aggregations at different time resolutions, such as 5 minutes, hourly, daily, and weekly.
- Realtime and low-latency: Our solution should offer realtime data with minimal latency. When customers make changes, the results should be immediately visible.
During our exploration, we looked at ClickHouse, an analytical (column-oriented) data warehouse known for its ability to deliver sub-second query responses on massive datasets while supporting standard SQL functionalities like windows and joins. One of the use cases ClickHouse promotes is externally facing dashboards making it a promising fit for our needs.
However, we didn't want the burden of hosting and managing ClickHouse ourselves. Luckily, we found Tinybird, a managed serverless ClickHouse offering that greatly improves the developer experience. One of the standout features of Tinybird is its ability to turn any SQL query into a REST endpoint using its templating language, simplifying the creation of a data serving backend with minimal code.
In the rest of this article we will show you how to use Tinybird to create an API endpoint that returns realtime user data and that you can use to server realtime metrics to users.
At Grafbase, we have two types of data flowing into Tinybird:
-
Events from the backend and clients: We use RudderStack, an open-source Customer Data Platform (CDP) similar to Segment, to process events and send them to storage. Tinybird natively supports Rudderstack, so integrating it is just a matter of selecting Tinybird as a destination in RudderStack.
-
Logs from infrastructure providers: To import logs from providers like Cloudflare, we set up Amazon AWS Lambdas triggered by S3 notifications when logs are written to S3. These Lambdas then send them to Tinybird via the Tinybird Event API.
For your own use case you have to look what ingestion method suits your infrastructure best. Tinybird supports a large range of options. Whatever method you choose you should end up with a Tinybird data source with your raw data that is updating either on new events or on some fixed time interval.
Tinybird Data flow view: a data source with raw logs data
Once the data is flowing into Tinybird, each dataset becomes visible as a data source. While querying these data sources directly might be sufficient for rapid development, we aimed to prepare for scalability from the start. Therefore, we decided to materialize each data source into aggregated datasets at the daily and hourly level.
Materializations in Tinybird are powerful. You create a materialized dataset (materialized pipe in Tinybird terms) by executing an SQL query against a dataset. Whenever a new row is added to the original dataset, the query reruns on the new row, updating the materialized dataset. This process is incredibly fast, enabling realtime transformations using a single SQL query that works on both historical and future data.
Tinybird Data flow view: the logs are materialized into an hourly and daily aggregation. When a new row is added to the log dataset, both aggregations are updated
Below is the log_hourly_materialization
query (pipe) in the above data flow that aggregates log data to an hourly intermediate aggregation. In this case we have a SaaS software that is metered by “requests” and where average CPU time as well as the 99th percentile and max of CPU time are meaningful metrics that users want to track. Users are determined by an account_id
and every account can have multiple projects designated by project_id
.
SELECT
toStartOfHour(timestamp) bucket,
account_id account_id,
project_id project_id,
countState() request_count,
avgState(cpu) avg_cpu,
maxSimpleState(cpu) max_cpu,
quantilesState(0.99)(cpu) quantiles_cpu
FROM
log
GROUP BY
bucket,
account_id,
project_id
If you are new to ClickHouse the first thing you will notice is that instead of count()
or sum()
functions we normally use in SQL we use functions named -State
or -SimpleState
like avgState()
, maxSimpleState()
and quantilesState().
These are ClickHouse types that are intermediate stages of aggregation enabling us to pre-aggregate data but decide the final grouping level in a later query.
To see where this is useful, imagine that we did not do the intermediate aggregation and instead used avg()
, max()
and quantiles()
to persist aggregation grouped by project_ulid
, account_id
and bucket
in a new materialized table. We then want to query this table but this time we are interested in the aggregation by bucket
(hour) and account_id
only. A pretty common situation here would be that we want to show account wide statistics for a user including data from all of their projects.
Equivalent SQL would be:
WITH aggregated_logs AS (
SELECT
toStartOfHour(Timestamp) bucket,
account_id account_id,
project_id project_id,
count() request_count,
max(cpu) max_cpu,
avg(cpu) avg_cpu,
quantiles(0.99)(cpu) quantiles_cpu
FROM
log
GROUP BY
bucket,
account_id,
project_id
)
SELECT
bucket,
account_id,
max(max_cpu) AS max_cpu,
avg(avg_cpu) AS avg_cpu,
quantiles(0.99)(quantiles_cpu) AS p99_cpu
FROM
aggregated_logs
GROUP BY
bucket,
account_id
For aggregating max()
there are no issues. The max for each combination of bucket
and account_id
will be the same as the max()
of bucket
, account_id
, and project_id
.
For avg()
it's problematic since the average of the average of all subgroups is not equal to the average of all requests. We could however work around this by instead aggregating the sum of all latency measurements as well as the count of all requests like this:
WITH aggregated_logs AS (
SELECT
toStartOfHour(Timestamp) bucket,
account_id account_id,
project_id project_id,
count() request_count,
sum(cpu) sum_cpu,
max(cpu) max_cpu,
avg(cpu) avg_cpu,
quantiles(0.99)(cpu) quantiles_cpu
FROM
log
GROUP BY
bucket,
account_id,
project_id
)
SELECT
bucket,
account_id,
max(max_cpu) AS max_cpu,
sum_cpu / request_count AS avg_cpu,
quantiles(0.99)(quantiles_cpu) AS quintiles
FROM
aggregated_logs
GROUP BY
dt,
account_id
This leaves us with another problem however, there is no way to calculate the correct p99 CPU metrics. Using quantiles(0.99)(quantiles_cpu)
would not work as we would calculate a p99 quantile of the p99 quantiles for each project. Same issue as for avg()
earlier but this time there is no workaround. It’s just impossible to calculate exact quantiles without having the underlying data. We need to have all the underlying observations when doing quantile based aggregations.
This is where the ClickHouse intermediate aggregation functionality comes in handy. It lets us aggregate to any combination of subgroups in the future while letting ClickHouse handle everything behind the scene.
When should we use -State()
and when should we prefer -SimpleState()
?
Most functions allow you to use any of the two suffixes but -SimpleState()
should be the preferred one as it’s faster. In many cases however, -SimpleState()
doesn’t work and you have to use the -State()
version. Tinybird will tell you when that is the case so when in doubt use -SimpleState()
until you get an error.
Now that we have the data materialized in different intermediate time aggregations the last step to an API endpoint is to create the query (pipe) that we will convert into an endpoint. The endpoint is then to be queried from our frontend or backend and return the data we want to show in the chart.
Our API should take the following input:
- The id of a project
- A start datetime and an end datetime, the time span we want data for
- A time aggregation level (hourly, daily, weekly, monthly). If not provided we default to daily
It should return a time series with datetime and metrics in JSON format to chart on the admin dashboard. Below is the full code for the endpoint:
%
{% if not defined(project_id) %}
{{ error('project_id (String) query param is required') }}
{% end %}
{% if not defined(start_date) %}
{{ error('start_date (DateTime) query param is required') }}
{% end %}
{% if not defined(end_date) %}
{{ error('end_date (DateTime) query param is required') }}
{% end %}
SELECT
{% if period == 'daily' or period == 'hourly' or not defined(period) %}
bucket
{% elif period == 'weekly' %}
toStartOfWeek(bucket)
{% elif period == 'monthly' %}
toStartOfMonth(bucket)
{% end %} bucket,
project_id,
countMerge(request_count) count_requests,
avgMerge(avg_cpu) cpu_avg,
max(max_cpu) cpu_max,
quantilesMerge(0.99)(quantiles_cpu) cpu_p99
{% if period == 'hourly' %}
FROM hourly_log_materialization
{% else %}
FROM daily_log_materialization
{% end %}
WHERE bucket >= {{DateTime(start_date)}}
AND bucket < {{DateTime(end_date)}}
AND project_id = {{String(project_id)}}
GROUP BY project_id, bucket
Now lets walk through each part of the code and see what it does.
%
Every Tinybird pipe that uses the templating language must start with a %
. It tells Tinybird to interpret everything between {%
and %}
as blocks that contain instructions to the templating engine.
{% if not defined(project_id) %}
{{ error('project_id (String) query param is required') }}
{% end %}
{% if not defined(start_date) %}
{{ error('start_date (DateTime) query param is required') }}
{% end %}
{% if not defined(end_date) %}
{{ error('end_date (DateTime) query param is required') }}
{% end %}
We demand that project_id
, start_date
and end_date
have to be provided in the API request. If not we throw an error and explain what is missing.
SELECT
{% if period == 'daily' or period == 'hourly' or not defined(period) %}
bucket
{% elif period == 'weekly' %}
toStartOfWeek(bucket)
{% elif period == 'monthly' %}
toStartOfMonth(bucket)
{% end %} bucket,
This handles the special case where we want to show your data aggregated in weekly or monthly bars.
project_id,
countMerge(request_count) count_requests,
avgMerge(avg_cpu) cpu_avg,
max(max_cpu) cpu_max,
quantilesMerge(0.99)(quantiles_cpu) cpu_p99
This is where we turn the intermediate aggregations into actual results. Remember how we created them earlier and that we had two types? -State()
or -SimpleState()
. Lets look back at the code that created the intermediate aggregations. The aggregations were created like this:
countState() request_count,
avgState(cpu) avg_cpu,
maxSimpleState(cpu) max_cpu,
quantilesState(0.99)(cpu) quantiles_cpu
Do you see the pattern? -State()
intermediate aggregations are calculated and persisted by using -Merge()
while for -SimpleState()
we use the function name without any -Merge
suffix. This can be a bit confusing if you are not familiar with the difference between them.
So for example above we created the intermediate aggregation with countState()
and create the final aggregation with countMerge()
, while for maxSimpleState()
we create the final aggregation with max()
.
{% if period == 'hourly' %}
FROM hourly_log_materialization
{% else %}
FROM daily_log_materialization
{% end %}
Here we choose between the two materialized views we created earlier depending on if we want to show hourly or daily resolution in the chart.
WHERE bucket >= {{DateTime(start_date)}}
AND bucket < {{DateTime(end_date)}}
AND project_id = {{String(project_id)}}
Finally we filter data to return the time period we want to show in the chart and only the project_id
we are interested in.
Turning the above query into an endpoint is as easy as creating it as a Tinybird pipe and then clicking the “Create API Endpoint” button in the top right corner.
Create API endpoint button
This will create the endpoint and give us the URL where we can access it. You can then use the Sample usage snippet in the bottom to call the endpoint and see that it returns your data.
Summary
In conclusion, we have:
- Set up ingestion of data into Tinybird.
- Leveraged materializations to efficiently process our data into hourly and daily intermediate aggregations.
- Used the templating system to create a REST API endpoint that can be used to feed data into a SaaS app dashboard. The endpoint is called with a time period, a project_id and a time resolution and will return the corresponding time series metrics as JSON.
P.S. If you are building on Grafbase you can add a Tinybird endpoint to your schema and get the data directly in your GraphQL schema. Even if you are not using Grafbase today this can be interesting if you prefer consuming the data as a GraphQL API instead of Tinybirds standard REST API.