Logging and OpenTelemetry in the Grafbase Gateway

The Grafbase Gateway provides logs, traces, and metrics for monitoring gateway operations and errors. By default, it outputs logs to standard output. Additionally, the gateway can send monitoring data to an endpoint that implements the OpenTelemetry protocols.

You can define the level of information by setting the log level command line argument:

--log <LOG_LEVEL> Set the logging level, this applies to all spans, logs and trace events. Beware that *only* 'off', 'error', 'warn' and 'info' can be used safely in production. More verbose levels, such as 'debug', will include sensitive information like request variables, responses, etc. Possible values are: 'off', 'error', 'warn', 'info', 'debug', 'trace' or a custom string. In the last case, the string is passed on to [`tracing_subscriber::EnvFilter`] as is and is only meant for debugging purposes. No stability guarantee is made on the format. [env: GRAFBASE_LOG=] [default: info]

This setting affects both traces and logs. The default level is info. debug and trace will include sensitive details and should not be used in production.

If you want to silence all logs but still export them along with traces and metrics to an OpenTelemetry endpoint, direct standard output and standard error to /dev/null.

By default, the system outputs logs to standard output. Logs can appear in two different formats:

--log-style <LOG_STYLE> Set the style of log output [env: GRAFBASE_LOG_STYLE=] [default: pretty] Possible values: - pretty: Pretty printed logs, used as the default in the terminal - text: Standard text, used as the default when piping stdout to a file - json: JSON objects

The default style is pretty, inside a terminal, which provides ANSI-colored text for terminal output and a human-friendly formatting. When piping to a file, text will be used instead.The json format delivers logs in JSON format, which can be useful if the logging platform supports structured data.

Logs can also be sent to an OpenTelemetry endpoint by enabling the OpenTelemetry exporter in the configuration:

[telemetry.exporters.otlp] enabled = true endpoint = "http://localhost:1234"

You can send logs to a different endpoint than the global OpenTelemetry settings:

[telemetry.logs.exporters.otlp] enabled = true endpoint = "http://localhost:1235"

Read more about OpenTelemetry options in the configuration section.

Traces provide information on the request lifecycle and send data to the OpenTelemetry endpoint from the info level. You can define some important settings for traces:

[telemetry.tracing] sampling = 1 parent_based_sampler = false

The sampling setting defines the percentage (a floating point from 0 to 1) of requests traced. For testing purposes or with low traffic, set this to 1. If you expect high traffic, sampling every request can become expensive in terms of network, CPU, and storage. The default value is 0.15, which samples 15% of all requests.

The parent_based_sampler option enables the parent based sampler mechanism. When this is enabled, the gateway will look at the request headers to make trace sampling decisions, and fall back to its default sampling strategy when the request does not specify a sampling strategy. It is disabled by default, and should only be enabled if you control all the clients, as malicious actors could create more load by manipulating sampling.

The collect options define limits per span:

[telemetry.tracing.collect] max_events_per_span = 128 max_attributes_per_span = 128 max_links_per_span = 128 max_attributes_per_event = 128 max_attributes_per_link = 128
  • max_events_per_span: Maximum number of events recorded per span (default: 128)
  • max_attributes_per_span: Maximum number of attributes recorded per span (default: 128)
  • max_links_per_span: Maximum number of links recorded per span (default: 128)
  • max_attributes_per_event: Maximum number of attributes one event can have (default: 128)
  • max_attributes_per_link: Maximum number of attributes one link can have (default: 128)

The propagation options define how tracing context (trace id, parent span id and extra context) are propagated, both when the router receives it in requests, and when it passes it down to subgraphs. Multiple common standards are supported. If you need support for additional formats, please contact us.

[telemetry.tracing.propagation] trace_context = true baggage = true aws_xray = false
  • trace_context: Enable TraceContext propagation through the traceparent header. This is the standard trace parent propagation mechanism in OpenTelemetry. Default: false.
  • baggage: Enable Baggage context propagation through the baggage header. This is the standard context propagation mechanism in OpenTelemetry. Default: false.
  • aws_xray: Enable AWS X-Ray propagation through the x-amzn-trace-id header. This is the builtin trace propagation mechanism in AWS X-Ray.

Traces can send to an OpenTelemetry endpoint by enabling the OpenTelemetry exporter in the configuration:

[telemetry.exporters.otlp] enabled = true endpoint = "http://localhost:1234"

You can also send traces to a different endpoint:

[telemetry.tracing.exporters.otlp] enabled = true endpoint = "http://localhost:1235"

Read more about OpenTelemetry options in the configuration section.

Enable spans to write directly to standard output by turning on the global stdout exporter, which helps during evaluation and debugging:

[telemetry.exporters.stdout] enabled = true

You can enable it only for tracing as well:

[telemetry.tracing.exporters.stdout] enabled = true

The Grafbase Gateway delivers metrics for requests and operations to an OpenTelemetry endpoint. Metrics include counters, histograms, and gauges at various points in the system.

Enable the OpenTelemetry exporter in the configuration to send metrics to an OpenTelemetry endpoint:

[telemetry.exporters.otlp] enabled = true endpoint = "http://localhost:1234"

You can send metrics to a separate endpoint as well:

[telemetry.metrics.exporters.otlp] enabled = true endpoint = "http://localhost:1235"

Read more about OpenTelemetry options in the configuration section.

You can also write spans directly to standard output by enabling the global stdout exporter for evaluation and debugging:

[telemetry.exporters.stdout] enabled = true

Enable it only for metrics if needed:

[telemetry.metrics.exporters.stdout] enabled = true

The exponential histograms include a Count field, which doubles any histogram as a counter metric. If you can't find a specific counter, check if any of the histograms can serve that purpose.

Metric Name: http.server.request.duration

This exponential histogram measures the time in milliseconds for each HTTP request and helps you track the final response time for those requests. It includes the following attributes:

  • http.response.status_code: The HTTP status code.
  • http.request.method: The HTTP request method.
  • http.route: The request path.
  • network.protocol.version: The HTTP version of the request.
  • server.address: The server's listen address.
  • server.port: The server's listen port.
  • url.scheme: Either http or https, depending on whether TLS is enabled in the gateway.
  • http.headers.x-grafbase-client-name: The name of the client that triggered this request, if available.
  • http.headers.x-grafbase-client-version: The version of the client that triggered this request, if available.
  • graphql.response.status: Indicates whether the underlying GraphQL operation succeeded, if available.

Metric Name: http.server.connected.clients

This up/down counter tracks currently connected clients, incrementing on an incoming request and decrementing upon any response.

Metric Name: http.server.request.body.size

This exponential histogram measures request body sizes.

Metric Name: http.server.response.body.size

This exponential histogram measures response body sizes.

Metric Name: graphql.operation.duration

This exponential histogram measures the time in milliseconds for every valid operation in the GraphQL engine. The metric includes the following attributes:

  • graphql.document: The normalized query of this operation, stripped of all variables. This value cannot contain any private data.
  • graphql.operation.type: The type of the operation (either query, mutation, or subscription).
  • graphql.operation.name: The name of the operation, if provided.
  • graphql.response.status: Indicates if the response succeeded.
  • http.headers.x-grafbase-client-name: The name of the client that triggered this request, if available.
  • http.headers.x-grafbase-client-version: The version of the client that triggered this request, if available.

Metric Name: graphql.operation.errors

This counter tracks distinct GraphQL errors per request. The metric contains the following attributes:

  • graphql.response.error.code: The error code returned to the user.
  • graphql.operation.name: The name of the operation, if present.
  • http.headers.x-grafbase-client-name: The name of the client, if present.
  • http.headers.x-grafbase-client-version: The version of the client, if present.

Metric Name: graphql.operation.batch.size

This exponential histogram measures the number of batched requests sent to the engine. It counts the total number of batched requests while measuring the number of requests in the batch.

Metric Name: graphql.subgraph.request.duration

This exponential histogram measures the time in milliseconds for every subgraph request. It helps track execution time and includes the following attributes:

  • graphql.subgraph.name: The requested subgraph's name.
  • graphql.subgraph.response.status: Indicates if the response succeeded.
  • http.response.status_code: The HTTP status code.

Metric Name: graphql.subgraph.request.retries

This counter tracks retried subgraph requests. To enable this counter, you must enable retries. The counter increments when a subgraph request fails and the engine retries it. The metric includes the following attributes:

  • graphql.subgraph.name: The requested subgraph's name.
  • graphql.subgraph.aborted: Indicates if the retries stopped and if the request became an error.

Metric Name: graphql.subgraph.request.body.size

This exponential histogram measures subgraph request body sizes in bytes. The metric includes the following attribute:

  • graphql.subgraph.name: The requested subgraph's name.

Metric Name: graphql.subgraph.response.body.size

This exponential histogram measures successful subgraph response body sizes in bytes. The metric includes the following attribute:

  • graphql.subgraph.name: The requested subgraph's name.

Metric Name: graphql.subgraph.request.inflight

This up/down counter tracks in-flight subgraph requests. It increments when requesting a subgraph and decrements upon any response. The metric includes the following attribute:

  • graphql.subgraph.name: The requested subgraph's name.

Metric Name: graphql.subgraph.request.cache.hit

This counter tracks hits of subgraph entity caches. Enable this counter by activating entity caching. The metric includes the following attribute:

  • graphql.subgraph.name: The requested subgraph's name.

Metric Name: graphql.subgraph.request.cache.miss

This counter tracks misses of subgraph entity caches. Enable this counter by activating entity caching. The metric includes the following attribute:

  • graphql.subgraph.name: The requested subgraph's name.

Metric Name: graphql.operation.cache.hit

This counter tracks hits for operation plan caches.

Metric Name: graphql.operation.cache.miss

This counter tracks misses for operation plan caches.

Metric Name: graphql.operation.prepare.duration

This exponential histogram measures the time in milliseconds taken to prepare an operation. This includes:

  • Fetching a trusted document, if enabled and available.
  • Fetching a query plan from the in-memory cache.
  • If the plan is not cached, parsing the query into an AST and then determining the plan.

The metric includes the following attributes:

  • graphql.operation.name: The name of the operation, if present.
  • graphql.document: The normalized operation if parsing succeeds.
  • graphql.operation.success: Indicates if the preparation finished successfully.

Metric Name: grafbase.hook.duration

This exponential histogram measures the time in milliseconds taken to execute a hook. The metric includes the following attributes:

  • grafbase.name.hook: The name of the hook function.
  • grafbase.hook.status: Indicates if the hook call succeeded (SUCCESS), or if it failed due to errors from Grafbase code (HOST_ERROR), or from user code (GUEST_ERROR).

Metric Name: grafbase.gateway.access_log.pending

This counter measures the amount of access log events not yet written to the access log file. Read more on access logs.

Metric Name: grafbase.gateway.rate_limit.duration

This exponential histogram measures the time in milliseconds taken to query the current request rate from Redis. This metric requires enabling the Redis-based rate-limiting.

Metric Name: gdn.request.duration

This exponential histogram measures the time in milliseconds to fetch a graph from the Graph Delivery Network. This metric only activates in hybrid mode. The metric includes the following attributes:

  • server.address: The Graph Delivery Network endpoint URL.
  • gdn.response.kind: The response status kind, either new, unchanged, http_error, or gdn_error.
  • http.response.status_code: The status code of the request.

Define OpenTelemetry settings in the telemetry block of your Gateway configuration:

[telemetry] service_name = "grafbase-gateway"

The service_name appears in all traces, metrics, and logs and should be unique in your system.

Grafbase sends a standard set of resource attributes for every user. You can also define your own attributes, available in all logs, traces, and metrics:

[telemetry.resource_attributes] custom_key = "custom_value" other_key = "other_value"

You can define exporter settings globally for traces, logs, and metrics. If you need different settings for logs, tracing, or metrics, prefix the exporter settings with the appropriate word. For instance, custom settings for tracing use the key telemetry.tracing.exporters.otlp.

The traces and metrics can also be sent to standard output (logs will always be there):

[telemetry.exporters.stdout] enabled = true timeout = 60
  • enabled: Enables the OpenTelemetry exporter (default: false).
  • timeout: Time in seconds data remains in memory if the collector does not collect it promptly (default: 60).

Send traces, metrics, and logs to an external OpenTelemetry collector:

[telemetry.exporters.otlp] enabled = true endpoint = "http://localhost:1234" protocol = "grpc" timeout = 60
  • enabled: Enables the OpenTelemetry exporter (default: false).
  • endpoint: Defines the URL for the OpenTelemetry collector.
  • protocol: Either grpc or http (default: grpc).
  • timeout: Time in seconds data remains in memory if the collector does not collect it promptly (default: 60).

Avoid triggering a request for every single span, trace, and metric event. Instead, batch requests and send data at regular intervals. Configure the OpenTelemetry batch settings:

[telemetry.exporters.otlp.batch_export] scheduled_delay = 5 max_queue_size = 2048 max_export_batch_size = 512 max_concurrent_exports = 1
  • scheduled_delay: Time in seconds between consecutive requests (default: 5).
  • max_queue_size: Maximum queued items for delayed processing. If the queue fills, the system drops events (default: 2048).
  • max_export_batch_size: Maximum number of events in a single batch. If more events are collected before the scheduled delay, it queues them (default: 512).
  • max_concurrent_exports: Number of concurrent senders processing batches (default: 1).

If using grpc as the protocol, the Gateway will use the following settings.

For collectors using TLS with a custom certificate, specify the TLS settings:

[telemetry.exporters.otlp.grpc.tls] domain_name = "custom_name" key = "/path/to/key.pem" cert = "/path/to/cert.pem" ca = "/path/to/ca.crt"
  • domain_name: The domain name against which to verify the server's TLS certificate.
  • key: Path to the secret key.
  • cert: Path to the X509 certificate file in PEM format.
  • ca: Path to the X509 CA certificate file in PEM format.

If needed, define custom headers for gRPC collectors:

[[telemetry.exporters.otlp.grpc.headers]] authorization = "Bearer {{ env.GRPC_TOKEN }}" [[telemetry.exporters.otlp.grpc.headers]] custom = "static value"

If you set the protocol to http, the Gateway will use the following settings. Define custom headers to send with every request:

[[telemetry.exporters.otlp.http.headers]] authorization = "Bearer {{ env.GRPC_TOKEN }}" [[telemetry.exporters.otlp.http.headers]] custom = "static value"

Currently, the http exporter does not support TLS. If you need TLS, use the grpc exporter.