This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Manage metrics

Manage Aura metrics

Learn what are Aura metrics, how they are generated and stored in Prometheus and the process for its analysis through Grafana

Introduction

Metrics provide a measurement of certain data that represent a specific aspect of the monitored system at a point in time and offer an aggregated view over the system. They are useful to visualize long-term trends and alerts on log data.

Each Aura component is in charge of publishing its own metrics, which are typically generated at fixed-time intervals from aggregated logs.

Once generated, Aura metrics are pooled by Prometheus, which is in charge of gathering and exposing them.

Grafana is the most suitable tool to represent metrics through different dashboards. Each component counts on a Grafana dashboard to show its current behavior and there is a single dashboard for an Aura overview.

If you think a new metric could useful, please contact the Aura Platform Team, so it can be officially included as part of the platform.

The aim of this section is to explain both how Aura metrics work and all the metrics stored by each component.

⚠️ Saved dashboards, visualizations and queries are not guaranteed to be kept between upgrades because all the stack, including ElasticSearch and Grafana can be upgraded to newer versions.

Prometheus

Aura metrics system is based on Prometheus, a Cloud Native Computing Foundation project that works as systems and services monitoring system. Prometheus collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.

prom-client is being used to implement prometheus functionality in Node.js.

Prometheus service pools every component to get the metrics generated during the last time period. Every component counts on a private endpoint (not accessible from Internet) called /metrics where Prometheus requests the metrics.

Currently, the metric types used in this component are:

  • Summary: similar to histogram metrics, it includes samples observations (such as request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.

  • Counter: cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.

  • Gauge: similar to Counter, but it represents a single numerical value that can arbitrarily go up and down.

Prometheus-es-exporter

Working with Prometheus, we can create metrics using queries to ElasticSearch indexes (as well as create alarms, dashboard, etc) using prometheus-es-exporter.

This component is not deployed by default, but it can be enabled changing the variable prometheus_es_exporter_enabled to true in you config.yml file. (In Brazil, it is set to true by default). Access here the guidelines to enable prometheus-es-exporter component.

To config your own metrics from queries, write the new section, as in the following example, in your config.yml.

prometheus_es_exporter:
  query_blocks:
    ob:
      - name: "query_ob_br"
        QueryIntervalSecs: "60"
        QueryJson: '{"size":0,"query":{"bool":{"must":[],"filter":[{"bool":{"filter":[{"bool":{"should":[{"match_phrase":{"msg":"[AzureEventHub] emit"}}],"minimum_should_match":1}},{"bool":{"should":[{"match_phrase":{"kubernetes.labels.app":"aura-bot"}}],"minimum_should_match":1}}]}},{"range":{"@timestamp":{"gte":"now-1m","lte":"now"}}}]}}}'
        QueryIndices: "aurak8s-service-*"

Where:

  • name: Mandatory. Name of the query. It must start with query_*
  • QueryIntervalSecs: Optional. It indicates how often to run queries in seconds. By default, 60.
  • QueryJson: Mandatory. The search query to run.
  • QueryIndices: Optional. Indices to run the query on. Any way of specifying indices supported by your ElasticSearch version can be used. By default, _all. Although this field is optional, it is highly recommended to delimit the search query.

Aura components metrics

The main Aura components can generate their own metrics.

Select your intended component in the left menu and access to its details.

1 - Aura Bot metrics

Aura Bot metrics

List of metrics available in Aura Bot

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by aura-bot.

It is stored as a Summary in Prometheus. So every sample, besides the defined labels, also includes its duration.

It measures the duration since the request lands in aura-bot until its HTTP response is returned, indicating to the client that Aura is processing the request to obtain a proper answer for the user.

The metric allows measuring the behavior of the requests from any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response

This metric was stored since Iron Maiden (7.2.0) release.

outgoing_request_duration_seconds

This metric is intended to store the information related to all the outgoing HTTP requests made by aura-bot.

It is stored as a Summary in Prometheus so every sample, besides the defined labels, also includes its duration.

This metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status: HTTP status code returned in the response

This metric was stored since Camela (5.0.0) release.

outgoing_message_duration_seconds

This metric is intended to store the number of Direct Line requests arriving to aura-bot.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

As aura-bot is an asynchronous server, the processing of a request does not end when the HTTP response is returned, but when the proper answer for the user is sent back to the client callback. This metric measures the duration since the request lands in aura-bot until the last message of its answer is sent to the client callback.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status code returned in the response.
  • originStatus: status sent by Direct Line in the body of the response in the happening of an error.
  • origin: specific host of the request.
  • channel: channel of the request.

This metric was stored since Iron Maiden (7.2.0) release.

aura_component_version

This metric is intended to store the number of aura-bot instances (pods) running each version of the code. It is stored as a Gauge in Prometheus.

Labels:

  • version: version field in the package.json file included in the running docker container.
  • component: name of the component that is writing the metric.

This metric was stored since Camela (5.0.0) release with the name of bot_version and updated to aura_component_version in Iron Maiden (7.2.0).

bot_request_version

This metric is intended to store the number of incoming requests to aura-bot depending on their channelData.version. It is stored as a Counter in Prometheus.

Labels:

  • version: channelData.version in the incoming request. If the incoming request has no version field, 1 will be set.

This metric was stored since Iron Maiden (7.2.0) release.

aura_server_unhandled_error

This metric is intended to store the number of unhandled errors happening in aura-bridge.

It is stored as a Counter in Prometheus.

Labels:

  • error: exception message that forced the unhandled error.

This metric was stored since Iron Maiden (7.2.0) release.

aura_token_generate

This metric is intended to store the information related to Kernel accessToken refreshments in aura-bridge. It is intended to make it possible to set an alarm in the happening of any error during refresh of the 2-legged accessToken needed to access Kernel WhatsApp APIs.

It is stored as a Summary.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status returned by Kernel in the response.
  • originStatus: status sent by Kernel in the body of the response in the happening of an error.
  • origin: channelId of the channel that needs the accessToken in Aura.
  • channel: channel of the request.

This metric was stored since Iron Maiden (7.2.0) release.

services_status

This metric is intended to store the number of success or errored checks of modules of the server. It is stored as a Counter in Prometheus.

Labels:

  • moduleId: Id of the module.
  • status: OK or ERROR

2 - Aura Groot metrics

Aura Groot metrics

List of metrics available in Aura Groot

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by aura-groot.

It is stored as a Summary in Prometheus. So every sample, besides the defined labels, also includes its duration.

It measures the duration since the request lands in aura-groot until its HTTP response is returned, indicating to the client that Aura is processing the request to obtain a proper answer for the Direct Line or aura-bridge.

The metric allows measuring the behavior of the requests from any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response

outgoing_request_duration_seconds

This metric is intended to store the processing time related to all the outgoing HTTP requests made by aura-groot.

It is stored as a Summary in Prometheus so every sample, besides the defined labels, also includes its duration.

This metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status: HTTP status code returned in the response

outgoing_message_duration_seconds

This metric is intended to store the processing time of Direct Line or aura-bridge requests arriving to aura-groot.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

As aura-goot is an asynchronous server, the processing of a request does not end when the HTTP response is returned, but when the proper answer for the user is sent back to the client callback. This metric measures the duration since the request lands in aura-groot until the last message of its answer is sent to the client callback.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status code returned in the response.
  • originStatus: status sent by Direct Line in the body of the response in the happening of an error.
  • origin: specific host of the request (Direct Line or aura-bridge).
  • channel: channel of the request.

incoming_message_duration_seconds

This metric is intended to store the processing time of Direct Line, aura-bridge or skills requests arriving to aura-groot.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

As aura-goot is an asynchronous server, the processing of a request does not end when the HTTP response is returned, but when the proper answer for the channel or skill is sent back to the client callback. This metric measures the duration from when the request arrives at aura-groot until it is processed to send to the channel/bridge or skill.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status code returned in the response.
  • originStatus: status sent by Direct Line in the body of the response in the happening of an error.
  • origin: specific host of the request (Direct Line, aura-bridge or skill name). If origin is missing, the content of path label will be added.
  • channel: channel of the request.

aura_component_version

This metric is intended to store the number of aura-groot instances (pods) running each version of the code. It is stored as a Gauge in Prometheus.

Labels:

  • version: version field in the package.json file included in the running docker container.
  • component: name of the component that is writing the metric.

aura_server_unhandled_error

This metric is intended to store the number of unhandled errors happening in aura-groot.

It is stored as a Counter in Prometheus.

Labels:

  • error: exception message that forced the unhandled error.

skill_access_error

This metric is intended to store the number of times a skill has been misconfigured in aura-groot.

It is stored as a Counter in Prometheus.

Labels:

  • skill: skill name.
  • code: noRespond or noFound
  • channel: channel of the request.

skill_request_status

This metric is intended to store the number of times we have obtained a response status per skill in aura-groot.

It is stored as a Counter in Prometheus.

Labels:

  • skill: skill name.
  • code: status code of the request.
  • channel: channel of the request.

skill_response_error

This metric is intended to store the number of times a skill has been blocked in aura-groot.

It is stored as a Counter in Prometheus.

Labels:

  • skill: skill name
  • code: blocked
  • channel: channel of the request.

services_status

This metric is intended to store the number of success or errored checks of modules of the server. It is stored as a Counter in Prometheus.

Labels:

  • moduleId: Id of the module.
  • status: OK or ERROR

3 - Atria Model Gateway metrics

Atria Model Gateway metrics

List of metrics available in atria-model-gateway

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by atria-model-gateway.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

This metric allows measuring the behavior of the requests from any given endpoint. Specifically, the duration since the request lands in atria-model-gateway until its HTTP response is returned:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response
  • application: application name that is using the model

outgoing_request_duration_seconds

This metric is intended to store the information related to all the outgoing HTTP requests made by atria-model-gateway. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

The metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status: HTTP status code returned in the response

generative_tokens

This metric is intended to store the information related to tokens used by OpenAI in atria-rag-server. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its tokens usages.

The metric allows measuring the behavior of the tokens using any given OpenAI model:

  • The number of tokens during a time
  • The average/min/max tokens of these requests

Labels:

  • application: application name that is using the model
  • deployment_model_name: name of the deployment model
  • model_type: identifier of the model

4 - Atria RAG server metrics

Atria RAG server metrics

List of metrics available in atria-rag-server

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by atria-rag-server.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

This metric allows measuring the behavior of the requests from any given endpoint. Specifically, the duration since the request lands in atria-rag-server until its HTTP response is returned:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response
  • application: application name that is using the model

outgoing_request_duration_seconds

This metric is intended to store the information related to all the outgoing HTTP requests made by atria-rag-server. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

The metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status: HTTP status code returned in the response

5 - Aura Authentication API metrics

Authentication API metrics

List of metrics available in Aura Authentication API

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by aura-authentication-api. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

This metric allows measuring the behavior of the requests from any given endpoint. Specifically, the duration since the request lands in aura-authentication-api until its HTTP response is returned:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.).
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response

This metric was stored since Greenday (6.0.0) release.

outgoing_request_duration_seconds

This metric is intended to store the information related to all the outgoing HTTP requests made by aura-authentication-api. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

The metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status: HTTP status code returned in the response

This metric was stored since Camela (5.0.0) release.

aura_token_generate

This metric is intended to store the information related to Kernel accessToken generation, used during the integrated authorization process of the Aura users in aura-authentication-api.

It is intended to make it possible to set an alarm in the happening of any error during token validation. It is stored as a Summary in Prometheus.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status returned by Kernel in the response.
  • originStatus: status sent by Kernel in the body of the response in the happening of an error.
  • origin: channelId of the channel that needs the accessToken in Aura.

This metric was stored since Iron Maiden (7.2.0) release.

aura_component_version

This metric is intended to store the number of aura-authentication-api instances (pods) running each version of the code.

It is stored as a Gauge in Prometheus.

Labels:

  • version: version field in the package.json file included in the running docker container.
  • component: name of the component that is writing the metric.

This metric was stored since Barricada (5.3.0) release with the name of authentication_api_version and updated to aura_component_version in Iron Maiden (7.2.0).

aura_server_unhandled_error

This metric is intended to store the number of unhandled errors happening in aura-bridge. It is stored as a Counter in Prometheus.

Labels:

  • error: exception message that forced the unhandled error.

This metric was stored since Iron Maiden (7.2.0) release.

services_status

This metric is intended to store the number of success or errored checks of modules of the server. It is stored as a Counter in Prometheus.

Labels:

  • moduleId: Id of the module.
  • status: OK or ERROR

6 - Aura Configuration API metrics

Aura Configuration metrics

List of metrics available in Aura Configuration API

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by aura-configuration-api.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

This metric allows measuring the behavior of the requests from any given endpoint. Specifically, the duration since the request lands in aura-configuration-api until its HTTP response is returned:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response

This metric was stored since Greenday (6.0.0) release.

outgoing_request_duration_seconds

This metric is intended to store the information related to all the outgoing HTTP requests made by aura-configuration-api. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

The metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status: HTTP status code returned in the response

aura_component_version

This metric is intended to store the number of aura-configuration-api instances (pods) running each version of the code.

It is stored as a Gauge in Prometheus.

Labels:

  • version: version field in the package.json file included in the running docker container.
  • component: name of the component that is writing the metric.

aura_server_unhandled_error

This metric is intended to store the number of unhandled errors happening in aura-configuration-api. It is stored as a Counter in Prometheus.

Labels:

  • error: exception message that forced the unhandled error.

services_status

This metric is intended to store the number of success or errored checks of modules of the server. It is stored as a Counter in Prometheus.

Labels:

  • moduleId: Id of the module.
  • status: OK or ERROR

7 - Aura Gateway API metrics

Gateway API metrics

List of metrics available in Aura Gateway API

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by aura-gateway-api.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

This metric allows measuring the behavior of the requests from any given endpoint. Specifically, the duration since the request lands in aura-gateway-api until its HTTP response is returned:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response
  • application: Application name of the request.
  • channel: Channel name of the request. Only for NLPaaS endpoint.
  • preset: Preset name of the request. Only for Generative endpoint.

outgoing_request_duration_seconds

This metric is intended to store the information related to all the outgoing HTTP requests made by aura-gateway-api. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

The metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status: HTTP status code returned in the response

aura_component_version

This metric is intended to store the number of aura-gateway-api instances (pods) running each version of the code.

It is stored as a Gauge in Prometheus.

Labels:

  • version: version field in the package.json file included in the running docker container.
  • component: name of the component that is writing the metric.

This metric was stored since Beatles (8.9.0) release.

aura_server_unhandled_error

This metric is intended to store the number of unhandled errors happening in aura-gateway. It is stored as a Counter in Prometheus.

Labels:

  • error: exception message that forced the unhandled error.

services_status

This metric is intended to store the number of success or errored checks of modules of the server. It is stored as a Counter in Prometheus.

Labels:

  • moduleId: Id of the module.
  • status: OK or ERROR

8 - Aura Bridge metrics

Aura Bridge metrics

List of metrics available in Aura bridge

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by aura-bridge. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

It measures the duration since the request lands in aura-bridge until its HTTP response is returned, indicating to the client that Aura is processing the request to obtain a proper answer for the user.

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response

This metric allows measuring the behavior of the requests from any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

This metric was stored since Greenday (6.0.0) release.

outgoing_message_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by aura-bridge.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

As aura-bridge is an asynchronous server, the processing of a request does not end when the HTTP response is returned, but when the proper answer for the user is sent back to the client callback.

This metric measures the duration since the request lands in aura-bridge until the last message of its answer is sent to the client callback.

Labels:

  • host: host and domain where the request is being sent.
  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • path: specific endpoint of the request.
  • originStatus: third party status sent in the body of the response. Usually, this status is sent by whatsapp.
  • status: HTTP status code returned in the response.
  • origin: specific source of the request. The value could be: ‘4p’, ‘whatsapp’, ‘aura-bot’ or ‘genesys’.
  • channel: channel of the request.

This metric allows measuring the behavior of the requests from any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

This metric was stored since Greenday (6.0.0) release.

incoming_message_duration_seconds

This metric is intended to store the number requests arriving to aura-bridge from a channel or Direct Line.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

As aura-bridge is an asynchronous server, the processing of a request does not end when the HTTP response is returned, but when the proper answer for the channel or Direct Line is sent back to the client callback. This metric measures the duration from when the request arrives at aura-bridge until it is processed to send to the channel or Direct Line.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status code returned in the response.
  • originStatus: status sent by Direct Line or channel in the body of the response in the happening of an error.
  • origin: specific host of the request. If origin is missing, the content of path label will be added.
  • channel: channel of the request. In Auraline requests used to get conversationId with path: /aura-services/v1/auraline/conversations, channel will be missing.

aura_response_ack_duration_seconds

This metric is intended to store the information related to all the ACK requests sent by the clients to aura-bridge. The ACK requests are used by the clients (WhatsApp) to notify if in the end Aura’s answer was delivered to the user or not.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration. The duration measures since the ACK request lands in aura-bridge until its asynchronous answer is sent to the user.

Labels:

  • host: host and domain where the request is being sent.
  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • path: specific endpoint of the request.
  • originStatus: third party status sent in the body of the response. Usually, this status is sent by whatsapp.
  • status: HTTP status code returned in the response.
  • origin: specific source of the request. The value could be: ‘4p’, ‘whatsapp’, ‘aura-bot’ or ‘genesys’.
  • channel: channel of the request.

This metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

This metric was stored since Heroes (7.0.0) release.

outgoing_request_duration_seconds

This metric is intended to store the information related to all the outgoing HTTP requests made by aura-bridge. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, …)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status: HTTP status code returned in the response

This metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

This metric was stored since Greenday (6.0.0) release.

aura_server_unhandled_error

This metric is intended to store the number of unhandled errors happening in aura-bridge. It is stored as a Counter in Prometheus.

Labels:

  • error: exception message that forced the unhandled error.

This metric was stored since Iron Maiden (7.2.0) release.

aura_token_generate

This metric is intended to store the information related to Kernel accessToken refreshments in aura-bridge. It is intended to make it possible to set an alarm in the happening of any error during refresh of the 2-legged accessToken needed to access Kernel WhatsApp APIs.

It is stored as a Summary in Prometheus.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status returned by Kernel in the response.
  • originStatus: status sent by Kernel in the body of the response in the happening of an error.
  • origin: channelId of the channel that needs the accessToken in Aura.

This metric was stored since Iron Maiden (7.2.0) release.

aura_component_version

This metric is intended to store the number of aura-bridge instances (pods) running each version of the code.

It is stored as a Gauge in Prometheus.

Labels:

  • version: version field in the package.json file included in the running docker container.
  • component: name of the component that is writing the metric.

This metric was stored since Greenday (6.0.0) release with the name of aura_bridge_version and updated to aura_component_version in Iron Maiden (7.2.0).

aura_bridge_wa_incoming_message

This metric is intended to store the number of unhandled errors happening in aura-bridge. It is stored as a Counter in Prometheus.

Labels:

  • error: exception message that forced the unhandled error.

This metric was stored since Iron Maiden (7.2.0) release.

services_status

This metric is intended to store the number of success or errored checks of modules of the server. It is stored as a Counter in Prometheus.

Labels:

  • moduleId: Id of the module.
  • status: OK or ERROR

9 - Aura KPIs uploader metrics

Aura KPIs Uploader

List of metrics available in Aura KPIs uploader

aura_kpis_uploader_metrics_duration

This KPI measures the time required by aura-kpis-uploader to process each type of KPI. KPI management has several steps (load, process, upload), and this KPI represents the time it takes to perform all those steps for each of the KPIs defined in AURA_SOURCE_PATH_AVRO_ADAPTERS.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

Labels:

  • format: File format in which the KPI will be stored.
    • csv: File format will be CSV (deprecated).
    • avro: File format will be AVRO.
  • kpiType: Type of KPI:
    • entity: KPI is of type Entity.
    • dimensional: KPI is of type Dimensional.
  • kpiName: Name of the KPI.
  • duration: Time in seconds with the time used to process the KPI.
  • numberFilesProcessed: Number of KPIs processed. If the format is AVRO, it represents the number of records processed. If the format is CSV, it only represents the number of processed files.

aura_kpis_uploader_metrics

This metric is intended to store the information related to all processes executed by aura-kpis-uploader. It is stored as a Counter in Prometheus, so every sample, besides the defined labels.

This KPI measures the amount of KPI registers processed, if the format is AVRO it represents the number of records processed. If the format is CSV, it only represents the number of processed files.

Labels:

  • format: File format in which the KPI will be stored.
    • csv: File format will be CSV (deprecated).
    • avro: File format will be AVRO.
  • kpiType: Type of KPI:
    • entity: KPI is of type Entity.
    • dimensional: KPI is of type Dimensional.
  • kpiName: Name of the KPI.
  • duration: Time in seconds with the time used to process the KPI.
  • numberFilesProcessed: Number of KPIs processed. If the format is AVRO, it represents the number of records processed. If the format is CSV, it only represents the number of processed files.

aura_kpis_uploader_errors

This metric is intended to store the information related to all errors generated by execution of aura-kpis-uploader. It is stored as a Counter in Prometheus, so every sample, besides the defined labels.

This KPI measures the amount of KPI errors produced when generating KPIs.

Labels:

  • type: Name of the method or function where the error occurred.
  • format: File format in which the KPI will be stored.
    • csv: File format will be CSV (deprecated).
    • avro: File format will be AVRO.
  • kpiType: Type of KPI:
    • entity: KPI is of type Entity.
    • dimensional: KPI is of type Dimensional.
  • kpiName: Name of the KPI.
  • url: If the error contains a file with more information stored in Azure Storage, this field contains the URL to download the file.

aura_server_unhandled_error

This metric is intended to store the number of unhandled errors happening in aura-kpis-uploader. It is stored as a Counter in Prometheus.

Labels:

  • error: Exception message that forced the unhandled error.

aura_server_unhandled_error is stored from Loquillo (7.5.0) release onwards.

10 - Aura NLP metrics

Aura NLP metrics

List of metrics available in Aura NLP

These metrics are stored since Heroes (7.0.0.) release

http_request_duration_seconds

This Prometheus metric is modelled as a summary where its value is the spent time until the remote host responds to an HTTP request.

Note that the value is a float number rounded to its third decimal. It is stored as a Summary in Prometheus.

This metric is intended to store the duration of outgoing requests in seconds.

Labels:

All label values are strings.

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.).
  • path: HTTP path of the incoming request.
  • status_code: the responded HTTP status code (as a string).

Value:

  • Request duration in seconds.

outgoing_request_duration_seconds

This Prometheus metric is a modelled as a summary where the value is the spent time until the remote host responds to an HTTP request.

Note the value is a float number rounded to its third decimal. It is stored as a Summary in Prometheus.

This metric is intended to store the duration of outgoing requests in seconds.

Labels:

All label values are strings.

  • method: HTTP method (GET, POST; etc.), a string in uppercase.
  • host: remote host that will receive the outgoing request.
  • path: HTTP path of the outgoing request.
  • status: the responded HTTP status code (as a string).

11 - T&C API metrics

Terms & Conditions API metrics

List of metrics available in Terms and Conditions API

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests handled by tac-api. It is stored as a Histogram in Prometheus, so every sample, besides the defined labels, also includes its duration.

It measures the duration since the request lands in tac-api until its HTTP response is returned.

This metric allows measuring the behavior of the requests from any given endpoint:

  • The number of requests during a period of time
  • The average/min/max duration of these requests
  • Quantiles of the duration and the number of requests in a period

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response

This metric was stored since Barricada (5.0.0) release.

http_requests_total

This metric is intended to store information about all the request handled by tac-api. It is stored as a Counter in Prometheus.

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • status_code: HTTP status code returned in the response.

This metric allows measuring the behavior of the requests from any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests
  • Quantiles

This metric was stored since Barricada (4.0.0) release.

http_in_flight_requests_total

This metric is intended to store the information related to all the concurrent HTTP requests being handled by tac-api in a period.

It is stored as a Gauge in Prometheus because it is a value that can go up and down at every moment.

This metric allows to measure the behavior of the requests from any given endpoint:

  • The number of requests during a period of time
  • The average/min/max duration of these requests
  • Quantiles of the duration and the number of requests in a period.

This metric was stored since Barricada (4.0.0) release.

tac_internal_errors

This metric is intended to store the number of internal errors happening in tac-api. It is stored as a Counter in Prometheus because its value can only go up.

Labels:

  • name: it will contain the exception message that forced the unhandled error.

This metric was stored since Barricada (4.0.0) release.

tac_service_acceptances_total

This metric is intended to store the number of acceptances of Terms and Conditions per service handled by tac-api. It is stored as a Counter in Prometheus because its value can only go up.

Labels:

  • name: it will contain the name of the accepted service. Currently, it could contain one of: aura, whatsapp-anonymous, whatsapp-authenticated
  • version: T&C version accepted by the user

This metric was stored since Barricada (4.0.0) release.

tac_service_updates_total

This metric is intended to store the number of updates of terms and conditions per service handled by tac-api. It is stored as a Counter in Prometheus because its value can only go up.

Labels:

  • name: name of the updated service. Currently (Iron Maiden) it could contain one of: aura, whatsapp-anonymous, whatsapp-authenticated
  • version: T&C version updated by the user

This metric was stored since Barricada (4.0.0) release.

tac_user_deletions_total

This metric is intended to store the number of deletions of terms and conditions per service handled by tac-api. It is stored as a Counter in Prometheus because its value can only go up.

This metric was stored since Barricada (4.0.0) release.

aura_component_version

This metric is intended to store the number tac-api instances (pods) running each version of the code. It is stored as a Gauge in Prometheus.

Labels:

  • version: version field in the package.json file included in the running docker container.
  • component: name of the component that is writing the metric.

This metric was stored since Iron Maiden (7.2.0).

12 - NLP provisioning metrics

NLP Provisioning metrics

List of metrics available in Aura NLP provisioning

These metrics are stored since Heroes (7.0.0.) release.

Introduction

In the Aura NLP provisioning component, it is important to know in each moment the quantity of processes restarted in relation with the total processes that, at this moment, work to process the different container. In that way, it could be alerted to an abnormal performance and take measures in this regard.

http_request_duration_seconds

This Prometheus metric is modelled as a summary where its value is the spent time until the remote host responds to an HTTP request.

Note that the value is a float number rounded to its third decimal. It is stored as a Summary in Prometheus.

This metric is intended to store the duration of outgoing requests in seconds.

Labels:

All label values are strings.

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.).
  • path: HTTP path of the incoming request.
  • status_code: the responded HTTP status code (as a string).

Value:

  • Request duration in seconds.

nlp_provisioning_killed_processes

This metric is intended to store the number of processes killed in each iteration of the Aura NLP provisioning execution. It is stored as a Gauge in Prometheus.

Value:

  • Number worker processes killed in each iteration

nlp_provisioning_alive_processes

This metric is intended to store the number worker processes alive in each iteration of NLP Provisioning. It is stored as a Gauge.

Value:

  • Total alive processes.

nlp_provisioning_expected_alive_processes

This metric is intended to store the number of expected alive processes in the NLP Provisioning. It is stored as a Gauge.

Value:

  • Set gauge with total alive processes.
  • Decrease gauge with finished processes.

nlp_provisioning_container_killed_count

This metric is intended to store the counter of all the processes killed in Aura NLP provisioning. It is stored as a Counter in Prometheus.

Labels:

  • container: container URL.

Value:

  • Dead process ids (pids).

13 - Aura Complex Logic metrics

Aura Complex Logic metrics

List of metrics available in Aura Complex Logic Framework

These metrics are stored since Heroes (7.0.0.) release

http_request_duration_seconds

This Prometheus metric is modelled as a summary, where its value is the spent time until the remote host responds to an HTTP request.

Note that the value is a float number rounded to its third decimal. It is stored as a Summary in Prometheus.

This metric is intended to store the duration of outgoing requests in seconds.

Labels:

All label values are strings.

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.).
  • path: HTTP path of the incoming request.
  • status_code: the responded HTTP status code (as a string).

Value:

  • Request duration in seconds

supervised_complex_logic_app_restarted_counter

This metric is intended to store a count of the restarted plugins.

It is stored as a Counter in Prometheus.

Labels:

All label values are strings.

  • app: clf
  • supervised_plugin: Supervised plugin class path.
  • plugin_status: Plugin response code status.
  • plugin_handler_name: Handler name.

complex_logic_app_http_requests

This metric is intended to store the HTTP requests of Aura Complex Logic plugins.

It is stored as a Counter in Prometheus.

Labels:

All label values are strings.

  • app: clf
  • plugin: plugin class path.
  • status_code: plugin response code status.
  • handler_name: handler name.

14 - Aura Context metrics

Aura Context metrics

List of metrics available in Aura Context

These metrics are stored since Heroes (7.0.0.) release

http_request_duration_seconds

This Prometheus metric is modelled as a summary where its value is the spent time until the remote host responds to an HTTP request.

Note that the value is a float number rounded to its third decimal. It is stored as a Summary in Prometheus.

This metric is intended to store the duration of outgoing requests in seconds.

Labels:

All label values are strings.

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.).
  • path: HTTP path of the incoming request.
  • status_code: the responded HTTP status code (as a string).

Value:

  • Request duration in seconds.

database_request_duration_seconds

This metric is intended to store the duration of database requests in seconds.

It is stored as a Summary in Prometheus.

Labels:

All label values are strings.

  • database: database name (Redis or Mongo).
  • operation: database operation (i.e., update, create, get_by_date, get_last_n, get_by_corr).

Value:

  • Request duration in seconds.

15 - Aura File Manager metrics

Aura File Manager metrics

List of metrics available in Aura File Manager

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by aura-file-manager.

It is stored as a Summary in Prometheus. So every sample, besides the defined labels, also includes its duration.

It measures the duration since the request lands in aura-file-manager until its HTTP response is returned, indicating to the client that Aura is processing the request to obtain a proper answer.

The metric allows measuring the behavior of the requests from any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status_code: HTTP status code returned in the response

outgoing_request_duration_seconds

This metric is intended to store the processing time related to all the outgoing HTTP requests made by aura-file-manager.

It is stored as a Summary in Prometheus so every sample, besides the defined labels, also includes its duration.

This metric allows measuring the behavior of the requests to any given endpoint:

  • The number of requests during a time
  • The average/min/max duration of these requests

Labels:

  • method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
  • host: host and domain where the request is being sent
  • path: specific endpoint of the request
  • status: HTTP status code returned in the response

outgoing_message_duration_seconds

This metric is intended to store the processing time of aura-bot requests arriving to aura-file-manager.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

As aura-file-manager is an asynchronous server, the processing of a request does not end when the HTTP response is returned, but when the proper answer for the user is sent back to the client callback. This metric measures the duration since the request lands in aura-file-manager until the last message of its answer is sent to the client callback.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status code returned in the response.
  • origin: aura-bot

incoming_message_duration_seconds

This metric is intended to store the processing time of aura-bot requests arriving to aura-file-manager.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

As aura-file-manage is an asynchronous server, the processing of a request does not end when the HTTP response is returned, but when the proper answer for the channel or skill is sent back to the client callback. This metric measures the duration from when the request arrives at aura-file-manager until it is processed to send the response.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status code returned in the response.
  • originStatus: status sent in the body of the response in the happening of an error.
  • origin: aura-bot

aura_component_version

This metric is intended to store the number of aura-file-manager instances (pods) running each version of the code. It is stored as a Gauge in Prometheus.

Labels:

  • version: version field in the package.json file included in the running docker container.
  • component: name of the component that is writing the metric.

aura_server_unhandled_error

This metric is intended to store the number of unhandled errors happening in aura-file-manager.

It is stored as a Counter in Prometheus.

Labels:

  • error: exception message that forced the unhandled error.

aura_token_generate

This metric is intended to store the processing time of aura-file-manger to get/refresh kernel token.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

Labels:

  • path: specific endpoint of the request.
  • httpStatus: HTTP status code returned in the response.
  • originStatus: status sent by Direct Line in the body of the response in the happening of an error.
  • origin: kernel client identifier

file_validation_duration_seconds

This metric is intended to store the validation time of a file.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

Labels:

  • path: specific endpoint of the request.
  • code: OK when file is valid.
  • origin: specific endpoint of the request.

services_status

This metric is intended to store the number of success or errored checks of modules of the server. It is stored as a Counter in Prometheus.

Labels:

  • moduleId: Id of the module.
  • status: OK or ERROR

16 - Aura Redis MongoDB sync metrics

Aura Redis MongoDB Synchronizer metrics

List of metrics available in aura-redis-mongo-sync (ARMS)

aura_component_version

This metric is intended to store the number of aura-bot instances (pods) running each version of the code. It is stored as a Gauge in Prometheus.

Labels:

  • version: version field in the package.json file included in the running docker container.
  • component: name of the component that is writing the metric.

aura_server_unhandled_error

This metric is intended to store the number of unhandled errors happening in aura-redis-mongo-sync.

It is stored as a Counter in Prometheus.

Labels:

  • error: exception message that forced the unhandled error.

redis_mongo_sync_duration_milliseconds

This metric measures the data upload time from the service to the Mongo database.

It is stored as a Histogram in Prometheus. So every sample, besides the defined labels, also includes its duration.

The aura-redis-mongo-sync service contains a data collector that helps the event service move stale data from Redis to MongoDB. This collector sends the data in packets to optimize performance. This metric measures the time MongoDB takes to process the packet.

Labels:

  • status: HTTP status returned in the response. Values: success.
    • success: if the status is success, the time is stored.

redis_mongo_synced_items_total

This metric is intended to store the registers synchronized between Redis and MongoDB by events.

It is stored as a Counter in Prometheus.

Labels:

  • type: register type. Values: event, active_context
    • event: Items synchronized by event.
    • active_context: Items synchronized by active context process.

redis_mongo_synced_errors

This metric is intended to store the errors that have occurred in the synchronization.

It is stored as a Counter in Prometheus.

Labels:

  • error: Values : create, syncData, executeBulk.
    • create: If the error occurred when creating the service.
    • syncData: If the error occurred when synchronizing the data.
    • executeBulk: If the error occurred when uploading the data to MongoDB in bulk mode.

redis_mongo_sync_configuration_settings

This metric contains the service configuration data.

It is stored as a Gauge in Prometheus.

Labels:

  • setting_name: Values: shard_count, pod_count, active_context_ttl_seconds, redis_cache_ttl_seconds.
    • shard_count: Current shard used to distribute the data to synchronize between pods.
    • pod_count: Current number of services of aura-redis-mongo-sync.
    • active_context_ttl_seconds: Time interval to run the data collector.
    • redis_cache_ttl_seconds: Time in seconds that will be set to the context elements in the Redis cache.

services_status

This metric is intended to store the number of success or errored checks of modules of the server. It is stored as a Counter in Prometheus.

Labels:

  • moduleId: Id of the module.
  • status: OK or ERROR