Aura – atria-model-gateway

Docs:

Mon, 01 Jan 0001 00:00:00 +0000

ATRIA Model Gateway architecture and components

Development architecture and technical components of the atria-model-gateway

Technical foundations

atria-model-gateway is responsible for managing the communication with different AI models. This component receives a request from aura-gateway-api, together with other input data, and makes a call the corresponding AI models.

If the selected AI model is RAG, then atria-model-gateway calls the atria-rag-server, which is in charge of executing the RAG chain and making the corresponding calls to the LLM models and databases.

Functional components

The functional components of atria-model-gateway are described in the document LLM/LMM Experiences Builder

Architecture overview

The following diagram schematically shows the main technical components integrated into atria-model-gateway.

A brief description of these components is included below:

Access module

Module for the management of different profiles to access atria-model-gateway.

Context module

Module in charge of the storage of a conversation history in a cache (currently, Redis is used) over a period of time, grouped by session ID. These conversations are taken into account when calling the generative LLM models.

Model manager

Module that includes the available models and presets. It is in charge of receiving the info from aura-gateway-api and calling the corresponding model.

Models

Available AI models integrated into the atria-model-gateway.

Presets

Presets are configurable entities to define the specific model to work with and certain parameters associated to it: model Id, name, description, model parameters, etc.

Constructors can use the default presets or build new ones: Go to document ATRIA configuration.

When configuring an application, all the presets that can be used for this application must be previously defined.

Docs:

Mon, 01 Jan 0001 00:00:00 +0000

ATRIA Model Gateway operational overview

Overview of the atria-model-gateway operation

Operational workflow

The operational flow between an application (for the communication with aura-gateway-apì), atria-model-gateway, atria-rag-server and atria-rag-generate-db is schematically shown in the following figure:

Application
- Constructors must configure an application for a channel, skill or service to communicate with aura-gateway-api.
- In the application, the constructor must set the access grants for this application and all the presets that this application can use, from the ones configured in atria-model-gateway.
atria-model-gateway It contains:
- The different accesses that are defined here for each preset.
- The available presets. Each of them is associated to an AI model with specific parameters.
- The available AI models.
atria-rag-server
- When using RAG (Retrieval Augmented Generation), atria-rag-server is in charge of managing the requests made to the RAG model.
- The available projects that contain information required for the execution of the RAG pipeline are included here.
atria-rag-generate-db
- atria-rag-generate-db is in charge of feeding the databases the RAG works with.
- The available projects that contain the data required for reading the information sources and feed the databases are included here.

Configuration

atria-model-gateway includes a default configuration. Constructors can use it as is or they can modify it to be adapted to their requirements or business models: Go to document ATRIA configuration.

Docs:

Mon, 01 Jan 0001 00:00:00 +0000

ATRIA RAG Generate DB operational overview

Overview of the atria-rag-generate-db operation

Operational flow

The operational flow between an application (for the communication with aura-gateway-api), atria-model-gateway, atria-rag-server and atria-rag-generate-db is schematically shown in the document atria-model-gateway: operational flow.

Configuration

Data persistence feature

Now ATRIA enables data persistence in knowledge bases across releases: After the installation of a new release, all existing data in the knowledge base (currently, Qdrant) remains fully available and accessible for every ATRIA experience. Thus, information is completely independent of the deployed version.

This feature provides key advantages:

Guaranteed continuity of ATRIA experiences.
No need for data re-ingestion after each release.
No need to recalculate embeddings.
Data ingested after the installation of a release (through hot swapping) is now automatically consolidated and carried forward to subsequent releases.

Tracking and clean-up processes

atria-rag-generate-db keeps a record of the current state of documents and related configuration for data sources, so it only feeds documents that have been modified or added since the last update.

atria-rag-generate-db also cleans up any resources that are left behind and no longer used after new ones are introduced.

Preset management

Preset report

After generation-db is executed, a report is logged with the following information for each preset:

The preset name.
The status of the execution (success, skipped or error).
A descriptive message with the reason for the status.
Date and time of the execution start.
Date and time of the execution end.
The configured documents for the preset

Preset availability

When a new preset is created, it is necessary to launch the database generation process by executing the atria-rag-generate-db component. This process may take several minutes to complete. Once the generation is finished, the atria-rag-generate-db component is automatically restarted.

While these processes are running, a message is shown to the user indicating that the preset is not yet available.

When both processes are finished, the preset becomes available for use.

Data migration between ATRIA releases

The data persistence feature is implemented by a migration tool between environments or releases integrated in the atria-rag-generate-db component. This tool moves the trained data from one release to another, to avoid generating preset data that has been previously created in a release.

The process for migrating data must be triggered manually by launching a command (similar to the aura-rag-generate-db job), where both source and target environments should be indicated.

After executing this command, data will be migrated from one environment to the other automatically.

The migration flow is executed as follows:

Process the hashes file and, for each preset we want to migrate, we will do the following steps:
- Check that the preset from the source environment is in the config of the target environment
- Move the trained_data files from the source environment to the respective training folder of the target environment
- Duplicate the collections from the source environment to the target environment
- Move the TFIDFs files from the source environment to the target environment
Move the hashes file from the source environment to the target environment
Add the new presets training files to the respective training folder in the target environment.
Launch atria-rag-generate-db. Only new presets will be reloaded.

Data migration flow

In the migration process described above, the following folders are generated and stored in an Azure blob storage after atria-rag-generate-db is finished:

Shared data

This folder contains the trained data shared between atria-rag-server and atria-rag-generate-db. This is used to store the files that the atria-rag-generate-db generates and then the atria-rag-server uses to be able to process the request.

At the moment, only the files generated by the TFIDF (Term Frequency–Inverse Document Frequency) exist in this folder.

This folder is used for migration, as we can take the TFIDFs of a trained preset to the blob of a specific release where that preset has not been trained and save the training afterward.

Trained data

This folder contains the files that have been used in the atria-rag-generate-db for each preset.

The folder structure is defined with a hash of the contents of all the files for each preset, to facilitate migration.

Atria RAG project hashes

This is a file containing all the information for each preset, to facilitate migration.

It contains the following information for each preset:

config_hash: Hash of the preset configuration at the time the atria-rag-generate-db was launched.
source_files_hash: Hash of the source files used to generate the preset. This hash should exist in one folder into the trained data folder.
metadata: Metadata of the preset, including the date of atria-rag-generate-db launching.

retrievers: Info that retrievers used to generate the preset. It contains the name of the Qdrant collection and the path where it holds the TFIDF files, which would correspond to the shared data.

{
  "5905dece-433d-47f4-a78c-72366bcd1473": {
    "config_hash": "28f837d56079f30c59a419292d129bc3",
    "source_files_hash": "cda3afcd8e74ede0d23065e897d55fae",
    "metadata": {
      "date": "2025-04-01 11:25:59"
    },
    "retrievers": {
      "qdrant_collection_name": "rag-ap-eight-9100-dev-project-copilot",
      "tfidf_path_file": "project-copilot/tfidf"
    }
  }
}

In addition to using this data for migration, it also speeds up the launch of the atria-rag-generate-db.

The config_hash and source_files_hash values are used to verify if, at the moment of launching the atria-rag-generate-db, something has been changed in the configuration or in the training data. If changes are detected, all the data for that preset is regenerated. Otherwise, if the preset has not changed, we will save that generation.

Launch migration process

The process to persist data between releases has to be launched manually through the execution of the following command: To run this script, we just need the output files with the environment configuration info generated by the installer in the output_install directory from the source and destination environment. With this info, run the script as shown below, using the corresponding files names for the desired environment:

  ./migrate-data --source-file ${SOURCE_ENVIRONMENT_INFO_FILE} --dest-file ${DEST_ENVIRONMENT_INFO_FILE}

Where:

source-file: Source environment info file where the data is stored.
dest-file: Target environment info file where the data is going to be migrated.

Docs:

Mon, 01 Jan 0001 00:00:00 +0000

Atria Model Gateway API definition

Description of Atria Model Gateway configuration API swagger

This is an internal ATRIA API

Download swagger file

Docs:

Mon, 01 Jan 0001 00:00:00 +0000

ATRIA Model Gateway

Descriptive documentation regarding the ATRIA component atria-model-gateway

Introduction

atria-model-gateway is an ATRIA component in charge of managing the communication with different AI models.

Currently, this component receives a request from aura-gateway-api, together with other input data, and makes a call to the LLM/LMM Experiences Builder and use its capabilities.

atria-model-gateway is also in charge of security and privacy control and allows users to provide feedback on their experience.

The functional components of atria-model-gateway are described in the document LLM/LMM Experiences Builder

Associated documentation

Descriptive technical documentation regarding atria-model-gateway includes:

Docs:

Mon, 01 Jan 0001 00:00:00 +0000

Atria Model Gateway metrics

List of metrics available in atria-model-gateway

http_request_duration_seconds

This metric is intended to store the information related to all the incoming HTTP requests received by atria-model-gateway.

It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

This metric allows measuring the behavior of the requests from any given endpoint. Specifically, the duration since the request lands in atria-model-gateway until its HTTP response is returned:

The number of requests during a time
The average/min/max duration of these requests

Labels:

method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
path: specific endpoint of the request
status_code: HTTP status code returned in the response
application: application name that is using the model

outgoing_request_duration_seconds

This metric is intended to store the information related to all the outgoing HTTP requests made by atria-model-gateway. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its duration.

The metric allows measuring the behavior of the requests to any given endpoint:

The number of requests during a time
The average/min/max duration of these requests

Labels:

method: HTTP method used by the request being stored (GET, POST, PUT, DELETE, etc.)
host: host and domain where the request is being sent
path: specific endpoint of the request
status: HTTP status code returned in the response

generative_tokens

This metric is intended to store the information related to tokens used by OpenAI in atria-rag-server. It is stored as a Summary in Prometheus, so every sample, besides the defined labels, also includes its tokens usages.

The metric allows measuring the behavior of the tokens using any given OpenAI model:

The number of tokens during a time
The average/min/max tokens of these requests

Labels:

application: application name that is using the model
deployment_model_name: name of the deployment model
model_type: identifier of the model

Docs:

Mon, 01 Jan 0001 00:00:00 +0000

ATRIA Model Gateway environment variables

List of environment variables handled by the atria-model-gateway

Introduction

The atria-model-gateway depends on these environment variables to be set. None of them are modifiable by the OBs.

Property	Type	Description
AURA_REDIS_DATABASE	number	Redis database number to be used by the server. This number is used to connect to the Redis database.
AURA_REDIS_HOSTS	string	Redis hosts to be used by the server. This is a comma-separated list of Redis host names or IP addresses.
AURA_REDIS_MODE	string	Mode of the Redis connection.
AURA_REDIS_PASSWORD	number	Ppassword for the Redis connection. This password is used to authenticate the connection to the Redis database.
AURA_REDIS_POOL_SIZE	number	Size of the Redis connection pool. This number is used to limit the number of connections to the Redis database.
AURA_REDIS_PREFIX_SUBSCRIBERS	string	Prefix for the Redis subscribers. This prefix is used to identify the subscribers in the Redis database.
AURA_REDIS_CHANNELS_SUBSCRIBERS	string	Channels (separated by ‘,’) for the Redis subscribers. Redis prefix is added to these channels at the beginning.
AURA_REDIS_P_CHANNELS_SUBSCRIBERS	string	Pattern channels (separated by ‘,’) for the Redis subscribers. Redis prefix is added to these pattern channels at the beginning.
AURA_DAPR_PUBSUB_NAME	string	DAPR pubsub component name. It is used to identify DAPR component to be used.
AURA_DAPR_PREFIX_SUBSCRIBERS	string	Prefix for the DAPR pubsub subscribers. This prefix is used to identify the subscribers in the configured database.
AURA_DAPR_TOPICS_SUBSCRIBERS	string	Topics (separated by ‘,’) for the DAPR pubsub subscribers. DAPR prefix is added to these topics at the beginning.

The environment variables related to Redis are used to connect to the Redis database. This database is used to refresh the agent’s configuration, because every time the configuration of an agent is changed, it publishes this change in the corresponding channel so we can detect this change in order to refresh it.
The environment variables related to DAPR are used to connect to the DAPR components. These components are used to refresh the agent’s configuration, because every time the configuration of an agent is changed, it publishes this change in a corresponding topic, so we can detect this change in order to refresh it.
In order to use DAPR, it is necessary to have the DAPR pubsub component configured in the DAPR configuration file, besides the DAPR environment variables.

Docs:

Mon, 01 Jan 0001 00:00:00 +0000

atria-model-gateway error management

This document includes the different errors returned by atria-model-gateway

Error descriptions

InvalidModelParam

description: One or more parameters provided for the model are invalid. Please verify the parameter names and values.
message: Specific and descriptive text of the error.
http status: 400

ModelFilterContent

description: The response was filtered due to the prompt triggering Azure OpenAI’s content management policy.
message: Specific and descriptive text of the error.
http status: 400

InvalidPrompt

description: The prompt format is incorrect or contains unsupported characters. Ensure it is a valid string and is adhered to formatting guidelines.
message: Specific and descriptive text of the error.
http status: 400

ModelNotFound

description: The specified model does not exist or is not available for you.
message: Specific and descriptive text of the error.
http status: 400

ContextLengthExceeded

description: The message sent (including prompt + previous messages) exceeds the token limit of the model. Reduce the size of the prompt or the conversation history.
message: Specific and descriptive text of the error.
http status: 400

InjectionAttempt

description: Injection attempt detected. The request appears to contain input designed to manipulate the system’s behavior. This request has been blocked for security reasons.
message: Specific and descriptive text of the error.
http status: 400

InternalError

description: Incoming HTTP request produces an internal error.
message: Specific and descriptive text of the error.
http status: 500

Unauthorized

description: Incoming HTTP request authorization is not valid.
message: Specific and descriptive text of the error.
http status: 500

RequestTimeout

description: The server has decided to close the connection rather than continue waiting. In the headers, the field retry-after is included, which is the waiting time for retrying again.
message: Specific and descriptive text of the error.
http status: 500

QuotaError

description: Incoming HTTP request needs more quota. In the headers, the field retry-after is included, which is the waiting time for retrying again.
message: Specific and descriptive text of the error.
http status: 500

AuraError

description: Generic error.
message: Specific and descriptive text of the error.
http status: 500