ATRIA components default configuration

Description of the default configuration (internal configuration) for ATRIA components

Introduction

The default configuration of ATRIA corresponds to the server configuration, that is, the internal configuration for ATRIA components.

Within a specific configuration type, parameters are organized by component:

  • Fields for atria-model-gateway configuration
  • Fields for atria-rag-server
  • Common fields for both components

1. Server configuration

Fields related to the internal configuration of ATRIA components

Target users: ATRIA development and installation teams

The default server configuration fields are non-modifiable by ATRIA constructors (excepting prompts)

1.1. Logging configuration

Configuration field shared between atria-model-gateway and atria-rag-server that enables the configuration of logs in a customizable and independent way

The logging configuration is done through a json configuration file that is set by default, as shown below.

{
  "version": 1,
  "disable_existing_loggers": false,
  "logging": {
    "handlers": {
      "hdl2": {
        "class": "logging.StreamHandler",
        "formatter": "json",
        "level": <AUTOCOMPLETED>
      }
    },
    "loggers": {
      "atria_model_gw": {
         "level": <AUTOCOMPLETED>,
         "handlers":[
            "hdl2"
         ],
         "filters":[],
         "propagate": false
      }
    },
    "root": {
      "level": <AUTOCOMPLETED>,
      "handlers": []
    }
  }
}

Fields

The main fields are explained below. However, for more details, developers are kindly requested to read the General Python logging documentation

Parameter Subparameters Definition Type/Default values
version Version of the logging configuration number
disable_existing_loggers Boolean value to indicate whether or not the already existing loggers when this call is made are disabled or not boolean
handlers Dictionary with different logging handlers. Each key is the name of a handler
class It is configured with Python logging handlers (See Python documentation)
formatter It configures the format of logs. json, string, console, simple
level Level of the logging event. It must be filled with the labels INFO, ERROR, WARN or DEBUG
loggers Python dictionary in which each key is a logger name and each value is a dictionary describing how to configure the corresponding logger instance
level (Optional) Level of the logger.
handlers (Optional) List with the IDs of the handlers for this logger
filters (Optional) List with the IDs of the filters for this logger
root Configuration for the root logger.
level (Optional) Level of the logger.
handlers (Optional) List with the IDs of the handlers for this logger

1.2. atria-model-gateway default configuration

This section includes the parameters configured by default in atria-model-gateway:

Defaults

General-purpose field with parameters to define the behavior of atria-model-gateway

Defaults fields
Parameter Subparameters Definition Type/Default values
session_params (Optional) Default values for a session object
window (Optional) Session window number
timeout (Optional) Session expiration time number
service_params (Optional) Default values for the server object
preflight_max_age (Optional) Preflight max age number
messages (Optional) Message options object
types (Optional) Types of messages. list[string]
openai_proxy Activate OpenAI proxy boolean
trimmer (Optional) Expression to trim the response string

If the timeout is 0, the last conversation in the session will not be saved, but the session history will be used.

Defaults by default

The default configuration is described as follows:

defaults:
  # Default values for a session
  session_params:
    window: 2
    timeout: 3600

  # Default values for the server
  service_params:
    preflight_max_age: 86400

  # Message options
  messages:
    types:
      - feedback

  # Activate openai proxy
  openai_proxy: false

Redis

This section includes the Redis connection configuration for atria-model-gateway.

Redis fields
Parameter Definition Type/Default values
connection_mode (Mandatory) Connection mode single, sentinel, cluster
pool_size (Mandatory) Pool size number
database (Mandatory) Database number
password (Mandatory) Password string
uri (Mandatory) URI name string
prefix (Mandatory) Prefix string
sleep_time (Optional) Sleep time number
max_retries (Optional) Maximum number of retries number
Redis by default

The default configuration for Redis is described as follows:

redis:
  connection_mode: <AUTOCOMPLETED>
  pool_size: 100
  database: <AUTOCOMPLETED>
  password: <AUTOCOMPLETED>
  uri: <AUTOCOMPLETED>
  prefix: <AUTOCOMPLETED>

Redis Subscriber

This section includes the Redis event subscriber connection configuration for atria-model-gateway.

Redis subscriber fields
Parameter Definition Type/Default values
connection_mode (Mandatory) Connection mode single, sentinel, cluster
pool_size (Mandatory) Pool size number
database (Mandatory) Database number
password (Mandatory) Password string
uri (Mandatory) URI name string
prefix (Mandatory) Prefix string
sleep_time (Optional) Sleep time number
max_retries (Optional) Maximum number of retries number
channels List of channels to subscribe to list[string]
Redis subscriber by default

The default configuration for Redis is described as follows:

redis_subscriber:
  connection_mode: <AUTOCOMPLETED>
  pool_size: 100
  database: <AUTOCOMPLETED>
  password: <AUTOCOMPLETED>
  uri: <AUTOCOMPLETED>
  prefix: <AUTOCOMPLETED>
  channels:
    - "ApplicationConfiguration"
    - "PresetConfiguration"

Config API

Field with parameters for the API configuration for atria-model-gateway

Config API fields
Parameter Definition Type/Default values
base_url (Mandatory) API config URL string
api_key (Mandatory) APIKey string
Config API by default

The default configuration is described as follows:

aura_config_api:
  base_url: <AUTOCOMPLETED>
  api_key:  <AUTOCOMPLETED>

Allow logging prompts with INFO level

Field to allow logging prompt with INFO level for atria-model-gateway. It should only be used for debugging errors in environments where there are no debug logs. Due to the size of the prompts, this variable should be set to false once it is not needed.

Allow logging prompts
Parameter Definition Type/Default values
allow_log_prompts Allow logging prompts boolean
Allow logging prompts by default

The default configuration is described as follows:

allow_log_prompts: false

Models

Predefined AI models included in atria-model-gateway by default.

The model(s) to be used must be selected when configuring an application.

Model fields
Parameter Subparameters Definition Type/Default values
type (Mandatory) Identifier type of model rag, openai, mock, perplexity
name (Optional) Model name. If this value does not exist, id is used string
class_params (Mandatory) Preset description object
endpoint (Mandatory) Endpoint of the model string
type (Mandatory for RAG) Type of the model langchain
path (Mandatory for RAG) Path of endpoint model string
azure_name (Mandatory for OpenAI) Azure name of the model string
model_name (Mandatory for OpenAI) Model name string
api_key (Mandatory for OpenAI) APIkey to be used in the model call string
api_version (Mandatory for OpenAI) API version to be used in the model call string
output (Mandatory for mocks) Response to be used in the model call string
description_params (Optional) Description of the model params object
context_window (Optional) Context window of model number
tokenizer (Optional) Tokenizer of model string
Models by default
atria-rag model

Model for using the atria-rag-server.

The default configuration is described as follows:

  atria-rag:
    type: rag
    name: Rag server model
    class_params:
      type: langchain
      endpoint: <AUTOCOMPLETED>
      path: <AUTOCOMPLETED>
gpt-4

Model for using Azure OpenAI GPT-4 model.

The default configuration is described as follows:

      gpt-4:
        type: openai
        local: false
        class_params:
          azure_name: deployment_gpt-4
          model_name: gpt-4
          api_key: <AUTOCOMPLETED>
          endpoint: <AUTOCOMPLETED>
          api_version: <AUTOCOMPLETED>
          timeout:
             timeout: 60
             read: 60
        description_params:
          context_window: 300
gpt-4o

Model for using Azure OpenAI GPT-4o model.

The default configuration is described as follows:

      gpt-4o:
        type: openai
        local: false
        class_params:
          azure_name: deployment_gpt-4o
          model_name: gpt-4o
          api_key: <AUTOCOMPLETED>
          endpoint: <AUTOCOMPLETED>
          api_version: <AUTOCOMPLETED>
          timeout:
            timeout: 60
            read: 60
          description_params:
            context_window: 128000
gpt-4o-mini

Model for using Azure OpenAI GPT-4o-mini model.

The default configuration is described as follows:

      gpt-4o-mini:
        type: openai
        local: false
        class_params:
          azure_name: deployment_gpt-4o-mini
          model_name: gpt-4o-mini
          api_key: <AUTOCOMPLETED>
          endpoint: <AUTOCOMPLETED>
          api_version: <AUTOCOMPLETED>
          timeout:
            timeout: 60
            read: 60
          description_params:
            context_window: 128000
o3-mini

Model for using Azure OpenAI o3-mini model.

The default configuration is described as follows:

      o3-mini:
        type: openai
        local: false
        class_params:
          azure_name: deployment_o3-mini
          model_name: o3-mini
          api_key: <AUTOCOMPLETED>
          endpoint: <AUTOCOMPLETED>
          api_version: <AUTOCOMPLETED>
          timeout:
            timeout: 60
            read: 60
          description_params:
            context_window: 128000
gpt-4.1-nano

Model for using Azure OpenAI gpt-4.1-nano model.

gpt-4.1-nano:
  type: openai
  local: false
  class_params:
    azure_name: deployment_gpt-4.1-nano
    model_name: gpt-4.1-nano
    api_key: <AUTOCOMPLETED>
    endpoint: <AUTOCOMPLETED>
    api_version: <AUTOCOMPLETED>
    timeout:
      timeout: 60
      read: 60
    description_params:
      context_window: 128000
perplexity-sonar

This model will be available in ATRIA in upcoming releases. Model for using Perplexity sonar model.

The default configuration is described as follows:

perplexity-sonar:
 type: perplexity
 local: false
 class_params:
   model_name: sonar
   api_key: <AUTOCOMPLETED>
   endpoint: <AUTOCOMPLETED>
   timeout:
     timeout: 20
     read: 45
   http_raise_when_retry_limit_exceeded_recognizer: false
 description_params:
   context_window: 300

Important: This model does not support the same parameters as the previous ones. Check Microsoft document API & feature support.
The following parameters are not supported by the model: temperature, top_p, presence_penalty, frequency_penalty, logprobs, top_logprobs, logit_bias, max_tokens.

1.3. atria-rag-server default configuration

This section includes the parameters configured by default in atria-rag-server:

LLMs

Predefined parameter to define the Large Language Models (LLMs) that call from atria-model-gateway to atria-rag-server.

Currently, only one LLM with the necessary configuration to connect atria-model-gateway to atria-rag-server is defined. It cannot be modified.

LLMs fields
Parameter Subparameters Definition Type/Default values
name (Optional) LLM name. If this value does not exist, id is used string
model_type (Mandatory) Model type string
endpoint (Mandatory) Endpoint of the model string
LLm by default

atria-model-gateway:

  atria_model_gateway:
    name: Local Model Gateway
    model_type: llm_manager
    endpoint: http://atria-model-gw:6391/aura-services/v1/atria-model-gw

Embeddings

Parameters to define the embeddings, vector representations to find text blocks that contain the information to resolve the input request.

Two types of Embeddings are available for use:

  • Local Embeddings: Generated by the atria-rag-server in local mode.
  • Embeddings OpenAI: Generated by OpenAI.
Embeddings fields
Parameter Subparameters Definition Type/Default values
name (Mandatory) Embedding name string
type (Mandatory) LLM name. Type of the model sentence_transformer, azure_openai
model (Mandatory) Used model string
openai_api_version (Mandatory to call Azure OpenAI) OpenAI API version string
openai_api_type (Mandatory to call Azure OpenAI) OpenAI API type string
openai_api_key (Mandatory to call Azure OpenAI) OpenAI APIKey string
azure_endpoint (Mandatory to call Azure OpenAI) Azure endpoint string
Embeddings by default

The predefined embeddings in atria-rag-server are shown below:

Local Sentence Transformer from HuggingFace:

This is an open-source model that appears in sentence-transformers library.

It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for several tasks like:

  • Clustering
  • Multilingual similarity searches
  • Retrieval-based tasks
  • Classification

A brief characterization of this embedding regarding different parameters is included below:

  • Cost: Free to use once downloaded (local execution). No API call costs.
  • Latency: Low, since it runs locally without external API calls.
  • Performance: Satisfactory for general-purpose sentence embeddings, supporting multiple languages.
  • Vector Length: 384 dimensions (smaller than OpenAI’s ADA model).
  • Hardware Requirements: Needs a GPU for faster inference; otherwise, it can be slow on a CPU.
  • Model Size: Requires local storage (~120MB).
  • Quality: Slightly lower accuracy than larger models, especially for complex NLP tasks.

This embedding can be configured with a yaml file:

local_st:
    name: Local Sentence Transformer from HuggingFace
    type: sentence_transformer
    model: paraphrase-multilingual-MiniLM-L12-v2

Distilbert-based Local Sentence Transformer from HuggingFace

This is an open-source model that appears in sentence-transformers library.

It has been trained on 215M (question, answer) pairs from diverse sources.

It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for several tasks like:

  • Semantic search
  • Question answering
  • Passage retrieval

A brief characterization of this embedding regarding different parameters is included below:

  • Cost: Free (local execution). No API call costs.
  • Latency: Fast, optimized for question-answer retrieval tasks.
  • Performance: Outperforms MiniLM in retrieval-based tasks due to DistilBERT’s training on QA data.
  • Vector Length: 768 dimensions (higher than MiniLM, better at capturing semantics).
  • Hardware Requirements: Similar to MiniLM, requires a GPU for optimal performance.
  • Model Size: Larger than MiniLM (~250MB).
  • Quality: Primarily trained for English, not as strong for multilingual applications.

This embedding can be configured with a yaml file:

test_distilbert:
    name: Distilbert-based Local Sentence Transformer from HF
    type: sentence_transformer
    model: multi-qa-distilbert-cos-v1

OpenAI Embeddings ADA

This is one of OpenAI’s latest models for generating embeddings and has quickly become a top choice for tasks:

  • Recommendation systems
  • Chatbots
  • Semantic search
  • Large-scale applications

A brief characterization of this embedding regarding different parameters is included below:

  • Cost: Paid API model (depends on token usage, $0.0001/1k Tokens). It can be expensive for high-volume applications.
  • Latency: API calls introduce certain delay, specially in large-scale real-time applications.
  • Performance: State-of-the-art embeddings with high accuracy for a wide range of NLP tasks.
  • Hardware Requirements: No local hardware requirements, it works via API.
  • Vector Length: 1536 dimensions (rich semantic representation).
  • Quality: Strong performance across multiple languages.

This embedding can be configured with a yaml file:

text-embedding-ada-002:
  name: text-embedding-ada-002 model from Azure OpenAI API
  type: azure_openai
  model: deployment_text-embedding-ada-002
  openai_api_version: <AUTOCOMPLETED>
  openai_api_type: azure
  openai_api_key: <AUTOCOMPLETED>
  azure_endpoint: <AUTOCOMPLETED>

Redis Subscriber

This section includes the Redis event subscriber connection configuration for the atria-rag-server.

Redis subscriber fields
Parameter Definition Type/Default values
connection_mode (Mandatory) Connection mode single, sentinel, cluster
pool_size (Mandatory) Pool size number
database (Mandatory) Database number
password (Mandatory) Password string
uri (Mandatory) URI name string
prefix (Mandatory) Prefix string
sleep_time (Optional) Sleep time number
max_retries (Optional) Maximum number of retries number
channels List of channels to subscribe to list[string]
Redis subscriber by default

The default configuration for Redis is described as follows:

redis_subscriber:
  connection_mode: <AUTOCOMPLETED>
  pool_size: 100
  database: <AUTOCOMPLETED>
  password: <AUTOCOMPLETED>
  uri: <AUTOCOMPLETED>
  prefix: <AUTOCOMPLETED>
  channels:
    - "PresetConfiguration"

Prompts

A prompt is defined as an input instruction given to an AI model to generate a response. It guides the AI in the required kind of output.

A prompt by default is defined in ATRIA for different RAG stages. This can be used when a specific prompt is not defined in the preset.

Prompts structure for RAG

The hierarchy of default prompts in RAG stages is shown below:

prompts  
 |___ <stage>
        |___ default
        |       |___ text
        |       |___ args
        |___ <language>
                |___ text
                |___ args
  • The first level in the prompts configuration are the stages of the RAG process. Each stage has its own configuration and purpose.

  • Prompts configuration works at language level, so it is possible to have different prompts for different languages, indicated by the language code:

    • <language>: Any language prompt configuration (ISO 639-1 Code)
    • default: Default prompt configuration (in a specific language)
  • For each language, the prompts structure must include the fields text and args:

    • text: This field contains the text of the prompt that will be sent to the language model. It includes placeholders (e.g., {query}, {target_language}) that are mandatory for the prompt to work. These placeholders will be dynamically replaced with the specific values when the prompt is executed.
    • args: Optional field that contains a dictionary of arguments that will be used to replace the placeholders in the text field.
Default prompts in RAG stages

The following stages are currently defined in RAG:

cleanStg

This stage is responsible for cleaning the user query. It ensures that the query is in a proper format before further processing.

See how to include this stage in the default prompt code here

translationStg

This stage handles the translation of the user query into the target language, if necessary.

See how to include this stage in the default prompt code here

contextStg

This stage determines the context of the user query, ensuring it is aligned with the previous conversation or context.

Default prompts in this stage:

  • sameContext: Configuration to check if the query is in the same context.
  • recreatedQuestion: Configuration to rewrite the original question. It is composed of following prompts:
    • default: Configuration for rewriting the original question.
    • system: System prompt configuration.
    • human: Human prompt configuration.
  • system: System prompt configuration.
  • human: Human prompt configuration.
  • order: Array of strings with prompts names sorted.

See how to include this stage in the default prompt code here

postFilteringStg

This stage filters the retrieved documents or data to ensure relevance to the user query.

Default prompts in this stage:

  • relevantDocument: Configuration to check if the document is relevant.
  • relevantSql: Configuration to check if the SQL data is relevant.

See how to include this stage in the default prompt code here

generativeStg

This stage generates the final response using the retrieved and filtered data.

Default prompts in this stage:

  • stuff: Configuration for the “stuff” strategy. It is composed of the following sub-stages:
    • default: Configuration for the “stuff” strategy.
    • system: System prompt configuration.
    • human: Human prompt configuration.
  • notAnswerResponse: Configuration for responses when the question cannot be answered.
  • informationExtraction: Configuration for extracting information. It is composed of following prompts:
    • human1: Human prompt configuration.
    • ia: IA prompt configuration.
    • human: Human prompt configuration.
  • responseConsolidation: Configuration for consolidating the response.
  • sqlPrompt: Configuration for generating SQL query statements.

See how to include this stage in the default prompt code here

RAG default prompt

The current section includes the prompt defined by default for ATRIA RAG capability.

You can also access the yaml file in the Github repository.

In case of any discrepancy between the content of this document and that on GitHub, the GitHub version shall always be considered the most up-to-date

RAG default prompt
prompts:
  cleanStg:
    es:
      text: |
        A continuación hay una consulta del usuario.
        Por favor, limpie la consulta y responda solo con la pregunta del usuario o alguna charla informal.
        -------
        {query}        
    default:
      text:
        A user query follows.
        Please clean the query and respond with just the user question or small talk. The query must be written in English.
        -------
        {query}
  translationStg:
    default:
      text: |
        Translate the following question to {target_language}: {question}

        Instructions:
        1. Maintain the formal tone of the original text.
        2. Do not translate proper names and specific terms (e.g., company names, product names, countries).
        3. Provide the translation in the same format and structure as the original text.

        Translated Text:
        Finally, return the result as a unique JSON object, with the following structure:

        ```
        {{
            "source_languge": The original question language,
            "target_language": The target language,
            "translation": The translation of the question to the target_language. ,
            "possible": true|false,
            "reason": The reason why it is possible or not possible to translate the question.
        }}
        ```        
  contextStg:
    sameContext:
      default:
        text: |
          Below is a conversation followed by a question. You must determine if the question corresponds to the same context as the conversation or if it is from a different context.
          Respond only with: [SAME CONTEXT] o [DIFFERENT CONTEXT]

          Conversation:
          {memory}

          Question:
          {query}          
      es:
        text: |
          A continuación hay una conversación y seguidamente una pregunta. Debes responder si la pregunta corresponde al mismo contexto de la conversación o es una pregunta de un contexto diferente.
          Responde únicamente con: [MISMO CONTEXTO] o [DIFERENTE CONTEXTO]

          Conversación:
          {memory}

          Pregunta:
          {query}          
    recreatedQuestion:
      default:
        default:
          text: |
            Answer with just a new question or the original question.
            Rewrite the original question only if it follows the conversation. Always rewritten question in the same language as the user's question.

            Conversation:
            {memory}

            Original question:
            {query}

            Rewritten question:            
        es:
          text: |
            Responde sólamente con una nueva pregunta.
            Reescribe la pregunta original si es una continuación de la conversación. Utiliza el idioma de la peticion del usuario para rescribir la pregunta.

            Conversación:
            {memory}

            Pregunta original:
            {query}

            Pregunta reescrita:            
      system:
        default:
          text: |
            The user text contains a query, plus the previous conversation turn.
            - If the previous conversation is relevant for the current query, incorporate it into the query and produce a rewritten query
            - else just repeat the current query.

            Always rewrite the question in the same language as the user's question.            
        es:
          text: |
            El texto del usuario contiene una consulta, además del turno anterior de la conversación.

            - Si la conversación anterior es relevante para la consulta actual, incorpórala en la consulta y produce una consulta reescrita.
            - Si no es relevante, simplemente repite la consulta actual.

            Reescribe siempre la consulta en el mismo idioma en que está formulada la consulta del  usuario.            
      human:
        default:
          text: |
            Previous conversation:
            {memory}

            Current query:
            {query}

            Rewritten query:            
        es:
          text: |
            Conversación anterior:
            {memory}

            Consulta actual:
            {query}

            Consulta reescrita:            
      order: ["system", "human"]
  postFilteringStg:
    relevantDocument:
      default:
        text: |
          Below is an excerpt of text followed by a question. You must determine if the excerpt is relevant or irrelevant for answering the question.
          Respond only with: [RELEVANT] o [IGNORABLE]

          Excerpt:
          {extract}

          Question:
          {query}          
      es:
        text: |
          A continuación hay un extracto de texto y seguidamente una pregunta. Debes responder si el extracto es relevante o ignorable para responder la pregunta.
          Responde únicamente con: [RELEVANTE] o [IGNORABLE]

          Extracto:
          {extract}

          Pregunta:
          {query}          
    relevantSql:
      default:
        text: |
          Given the following question:
          `{question}`

          Is it possible to answer, using the data contain in the following table?:
          ```sql
          {sql_table_definition}
          ```


          **Explain briefly, all your decisions**.
          First, identify which tables are necessary to answer the question. Justify why you selected each of these tables.
          Use the following format:
          ```
          I need the following tables to answer the question:
            - <table_name>: <reasoning>
            - <table_name>: <reasoning>
            ...
          ```

          Then, identify which columns are necessary to answer the question. Justify why you selected each of these columns.
          Write the list of columns you identified, and the reasoning after each column, using the following format:
          ```
          I need the following columns to answer the question:
            - <table name>:
              - <column_name>: <reasoning>
              - <column_name>: <reasoning>
              ...
            - <table_name>:
              - <column_name>: <reasoning>
              - <column_name>: <reasoning>
              ...
            ...
          ```

          Then, tell if the tables and columns you identified are enough to answer the question.
          Write the answer using the following format:
          ```
          Possible to answer the question using the former columns:
            - <reasoning>
            - Result: <Yes|No>
          ```

          Then, explain, step by step, how you would write the SQL query to answer the question, using the columns you identified.
           **Use the full qualified names of the columns**. **DO NOT USE THE `JSON_OBJECT` FUNCTION IN THE QUERY**.

          Finally, tell if the question can be answered using this format:

          ```
          {{
              "possible": true|false,
              "reason": The reason why it is possible or not possible to answer the question.
          }}
          ```          
  generativeStg:
    stuff:
      default:
        default:
          text: |
            Use the following context extractions to answer the question at the end.

            Contexto:
            {context}

            If the extracted context do not contain the answer avoid coming up with an answer, and response you do not have information for answering and kindly invite the user to make a new question.

            Question:
            {question}

            Never include information by your own using your own knowledge.
            {extra_prompt}            
        es:
          text: |
            Utilice el siguiente contexto que ha sido extraido  para responder la pregunta del final.

            Contexto:
            {context}

            Usando esta información, responde a la pregunta del usuario.
            Si la información no contiene la respuesta evita firmemente responder, di que desconoces la respuesta e invita educadamente al usuario a que formule una nueva pregunta.

            Pregunta:
            {question}

            Nunca incluyas información utilizando tus propios conocimientos.
            {extra_prompt}            
      system:
        default:
          text: |
            Respond in language {user_query_language}.

            Question:
            {question}            
          args:
            user_query_language: "#.auto.language.user_query"
        es:
          text: |
            Responde en el idioma {user_query_language}.

            Pregunta:
            {question}            
          args:
            user_query_language: "#.auto.language.user_query"
      human:
        default:
          text: |
            You are going to generate an answer for a user question or query.
            To generate the answer, take always into account all the information available in the context provided.

            Context:
            {context}

            Question:
            {question}

            Never include information by your own using your own knowledge.
            {extra_prompt}            
        es:
          text: |
            Vas a generar una respuesta para una pregunta o consulta del usuario.
            Para generar la respuesta, ten siempre en cuenta toda la información disponible en el contexto proporcionado.

            Pregunta:
            {question}

            Contexto:
            {context}

            Nunca incluyas información utilizando tus propios conocimientos.
            {extra_prompt}            
      order: ["system", "human"]
    notAnswerResponse:
      default:
        text: |
          You are a question answering agent. You have tried to answer this question: {query}
          However you do not have information to answer this.
          Please, tell the user that you are not able to answer, apologize and invite the user to make other question.
          Avoid any harmful answer, such as sexual, rude, sexist or racist.
          Respond in language {user_query_language}.

          User question:
          {query}          
        args:
          user_query_language: "#.auto.language.user_query"
      es:
        text: |
          Eres un agente de respuesta a preguntas. Has intentado responder a esta pregunta: {query}
          Sin embargo, no tienes información para responder a esto.
          Por favor, dile al usuario que no puedes responder, discúlpate e invita al usuario a hacer otra pregunta.
          Evita cualquier respuesta dañina, como sexual, grosera, sexista o racista.
          Responde en el idioma {user_query_language}.

          Pregunta del usuario:
          {query}          
        args:
          user_query_language: "#.auto.language.user_query"
    informationExtraction:
      default:
        default:
          text: |
            The original question is this: {question}
            We have provided a previous answer: {existing_answer}
            Only if necessary, refine the answer exclusively with the context below.
            ------------
            {context_str}
            ------------
            Given the new context, refine the original answer to improve the quality of the response.
            If the context is useless, respond with the exact words of the original answer.
            {extra_prompt}            
        es:
          text: |
            La pregunta original es esta: {question}
            Hemos proporcionado una respuesta previa: {existing_answer}
            Sólo si es necesario refina la respuesta exclusivamente con el contexto a continuación.
            ------------
            {context_str}
            ------------
            Dado el nuevo contexto, refina la respuesta original para mejorar la calidad de la respuesta.
            Si el contexto es inútil responde con las mismas palabras de la respuesta original.
            {extra_prompt}            
      human1:
        default:
          text: "{question}"
        es:
          text: "{question}"
      ia:
        default:
          text: "{existing_answer}"
        es:
          text: "{existing_answer}"
      human:
        default:
          text: |
            Refine the existing answer only if necessary, exclusively with the context below.
            ------------
            {context_str}
            ------------
            Given the new context, refine the original answer to improve the quality of the response.
            If the context is useless, respond with the exact words of the original answer.
            {extra_prompt}            
        es:
          text: |
            Refina la respuesta existente, sólo si es necesario, exclusivamente con el contexto a continuación.
            ------------
            {context_str}
            ------------
            Dado el nuevo contexto, refina la respuesta original para mejorar la calidad de la respuesta.
            Si el contexto es inútil responde con las mismas palabras de la respuesta original.
            {extra_prompt}            
      order: ["human1", "ia", "human"]
    responseConsolidation:
      default:
        default:
          text: |
            Below I provide you a context.
            ---------------------
            {context_str}
            ---------------------

            Given exclusively the context, and without using any prior knowledge, respond with a single sentence to the question:
            {question}

            {extra_prompt}            
        es:
          text: |
            A continuación te doy un contexto.
            ---------------------
            {context_str}
            ---------------------

            Dado exclusivamente el contexto, y sin usar ningún conocimiento previo responde con una única frase a la pregunta:
            {question}

            {extra_prompt}            
      system:
        default:
          text: |
            Below I provide you a context.
            ---------------------
            {context_str}
            ---------------------

            Given exclusively the context, and without using any prior knowledge, respond with a single sentence to the question:
            {question}

            {extra_prompt}            
        es:
          text: |
            A continuación te doy un contexto.
            ---------------------
            { context_str }
            ---------------------

            Dado exclusivamente el contexto y sin usar ningún conocimiento previo responde con una única frase a cualquier pregunta.

            { extra_prompt }            
      human:
        default:
          text: "{question}"
        es:
          text: "{question}"
      order: ["system", "human"]
    sqlPrompt:
      default:
        text: |
          Generate a SQL query statement to answer the following question:
          `{question}`

          Use the data contained in the following table, as defined in SQL:
          ```sql
          {sql_table_definition}
          ```

          The following tables, containing auxiliary information, are also available:
          ```sql
          CREATE TABLE D_CBD_Static_Geo_Area_v6 (GEO_AREA_ID VARCHAR, CBD_GEO_AREA_LEVEL1_ID VARCHAR, CBD_GEO_AREA_LEVEL2_ID VARCHAR, CBD_GEO_AREA_LEVEL3_ID VARCHAR, CBD_GEO_AREA_LEVEL4_ID VARCHAR, OB_ALPHA_ID VARCHAR, EXTRACTION_TM VARCHAR);
              COMMENT ON TABLE D_CBD_Static_Geo_Area IS 'Geographical areas. This table contains foreign keys to the different levels of geographical areas. In particular, it contains the foreign keys to these tables: CBD_Static_Geo_Area_Level1, CBD_Static_Geo_Area_Level2, CBD_Static_Geo_Area_Level3, CBD_Static_Geo_Area_Level4. Therefore, this tables is used, via JOIN, to query the geographical information contained in the different levels of geographical areas. For instance, if you have a table T with a field GEO_AREA_ID and you need to check whether this location corresponds to the region of Asturias you will need to look for GEO_AREA_ID in this table, then extract the CBD_GEO_AREA_LEVEL4_ID and query the table CBD_Static_Geo_Area_Level4 to get the name of the region.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area.GEO_AREA_ID IS 'Identifier of the geographical area considered. FORMAT: string containing a numerical code. This field does not contain location names.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area.CBD_GEO_AREA_LEVEL1_ID IS 'Identifier of the geographical area Level 1 (max level of detail: CP or similar). FORMAT: string containing a numerical code. This field does not contain location names.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area.CBD_GEO_AREA_LEVEL2_ID IS 'Identifier of the geographical area Level 2 (City/Town). FORMAT: string containing a numerical code. This field does not contain location names.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area.CBD_GEO_AREA_LEVEL3_ID IS 'Identifier of the geographical area Level 3 (Province). FORMAT: string containing a numerical code. This field does not contain location names.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area.CBD_GEO_AREA_LEVEL4_ID IS 'Identifier of the geographical area Level 4 (State/Region). FORMAT: string containing a numerical code. This field does not contain location names.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area.OB_ALPHA_ID IS 'Alphanumeric Organizational Business ID';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area.EXTRACTION_TM IS 'Date-time of the record';

          CREATE TABLE D_CBD_Static_Geo_Area_Level2_v6 (CBD_GEO_AREA_LEVEL2_ID VARCHAR, GEO_AREA_LEVEL_DES VARCHAR, CBD_GEO_AREA_LEVEL3_ID VARCHAR, LONGITUDE_LON_CO DOUBLE, LATITUDE_LAT_CO DOUBLE, GEO_AREA_ID VARCHAR, GEO_STD_AREA_CD VARCHAR, OB_ALPHA_ID VARCHAR, EXTRACTION_TM VARCHAR);
              COMMENT ON TABLE D_CBD_Static_Geo_Area_Level2 IS 'Geographical area level 2 (State)';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.CBD_GEO_AREA_LEVEL2_ID IS 'Identifier of the geographical area Level 2 (City/Town). FORMAT: string containing a numerical code.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.GEO_AREA_LEVEL_DES IS 'Description associated to the identifier level 2. FORMAT: alphanumeric string containing the name of the city/town.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.CBD_GEO_AREA_LEVEL3_ID IS 'Identifier of the geographical area Level 3 (Province)';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.LONGITUDE_LON_CO IS 'Longitude coordinates (in WGS84) associated with level 2';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.LATITUDE_LAT_CO IS 'Latitude coordinates (in WGS84) associated with level 2';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.GEO_AREA_ID IS 'Identifier of the geographical area considered. FORMAT: string containing a numerical code.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.GEO_STD_AREA_CD IS 'Standard code of the geo area';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.OB_ALPHA_ID IS 'Alphanumeric Organizational Business ID';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.EXTRACTION_TM IS 'Date-time of the record';

          CREATE TABLE D_CBD_Static_Geo_Area_Level3_v6 (CBD_GEO_AREA_LEVEL3_ID VARCHAR, GEO_AREA_LEVEL_DES VARCHAR, CBD_GEO_AREA_LEVEL4_ID VARCHAR, LONGITUDE_LON_CO DOUBLE, LATITUDE_LAT_CO DOUBLE, ISO_3166_2_CD VARCHAR, GEO_AREA_ID VARCHAR, GEO_STD_AREA_CD VARCHAR, OB_ALPHA_ID VARCHAR, EXTRACTION_TM VARCHAR);
              COMMENT ON TABLE D_CBD_Static_Geo_Area_Level3 IS 'Geographical area level 3 (Region)';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.CBD_GEO_AREA_LEVEL3_ID IS 'Identifier of the geographical area Level 3 (Province). FORMAT: string containing a numerical code.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.GEO_AREA_LEVEL_DES IS 'Description associated to the identifier level 3. FORMAT: alphanumeric string containing the name of the province.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.CBD_GEO_AREA_LEVEL4_ID IS 'Identifier of the geographical area Level 4 (State/Region). FORMAT: string containing a numerical code.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.LONGITUDE_LON_CO IS 'Longitude coordinates (in WGS84) associated with level 3';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.LATITUDE_LAT_CO IS 'Latitude coordinates (in WGS84) associated with level 3';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.ISO_3166_2_CD IS 'ISO 3166-2 associated';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.GEO_AREA_ID IS 'Identifier of the geographical area considered. FORMAT: string containing a numerical code.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.GEO_STD_AREA_CD IS 'Standard code of the geo area';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.OB_ALPHA_ID IS 'Alphanumeric Organizational Business ID';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.EXTRACTION_TM IS 'Date-time of the record';

          CREATE TABLE D_CBD_Static_Geo_Area_Level4_v6 (CBD_GEO_AREA_LEVEL4_ID VARCHAR, GEO_AREA_LEVEL_DES VARCHAR, LONGITUDE_LON_CO DOUBLE, LATITUDE_LAT_CO DOUBLE, HASC_1_CD VARCHAR, GEO_AREA_ID VARCHAR, GEO_STD_AREA_CD VARCHAR, OB_ALPHA_ID VARCHAR, EXTRACTION_TM VARCHAR);
              COMMENT ON TABLE D_CBD_Static_Geo_Area_Level4 IS 'Geographical area level 4 (min. Detail)';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.CBD_GEO_AREA_LEVEL4_ID IS 'Identifier of the geographical area Level 4 (State/Region). FORMAT: string containing a numerical code.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.GEO_AREA_LEVEL_DES IS 'Description associated to the identifier level 4. FORMAT: alphanumerical string containing the name of the state/region. EXAMPLE VALUES: ''Asturias'', ''Andaluc\u00eda'', etc.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.LONGITUDE_LON_CO IS 'Longitude coordinates (in WGS84) associated with level 4';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.LATITUDE_LAT_CO IS 'Latitude coordinates (in WGS84) associated with level 4';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.HASC_1_CD IS 'Hierarchical administrative subdivision codes ';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.GEO_AREA_ID IS 'Identifier of the geographical area considered. FORMAT: string containing a numerical code.';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.GEO_STD_AREA_CD IS 'Standard code of the geo area';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.OB_ALPHA_ID IS 'Alphanumeric Organizational Business ID';
              COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.EXTRACTION_TM IS 'Date-time of the record';

          CREATE TABLE D_CBD_Static_Station_Type_v6 (STATION_TYPE_CD VARCHAR, TECH_LEVEL_WEIGHT_QT FLOAT, STATION_TYPE_L2_DES VARCHAR, STATION_TYPE_L1_DES VARCHAR, STATION_TYPE_L2_ORDER_NUM INT, STATION_TYPE_L1_ORDER_NUM INT, STATION_TYPE_ORDER_NUM INT, CONSCIOUS_IND BOOLEAN, EXTRACTION_TM VARCHAR);
              COMMENT ON TABLE D_CBD_Static_Station_Type IS 'Station types';
              COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_CD IS 'Device type';
              COMMENT ON COLUMN D_CBD_Static_Station_Type.TECH_LEVEL_WEIGHT_QT IS 'Associated weight for the technologic level of the home';
              COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_L2_DES IS 'Station type level 2';
              COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_L1_DES IS 'Station type level 1';
              COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_L2_ORDER_NUM IS 'Station type order level 2';
              COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_L1_ORDER_NUM IS 'Station type order level 1';
              COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_ORDER_NUM IS 'Station type order';
              COMMENT ON COLUMN D_CBD_Static_Station_Type.CONSCIOUS_IND IS 'Indicates if the related device type has energy efficiency';
              COMMENT ON COLUMN D_CBD_Static_Station_Type.EXTRACTION_TM IS 'Date-time of the record';

          CREATE TABLE D_Segment_v8 (OPERATOR_ID VARCHAR, SEGMENT_ID VARCHAR, SEGMENT_DES VARCHAR, GBL_SEGMENT_ID VARCHAR, SEGMENT_GROUP_ID VARCHAR, SEGMENT_GROUP_DES VARCHAR, EXTRACTION_TM VARCHAR);
              COMMENT ON TABLE D_Segment IS 'Classifications of the customers, attending to different segmentation criteria, for marketing and management issues, according to OB criteria and its correspondence with the global segment classification';
              COMMENT ON COLUMN D_Segment.OPERATOR_ID IS 'Global Operator Identifier (Operator acting as owner of the information present in the current entity)';
              COMMENT ON COLUMN D_Segment.SEGMENT_ID IS 'Organisational segment of the client, in the OB. FORMAT: Numerical code.';
              COMMENT ON COLUMN D_Segment.SEGMENT_DES IS 'Segment description. This is the actual name of the segment. POSSIBLE VALUES: ''NTT'', ''Residencial'', ''Pymes'', ''Residencial/SC'', ''Autonomos'', ''Operadores'', ''Grandes Clientes'', ''Residencial Prepago'', ''Telefonica'', ''Sin Clasificar'', ''Empresas''';
              COMMENT ON COLUMN D_Segment.GBL_SEGMENT_ID IS 'ID of the global segment classification';
              COMMENT ON COLUMN D_Segment.SEGMENT_GROUP_ID IS 'ID code of the segmentation group';
              COMMENT ON COLUMN D_Segment.SEGMENT_GROUP_DES IS 'Description of the segmentation group. POSSIBLE VALUES: ''0.- OPERADORES'', ''1.- U.N. Empresas'', ''2.-U.N. Gran Público'', ''3.- TELEFONICA'', ''4.- SIN CLASIFICAR''';
              COMMENT ON COLUMN D_Segment.EXTRACTION_TM IS 'Date-time of the record';
          ```

          Some of the former tables contains columns in full-qualified format. For instance, these are some examples of full-qualified columns:
          ```
          record_name.field_name
          TEC_PLAT_REC.DEVICE_ID
          record_name.subrecord_name.field_name
          TEC_PLAT_REC.TEC_PLAT_SUBCOMP_REC.DEVICE_ID
          ...
          ```
          Always use the full-qualified format when referring to columns in the tables. For instance, if you need to use the column 'TEC_PLAT_REC.DEVICE_ID', you should not refer to it as 'DEVICE_ID', but as 'TEC_PLAT_REC.DEVICE_ID'.
          **Explain in detail, step by step, all your decisions**.
          If you need to filter by a higher level geographical such as a region (Comunidad Autónoma) you will need to:
          - join the `GEO_AREA_ID` field of the data table (such as `CBD_HGU_Detail_Daily`) with the `GEO_AREA_ID` field in `D_CBD_Static_Geo_Area` table
          - then join the `CBD_GEO_AREA_LEVEL4_ID` field in the `D_CBD_Static_Geo_Area` with the `CBD_GEO_AREA_LEVEL4_ID` field in the `D_CBD_Static_Geo_Area_Level4` table
          - then compare the `GEO_AREA_LEVEL_DES` field in the `D_CBD_Static_Geo_Area_Level4` table with the name of the region (e.g., 'Cantabria'), since the DESCRIPTION field does contain the actual name of the geographical area.
          **Only perform these joins if explicit filtering or grouping by geographical location is necessary**.

          First, identify which tables are necessary to answer the question. Justify why you selected each of these tables.
          Use the following format:
          ```
          I need the following tables to answer the question:
            - <table_name>: <reasoning>
            - <table_name>: <reasoning>
            ...
          ```
          Then, identify which columns are necessary to answer the question. Justify why you selected each of these columns.
          Write the list of columns you identified, and the reasoning after each column, using the following format:
          ```
          I need the following columns to answer the question:
            - <table name>:
              - <column_name>: <reasoning>
              - <column_name>: <reasoning>
              ...
            - <table_name>:
              - <column_name>: <reasoning>
              - <column_name>: <reasoning>
              ...
            ...
          ```
          Then, tell if the tables and columns you identified are enough to answer the question.
          Write the answer using the following format:
          ```
          Possible to answer the question using the former columns:
            - <reasoning>
            - Result: <Yes|No>
          ```
          Then, explain, step by step, how you would write the SQL query to answer the question, using the columns you identified. **Use the full qualified names of the columns**. **DO NOT USE THE `JSON_OBJECT` FUNCTION IN THE QUERY**.
          Finally, write the SQL query to answer the question, using the columns you identified. **DO NOT USE THE `JSON_OBJECT` FUNCTION IN THE QUERY**.
          Return the result as a unique JSON object, with the following structure:
          {{
              "result": <Write the SQL query here. **MAKE SURE THAT THE STATEMENT `SELECT JSON_OBJECT` is not used in the query and Use the full qualified names of the columns. Generate a valid SQL sentence in a single line without new line characters.**>,
              "status": "OK",
              "reason": <a reasoning explaining the query>
          }}
          If the former table does not contain the necessary data to answer the question, return the following JSON object:
          {{
              "result": null,
              "status": "ERROR",
              "reason": <a reasoning explaining the query>
          }}
          Make sure that the JSON object is correctly formatted, and can be parsed by a JSON parser.          

Injection

Default injection configuration for atria-rag-server. It is used to avoid prompt injection.

Injection fields
Parameter Definition Type/Default values
heuristics Heuristic sentences. Object, where the key is the language and the value is a list of phrases.
Now, by default, the heuristics sentences are defined in the config, the file path is no indicated.
It is important to note that the phrases added here will be also added to those defined in the security stage securityStg of the preset configuration.
object

| max_length | (Mandatory) Maximum length |number |

Injection by default

The default configuration is described as follows:

injection:
  heuristics:
    es: 
      - responde como
      - responda como
      - respondeme como
      - respondame como
    en: 
      - answer like
      - forget everything
      - forget your
  max_length: 200

Service

Defaults service configuration for atria-rag-server.

Service fields
Parameter Definition Type/Default values
host (Mandatory) Host name string
port (Mandatory) Port id number
Service by default

The default configuration is described as follows:

service:
  host: 0.0.0.0
  port: <AUTOCOMPLETED>
  log_level: <AUTOCOMPLETED>

Local Storage

Defaults fields related to the configuration of the local storage for documents

Local Storage fields
Parameter Definition Type/Default values
atria_resources_data_folder (Mandatory) Folder name for data resources string
atria_shared_data_folder (Mandatory) Shared data folder name string
Local Storage by default

The default configuration is described as follows:

local_storage_manager:
  atria_resources_data_folder: "/opt/atria-rag/data"
  atria_shared_data_folder: "/var/atria-rag-data"

Config API

Field with parameters for atria-rag-server API configuration

Config API fields
Parameter Definition Type/Default values
base_url (Mandatory) API Config URL string
api_key (Mandatory) APIKey string
Config API by default

The default configuration is described as follows:

aura_config_api:
  base_url: <AUTOCOMPLETED>
  api_key:  <AUTOCOMPLETED>

Retrievers

Retriever are responsible for storing the information that have been generated in the documents. Each retriever is associated with a database in order to feed or retrieve information from it.

Currently, there are three different retrievers defined in ATRIA:
-qdrant
-tfidf
-elasticsearch

Retriever fields

Each retriever type has defined specific fields, as shown below:

Parameter Subparameters Definition Type/Default values
qdrant host (Mandatory) Host service Qdrant string
port (Mandatory) Port service Qdrant number
prefix (Mandatory) Prefix to collection string
tfidf dump_name (Mandatory) Dump name of service Tfidf string
elasticsearch host (Mandatory) Host service Elasticsearch string
ca_crt (Mandatory) Path certificate Elasticsearch string
username (Mandatory) Username service Elasticsearch string
password (Mandatory) Password service Elasticsearch string
index_name (Mandatory) Index service Elasticsearch string
Retrievers by default

The default configuration is described as follows:

retrievers:
  qdrant:
    host: <AUTOCOMPLETED>
    port: 6333
    prefix: <AUTOCOMPLETED>
  tfidf:
    dump_name: /var/atria-rag-data/tfidf/dump/

Metadata

Parameter related to the configuration of metadata in atria-rag-server

It is used to setup how metadata is used when providing responses. The retrieving operation produces a list of candidates, each of which may provide a dictionary of metadata. The metadata is used to filter the candidates and provide additional information in the response.

Metadata fields
Parameter Subparameters Definition Type/Default values
map filetype (Optional) Type of file, typically used to specify the format string
page_number (Optional) Page number. It could be used to identify particular pages string
group-by (Optional) Group by field names. string
aggregate (Optional) Determines how the values of duplicated fields are consolidated during grouping string
output_filter (Optional) List of fields to be displayed in the metadata List of string
root (Optional) Primary fields that will structure the final output of the metadata processing List of string
Metadata by default

The default configuration for metadata is described as follows:

metadata:
  map:
    filetype: content-type
    page_number: page-number
  group-by: url
  aggregate: page-number
  output_filter:
    - title
    - url
    - content-type
    - page-number
    - _zxcv
  root:
    - title
    - url
    - content-type

Language identification

Parameter related to the configuration of Language Identification in atria-rag-server

It is used to identify the language of the user’s question. The result is a dictionary containing the detected language in ISO 639-3 format and its corresponding conversion.
In addition to language identification, the user’s question is preprocessed at this stage, and special characters that may cause recognition errors are removed. For example, line breaks. In case of error, the default language is returned.

This language identification is calculated through fasttext library.

Language identification fields
Parameter Subparameters Definition Type/Default values
language_default (Optional) Language in ISO 639-3 format (two letters). For example: es string
score_threshold (Optional) Score threshold used to respond in the identified language or in the default language. For example: 0.85 float
model_path (Mandatory) Model path. For example: /opt/atria-fasttext/fasttext_model.bin string
chars_to_clean (Optional) Characters to be cleaned. By default is ['/n'] list of string
Language Identification by default

The default configuration for language identification is described as follows:

language_identification:
  score_threshold: <AUTOCOMPLETED>
  language_default: <AUTOCOMPLETED>
  model_path: "/opt/atria-fasttext/fasttext_model.bin"