Categories:
ATRIA components default configuration
Description of the default configuration (internal configuration) for ATRIA components
Introduction
The default configuration of ATRIA corresponds to the server configuration, that is, the internal configuration for ATRIA components.
Within a specific configuration type, parameters are organized by component:
- Fields for atria-model-gateway configuration
- Fields for atria-rag-server
- Common fields for both components
1. Server configuration
Fields related to the internal configuration of ATRIA components
Target users: ATRIA development and installation teams
The default server configuration fields are non-modifiable by ATRIA constructors (excepting prompts)
1.1. Logging configuration
Configuration field shared between atria-model-gateway and atria-rag-server that enables the configuration of logs in a customizable and independent way
The logging configuration is done through a json configuration file that is set by default, as shown below.
{
"version": 1,
"disable_existing_loggers": false,
"logging": {
"handlers": {
"hdl2": {
"class": "logging.StreamHandler",
"formatter": "json",
"level": <AUTOCOMPLETED>
}
},
"loggers": {
"atria_model_gw": {
"level": <AUTOCOMPLETED>,
"handlers":[
"hdl2"
],
"filters":[],
"propagate": false
}
},
"root": {
"level": <AUTOCOMPLETED>,
"handlers": []
}
}
}
Fields
The main fields are explained below. However, for more details, developers are kindly requested to read the General Python logging documentation
| Parameter | Subparameters | Definition | Type/Default values |
|---|---|---|---|
version |
Version of the logging configuration | number | |
disable_existing_loggers |
Boolean value to indicate whether or not the already existing loggers when this call is made are disabled or not | boolean | |
handlers |
Dictionary with different logging handlers. Each key is the name of a handler | ||
class |
It is configured with Python logging handlers (See Python documentation) | ||
formatter |
It configures the format of logs. | json, string, console, simple |
|
level |
Level of the logging event. It must be filled with the labels | INFO, ERROR, WARN or DEBUG |
|
loggers |
Python dictionary in which each key is a logger name and each value is a dictionary describing how to configure the corresponding logger instance | ||
level |
(Optional) Level of the logger. | ||
handlers |
(Optional) List with the IDs of the handlers for this logger | ||
filters |
(Optional) List with the IDs of the filters for this logger | ||
root |
Configuration for the root logger. | ||
level |
(Optional) Level of the logger. | ||
handlers |
(Optional) List with the IDs of the handlers for this logger |
1.2. atria-model-gateway default configuration
This section includes the parameters configured by default in atria-model-gateway:
Defaults
General-purpose field with parameters to define the behavior of atria-model-gateway
Defaults fields
| Parameter | Subparameters | Definition | Type/Default values |
|---|---|---|---|
session_params |
(Optional) Default values for a session | object | |
window |
(Optional) Session window | number | |
timeout |
(Optional) Session expiration time | number | |
service_params |
(Optional) Default values for the server | object | |
preflight_max_age |
(Optional) Preflight max age | number | |
messages |
(Optional) Message options | object | |
types |
(Optional) Types of messages. | list[string] | |
openai_proxy |
Activate OpenAI proxy | boolean | |
trimmer |
(Optional) Expression to trim the response | string |
If the timeout is 0, the last conversation in the session will not be saved, but the session history will be used.
Defaults by default
The default configuration is described as follows:
defaults:
# Default values for a session
session_params:
window: 2
timeout: 3600
# Default values for the server
service_params:
preflight_max_age: 86400
# Message options
messages:
types:
- feedback
# Activate openai proxy
openai_proxy: false
Redis
This section includes the Redis connection configuration for atria-model-gateway.
Redis fields
| Parameter | Definition | Type/Default values | |
|---|---|---|---|
connection_mode |
(Mandatory) Connection mode | single, sentinel, cluster |
|
pool_size |
(Mandatory) Pool size | number | |
database |
(Mandatory) Database | number | |
password |
(Mandatory) Password | string | |
uri |
(Mandatory) URI name | string | |
prefix |
(Mandatory) Prefix | string | |
sleep_time |
(Optional) Sleep time | number | |
max_retries |
(Optional) Maximum number of retries | number |
Redis by default
The default configuration for Redis is described as follows:
redis:
connection_mode: <AUTOCOMPLETED>
pool_size: 100
database: <AUTOCOMPLETED>
password: <AUTOCOMPLETED>
uri: <AUTOCOMPLETED>
prefix: <AUTOCOMPLETED>
Redis Subscriber
This section includes the Redis event subscriber connection configuration for atria-model-gateway.
Redis subscriber fields
| Parameter | Definition | Type/Default values | |
|---|---|---|---|
connection_mode |
(Mandatory) Connection mode | single, sentinel, cluster |
|
pool_size |
(Mandatory) Pool size | number | |
database |
(Mandatory) Database | number | |
password |
(Mandatory) Password | string | |
uri |
(Mandatory) URI name | string | |
prefix |
(Mandatory) Prefix | string | |
sleep_time |
(Optional) Sleep time | number | |
max_retries |
(Optional) Maximum number of retries | number | |
channels |
List of channels to subscribe to | list[string] |
Redis subscriber by default
The default configuration for Redis is described as follows:
redis_subscriber:
connection_mode: <AUTOCOMPLETED>
pool_size: 100
database: <AUTOCOMPLETED>
password: <AUTOCOMPLETED>
uri: <AUTOCOMPLETED>
prefix: <AUTOCOMPLETED>
channels:
- "ApplicationConfiguration"
- "PresetConfiguration"
Config API
Field with parameters for the API configuration for atria-model-gateway
Config API fields
| Parameter | Definition | Type/Default values | |
|---|---|---|---|
base_url |
(Mandatory) API config URL | string | |
api_key |
(Mandatory) APIKey | string |
Config API by default
The default configuration is described as follows:
aura_config_api:
base_url: <AUTOCOMPLETED>
api_key: <AUTOCOMPLETED>
Allow logging prompts with INFO level
Field to allow logging prompt with INFO level for atria-model-gateway.
It should only be used for debugging errors in environments where there are no debug logs. Due to the size of the prompts, this variable should be set to false once it is not needed.
Allow logging prompts
| Parameter | Definition | Type/Default values | |
|---|---|---|---|
allow_log_prompts |
Allow logging prompts | boolean |
Allow logging prompts by default
The default configuration is described as follows:
allow_log_prompts: false
Models
Predefined AI models included in atria-model-gateway by default.
The model(s) to be used must be selected when configuring an application.
Model fields
| Parameter | Subparameters | Definition | Type/Default values |
|---|---|---|---|
type |
(Mandatory) Identifier type of model | rag, openai, mock, perplexity |
|
name |
(Optional) Model name. If this value does not exist, id is used |
string | |
class_params |
(Mandatory) Preset description | object | |
endpoint |
(Mandatory) Endpoint of the model | string | |
type |
(Mandatory for RAG) Type of the model | langchain |
|
path |
(Mandatory for RAG) Path of endpoint model | string | |
azure_name |
(Mandatory for OpenAI) Azure name of the model | string | |
model_name |
(Mandatory for OpenAI) Model name | string | |
api_key |
(Mandatory for OpenAI) APIkey to be used in the model call | string | |
api_version |
(Mandatory for OpenAI) API version to be used in the model call | string | |
output |
(Mandatory for mocks) Response to be used in the model call | string | |
description_params |
(Optional) Description of the model params | object | |
context_window |
(Optional) Context window of model | number | |
tokenizer |
(Optional) Tokenizer of model | string |
Models by default
atria-rag model
Model for using the atria-rag-server.
The default configuration is described as follows:
atria-rag:
type: rag
name: Rag server model
class_params:
type: langchain
endpoint: <AUTOCOMPLETED>
path: <AUTOCOMPLETED>
gpt-4
Model for using Azure OpenAI GPT-4 model.
The default configuration is described as follows:
gpt-4:
type: openai
local: false
class_params:
azure_name: deployment_gpt-4
model_name: gpt-4
api_key: <AUTOCOMPLETED>
endpoint: <AUTOCOMPLETED>
api_version: <AUTOCOMPLETED>
timeout:
timeout: 60
read: 60
description_params:
context_window: 300
gpt-4o
Model for using Azure OpenAI GPT-4o model.
The default configuration is described as follows:
gpt-4o:
type: openai
local: false
class_params:
azure_name: deployment_gpt-4o
model_name: gpt-4o
api_key: <AUTOCOMPLETED>
endpoint: <AUTOCOMPLETED>
api_version: <AUTOCOMPLETED>
timeout:
timeout: 60
read: 60
description_params:
context_window: 128000
gpt-4o-mini
Model for using Azure OpenAI GPT-4o-mini model.
The default configuration is described as follows:
gpt-4o-mini:
type: openai
local: false
class_params:
azure_name: deployment_gpt-4o-mini
model_name: gpt-4o-mini
api_key: <AUTOCOMPLETED>
endpoint: <AUTOCOMPLETED>
api_version: <AUTOCOMPLETED>
timeout:
timeout: 60
read: 60
description_params:
context_window: 128000
o3-mini
Model for using Azure OpenAI o3-mini model.
The default configuration is described as follows:
o3-mini:
type: openai
local: false
class_params:
azure_name: deployment_o3-mini
model_name: o3-mini
api_key: <AUTOCOMPLETED>
endpoint: <AUTOCOMPLETED>
api_version: <AUTOCOMPLETED>
timeout:
timeout: 60
read: 60
description_params:
context_window: 128000
gpt-4.1-nano
Model for using Azure OpenAI gpt-4.1-nano model.
gpt-4.1-nano:
type: openai
local: false
class_params:
azure_name: deployment_gpt-4.1-nano
model_name: gpt-4.1-nano
api_key: <AUTOCOMPLETED>
endpoint: <AUTOCOMPLETED>
api_version: <AUTOCOMPLETED>
timeout:
timeout: 60
read: 60
description_params:
context_window: 128000
perplexity-sonar
This model will be available in ATRIA in upcoming releases. Model for using Perplexity sonar model.
The default configuration is described as follows:
perplexity-sonar:
type: perplexity
local: false
class_params:
model_name: sonar
api_key: <AUTOCOMPLETED>
endpoint: <AUTOCOMPLETED>
timeout:
timeout: 20
read: 45
http_raise_when_retry_limit_exceeded_recognizer: false
description_params:
context_window: 300
Important: This model does not support the same parameters as the previous ones. Check Microsoft document API & feature support.
The following parameters are not supported by the model: temperature, top_p, presence_penalty, frequency_penalty, logprobs, top_logprobs, logit_bias, max_tokens.
1.3. atria-rag-server default configuration
This section includes the parameters configured by default in atria-rag-server:
LLMs
Predefined parameter to define the Large Language Models (LLMs) that call from atria-model-gateway to atria-rag-server.
Currently, only one LLM with the necessary configuration to connect atria-model-gateway to atria-rag-server is defined. It cannot be modified.
LLMs fields
| Parameter | Subparameters | Definition | Type/Default values |
|---|---|---|---|
name |
(Optional) LLM name. If this value does not exist, id is used |
string | |
model_type |
(Mandatory) Model type | string | |
endpoint |
(Mandatory) Endpoint of the model | string |
LLm by default
atria-model-gateway:
atria_model_gateway:
name: Local Model Gateway
model_type: llm_manager
endpoint: http://atria-model-gw:6391/aura-services/v1/atria-model-gw
Embeddings
Parameters to define the embeddings, vector representations to find text blocks that contain the information to resolve the input request.
Two types of Embeddings are available for use:
- Local Embeddings: Generated by the atria-rag-server in local mode.
- Embeddings OpenAI: Generated by OpenAI.
Embeddings fields
| Parameter | Subparameters | Definition | Type/Default values |
|---|---|---|---|
name |
(Mandatory) Embedding name | string | |
type |
(Mandatory) LLM name. Type of the model | sentence_transformer, azure_openai |
|
model |
(Mandatory) Used model | string | |
openai_api_version |
(Mandatory to call Azure OpenAI) OpenAI API version | string | |
openai_api_type |
(Mandatory to call Azure OpenAI) OpenAI API type | string | |
openai_api_key |
(Mandatory to call Azure OpenAI) OpenAI APIKey | string | |
azure_endpoint |
(Mandatory to call Azure OpenAI) Azure endpoint | string |
Embeddings by default
The predefined embeddings in atria-rag-server are shown below:
Local Sentence Transformer from HuggingFace:
This is an open-source model that appears in sentence-transformers library.
It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for several tasks like:
- Clustering
- Multilingual similarity searches
- Retrieval-based tasks
- Classification
A brief characterization of this embedding regarding different parameters is included below:
- Cost: Free to use once downloaded (local execution). No API call costs.
- Latency: Low, since it runs locally without external API calls.
- Performance: Satisfactory for general-purpose sentence embeddings, supporting multiple languages.
- Vector Length: 384 dimensions (smaller than OpenAI’s ADA model).
- Hardware Requirements: Needs a GPU for faster inference; otherwise, it can be slow on a CPU.
- Model Size: Requires local storage (~120MB).
- Quality: Slightly lower accuracy than larger models, especially for complex NLP tasks.
This embedding can be configured with a yaml file:
local_st:
name: Local Sentence Transformer from HuggingFace
type: sentence_transformer
model: paraphrase-multilingual-MiniLM-L12-v2
Distilbert-based Local Sentence Transformer from HuggingFace
This is an open-source model that appears in sentence-transformers library.
It has been trained on 215M (question, answer) pairs from diverse sources.
It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for several tasks like:
- Semantic search
- Question answering
- Passage retrieval
A brief characterization of this embedding regarding different parameters is included below:
- Cost: Free (local execution). No API call costs.
- Latency: Fast, optimized for question-answer retrieval tasks.
- Performance: Outperforms MiniLM in retrieval-based tasks due to DistilBERT’s training on QA data.
- Vector Length: 768 dimensions (higher than MiniLM, better at capturing semantics).
- Hardware Requirements: Similar to MiniLM, requires a GPU for optimal performance.
- Model Size: Larger than MiniLM (~250MB).
- Quality: Primarily trained for English, not as strong for multilingual applications.
This embedding can be configured with a yaml file:
test_distilbert:
name: Distilbert-based Local Sentence Transformer from HF
type: sentence_transformer
model: multi-qa-distilbert-cos-v1
OpenAI Embeddings ADA
This is one of OpenAI’s latest models for generating embeddings and has quickly become a top choice for tasks:
- Recommendation systems
- Chatbots
- Semantic search
- Large-scale applications
A brief characterization of this embedding regarding different parameters is included below:
- Cost: Paid API model (depends on token usage, $0.0001/1k Tokens). It can be expensive for high-volume applications.
- Latency: API calls introduce certain delay, specially in large-scale real-time applications.
- Performance: State-of-the-art embeddings with high accuracy for a wide range of NLP tasks.
- Hardware Requirements: No local hardware requirements, it works via API.
- Vector Length: 1536 dimensions (rich semantic representation).
- Quality: Strong performance across multiple languages.
This embedding can be configured with a yaml file:
text-embedding-ada-002:
name: text-embedding-ada-002 model from Azure OpenAI API
type: azure_openai
model: deployment_text-embedding-ada-002
openai_api_version: <AUTOCOMPLETED>
openai_api_type: azure
openai_api_key: <AUTOCOMPLETED>
azure_endpoint: <AUTOCOMPLETED>
Redis Subscriber
This section includes the Redis event subscriber connection configuration for the atria-rag-server.
Redis subscriber fields
| Parameter | Definition | Type/Default values | |
|---|---|---|---|
connection_mode |
(Mandatory) Connection mode | single, sentinel, cluster |
|
pool_size |
(Mandatory) Pool size | number | |
database |
(Mandatory) Database | number | |
password |
(Mandatory) Password | string | |
uri |
(Mandatory) URI name | string | |
prefix |
(Mandatory) Prefix | string | |
sleep_time |
(Optional) Sleep time | number | |
max_retries |
(Optional) Maximum number of retries | number | |
channels |
List of channels to subscribe to | list[string] |
Redis subscriber by default
The default configuration for Redis is described as follows:
redis_subscriber:
connection_mode: <AUTOCOMPLETED>
pool_size: 100
database: <AUTOCOMPLETED>
password: <AUTOCOMPLETED>
uri: <AUTOCOMPLETED>
prefix: <AUTOCOMPLETED>
channels:
- "PresetConfiguration"
Prompts
A prompt is defined as an input instruction given to an AI model to generate a response. It guides the AI in the required kind of output.
A prompt by default is defined in ATRIA for different RAG stages. This can be used when a specific prompt is not defined in the preset.
Prompts structure for RAG
The hierarchy of default prompts in RAG stages is shown below:
prompts
|___ <stage>
|___ default
| |___ text
| |___ args
|___ <language>
|___ text
|___ args
-
The first level in the prompts configuration are the stages of the RAG process. Each stage has its own configuration and purpose.
-
Prompts configuration works at language level, so it is possible to have different prompts for different languages, indicated by the language code:
<language>: Any language prompt configuration (ISO 639-1 Code)default: Default prompt configuration (in a specific language)
-
For each language, the prompts structure must include the fields
textandargs:- text: This field contains the text of the prompt that will be sent to the language model. It includes placeholders (e.g., {query}, {target_language}) that are mandatory for the prompt to work. These placeholders will be dynamically replaced with the specific values when the prompt is executed.
- args: Optional field that contains a dictionary of arguments that will be used to replace the placeholders in the
textfield.
Default prompts in RAG stages
The following stages are currently defined in RAG:
cleanStg
This stage is responsible for cleaning the user query. It ensures that the query is in a proper format before further processing.
See how to include this stage in the default prompt code here
translationStg
This stage handles the translation of the user query into the target language, if necessary.
See how to include this stage in the default prompt code here
contextStg
This stage determines the context of the user query, ensuring it is aligned with the previous conversation or context.
Default prompts in this stage:
sameContext: Configuration to check if the query is in the same context.recreatedQuestion: Configuration to rewrite the original question. It is composed of following prompts:default: Configuration for rewriting the original question.system: System prompt configuration.human: Human prompt configuration.
system: System prompt configuration.human: Human prompt configuration.order: Array of strings with prompts names sorted.
See how to include this stage in the default prompt code here
postFilteringStg
This stage filters the retrieved documents or data to ensure relevance to the user query.
Default prompts in this stage:
relevantDocument: Configuration to check if the document is relevant.relevantSql: Configuration to check if the SQL data is relevant.
See how to include this stage in the default prompt code here
generativeStg
This stage generates the final response using the retrieved and filtered data.
Default prompts in this stage:
stuff: Configuration for the “stuff” strategy. It is composed of the following sub-stages:default: Configuration for the “stuff” strategy.system: System prompt configuration.human: Human prompt configuration.
notAnswerResponse: Configuration for responses when the question cannot be answered.informationExtraction: Configuration for extracting information. It is composed of following prompts:human1: Human prompt configuration.ia: IA prompt configuration.human: Human prompt configuration.
responseConsolidation: Configuration for consolidating the response.sqlPrompt: Configuration for generating SQL query statements.
See how to include this stage in the default prompt code here
RAG default prompt
The current section includes the prompt defined by default for ATRIA RAG capability.
You can also access the yaml file in the Github repository.
In case of any discrepancy between the content of this document and that on GitHub, the GitHub version shall always be considered the most up-to-date
RAG default prompt
prompts:
cleanStg:
es:
text: |
A continuación hay una consulta del usuario.
Por favor, limpie la consulta y responda solo con la pregunta del usuario o alguna charla informal.
-------
{query}
default:
text:
A user query follows.
Please clean the query and respond with just the user question or small talk. The query must be written in English.
-------
{query}
translationStg:
default:
text: |
Translate the following question to {target_language}: {question}
Instructions:
1. Maintain the formal tone of the original text.
2. Do not translate proper names and specific terms (e.g., company names, product names, countries).
3. Provide the translation in the same format and structure as the original text.
Translated Text:
Finally, return the result as a unique JSON object, with the following structure:
```
{{
"source_languge": The original question language,
"target_language": The target language,
"translation": The translation of the question to the target_language. ,
"possible": true|false,
"reason": The reason why it is possible or not possible to translate the question.
}}
```
contextStg:
sameContext:
default:
text: |
Below is a conversation followed by a question. You must determine if the question corresponds to the same context as the conversation or if it is from a different context.
Respond only with: [SAME CONTEXT] o [DIFFERENT CONTEXT]
Conversation:
{memory}
Question:
{query}
es:
text: |
A continuación hay una conversación y seguidamente una pregunta. Debes responder si la pregunta corresponde al mismo contexto de la conversación o es una pregunta de un contexto diferente.
Responde únicamente con: [MISMO CONTEXTO] o [DIFERENTE CONTEXTO]
Conversación:
{memory}
Pregunta:
{query}
recreatedQuestion:
default:
default:
text: |
Answer with just a new question or the original question.
Rewrite the original question only if it follows the conversation. Always rewritten question in the same language as the user's question.
Conversation:
{memory}
Original question:
{query}
Rewritten question:
es:
text: |
Responde sólamente con una nueva pregunta.
Reescribe la pregunta original si es una continuación de la conversación. Utiliza el idioma de la peticion del usuario para rescribir la pregunta.
Conversación:
{memory}
Pregunta original:
{query}
Pregunta reescrita:
system:
default:
text: |
The user text contains a query, plus the previous conversation turn.
- If the previous conversation is relevant for the current query, incorporate it into the query and produce a rewritten query
- else just repeat the current query.
Always rewrite the question in the same language as the user's question.
es:
text: |
El texto del usuario contiene una consulta, además del turno anterior de la conversación.
- Si la conversación anterior es relevante para la consulta actual, incorpórala en la consulta y produce una consulta reescrita.
- Si no es relevante, simplemente repite la consulta actual.
Reescribe siempre la consulta en el mismo idioma en que está formulada la consulta del usuario.
human:
default:
text: |
Previous conversation:
{memory}
Current query:
{query}
Rewritten query:
es:
text: |
Conversación anterior:
{memory}
Consulta actual:
{query}
Consulta reescrita:
order: ["system", "human"]
postFilteringStg:
relevantDocument:
default:
text: |
Below is an excerpt of text followed by a question. You must determine if the excerpt is relevant or irrelevant for answering the question.
Respond only with: [RELEVANT] o [IGNORABLE]
Excerpt:
{extract}
Question:
{query}
es:
text: |
A continuación hay un extracto de texto y seguidamente una pregunta. Debes responder si el extracto es relevante o ignorable para responder la pregunta.
Responde únicamente con: [RELEVANTE] o [IGNORABLE]
Extracto:
{extract}
Pregunta:
{query}
relevantSql:
default:
text: |
Given the following question:
`{question}`
Is it possible to answer, using the data contain in the following table?:
```sql
{sql_table_definition}
```
**Explain briefly, all your decisions**.
First, identify which tables are necessary to answer the question. Justify why you selected each of these tables.
Use the following format:
```
I need the following tables to answer the question:
- <table_name>: <reasoning>
- <table_name>: <reasoning>
...
```
Then, identify which columns are necessary to answer the question. Justify why you selected each of these columns.
Write the list of columns you identified, and the reasoning after each column, using the following format:
```
I need the following columns to answer the question:
- <table name>:
- <column_name>: <reasoning>
- <column_name>: <reasoning>
...
- <table_name>:
- <column_name>: <reasoning>
- <column_name>: <reasoning>
...
...
```
Then, tell if the tables and columns you identified are enough to answer the question.
Write the answer using the following format:
```
Possible to answer the question using the former columns:
- <reasoning>
- Result: <Yes|No>
```
Then, explain, step by step, how you would write the SQL query to answer the question, using the columns you identified.
**Use the full qualified names of the columns**. **DO NOT USE THE `JSON_OBJECT` FUNCTION IN THE QUERY**.
Finally, tell if the question can be answered using this format:
```
{{
"possible": true|false,
"reason": The reason why it is possible or not possible to answer the question.
}}
```
generativeStg:
stuff:
default:
default:
text: |
Use the following context extractions to answer the question at the end.
Contexto:
{context}
If the extracted context do not contain the answer avoid coming up with an answer, and response you do not have information for answering and kindly invite the user to make a new question.
Question:
{question}
Never include information by your own using your own knowledge.
{extra_prompt}
es:
text: |
Utilice el siguiente contexto que ha sido extraido para responder la pregunta del final.
Contexto:
{context}
Usando esta información, responde a la pregunta del usuario.
Si la información no contiene la respuesta evita firmemente responder, di que desconoces la respuesta e invita educadamente al usuario a que formule una nueva pregunta.
Pregunta:
{question}
Nunca incluyas información utilizando tus propios conocimientos.
{extra_prompt}
system:
default:
text: |
Respond in language {user_query_language}.
Question:
{question}
args:
user_query_language: "#.auto.language.user_query"
es:
text: |
Responde en el idioma {user_query_language}.
Pregunta:
{question}
args:
user_query_language: "#.auto.language.user_query"
human:
default:
text: |
You are going to generate an answer for a user question or query.
To generate the answer, take always into account all the information available in the context provided.
Context:
{context}
Question:
{question}
Never include information by your own using your own knowledge.
{extra_prompt}
es:
text: |
Vas a generar una respuesta para una pregunta o consulta del usuario.
Para generar la respuesta, ten siempre en cuenta toda la información disponible en el contexto proporcionado.
Pregunta:
{question}
Contexto:
{context}
Nunca incluyas información utilizando tus propios conocimientos.
{extra_prompt}
order: ["system", "human"]
notAnswerResponse:
default:
text: |
You are a question answering agent. You have tried to answer this question: {query}
However you do not have information to answer this.
Please, tell the user that you are not able to answer, apologize and invite the user to make other question.
Avoid any harmful answer, such as sexual, rude, sexist or racist.
Respond in language {user_query_language}.
User question:
{query}
args:
user_query_language: "#.auto.language.user_query"
es:
text: |
Eres un agente de respuesta a preguntas. Has intentado responder a esta pregunta: {query}
Sin embargo, no tienes información para responder a esto.
Por favor, dile al usuario que no puedes responder, discúlpate e invita al usuario a hacer otra pregunta.
Evita cualquier respuesta dañina, como sexual, grosera, sexista o racista.
Responde en el idioma {user_query_language}.
Pregunta del usuario:
{query}
args:
user_query_language: "#.auto.language.user_query"
informationExtraction:
default:
default:
text: |
The original question is this: {question}
We have provided a previous answer: {existing_answer}
Only if necessary, refine the answer exclusively with the context below.
------------
{context_str}
------------
Given the new context, refine the original answer to improve the quality of the response.
If the context is useless, respond with the exact words of the original answer.
{extra_prompt}
es:
text: |
La pregunta original es esta: {question}
Hemos proporcionado una respuesta previa: {existing_answer}
Sólo si es necesario refina la respuesta exclusivamente con el contexto a continuación.
------------
{context_str}
------------
Dado el nuevo contexto, refina la respuesta original para mejorar la calidad de la respuesta.
Si el contexto es inútil responde con las mismas palabras de la respuesta original.
{extra_prompt}
human1:
default:
text: "{question}"
es:
text: "{question}"
ia:
default:
text: "{existing_answer}"
es:
text: "{existing_answer}"
human:
default:
text: |
Refine the existing answer only if necessary, exclusively with the context below.
------------
{context_str}
------------
Given the new context, refine the original answer to improve the quality of the response.
If the context is useless, respond with the exact words of the original answer.
{extra_prompt}
es:
text: |
Refina la respuesta existente, sólo si es necesario, exclusivamente con el contexto a continuación.
------------
{context_str}
------------
Dado el nuevo contexto, refina la respuesta original para mejorar la calidad de la respuesta.
Si el contexto es inútil responde con las mismas palabras de la respuesta original.
{extra_prompt}
order: ["human1", "ia", "human"]
responseConsolidation:
default:
default:
text: |
Below I provide you a context.
---------------------
{context_str}
---------------------
Given exclusively the context, and without using any prior knowledge, respond with a single sentence to the question:
{question}
{extra_prompt}
es:
text: |
A continuación te doy un contexto.
---------------------
{context_str}
---------------------
Dado exclusivamente el contexto, y sin usar ningún conocimiento previo responde con una única frase a la pregunta:
{question}
{extra_prompt}
system:
default:
text: |
Below I provide you a context.
---------------------
{context_str}
---------------------
Given exclusively the context, and without using any prior knowledge, respond with a single sentence to the question:
{question}
{extra_prompt}
es:
text: |
A continuación te doy un contexto.
---------------------
{ context_str }
---------------------
Dado exclusivamente el contexto y sin usar ningún conocimiento previo responde con una única frase a cualquier pregunta.
{ extra_prompt }
human:
default:
text: "{question}"
es:
text: "{question}"
order: ["system", "human"]
sqlPrompt:
default:
text: |
Generate a SQL query statement to answer the following question:
`{question}`
Use the data contained in the following table, as defined in SQL:
```sql
{sql_table_definition}
```
The following tables, containing auxiliary information, are also available:
```sql
CREATE TABLE D_CBD_Static_Geo_Area_v6 (GEO_AREA_ID VARCHAR, CBD_GEO_AREA_LEVEL1_ID VARCHAR, CBD_GEO_AREA_LEVEL2_ID VARCHAR, CBD_GEO_AREA_LEVEL3_ID VARCHAR, CBD_GEO_AREA_LEVEL4_ID VARCHAR, OB_ALPHA_ID VARCHAR, EXTRACTION_TM VARCHAR);
COMMENT ON TABLE D_CBD_Static_Geo_Area IS 'Geographical areas. This table contains foreign keys to the different levels of geographical areas. In particular, it contains the foreign keys to these tables: CBD_Static_Geo_Area_Level1, CBD_Static_Geo_Area_Level2, CBD_Static_Geo_Area_Level3, CBD_Static_Geo_Area_Level4. Therefore, this tables is used, via JOIN, to query the geographical information contained in the different levels of geographical areas. For instance, if you have a table T with a field GEO_AREA_ID and you need to check whether this location corresponds to the region of Asturias you will need to look for GEO_AREA_ID in this table, then extract the CBD_GEO_AREA_LEVEL4_ID and query the table CBD_Static_Geo_Area_Level4 to get the name of the region.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area.GEO_AREA_ID IS 'Identifier of the geographical area considered. FORMAT: string containing a numerical code. This field does not contain location names.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area.CBD_GEO_AREA_LEVEL1_ID IS 'Identifier of the geographical area Level 1 (max level of detail: CP or similar). FORMAT: string containing a numerical code. This field does not contain location names.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area.CBD_GEO_AREA_LEVEL2_ID IS 'Identifier of the geographical area Level 2 (City/Town). FORMAT: string containing a numerical code. This field does not contain location names.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area.CBD_GEO_AREA_LEVEL3_ID IS 'Identifier of the geographical area Level 3 (Province). FORMAT: string containing a numerical code. This field does not contain location names.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area.CBD_GEO_AREA_LEVEL4_ID IS 'Identifier of the geographical area Level 4 (State/Region). FORMAT: string containing a numerical code. This field does not contain location names.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area.OB_ALPHA_ID IS 'Alphanumeric Organizational Business ID';
COMMENT ON COLUMN D_CBD_Static_Geo_Area.EXTRACTION_TM IS 'Date-time of the record';
CREATE TABLE D_CBD_Static_Geo_Area_Level2_v6 (CBD_GEO_AREA_LEVEL2_ID VARCHAR, GEO_AREA_LEVEL_DES VARCHAR, CBD_GEO_AREA_LEVEL3_ID VARCHAR, LONGITUDE_LON_CO DOUBLE, LATITUDE_LAT_CO DOUBLE, GEO_AREA_ID VARCHAR, GEO_STD_AREA_CD VARCHAR, OB_ALPHA_ID VARCHAR, EXTRACTION_TM VARCHAR);
COMMENT ON TABLE D_CBD_Static_Geo_Area_Level2 IS 'Geographical area level 2 (State)';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.CBD_GEO_AREA_LEVEL2_ID IS 'Identifier of the geographical area Level 2 (City/Town). FORMAT: string containing a numerical code.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.GEO_AREA_LEVEL_DES IS 'Description associated to the identifier level 2. FORMAT: alphanumeric string containing the name of the city/town.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.CBD_GEO_AREA_LEVEL3_ID IS 'Identifier of the geographical area Level 3 (Province)';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.LONGITUDE_LON_CO IS 'Longitude coordinates (in WGS84) associated with level 2';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.LATITUDE_LAT_CO IS 'Latitude coordinates (in WGS84) associated with level 2';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.GEO_AREA_ID IS 'Identifier of the geographical area considered. FORMAT: string containing a numerical code.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.GEO_STD_AREA_CD IS 'Standard code of the geo area';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.OB_ALPHA_ID IS 'Alphanumeric Organizational Business ID';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level2.EXTRACTION_TM IS 'Date-time of the record';
CREATE TABLE D_CBD_Static_Geo_Area_Level3_v6 (CBD_GEO_AREA_LEVEL3_ID VARCHAR, GEO_AREA_LEVEL_DES VARCHAR, CBD_GEO_AREA_LEVEL4_ID VARCHAR, LONGITUDE_LON_CO DOUBLE, LATITUDE_LAT_CO DOUBLE, ISO_3166_2_CD VARCHAR, GEO_AREA_ID VARCHAR, GEO_STD_AREA_CD VARCHAR, OB_ALPHA_ID VARCHAR, EXTRACTION_TM VARCHAR);
COMMENT ON TABLE D_CBD_Static_Geo_Area_Level3 IS 'Geographical area level 3 (Region)';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.CBD_GEO_AREA_LEVEL3_ID IS 'Identifier of the geographical area Level 3 (Province). FORMAT: string containing a numerical code.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.GEO_AREA_LEVEL_DES IS 'Description associated to the identifier level 3. FORMAT: alphanumeric string containing the name of the province.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.CBD_GEO_AREA_LEVEL4_ID IS 'Identifier of the geographical area Level 4 (State/Region). FORMAT: string containing a numerical code.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.LONGITUDE_LON_CO IS 'Longitude coordinates (in WGS84) associated with level 3';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.LATITUDE_LAT_CO IS 'Latitude coordinates (in WGS84) associated with level 3';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.ISO_3166_2_CD IS 'ISO 3166-2 associated';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.GEO_AREA_ID IS 'Identifier of the geographical area considered. FORMAT: string containing a numerical code.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.GEO_STD_AREA_CD IS 'Standard code of the geo area';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.OB_ALPHA_ID IS 'Alphanumeric Organizational Business ID';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level3.EXTRACTION_TM IS 'Date-time of the record';
CREATE TABLE D_CBD_Static_Geo_Area_Level4_v6 (CBD_GEO_AREA_LEVEL4_ID VARCHAR, GEO_AREA_LEVEL_DES VARCHAR, LONGITUDE_LON_CO DOUBLE, LATITUDE_LAT_CO DOUBLE, HASC_1_CD VARCHAR, GEO_AREA_ID VARCHAR, GEO_STD_AREA_CD VARCHAR, OB_ALPHA_ID VARCHAR, EXTRACTION_TM VARCHAR);
COMMENT ON TABLE D_CBD_Static_Geo_Area_Level4 IS 'Geographical area level 4 (min. Detail)';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.CBD_GEO_AREA_LEVEL4_ID IS 'Identifier of the geographical area Level 4 (State/Region). FORMAT: string containing a numerical code.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.GEO_AREA_LEVEL_DES IS 'Description associated to the identifier level 4. FORMAT: alphanumerical string containing the name of the state/region. EXAMPLE VALUES: ''Asturias'', ''Andaluc\u00eda'', etc.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.LONGITUDE_LON_CO IS 'Longitude coordinates (in WGS84) associated with level 4';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.LATITUDE_LAT_CO IS 'Latitude coordinates (in WGS84) associated with level 4';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.HASC_1_CD IS 'Hierarchical administrative subdivision codes ';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.GEO_AREA_ID IS 'Identifier of the geographical area considered. FORMAT: string containing a numerical code.';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.GEO_STD_AREA_CD IS 'Standard code of the geo area';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.OB_ALPHA_ID IS 'Alphanumeric Organizational Business ID';
COMMENT ON COLUMN D_CBD_Static_Geo_Area_Level4.EXTRACTION_TM IS 'Date-time of the record';
CREATE TABLE D_CBD_Static_Station_Type_v6 (STATION_TYPE_CD VARCHAR, TECH_LEVEL_WEIGHT_QT FLOAT, STATION_TYPE_L2_DES VARCHAR, STATION_TYPE_L1_DES VARCHAR, STATION_TYPE_L2_ORDER_NUM INT, STATION_TYPE_L1_ORDER_NUM INT, STATION_TYPE_ORDER_NUM INT, CONSCIOUS_IND BOOLEAN, EXTRACTION_TM VARCHAR);
COMMENT ON TABLE D_CBD_Static_Station_Type IS 'Station types';
COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_CD IS 'Device type';
COMMENT ON COLUMN D_CBD_Static_Station_Type.TECH_LEVEL_WEIGHT_QT IS 'Associated weight for the technologic level of the home';
COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_L2_DES IS 'Station type level 2';
COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_L1_DES IS 'Station type level 1';
COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_L2_ORDER_NUM IS 'Station type order level 2';
COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_L1_ORDER_NUM IS 'Station type order level 1';
COMMENT ON COLUMN D_CBD_Static_Station_Type.STATION_TYPE_ORDER_NUM IS 'Station type order';
COMMENT ON COLUMN D_CBD_Static_Station_Type.CONSCIOUS_IND IS 'Indicates if the related device type has energy efficiency';
COMMENT ON COLUMN D_CBD_Static_Station_Type.EXTRACTION_TM IS 'Date-time of the record';
CREATE TABLE D_Segment_v8 (OPERATOR_ID VARCHAR, SEGMENT_ID VARCHAR, SEGMENT_DES VARCHAR, GBL_SEGMENT_ID VARCHAR, SEGMENT_GROUP_ID VARCHAR, SEGMENT_GROUP_DES VARCHAR, EXTRACTION_TM VARCHAR);
COMMENT ON TABLE D_Segment IS 'Classifications of the customers, attending to different segmentation criteria, for marketing and management issues, according to OB criteria and its correspondence with the global segment classification';
COMMENT ON COLUMN D_Segment.OPERATOR_ID IS 'Global Operator Identifier (Operator acting as owner of the information present in the current entity)';
COMMENT ON COLUMN D_Segment.SEGMENT_ID IS 'Organisational segment of the client, in the OB. FORMAT: Numerical code.';
COMMENT ON COLUMN D_Segment.SEGMENT_DES IS 'Segment description. This is the actual name of the segment. POSSIBLE VALUES: ''NTT'', ''Residencial'', ''Pymes'', ''Residencial/SC'', ''Autonomos'', ''Operadores'', ''Grandes Clientes'', ''Residencial Prepago'', ''Telefonica'', ''Sin Clasificar'', ''Empresas''';
COMMENT ON COLUMN D_Segment.GBL_SEGMENT_ID IS 'ID of the global segment classification';
COMMENT ON COLUMN D_Segment.SEGMENT_GROUP_ID IS 'ID code of the segmentation group';
COMMENT ON COLUMN D_Segment.SEGMENT_GROUP_DES IS 'Description of the segmentation group. POSSIBLE VALUES: ''0.- OPERADORES'', ''1.- U.N. Empresas'', ''2.-U.N. Gran Público'', ''3.- TELEFONICA'', ''4.- SIN CLASIFICAR''';
COMMENT ON COLUMN D_Segment.EXTRACTION_TM IS 'Date-time of the record';
```
Some of the former tables contains columns in full-qualified format. For instance, these are some examples of full-qualified columns:
```
record_name.field_name
TEC_PLAT_REC.DEVICE_ID
record_name.subrecord_name.field_name
TEC_PLAT_REC.TEC_PLAT_SUBCOMP_REC.DEVICE_ID
...
```
Always use the full-qualified format when referring to columns in the tables. For instance, if you need to use the column 'TEC_PLAT_REC.DEVICE_ID', you should not refer to it as 'DEVICE_ID', but as 'TEC_PLAT_REC.DEVICE_ID'.
**Explain in detail, step by step, all your decisions**.
If you need to filter by a higher level geographical such as a region (Comunidad Autónoma) you will need to:
- join the `GEO_AREA_ID` field of the data table (such as `CBD_HGU_Detail_Daily`) with the `GEO_AREA_ID` field in `D_CBD_Static_Geo_Area` table
- then join the `CBD_GEO_AREA_LEVEL4_ID` field in the `D_CBD_Static_Geo_Area` with the `CBD_GEO_AREA_LEVEL4_ID` field in the `D_CBD_Static_Geo_Area_Level4` table
- then compare the `GEO_AREA_LEVEL_DES` field in the `D_CBD_Static_Geo_Area_Level4` table with the name of the region (e.g., 'Cantabria'), since the DESCRIPTION field does contain the actual name of the geographical area.
**Only perform these joins if explicit filtering or grouping by geographical location is necessary**.
First, identify which tables are necessary to answer the question. Justify why you selected each of these tables.
Use the following format:
```
I need the following tables to answer the question:
- <table_name>: <reasoning>
- <table_name>: <reasoning>
...
```
Then, identify which columns are necessary to answer the question. Justify why you selected each of these columns.
Write the list of columns you identified, and the reasoning after each column, using the following format:
```
I need the following columns to answer the question:
- <table name>:
- <column_name>: <reasoning>
- <column_name>: <reasoning>
...
- <table_name>:
- <column_name>: <reasoning>
- <column_name>: <reasoning>
...
...
```
Then, tell if the tables and columns you identified are enough to answer the question.
Write the answer using the following format:
```
Possible to answer the question using the former columns:
- <reasoning>
- Result: <Yes|No>
```
Then, explain, step by step, how you would write the SQL query to answer the question, using the columns you identified. **Use the full qualified names of the columns**. **DO NOT USE THE `JSON_OBJECT` FUNCTION IN THE QUERY**.
Finally, write the SQL query to answer the question, using the columns you identified. **DO NOT USE THE `JSON_OBJECT` FUNCTION IN THE QUERY**.
Return the result as a unique JSON object, with the following structure:
{{
"result": <Write the SQL query here. **MAKE SURE THAT THE STATEMENT `SELECT JSON_OBJECT` is not used in the query and Use the full qualified names of the columns. Generate a valid SQL sentence in a single line without new line characters.**>,
"status": "OK",
"reason": <a reasoning explaining the query>
}}
If the former table does not contain the necessary data to answer the question, return the following JSON object:
{{
"result": null,
"status": "ERROR",
"reason": <a reasoning explaining the query>
}}
Make sure that the JSON object is correctly formatted, and can be parsed by a JSON parser.
Injection
Default injection configuration for atria-rag-server. It is used to avoid prompt injection.
Injection fields
| Parameter | Definition | Type/Default values |
|---|---|---|
heuristics |
Heuristic sentences. Object, where the key is the language and the value is a list of phrases. Now, by default, the heuristics sentences are defined in the config, the file path is no indicated. It is important to note that the phrases added here will be also added to those defined in the security stage securityStg of the preset configuration. |
object |
| max_length | (Mandatory) Maximum length |number |
Injection by default
The default configuration is described as follows:
injection:
heuristics:
es:
- responde como
- responda como
- respondeme como
- respondame como
en:
- answer like
- forget everything
- forget your
max_length: 200
Service
Defaults service configuration for atria-rag-server.
Service fields
| Parameter | Definition | Type/Default values |
|---|---|---|
host |
(Mandatory) Host name | string |
port |
(Mandatory) Port id | number |
Service by default
The default configuration is described as follows:
service:
host: 0.0.0.0
port: <AUTOCOMPLETED>
log_level: <AUTOCOMPLETED>
Local Storage
Defaults fields related to the configuration of the local storage for documents
Local Storage fields
| Parameter | Definition | Type/Default values |
|---|---|---|
atria_resources_data_folder |
(Mandatory) Folder name for data resources | string |
atria_shared_data_folder |
(Mandatory) Shared data folder name | string |
Local Storage by default
The default configuration is described as follows:
local_storage_manager:
atria_resources_data_folder: "/opt/atria-rag/data"
atria_shared_data_folder: "/var/atria-rag-data"
Config API
Field with parameters for atria-rag-server API configuration
Config API fields
| Parameter | Definition | Type/Default values |
|---|---|---|
base_url |
(Mandatory) API Config URL | string |
api_key |
(Mandatory) APIKey | string |
Config API by default
The default configuration is described as follows:
aura_config_api:
base_url: <AUTOCOMPLETED>
api_key: <AUTOCOMPLETED>
Retrievers
Retriever are responsible for storing the information that have been generated in the documents. Each retriever is associated with a database in order to feed or retrieve information from it.
Currently, there are three different retrievers defined in ATRIA:
-qdrant
-tfidf
-elasticsearch
Retriever fields
Each retriever type has defined specific fields, as shown below:
| Parameter | Subparameters | Definition | Type/Default values |
|---|---|---|---|
qdrant |
host |
(Mandatory) Host service Qdrant | string |
port |
(Mandatory) Port service Qdrant | number | |
prefix |
(Mandatory) Prefix to collection | string | |
tfidf |
dump_name |
(Mandatory) Dump name of service Tfidf | string |
elasticsearch |
host |
(Mandatory) Host service Elasticsearch | string |
ca_crt |
(Mandatory) Path certificate Elasticsearch | string | |
username |
(Mandatory) Username service Elasticsearch | string | |
password |
(Mandatory) Password service Elasticsearch | string | |
index_name |
(Mandatory) Index service Elasticsearch | string |
Retrievers by default
The default configuration is described as follows:
retrievers:
qdrant:
host: <AUTOCOMPLETED>
port: 6333
prefix: <AUTOCOMPLETED>
tfidf:
dump_name: /var/atria-rag-data/tfidf/dump/
Metadata
Parameter related to the configuration of metadata in atria-rag-server
It is used to setup how metadata is used when providing responses. The retrieving operation produces a list of candidates, each of which may provide a dictionary of metadata. The metadata is used to filter the candidates and provide additional information in the response.
Metadata fields
| Parameter | Subparameters | Definition | Type/Default values |
|---|---|---|---|
map |
filetype |
(Optional) Type of file, typically used to specify the format | string |
page_number |
(Optional) Page number. It could be used to identify particular pages | string | |
group-by |
(Optional) Group by field names. | string | |
aggregate |
(Optional) Determines how the values of duplicated fields are consolidated during grouping | string | |
output_filter |
(Optional) List of fields to be displayed in the metadata | List of string | |
root |
(Optional) Primary fields that will structure the final output of the metadata processing | List of string |
Metadata by default
The default configuration for metadata is described as follows:
metadata:
map:
filetype: content-type
page_number: page-number
group-by: url
aggregate: page-number
output_filter:
- title
- url
- content-type
- page-number
- _zxcv
root:
- title
- url
- content-type
Language identification
Parameter related to the configuration of Language Identification in atria-rag-server
It is used to identify the language of the user’s question. The result is a dictionary containing the detected language in ISO 639-3 format and its corresponding conversion.
In addition to language identification, the user’s question is preprocessed at this stage, and special characters that may cause recognition errors are removed. For example, line breaks. In case of error, the default language is returned.
This language identification is calculated through fasttext library.
Language identification fields
| Parameter | Subparameters | Definition | Type/Default values |
|---|---|---|---|
language_default |
(Optional) Language in ISO 639-3 format (two letters). For example: es |
string | |
score_threshold |
(Optional) Score threshold used to respond in the identified language or in the default language. For example: 0.85 |
float | |
model_path |
(Mandatory) Model path. For example: /opt/atria-fasttext/fasttext_model.bin |
string | |
chars_to_clean |
(Optional) Characters to be cleaned. By default is ['/n'] |
list of string |
Language Identification by default
The default configuration for language identification is described as follows:
language_identification:
score_threshold: <AUTOCOMPLETED>
language_default: <AUTOCOMPLETED>
model_path: "/opt/atria-fasttext/fasttext_model.bin"