<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Aura – rag-documents</title>
    <link>/tags/rag-documents/</link>
    <description>Recent content in rag-documents on Aura</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    
	  <atom:link href="/tags/rag-documents/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Docs: </title>
      <link>/docs/deployment/troubleshooting/generate-db-hf-embeddings-models/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/docs/deployment/troubleshooting/generate-db-hf-embeddings-models/</guid>
      <description>
        
        
        &lt;h1 id=&#34;check-hugging-face-embedding-models-downloading&#34;&gt;Check Hugging Face embedding models downloading&lt;/h1&gt;


&lt;div class=&#34;pageinfo pageinfo-primary&#34;&gt;
&lt;p&gt;Guidelines to check if the Hugging Face models used in &lt;em&gt;&lt;strong&gt;ATRIA&lt;/strong&gt;&lt;/em&gt; are downloaded during the generate-db process&lt;/p&gt;

&lt;/div&gt;

&lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The free embedding templates we are currently using in &lt;em&gt;&lt;strong&gt;ATRIA&lt;/strong&gt;&lt;/em&gt; are &lt;strong&gt;paraphrase-multilingual-MiniLM-L12-v2&lt;/strong&gt;
and &lt;strong&gt;multi-qa-distilbert-cos-v1&lt;/strong&gt; both from Hugging Face. (These models are the ones used with the following
&lt;a href=&#34;../../docs/atria/technical-guidelines/configuration/atria-default-configuration/#embeddings-by-default&#34;&gt;embeddings by default available in &lt;em&gt;&lt;strong&gt;ATRIA&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt;: Local Sentence Transformer and Distilbert-based Local Sentence Transformer).&lt;/p&gt;
&lt;p&gt;During the &lt;a href=&#34;../../docs/atria/technical-components/atria-rag-generate-db/&#34;&gt;generate-db process&lt;/a&gt;,
these models are loaded into memory and the process may fail if there is a connection problem with Hugging Face. In this error scenario, the only solution is to wait until the service is again up and running.&lt;/p&gt;
&lt;p&gt;In the current document, we include the instructions to check if the embedding models can be downloaded, in order to detect the process failure.&lt;/p&gt;
&lt;h2 id=&#34;prerequisites&#34;&gt;Prerequisites&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Install huggingface-cli&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pkgx install huggingface-cli
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;check-if-the-hugging-face-models-are-downloaded-properly&#34;&gt;Check if the Hugging Face models are downloaded properly&lt;/h2&gt;
&lt;p&gt;The way to check if the service is up is by launching the following command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;huggingface-cli download sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If the download starts, the service is up, and you can restart the generate-db process.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Docs: </title>
      <link>/docs/atria/technical-guidelines/configuration/import-documents/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/docs/atria/technical-guidelines/configuration/import-documents/</guid>
      <description>
        
        
        &lt;h1 id=&#34;import-documents-into-atria&#34;&gt;Import documents into ATRIA&lt;/h1&gt;


&lt;div class=&#34;pageinfo pageinfo-primary&#34;&gt;
&lt;p&gt;Guidelines for importing documents and new data into &lt;em&gt;&lt;strong&gt;ATRIA&lt;/strong&gt;&lt;/em&gt; environment&lt;/p&gt;

&lt;/div&gt;

&lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;As described in &lt;a href=&#34;../../docs/atria/capabilities/llm-experiences-builder/rag/general-rag/#functional-overview&#34;&gt;General RAG: functional overview&lt;/a&gt;, when using &lt;a href=&#34;../../docs/atria/capabilities/llm-experiences-builder/rag/&#34;&gt;RAG capability&lt;/a&gt;, different databases are used for lexical and semantic search.&lt;/p&gt;
&lt;p&gt;The documents that feed these knowledge bases must be uploaded into the environment to be used in the RAG chain and updated when required. In this framework, two processes must be considered:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#1-data-curation&#34;&gt;a. Curate data (recommended)&lt;/a&gt;: Firstly, it is important to curate the data to be uploaded afterwards, to optimize the recognition process.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#2-import-documents&#34;&gt;b. Import documents&lt;/a&gt;: Once the data is curated, the documents must be uploaded into the system. For that purpose, apart from the general method, a hot swapping process can be executed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;a-data-curation&#34;&gt;a. Data curation&lt;/h2&gt;
&lt;p&gt;Data curation is the process of organizing, managing, cleaning up and maintaining data to ensure it stays relevant and valuable. Good practices in this task leads to an efficient recognition by the AI model.&lt;/p&gt;
&lt;p&gt;For this purpose, we recommend following these tips, based on research and internal analysis:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Data selection and cleaning&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Include only data relevant to the purpose of the RAG. Redundant, irrelevant or outdated information should be removed to clean up noise that does not add value.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;2. Clarity and consistency in content&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Be concrete and specific&lt;/strong&gt;&lt;/em&gt;: Keep the information to the point. Avoid unnecessary words or complex explanations.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Avoid ambiguous messages&lt;/strong&gt;&lt;/em&gt;: Avoid vague or unclear terms that could lead to confusion. Make sure the meaning is easy to interpret.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Reinforce the message&lt;/strong&gt;&lt;/em&gt;: Make the message clearer by using specific terms related to the category being discussed. Use keywords strategically to reinforce the message.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Make sure procedures are clear and include all the necessary steps&lt;/strong&gt;&lt;/em&gt;: Make sure each step in tutorials is fully described, logically structured and easy to follow. Avoid fragmented or disjointed instructions.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Remove unnecessary reference information&lt;/strong&gt;&lt;/em&gt;: Minimize excessive details between steps that could distract or confuse the LLM. Keep the flow simple and clear.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;3. Improvements in information&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Add missing content&lt;/strong&gt;&lt;/em&gt;: If the product includes features similar to others but with slight variations, add a sentence explaining what is and is not supported to make the LLM more accurate.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Add similar terminology&lt;/strong&gt;&lt;/em&gt;: Although you cannot control what terminology people use, mentioning common alternative terms in your content can help the LLM provide more informative answers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;4. Structure and formatting&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Maintain consistent formatting&lt;/strong&gt;&lt;/em&gt;: Ensure all steps follow a parallel structure (similar sentence formats and style) to improve coherence.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Simplify complex tables&lt;/strong&gt;&lt;/em&gt;: Avoid blank cells and ensure every cell has a complete value. Replace symbols (e.g., checkmarks) with clear text (&amp;ldquo;Yes&amp;rdquo;, &amp;ldquo;Supported&amp;rdquo;) to improve interpretation. Rewrite footnote text to add context. Move complex information in table cells out of the table.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Avoid nested content&lt;/strong&gt;&lt;/em&gt;: LLMs can have difficulty with multiple levels of nesting (e.g., steps within steps). Keep content linear and simple for better understanding.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Add summaries to tutorials or long procedures&lt;/strong&gt;&lt;/em&gt;: LLMs can get &amp;ldquo;lost&amp;rdquo; with long tutorials or procedures due to context window limitations. Including a summary is a simple way to enhance results.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;5. Clarification and Explanation of Concepts&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Easy writing&lt;/strong&gt;&lt;/em&gt;: Resolve writing issues such as wordiness, passive voice, and unclear pronouns (with ambiguous references) to make text more understandable.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Explain graphics/images in text&lt;/strong&gt;&lt;/em&gt;: Clearly explain conceptual graphics through text to resolve ambiguities and avoid relying on an image-to-text model&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;b-import-documents&#34;&gt;b. Import documents&lt;/h2&gt;
&lt;p&gt;Once the data is curated, the documents must be uploaded into the system. For that purpose, the following guidelines must be followed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note: The RAG does not support files with whitespaces.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id=&#34;1-upload-documents-in-the-azure-container-atria-resources&#34;&gt;1. Upload documents in the Azure container &lt;code&gt;atria-resources&lt;/code&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Insert these documents in the &lt;code&gt;&amp;lt;preset_name&amp;gt;/&amp;lt;retrievalStg.sources.name&amp;gt;/&amp;lt;retrievalStg.sources.docs[i].extension&amp;gt;/&lt;/code&gt; folder.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Keep in mind the allowed formats for documents, set in the preset&amp;rsquo;s variable &lt;a href=&#34;../../docs/atria/technical-guidelines/configuration/modify-atria-configuration/#:~:text=loader%3A-,loaderType,-%3A%20Mandatory.%20Must%20be&#34;&gt;&lt;code&gt;loader.loaderType&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;2-configure-docs-parameter-in-preset&#34;&gt;2. Configure &lt;code&gt;docs&lt;/code&gt; parameter in preset&lt;/h3&gt;
&lt;p&gt;For these documents to be used in your use case, they must be included in the preset, following these instructions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fill in the parameters in the &lt;a href=&#34;../../docs/atria/technical-guidelines/configuration/modify-atria-configuration/#:~:text=is%20associated%20with.-,docs,-%3A&#34;&gt;&lt;code&gt;docs&lt;/code&gt;&lt;/a&gt; key of your preset, which is related to the configuration of documents.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here is an example of documents configuration. In this example, documents in the preset are separated into two folders, as we are going to load two different types of data (jsonl and pdf) into this preset.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;```json
{
&amp;quot;retrievalStg&amp;quot;:{
    &amp;quot;sources&amp;quot;:{
        &amp;quot;name&amp;quot;:&amp;quot;project-de-faqs&amp;quot;,
        &amp;quot;embeddings&amp;quot;:&amp;quot;text-embedding-ada-002&amp;quot;,
        &amp;quot;docs&amp;quot;:[
            {
            &amp;quot;extension&amp;quot;:&amp;quot;jsonl&amp;quot;,
            &amp;quot;loader&amp;quot;:{
                &amp;quot;loaderType&amp;quot;:&amp;quot;jsonl&amp;quot;
            }
            },
            {
            &amp;quot;extension&amp;quot;:&amp;quot;pdf&amp;quot;,
            &amp;quot;loader&amp;quot;:{
                &amp;quot;loaderType&amp;quot;:&amp;quot;unstructured&amp;quot;,
                &amp;quot;options&amp;quot;:{
                    &amp;quot;loaderMode&amp;quot;:&amp;quot;single&amp;quot;
                }
            }
            }
        ],
        &amp;quot;splitter&amp;quot;:{
            &amp;quot;splitterType&amp;quot;:&amp;quot;recursivechar&amp;quot;,
            &amp;quot;options&amp;quot;:{
            &amp;quot;chunkSize&amp;quot;:512,
            &amp;quot;chunkOverlap&amp;quot;:160
            }
        },
        &amp;quot;retrievers&amp;quot;:[
            {
            &amp;quot;retrieverType&amp;quot;:&amp;quot;qdrant&amp;quot;
            },
            {
            &amp;quot;retrieverType&amp;quot;:&amp;quot;tfidf&amp;quot;
            }
        ]
    }
}
}
```
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;3-upload-list-of-urls&#34;&gt;3. Upload list of URLs&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If you use URLs as documents (&lt;code&gt;&amp;quot;loaderType&amp;quot;: &amp;quot;url_list&amp;quot;&lt;/code&gt;), you also need to upload a file with the list of URLs in the &lt;em&gt;preset&lt;/em&gt; folder.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Separate each URL with a line break. The file must have the extension &lt;code&gt;.txt&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;http://www.url1.com
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;http://www.url2.com
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;4-upload-jsonl-or-jsond-files&#34;&gt;4. Upload jsonl or jsond files&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If you use &lt;code&gt;jsonl&lt;/code&gt; or &lt;code&gt;jsond&lt;/code&gt; files as documents (&lt;code&gt;&amp;quot;loaderType&amp;quot;: &amp;quot;jsonl&amp;quot;&lt;/code&gt; or &lt;code&gt;&amp;quot;loaderType&amp;quot;: &amp;quot;jsond&amp;quot;&lt;/code&gt;), you also need to upload the file content in the same folder with the extension &lt;code&gt;.jsonl&lt;/code&gt; or &lt;code&gt;.jsond&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To do so, each desired document content must be provided in the &lt;code&gt;page_content&lt;/code&gt; key.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-jsonl&#34; data-lang=&#34;jsonl&#34;&gt;{&amp;#34;page_content&amp;#34;: &amp;#34;test1&amp;#34;, &amp;#34;metadata&amp;#34;: {&amp;#34;source&amp;#34;: &amp;#34;https://www.dummy1.es/&amp;#34;}, &amp;#34;type&amp;#34;: &amp;#34;Document&amp;#34;}
{&amp;#34;page_content&amp;#34;: &amp;#34;test2&amp;#34;, &amp;#34;metadata&amp;#34;: {&amp;#34;source&amp;#34;: &amp;#34;https://www.dummy2.es/&amp;#34;}, &amp;#34;type&amp;#34;: &amp;#34;Document&amp;#34;}
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;5-add-projectmetadata-file-optional&#34;&gt;5. Add project.metadata file (optional)&lt;/h3&gt;
&lt;h4 id=&#34;scenario-1-unstructured-csv-or-text-data&#34;&gt;Scenario 1: Unstructured, csv or text data&lt;/h4&gt;
&lt;p&gt;If the &lt;code&gt;loaderType&lt;/code&gt; is &lt;code&gt;url_list&lt;/code&gt;, &lt;code&gt;unstructured&lt;/code&gt; or &lt;code&gt;csv&lt;/code&gt;, you can optionally add a file called &lt;code&gt;project.metadata&lt;/code&gt; with relevant information about each file. This metadata will be stored in the database and is very helpful when we want to modify the source URL.&lt;/p&gt;
&lt;p&gt;&lt;i class=&#34;fa-solid fa-triangle-exclamation fa-xl&#34; style=&#34;color: #f45815;&#34;&gt;&lt;/i&gt; It is important that the file is &lt;strong&gt;correctly tabulated&lt;/strong&gt; and does not contain any invalid characters.&lt;/p&gt;
&lt;p&gt;The file is composed of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Key &lt;code&gt;__global__&lt;/code&gt;, which contains global data that affects all the files.&lt;/li&gt;
&lt;li&gt;Names of the specific files to which we want to include this extra data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is not necessary to define metadata for all the files in the folder.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;__global__:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   url: https://www.google.com
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   field1: test
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   field2: test
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;file1.txt:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   url: https://www.dummy-url.com
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   title: file1 title
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;file2.txt:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   url: https://www.dummy-url.com
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   title: file1 title
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   source: test
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;i class=&#34;fa-solid fa-circle-info fa-xl&#34; style=&#34;color: #3267c3;&#34;&gt;&lt;/i&gt; &lt;strong&gt;NOTE&lt;/strong&gt;: From all the information added to the &lt;code&gt;project.metadata&lt;/code&gt; when creating your use case, you can select the specific sources that will be shown to the user as part of the response, adding them to the field &lt;a href=&#34;../../docs/atria/technical-guidelines/configuration/modify-atria-configuration/#:~:text=is%20number.-,baseUrl,-%3A%20Mandatory.%20Base&#34;&gt;&lt;code&gt;baseURL&lt;/code&gt;&lt;/a&gt; of the preset configuration.&lt;/p&gt;
&lt;h4 id=&#34;scenario-2-url-or-json-documents&#34;&gt;Scenario 2: URL or json documents&lt;/h4&gt;
&lt;p&gt;In this case, there is no need to add the project.metadata file:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;&amp;quot;loaderType&amp;quot;: &amp;quot;url_list&amp;quot;&lt;/code&gt; &amp;mdash;&amp;gt; Metadata information is included in the URLs themselves, uploaded in &lt;a href=&#34;#3-upload-list-of-urls&#34;&gt;step 3&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;&amp;quot;loaderType&amp;quot;: &amp;quot;jsonl&amp;quot;&lt;/code&gt;, &lt;code&gt;&amp;quot;loaderType&amp;quot;: &amp;quot;jsond&amp;quot;&lt;/code&gt; &amp;mdash;&amp;gt; Metadata information is already included in the files uploaded in &lt;a href=&#34;#4-upload-jsonl-or-jsond-files&#34;&gt;step 4&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;6-update-data-into-the-environment&#34;&gt;6. Update data into the environment&lt;/h3&gt;
&lt;p&gt;Finally, execute the &lt;a href=&#34;../../docs/atria/technical-components/atria-rag-generate-db/#launch-atria-rag-generate-db&#34;&gt;&lt;em&gt;&lt;strong&gt;atria-rag-generate-db&lt;/strong&gt;&lt;/em&gt; job&lt;/a&gt; to update the data into the environment.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Docs: </title>
      <link>/docs/atria/technical-components/atria-rag-server/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/docs/atria/technical-components/atria-rag-server/</guid>
      <description>
        
        
        &lt;h1 id=&#34;atria-rag-server&#34;&gt;ATRIA RAG Server&lt;/h1&gt;


&lt;div class=&#34;pageinfo pageinfo-primary&#34;&gt;
&lt;p&gt;Descriptive documentation regarding the &lt;em&gt;&lt;strong&gt;ATRIA&lt;/strong&gt;&lt;/em&gt; component &lt;em&gt;&lt;strong&gt;atria-rag-server&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;/div&gt;

&lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;atria-rag-server&lt;/strong&gt;&lt;/em&gt; is an &lt;a href=&#34;../../docs/atria/&#34;&gt;&lt;em&gt;&lt;strong&gt;ATRIA&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt; component that manages a RAG-type server. It is called by &lt;a href=&#34;../../docs/atria/technical-components/atria-model-gateway&#34;&gt;&lt;em&gt;&lt;strong&gt;atria-model-gateway&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt; when RAG (Retrieval Augmented Generation) is used.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;atria-rag-server&lt;/strong&gt;&lt;/em&gt; manages the request made to the RAG model following the predefined RAG chain (pipeline) and making continuous requests combining Generative AI technology (LLMs) with semantic and lexical searches to retrieve the required information.&lt;/p&gt;
&lt;h2 id=&#34;associated-documentation&#34;&gt;Associated documentation&lt;/h2&gt;
&lt;p&gt;&lt;i class=&#34;fa-regular fa-file-lines fa-xl&#34; style=&#34;color: #0d5de7;&#34;&gt;&lt;/i&gt; Descriptive technical documentation regarding &lt;em&gt;&lt;strong&gt;atria-rag-server&lt;/strong&gt;&lt;/em&gt; includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;../../docs/atria/technical-components/atria-rag-server/components/&#34;&gt;Architecture and components&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;../../docs/atria/technical-components/atria-rag-server/operational-overview/&#34;&gt;Operational overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Docs: </title>
      <link>/docs/atria/technical-components/atria-rag-generate-db/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/docs/atria/technical-components/atria-rag-generate-db/</guid>
      <description>
        
        
        &lt;h1 id=&#34;atria-rag-generate-db&#34;&gt;ATRIA RAG Generate DB&lt;/h1&gt;


&lt;div class=&#34;pageinfo pageinfo-primary&#34;&gt;
&lt;p&gt;Descriptive documentation regarding the &lt;em&gt;&lt;strong&gt;ATRIA&lt;/strong&gt;&lt;/em&gt; component &lt;em&gt;&lt;strong&gt;atria-rag-generate-db&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;/div&gt;

&lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;atria-rag-generate-db&lt;/strong&gt;&lt;/em&gt; is an &lt;a href=&#34;../../docs/atria/&#34;&gt;&lt;em&gt;&lt;strong&gt;ATRIA&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt; component that manages a RAG-type database. This component is launched when you want to feed the document database for the first time or when you want to update the database with new information. See more information about these processes in the guidelines &lt;a href=&#34;../../docs/atria/technical-guidelines/configuration/import-documents/&#34;&gt;Import documents into ATRIA&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;atria-rag-generate-db&lt;/strong&gt;&lt;/em&gt; is in charge of handling the information coming from different sources and feeding the databases the RAG works with.&lt;/p&gt;
&lt;h2 id=&#34;associated-documentation&#34;&gt;Associated documentation&lt;/h2&gt;
&lt;p&gt;&lt;i class=&#34;fa-regular fa-file-lines fa-xl&#34; style=&#34;color: #0d5de7;&#34;&gt;&lt;/i&gt; Descriptive technical documentation regarding &lt;em&gt;&lt;strong&gt;atria-rag-generate-db&lt;/strong&gt;&lt;/em&gt; includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;../../docs/atria/technical-components/atria-rag-generate-db/components/&#34;&gt;Architecture and components&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;../../docs/atria/technical-components/atria-rag-generate-db/operational-overview/&#34;&gt;Operational overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;launch-atria-rag-generate-db&#34;&gt;Launch atria-rag-generate-db&lt;/h2&gt;
&lt;p&gt;To launch &lt;em&gt;&lt;strong&gt;atria-rag-generate-db&lt;/strong&gt;&lt;/em&gt;, there are two suitable options:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Option 1&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Send a request to the API for it to launch the &lt;em&gt;&lt;strong&gt;atria-rag-generate-db&lt;/strong&gt;&lt;/em&gt;. The endpoint responsible for this is:&lt;br&gt;
&lt;em&gt;/aura-services/v2/operations/data&lt;/em&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl -X POST &lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;https://&amp;lt;your-atria-domain&amp;gt;/aura-services/v2/operations/data&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#4e9a06&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;&lt;/span&gt;-H &lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;Content-Type: application/json&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;-d &lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#39;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;  &amp;#34;presetId&amp;#34;: &amp;#34;&amp;lt;name of the project&amp;gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Option 2&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Execute the following command to update the data in the environment.
This command is in charge of launching the generation of the database for all the projects, but we can launch this generation for a specific project.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000&#34;&gt;PROJECT&lt;/span&gt;&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#39;project-copilot-reduced&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl patch configmap/atria-rag-generate-db-project --type merge -p &lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;{\&amp;#34;data\&amp;#34;:{\&amp;#34;ATRIA_PROJECT\&amp;#34;:\&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;${&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;PROJECT&lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;\&amp;#34;}}&amp;#34;&lt;/span&gt; -n &amp;lt;namespace&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl create job --from&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;=&lt;/span&gt;cronjob/atria-rag-generate-db &lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;$(&lt;/span&gt;date +%Y%m%d%H%M%S&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;)&lt;/span&gt;-atria-rag-generate-db-&lt;span style=&#34;color:#4e9a06&#34;&gt;${&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;PROJECT&lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;}&lt;/span&gt; -n &amp;lt;namespace&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;(Change &lt;code&gt;&amp;lt;namespace&amp;gt;&lt;/code&gt; by the specific one)&lt;/p&gt;

      </description>
    </item>
    
  </channel>
</rss>
