Guidelines for making a request to Aura Generative API

Steps to be followed to make a request to aura-gateway-api Generative API, for using ATRIA Generative or RAG capabilities

Introduction

The use of the ATRIA AI-driven Generative AI or RAG capabilities requires making a request to the aura-gateway-api Generative API.

For this purpose, constructors must follow the steps below.

aura-generative-api is a synchronous service so, if there is no validation error, once the call to atria-model-gateway is made, the response will be sent to the application.

Steps in the process

The request from the application must include different fields to be properly processed by this API:

application.id or application.name: Id or name of the application to be used for the resolution of the request. If this field is empty or the application does not exist in the Generative service, an error is sent.
application.preset: Name of preset to use in atria-model-gateway
message: text of the message with the request to be resolved.
Authorization header: Two-legged token.

Request

curl --location 'https://api.environment.baikalplatform.com/aura-aiservices/v1/generative/prompts' \
--header 'x-correlator: <uuid>' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer {token}' \
--data '{
  "application": {    
    "name": "app-name",
    "preset": "preset-default"
  },
  "message": "Hola, ¿qué es AURA?",  
  "prompt_params": {
    "preamble": "system 1",
    "template": "template 1",
    "fields_mapped": {},
    "examples": ["example 1"]
  },
  "model_params": {
    "max_tokens": 1,
    "temperature": 2,
    "top_p": 1
  }
}'

Response

{
  "message": "Hello I am Aura, how can I help you?",
  "session": {
    "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "sequence": 1,
    "parameters": {
      "window": 10,
      "timeout": 30
    }
  },
  "prompt_info": {
    "sizes": {
      "completion": 100,
      "prompt": 50,
      "total": 150
    },
    "model_params": {
      "max_tokens": 100,
      "temperature": 0.5,
      "top_p": 0.5
    },
    "prompt": [
      {
        "role": "user",
        "content": "I want to know more about the beach"
      }
    ],
    "input": "I want to know more about the beach"
  }
}

Errors

Error 400: Invalid application

  {
    "code": "BAD_REQUEST",
    "message": "Invalid message. Application not found."
  }

Error 400: Preset not found for the application

  {
    "code": "BAD_REQUEST",
    "message": "Invalid message. Preset not valid for application app_name."
  }

Error 400: Invalid Args

{
    "code": "BAD_REQUEST",
    "message": "Bad Request",
    "errors": [
        {
            "code": "InvArg",
            "message": "unknown preset: dfg"
        }
    ]
}

Error 429: Quota

{
    "code": "TOO_MANY_REQUESTS",
    "message": "Too Many Request",
    "errors": [
        {
            "code": "Quota",
            "message": "The system is experiencing operational problems. We apologize for the inconvenience."
        }
    ]
}

Error 500

  {
    "code": "INTERNAL_SERVER_ERROR",
    "message": "Internal Server Error",
    "errors": [
        {
            "code": "Internal",
            "message": "The system is experiencing operational problems. We apologize for the inconvenience."
        }
    ]
  }

Recommendations for using response_format

The response_formatparameter is an object that specifies the format that the model must output. It is compatible with Azure OpenAI GPT models newer than gpt-3.5-turbo-1106.

Setting to { “type”: “json_schema”, “json_schema”: {…} } enables structured outputs which guarantee the model will match your supplied JSON schema.

How to include it in the request:

curl --location 'https://api.environment.baikalplatform.com/aura-aiservices/v1/generative/prompts' \
--header 'x-correlator: <uuid>' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer {token}' \
--data '{
  "application": {    
    "name": "app-name",
    "preset": "preset-default"
  },
  "message": "Hola, ¿qué es AURA?, genera un JSON ",  
  "prompt_params": {
    "preamble": "system 1",
    "template": "template 1",
    "fields_mapped": {},
    "examples": ["example 1"]
  },
  "model_params": {
    "max_tokens": 1,
    "temperature": 2,
    "top_p": 1,
    "response_format":{ "type": "json_object" }
  }
}

There are two key factors that need to be present to successfully use JSON mode:

response_format={ “type”: “json_object” }
With this configuration, we tell the model to output JSON as part of the system message. Including guidance to the model that it should produce JSON as part of the messages conversation is required. We recommend adding instruction as part of the system message.
According to OpenAI, to add this instruction can cause the model to “generate an unending stream of whitespace and the request could run continually until it reaches the token limit.”

If “JSON” is not included within the messages, the following error may occur:

BadRequestError: Error code: 400 - {'error': {'message': "'messages' must contain the word 'json' in some form, to use 'response_format' of type 'json_object'.", 'type': 'invalid_request_error', 'param': 'messages', 'code': None}}

Further Reference: Microsoft documentation: Learn how to use JSON mode

Last modified May 14, 2025: feat: Documentation improvement for Prince release #AURA-29163 [RTM] (c69c1272)

Tags:

Categories: