Aura entities definition in CSV format

Aura entities definition in CSV format version 5.0.0.

Introduction

The Aura entities definition includes the different entities currently written in Aura in CSV and required to calculate Aura KPIs or by any other team in their data processes.

Entities types Description Generation
Message Message entity is used to store information about the messages handled by aura-bot and the actions performed on them. They are created in aura-bot.
Recognizer Recognizer entity is used to store the request to any of the recognizers during the utterance recognition phase of the messages. They are created by aura-groot, aura-bot and aura-nlp.
Extended Message Extended Message entity is used to store extra information of a Message. They are created in aura-bot.
Groot Message Groot Message entity is used to store information about the messages handled by aura-groot. They are created in aura-groot.

Entities generation

Until release 9.3.0 (Gwen Stefani), all entities in Aura were generated in CSV format. They were also uploaded into Kernel storage in CSV format. Although using a script provided by and running in Kernel, some of them are converted to Avro entities (in a deprecated Avro format, not URM compliant).

In particular, the following conversions are provided:

As can be seen, the entities in Avro are duplicated to hold authenticated and anonymous users interactions separately, because the field USER_4P_ID cannot be null if it exists in a dataset definition.

CSV files format

All entities’ files must be created following these rules:

  • File format: UNIX. UTF8 without BOM
  • Date format: ISO8601
    • Date: 2018-05-02
    • Datetime: 2018-05-02T15:18:11Z => Always UTC

Usually, these kinds of files are stored in .txt and zipped, before uploading.

  • Entities used for the calculation of KPIs are stored in a Kernel bucket. Each entity must be included in a different folder, with files distributed by months, in the path:
    [OB]/[ENTITY]/YYYYMM

  • Files generated in aura-bot can be stored as needed.
    Filename: BOT_[HOST_ID]_[OB]_[ENTITY]_YYYYMMDDTHH0000Z.txt

  • Files generated in aura-services (authentication) can be stored as needed.
    Filename: SERVICES_[HOST_ID]_[OB]_[ENTITY]_YYYYMMDDTHH0000Z.txt

  • Files generated in Aura NLP components can be stored as needed:
    Filename: NLP_<HOST_ID>_<OB>_<ENTITY>_YYYYMMDDTHH0000Z.txt

  • Dimensions entities:
    Path: [OB]/DIMENSIONS/YYYYMM
    Filename: [OB]_DIM_[DIM_NAME]_YYYYMMDDTHH0000Z.txt
    For example: ES_DIM_CHANNEL_20180612T160000Z.txt

All the files are refreshed every day.

CSV Entities tables nomenclature

  • #: Field ID
  • FIELD: Specific field of the entity type
  • PK: Parameter that indicates if the field is a Primary Key or not, that is, a value that uniquely identifies this field.
  • NULLABLE: Parameter that indicates if a field is allowed to have a null value or not.
  • TYPE: Type of the field. It can be one of: text, date, number, boolean
  • DESCRIPTION: Brief description of the field
  • FORMAT: Field mandatory format, if applicable
  • ALLOWED VALUES: Prefixed values permitted for this field
  • EXAMPLE: Example of application

The following considerations must be taken into account:

  • Numeric values are rounded to two decimal positions
  • The amount of money must be included in local currency