This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Aura Analytics 1.1.

Aura Analytics 1.1.

Description of Aura Analytics 1.1, the monitoring dataflow that allows active listening in Aura

Introduction

This document contains a description of a joint dataflow between LCDO OB teams and Aura Global Team for processing Aura log files created in production environment (i.e., coming from actual Aura users) in order to create PPDs (Privacy-Preserving Datasets). All this process is known as Active Listening.

The dataflow produces as a result, among other elements, an analytics component, named as Aura Analytics Dashboard, that can be used to gather statistics on the production system and to analyze user’s behavior. The latest version 1.1 of this dashboard is described in the current document.

The main objectives of the unified dataflow are:

  • Consolidate the processing of Aura logs into a framework.
  • Provide LCDOs and Aura Global Team with a unified common source for analytics, in a privacy-preserving way.
  • Enable extensibility of the dataflow.

In this framework, the current documents provide:  

The target audience of this document includes the following roles both in LCDO Teams and Aura Global Team:

  • Data Scientists and Product teams, that wish to access Aura logs information and perform analytics on them.
  • Operation teams, for the architectural description and the requirements on OB environments.

Aura Analytics versions

Release 1.0.

The first release 1.0. sets up the basic paths, deploys the PPD infrastructure and produce:

  • Version 1.0. of the OB Analytics system, which includes the OB Dashboard.
  • The first version of pre-processed datasets (clean PPDs) for training and analytics at Aura Global.

As mentioned, this version enables OBs to go further by:

  • Enhancing the OB Dashboard with new visualizations, as they seem fit (given that panels and dashboards can be exported and imported, it is possible to share new ones across all OBs, as they are developed).

  • Processing the PPD files as desired (they are standard CSV files, which can be ingested in alternative platforms if desired). Restrictions on them are softer than on the original logs due to the anonymization process they have been subjected to, although they are still subjected to management precautions (a code of conduct is being prepared for that).

Release 1.1.

Version 1.1. introduces the following changes:

  • The table of data has been enlarged with these new fields: AURA_ID, STATUS_CD, sesId, sesSize, sesDuration.
  • An expanded list of test users is used, so that the userType column contains more identifications.
  • The code for data ingestion into a local Kibana, which previously consisted on a single Python script, has been turned into a full Python package to be installed, due to its increasing complexity.

The prerequisites for the use of version 1.1. of Aura Analytics Dashboard are set below:

  • Aura Platform version:

  • Recommended operating system:Ubuntu 18.04 system

  • Recommended tool for data visualization: ELK stack

1 - Architecture

Aura Analytics 1.1. architecture

Technical architecture of Aura Analytics 1.1.

Architecture description

The following figure shows a full overview of Aura Analytics Dashboard architecture and operation, which is also described below:

Aura Analytics architecture

  1.  Aura logs generated in local instance are converted to datasets and transferred to local Kernel via the standard procedure and with the established frequency (typically, daily).

  2.  Once there, the “Active listening” process flow fires up daily. Through a specialized process that runs on an Aura local instance and with access to the stored datasets in the Kernel local storage space:

    • PII (Personally Identifiable Information) is removed or encrypted.
    • The result is transferred to a bucket/blob set up for this task and managed by Global Aura team.
    • Here, the PPDs (Privacy-Preserving Datasets) are created. Currently, only MESSAGE, RECOGNIZER and API datasets are involved in this process.

    In order to convert PII data to PPD, every field in these datasets can be:

    • a. Not transferred.
    • b. Pseudo-anonymized. In this situation, the field is transformed through a cryptographic hashing process using a secret set up by the OB.
    • c. Anonymized fragments of the field (e.g., credit card number, email, etc.). The field is processed to detect specific patterns and replaces them with a specific tag (idemail, idpassport, etc.). The list of anonymization strings is agreed with each OB.
    • d. Transferred as is.
  3.  After that, the Raw PPD Datasets stored in bucket/blog managed by the Global Team are processed generating clean PPD Datasets in order to adapt them to the analytics tools.

  4.  From that space, the clean PPD Datasets can be:

  • Accessed by the Aura Global Team that use them for several tasks, with the purpose of evaluating Aura quality and taking the best decisions regarding to product evolution:

    • Perform analytics on Aura behavior and prototype Analytics Dashboard features
    • Improve Aura Platform capabilities (e.g., adapting machine learning models)
  • Accessed by a Local Aura Team, ingesting the data to a dedicated server managed by the OB with analytics and data visualization capabilities. In order to do that, the Aura Global Team provides a component with the ELK (elasticsearch, logstash & kibana) preconfigured with a set of dashboards that can be deployed and adapted by the OB team.

All the code involved in this process can be found in Github. Particularly:

2 - OB analytics

OB analytics

Description of the OB OB Analytics subsystem that can be managed by OBs.

Introduction

The OB Analytics subsystem is an optional component in the dataflow, which enables the management of clean PPDs (Privacy-Preserving Datasets) by LCDOs for the analysis of Aura behavior.

In order to work with OB Analytics subsystem, the following items must be fulfilled:

  1.  The legal agreement for log management and creation of PPDs must be signed between the OB and Aura Global Team.

  2.  The mechanism for PPD creation and transfer must be installed. This requires the deployment of a piece of software (provided by Aura Global Team) inside the OB cloud, with access to the repository (AWS bucket or Azure Blob Storage) holding Aura logs.

  3.  A virtual machine must be deployed on the OB cloud to hold the OB Dashboard. This virtual server must be provisioned by the OB on the same cloud environment (provider and region, e.g., AWS West Europe) than the Kernel cloud, but separated from it in terms of access rights (placing it in the same cloud enables saving transfer costs from the cloud provider for PPD access).

Architecture and installation

The basic infrastructure of the OB Analytics subsystem consists on a Virtual Machine that is fed with the extracted and cleaned PPDs. This virtual machine is set up with a proposed stack of tools based on the open-source ELK framework (See figure in Architecture document).

  • Elastic Search: indexing database.

  • Logstash: ingester for PPD data, configured to upload the anonymized clean PPD tables into Elastic Search.

  • Kibana: visualization tool offering dashboards and panels created over Elastic Search data.

The OB is required to set up the base VM, for which an Ubuntu 18.04 system is advised.

On top of this base system, Aura Global Team provides an installation kit that includes:

  • The pre-processing and ingesting configuration for feeding clean PPD data into logstash.
  • The indexing configuration for Elastic Search.
  • Certain prototype dashboards and panels for Kibana.
  • Basic security provisions (providing web-based secure access to the dashboard).

Once installed, the system automatically ingests any new clean PPD being produced, so that the index and dashboards remain up to date.

In principle, the PPD creation process specifies daily production, since Aura logs are sent to Kernel once a day. This means that information about Aura behavior and user actions on one given day will be available in the dashboards on the following day.

The provided system and installed dashboards are only visualization examples for clean PPDs. The system allows the creation of additional panels that may provide complementary insights on clean PPD elements and OBs are encouraged to explore data as they see fit.

Dashboards can be exported and reimported in a different system, so in addition to the LCDO team adding new analysis features, it is possible to provide later updates to the OB Analytics system. These updates can be provided by the Aura Global Team or shared between OBs.

Outside the dashboard stack, it is also possible to process clean PPD with alternative tools (PPDs are essentially CSV files with a defined structure, so they can be processed with a variety of tools).

Kibana dataflow

The Aura Analytics dashboard follows a standard ELK deployment:

  1. An Elastic Search index has been created. It is called aura-message-COUNTRY, and its index schema contains a cleaned version of the AURA MESSAGE table (which registers input and output messages). For details on the fields that this index contains, go to the document Data model.

  2. A Logstash configuration ingests into this index the cleaned sets of datapoints that are produced daily as a result of the transfer and processing of Aura logs. This is usually done in the early morning (which will then upload data for the previous day).

  3. A Kibana index pattern has been created, matching the uploaded Elastic Search index. An Elastic Search index is how the data is stored inside the DB; a Kibana index pattern is how it is visualized in the interface. Typically, Kibana index patterns match Elastic Search indices, but it is, for example, possible to create a Kibana index pattern that matches more than one Elastic Search index and hence combines different data sources.

  4. A small set of visualizations have been pre-installed in Kibana over that index pattern, as a means to get a default peek on the index data. See the section preinstalled visual elements to check them.

This configuration is deployed on the Kibana default space (the only one available on a freshly created Aura Analytics dashboard). If there is the need to create additional spaces, to better organize visualizations, then the Elastic Search index pattern needs to be installed into those additional spaces.

Preinstalled visual elements

Kibana offers many possibilities to visualize the ingested data and there are many resources and tutorials around explaining its mechanics. We therefore refer to the official Kibana documentation, or tutorials available on the web, for generic information.

In the particular case of the Aura Analytics deployment, there is an Elastic Search index that gets automatically ingested daily. It is called Aura-message-COUNTRY and contains a cleaned version of the AURA MESSAGE table (which registers input and output messages).

Over this index, three types of panels/visualizations have been preinstalled, to provide a starting point:

  • Discover panel
  • Visualizations
  • Dashboards

These preinstalled elements are described in the following subsections. To access them, select the appropriate icon in the left navigation panel.

Elastic search index

Discover panel

The Discover panel in Kibana is an essential tool where one can perform queries to an Elastic Search index (and save those searches if desired), and explore users’ interactions with Aura in detail log by log, these being filtered by:

  1.  Search terms or conditions
  2.  A time interval
  3.  Additional filters applied to the query results
  4.  A set of index fields to show in the result table

These 4 steps are represented in the following figure:

Discover panel

As shown in the previous figure, the starting point is the Elastic Search index holding all the data. The three first steps in the chain reduces the amount of data handled, by pruning out elements that do not satisfy the defined condition. The fourth step is just a display adjustment: on the final dataset, define which of the available fields will be shown on the output table that appears in the panel. However, the retrieved data contains all fields (clicking on any of the rows will show them).

In the Aura Dashboard default set, there is one Discover panel preinstalled. It is called question-answer pairs and has the following characteristics:

  • A blank query (i.e., provide all the results)
  • A time interval for the last 7 days
  • A “only user” filter: it filters out all intents that correspond to non-user queries (suggestions, help commands from the client application, etc.)
  • A visualization that includes: the timestamp, the (cleaned) user message, the detected aura intent, associated entities (if applicable), the dialog that was invoked and Aura’s response

This figure shows a snapshot of this panel. To load it, select the Discover tool in the left navigation bar and then click on the “Open” menu option in the top menu bar. A list of saved panels will be shown, together with the already mentioned “question-answer pairs”.

question-answer pairs panel

Once the panel is loaded, each one of the aforementioned four elements can be freely modified. For example, the interface allows:

  • Adding new filters with the “+Add Filters” button
  • Deactivating the current filters by pressing over the predefined filter and clicking over the “Temporarily Disable” option
  • Modifying the query interval with the “calendar” button or “Dates Box”
  • Adding a specific query on a given index field(s) by using the “Search Box”, instead of the (default) blank query.

Discover panels can be saved as named objects, to be later loaded at will. So, if needed, any panel (a modified panel or a newly created one) can be saved with a new name to have it available for later loading.

Visualizations

A total of 7 visualizations come preinstalled with the base Aura Dashboard. The list can be obtained from the “visualizations” item in the left menu bar, as shown in the figure, and they are:

  • Three “Stats” type visualizations, which provide general statistics on platform usage.
  • Four “User” type visualizations, which provide insights on user behavior.

Visualizations

Note that this distinction between “User” and “Stats” is purely conceptual and based on the fields that have been used to generate the visualizations that, from the point of view of Kibana, are all regular visualizations. Those visualizations can be instantly loaded by clicking on their names. But they can also be integrated into dashboards, as described in the next section.

Dashboards

A dashboard in Kibana is essentially a spatial arrangement of visualizations. For example, to construct a dashboard, just place visualizations into a page, resizing them as required, so they can be observed in a single place.

It is interesting to know that in a dashboard all visualizations are linked. So that if, for example, time interval is changed, or a filter is added using the interface, these modifications affect all visualizations in the dashboard and all of them get updated.

Elements in the dashboard visualizations can also generate instant filters by clicking on graphs or table elements. Those filters are then added to the top of the page as a filter and, therefore, can then be modified or removed.

The Aura Analytics default installation preloads two dashboards. Those are available for selection when we click on the “dashboard” icon in the left navigation bar:

Default dashboards

There are different types of dashboards, described in the following sections.

System dashboard

This dashboard integrates the three predefined “Stats” visualizations (generic statistics):

  • A timeline of interactions (user messages sent and answered), segmented by channel
  • A heatmap of interactions by weekday and time of day (hour)
  • A bar graph classifying the interactions produced in the period by detected intent

The following figure shows a screenshot of this dashboard:

System dashboard

User dashboard

The user dashboard contains the four visualizations:

  • Most Frequent User Utterances: list of the most frequent user’s sentences (in the time interval and filter active at the moment). It uses the usrMsgSig field to group together very similar utterances.
  • Aura Answer Groups: list of the most frequent answers that Aura generates, grouped by the semantic categories in AuraMsgGroup field.
  • Words per query: distribution of sizes for the user messages, measured as number of words in the utterance and segmented by channel.
  • Tag cloud: set of most frequent user utterances, as a tag cloud in which the font size represents the utterance frequency. The MESSAGE_USR_NORM field is used for its representation, so it contains normalized utterances.

The next screenshots show the dashboard with all these visualizations (it is a large dashboard, so typically it needs scrolling to visualize all its components).

User dashboard

Note that those four visualizations are linked as they correspond to the same subset of the data (as given by filters and time interval) but they are NOT linked at the individual item level (i.e., a given most frequent user utterance in the left table does not correspond to any specific Aura answer in the right bar graph).

Instead, the dashboard can be manipulated by selecting one specific item in any of the visualization and this will create a filter for the others. For instance, as the following image shows, if we select <CHURN> in the Aura answer group visualization, we can see in the others the user utterances that led Aura to generate that answer (i.e., an answer about contract cancelation).

Aura answer groups in User dashboard

3 - Data model

Aura Analytics data model

Data model of Aura Analytics 1.1. that can be used as the base for building new elements

Introduction

New elements can be built (or the current elements modified) by making use of the available fields in Kibana through the ingested Elastic Search index.

In this document, we provide a reference of the schema that the index follows, so that it can be used to build such new visualizations, or to better understand the existing ones.

Elements in the Aura-message data model have 3 different types:

  • Numeric: single numbers, integer or real. Suitable for numerical statistics, such as averages, or for plotting variation across time in graphs.

  • Keyword: they are opaque strings, i.e., terms that cannot be searched within (it is not possible to look for words inside a keyword field). They can, however, be used to create some term-level queries, such as prefix queries (find all instances that begin with) and they usually work great for aggregations, since most of them are categorical variables (fields that only have a limited number of possible values) and can therefore be bucketed and counted.

  • Text: these fields are divided into separate terms (words), and some pre-processing is done to them before indexing to improve access though an Elastic Search analyzer. Text fields cannot be used in aggregated visualizations, since they cannot be grouped. They are most useful for queries, because they allow searching for fragments (only a few words) and fuzzy searches.

Fields list

The following table lists all the fields available in the Aura-message-COUNTRY Elastic Search index, together with their type and a brief description.

The most relevant ones include a more detailed description in the section fields explanations.

Note that some fields of text type have a mirror field of type keyword, with the same content. Having the same data indexed in two different ways at the same time (as text and as keyword) enables to perform different types of analysis by choosing the right field.

The “Raw” column indicates if this field is already present in the Aura raw PPD files:

  • Yes: field contained in raw PPDs.

  • No: generated field, produced when creating clean PPDs. They can be recognized as lowercase fields.

  • Partial: It exists in the raw PPDs, but in a somehow different shape.

Field Type Raw Contents
CORR_ID keyword yes Unique identifier for each interaction
VERSION_ID keyword yes Aura Platform version
CHANNEL_CD keyword yes Identifier for the channel this interaction corresponds to
STATUS_CD keyword yes Internal code related to operation status
AURA_ID_GLOBAL keyword yes (Mostly) unique identifier for the user
AURA_ID keyword yes (Mostly) local identifier for the user
INTENT keyword yes Detected user intent, including “system” intents
MESSAGE_USR text partial Text request sent by the user
MESSAGE_USR_NORM text no A normalized version of MESSAGE_USR
MESSAGE_USR_NORM.keyword keyword no A keyword version of MESSAGE_USR_NORM, to enable aggregating on it
MESSAGE_AURA text partial Text message sent by AURA to the user
MESSAGE_AURA.keyword partial Keyword version of MESSAGE_AURA, to enable aggregating on it
MODALITY_CD_USR text partial Modality of the user message
MODALITY_CD_AURA text partial Modality of Aura response
ENTITIES text yes Comma-separated list of the entities recognized in the user message
DIALOG_ID text yes Identifier for the dialog that produced Aura response
DIALOG_ID.keyword keyword yes Keyword version of DIALOG_ID, to enable aggregating on it
DURATION_NU number yes Elapsed time, in ms, between the reception of the user message and the moment the response is generated to be sent to the channel
userType keyword no A single char identifier that characterizes the user as a test user
usrMsgWc number no Message word count: number of words contained in the user message
usrMsgSig keyword no Message signature: a string that helps clustering user messages
AuraMsgGroup keyword no Cluster the Aura response belongs to
weekday number no Day of the week the interaction happened (0=Monday to 6=Sunday)
hour number no (Integer) hour the interaction happened
country keyword partial Two-letter code for the country
sesId keyword no Session information
sesSize number no Session information
sesDuration number no Session information

Fields explanations

This subsection contains more detailed descriptions of some of the key fields in the schema.

AURA_ID_GLOBAL

This element (mostly) uniquely identifies the user generating the interaction.

Note the concrete value of this field is not the same as the actual identifier used within Aura and uploaded to Kernel: for privacy reasons, the identifier was hashed when generating the PPD and has no resemblance to the original one. The correspondence is however maintained across time, so it is possible to analyse user behavior.

The “mostly” qualifier reflects one quirk of the original Aura identifier: it is generated with a dependence to the authentication method used by the channel, so if two channels follow different authentication methods (e.g., MobileConnect vs. User/Password) then the AURA_ID_GLOBAL identifier for the same user will be different. In summary:

  • The identifier stays the same for a given user across time.

  • Different users will not have the same identifier.

  • But the same user could produce two different identifiers if connected to two channels that use a different authentication method.

AURA_ID

This is a “local” identifier, i.e., one that is generated inside the channel according to specific channel characteristics and it is not tied as much as AURA_ID_GLOBAL to user authentication.

Its main disadvantage is its transient nature: the same user, on the same channel, could generate different AURA_ID strings when connecting different times on a different session. Therefore, for user accounting and tracing, AURA_ID_GLOBAL is usually preferred.

However, there are instances in which AURA_ID works better, namely for anonymous access (when the user is not authenticated). This depends on the channel:  

  • In the WhatsApp channel, the initial use of the channel will be anonymous from the Aura side (i.e., no authentication is done), hence AURA_ID_GLOBAL will also be empty (at least until the user authenticates, which depends on the use case). But in this channel, AURA_ID has a permanent value, linked to the WhatsApp user, so here it is a good substitute for a persistent id, even for unauthenticated users.
MESSAGE_USR

This field includes the message sent by the user.

It has been partially processed to enhance anonymization by removing some standard identifiers contained in it with <idxxx> strings (e.g., phone numbers appear as <idphone>).

Removal is done mostly through regular expressions, so there might be occasional glitches (such as identifying as phone a number that does not really correspond to a phone, just because it follows the phone number pattern).

MESSAGE_USR is a field of text type. As such, it is searchable: it is possible to search for specific words the user might have said.

Furthermore, it has been processed through an ElasticSearch analyzer adapted to the specific language used. This means that searches are able to match related words (e.g., plural versions of a singular query word, or verb conjugations). Phrase searches are also possible (by using double quotes around the phrase). If a phrase (several words) is used as a query without the quotes, ElasticSearch interprets it as a query for any of the words, so it will return all data elements that contain any of the words in the query.

In Kibana, more sophisticated text searches can be made by switching Lucene query syntax: proximity queries (words close to each other), fuzzy searches (query words allowing typos), wildcards, etc.

MESSAGE_USR_NORM

This is a normalized version of MESSAGE_USR, in which the user text has been streamlined by:

  • Converting all the sentence to lowercase
  • Removing all punctuation
  • Removing any extra spaces

Furthermore, this field is not processed through a language-dependent analyzer as MESSAGE_USR is, so queries on this field must match words exactly. It is still a text type field. However, the same query language can be used.

MESSAGE_AURA

This contains the text message generated by Aura and sent to the user as response to the user query. It is a text type field, so it is possible to search for specific words in it.


In the current version of Aura KPIs logs, this field only contains the text response. Some Aura use cases do not generate a purely textual message, but a more elaborated one (e.g., a card with text and graphics). These complex answers are inserted as attachments into Aura’s response to the channel and since attachments are not logged into the MESSAGE field, this field will appear empty in those cases. So, an empty MESSAGE_AURA field does not necessarily mean that Aura did not provide an answer. As an alternative for those situations, looking at the DIALOG_ID field (or INTENT) may give a hint of the type of answer that Aura delivered.

 MODALITY_CD_USR

This field contains the modality in which the user sent the message.

It is a slightly transformed field because there are some variations across Aura versions and, in order to unify it, the modalities are consolidated into only four different keywords: audio (spoken message), text (written free-text message) o form (commands sent via automatic processing or menus).

 DIALOG_ID

This field contains the identifier for the user case dialog module at the aura-bot Framework that was selected to construct the Aura response.

Dialog identifiers have two components (library  and dialog) separated by a colon e.g., services:service-usage

This field uses a custom analyser that splits the identifier at the colon, generating two terms. This makes possible to construct queries with one of the terms, e.g., “give me all the elements for the domain services”. But being a text field makes it impossible to do aggregations on it, so it cannot be used for statistics like bar charts (use DIALOG_ID.keyword for that).

DURATION_NU

This number reflects the time that took Aura to understand, process and respond to the user message. It is the difference (in milliseconds) between the timestamp of the moment the user message was received and the timestamp in which Aura’s response was finalized and sent to the channel.

Note that it is not a complete end-to-end delay time from the user’s point of view, since it does not include either the time it took the request to arrive to Aura through the channel or the time it took the response to travel back through the channel and get rendered at the client application (those times are outside Aura, and as such not registered by it).

Session Information

Session information includes the fields: sesId, sesSize, sesDuration.

These fields are generated by running a process over the time series formed by interactions from each user at each channel.

A session is automatically identified as a consecutive list of such user’s interactions, each separated from the next by a time interval shorter than 5 minutes. Once each session is identified, it is tabulated and labelled with three fields:

  1. sesId: string, forming a unique identifier for the session. It should be considered as an opaque identifier and the guarantee is that no other session in the data stream carries the same identifier.
    As an aside, interactions that do not correspond to actual user interactions (because no user could be identified or because the datapoint corresponds to an interaction not triggered by the user) are all labelled with a <void> sesId.

  2. sesSize: number of interactions this session contains. This is labelled only for the first interaction in the session, all other interactions carry a 0 in this field. Non-sessions such as the ones with <void> sesId will be left empty. This facilitates computing averages or other statistics on valid sessions, by just first filtering out all zero and empty values.

  3. sesDuration: time duration for each session, counted from the instant the first user message was received, to the instant the last Aura message was sent. For single-interaction sessions its value will be the same as DURATION_NU, for multiple interactions it will contain the time interval between all of them.

As with sesSize, only the first interaction in a session is annotated with sesDuration; the remaining interactions will be assigned a 0 value (and interactions that do not correspond to a session will be left empty). Therefore, to compute statistics on sesDuration, remove the 0 and empty values first.

userType

This field may be used, in certain cases, to help identify rows that do not correspond to real users but to test users (internal users that belong to test/QA teams and whose behaviour is, therefore, not representative of actual Aura users).

The field contains a single character, which is s for standard (real) users, and can be Q or T for QA/Test users respectively (there are also lowercased versions q and t, referring to unconfirmed test users).

Note that test user identification is not available on every country, since it depends on having a register of the AURA_GLOBAL_ID identifiers that QA/Test users authenticate and this is not always available.

usrMsgSig

This field is not useful by itself. Instead, it is intended to be used to help grouping together very similar user utterances. It does so by generating a signature of the utterance that is (hopefully) insensitive to small variations in the sentence.

This is an experimental field; it might change if we reach a variant that is better suited for its purpose.

The way to generate this signature is by following these steps with the utterance:

  • Start with the normalized utterance (i.e., MESSAGE_USR_NORM).

  • Perform stemming (removal of word suffixes) on all the words. This makes bills and bill the same word.

  • Substitute words from a fixed list of very common, uninformative tokens (stopwords) by an asterisk. For example, this converts both “get my bill” and “get the bill” to the same phrase “get * bill”.

  • Group words in sets of 3 elements (trigrams) and sort them alphabetically. This removes the global structure of the sentence, while retaining local structure.

The resulting string is a non-understandable version of the original utterance (hence, it cannot be used by itself), but the fact that several very similar utterances produce the same signature helps cluster those utterances. An example is one of the preinstalled visualizations “Most Frequent User Utterances” which uses this field to group very similar utterances.

Another example is provided in the following figure, which shows message utterances generating the same signature:

Message utterances

As it can be seen, the signature is the same for “how can I upgrade” and “when can I upgrade”, “when does my contract end” and “when is my contract ending”, and “live chat” & “live chats”. So, they would be counted together when aggregating by signature.

The procedure has its limitations and, as explained, it is experimental, so we are trying to improve it, but it can already alleviate a bit the inherent variability in user expressions.

AuraMsgGroup

Messages produced by Aura are as generated by its text resource database. In some cases, the same category of message produces different output texts, maybe because the message includes some user-dependent parameter or because the text database contains several variants of the same text (and Aura picks one at random).

The AuraMsgGroup field is a keyword field that helps categorize Aura answer by abstracting away some of this variation. It classifies the response given by Aura into two types of elements:

  • Generic group: a name such as <NONE>, <GREETING> or <NOTFOUND>, which corresponds to a response category (see Table 3)

  • Truncated answer: for answers that do not have a defined generic group, as a fallback the literal answer text is inserted, after substituting all numbers in it with a placeholder and truncating it (i.e., retain only the first characters).

The following table contains the generic groups defined so far. They correspond to the most frequent Aura messages. It is country-dependent, since it also depends on the use cases deployed in each country. As said above, responses not falling into these groups will be assigned a truncated version of the response text.

Note that th emost frequent Aura messages list can be enlarged with time. Also, the correspondence between Aura messages and groups is not static, if the text database is updated with new variants, it will be necessary to also update the translation table in the PPD cleaner process that generates this field.

Group Meaning
EMPTY No textual answer from Aura (see note in Section MESSAGE_AURA for the usual meaning of no text answer)
NONE Aura says it did not understand the user utterance
ERR There was a processing error of some kind at Aura side, and the request could not be fulfilled
GREETING Aura is greeting the user
GOODBYE Aura is acknowledging a conversation end
YOU-ARE-WELCOME Aura is accepting a compliment
CHURN Aura recognizes the user intention to terminate a contract
NOTFOUND Aura tried to search for some bit of data concerning the user query, and could not find it
CANNOT Aura cannot fulfil the user request because of insufficient information (in the query, or on user data)
BILL-INFO The user requested information about her bill, and Aura is returning it
DATA-INFO The user requested information about her data usage, and Aura is returning it

: The list can be enlarged with time. Also, the correspondence between Aura messages and groups is not static, if the text database is updated with new variants, it will be necessary to also update the translation table in the PPD cleaner process that generates this field.

4 - Annex: Dataset fields

Annex: Dataset fields detail

Explanation of the process that each field of the data model is going through towards a clean PPD

Introduction

The objective of the following tables is to explain the process that each field is going through within this flow:

AURA DATASET PPD_RAW PPD_CLEAN


  • Each cell of the table explains the process that the data field is undergoing in this specific moment before it gets to the concrete stage (table column).

  • For example, the field GLOBAL_AURA_ID is undergoing a “hashing” before it gets stored in PPD_RAW. After this, the “hashed data” is progressed without any further processing to PPD_CLEAN.

Tables used in the Active Listening process are described in the following sections. They belong to the Aura Entities database.

  • Columns “FIELD” and “DESCRIPTION”: instances managed by the OB

  • Columns “PPD RAW” and “PPD CLEAN”: instances managed by Aura Global Team

MESSAGE dataset

Message dataset (stored in local Kernel)

  • COLUMNS “field” and “description”: instances managed by the OB

  • COLUMNS “PPD raw” and “PPD clean”: instances managed by Aura Global Team

# FIELD                 DESCRIPTION   PPD RAW       PPD CLEAN
1 USER_ID               Unique user ID in the OB systems NOT
transferred
 NOT
transferred
2 MSG_DT                 Timestamp of the data                                
3 MSG_ID                 Unique ID of the message                         NOT
transferred
4 ACTION_CD             Code of the action that produces the data                     NOT
transferred
5 AURA_ID               User logging ID in Aura. The user will have a new Aura_id each time she logs in Aura. Hashed              
6 PHONE_ID               Phone number of the user    NOT
transferred
NOT
transferred             
7 CHANNEL_CD             Code of the channel where the action happened                                  
8 SUBSCRIPTION_CD       Code of the subscription type of the user in the OB           NOT
transferred
9 DOMAIN_CD             Code of the domain where the action happened                 NOT
transferred
10 CATEGORY_CD           Code of the category where the action happened               NOT
transferred
11 COUNTRY_CD             Code of the country NOT
transferred
12 CORR_ID               Correlator ID of the request that produces this data                                    
13 IS_CACHED             Shows if the entity content was already cached or not     NOT
transferred
14 STATUS_CD             Status code of the action, if meaningful                               
15 REASON                 Result of the action in error case, code of the error   NOT
transferred
16 VERSION_ID             Aura version that produces this data                                
17 LANG_CD               Language configured by the user for communication   NOT
transferred
18 TZ_CD                 Timezone where the communication happened NOT transferred
19 DURATION_NU           Duration in milliseconds of the action                                
20 MESSAGE               Content of the message   Anonymized                  
21 DIALOG_ID             Id of the dialog where the message happens                              
22 CONVERSATION_ID       Id of the conversation where the message happens               NOT
transferred
23 WIN_RECOGNIZER_CD     Code of the recognizer that wins for this message             NOT
transferred
24 WIN_RECOGNIZER_SCORE_NU Score of the recognizer that wins for this message             NOT
transferred
25 INTENT                 Selected intent                                  
26 ENTITIES               List of entities determined by the recognizer                                
27 MODALITY_CD           How does the user communicate with Aura                                   
28 AURA_ID_GLOBAL         Identifies the same user_id logged with the same authentication method Hashed    
29 ACCOUNT_NUMBER         Unique account number of the user               NOT
transferred             
NOT
transferred

RECOGNIZER dataset

Recognizer dataset stored in local Kernel

  • Columns “FIELD” and “DESCRIPTION”: instances managed by the OB

  • Columns “PPD RAW” and “PPD CLEAN”: instances managed by Aura Global Team

# FIELD                   DESCRIPTION       PPD RAW       PPD CLEAN
1 USER_ID                 Unique user ID in the OB systems   NOT transferred NOT transferred             
2 RECOGNIZER_DT           Timestamp of the data                              
3 RECOGNIZER_ID           Unique ID of the recognizer                              
4 ACTION_CD               Code of the action that produces the data                 NOT transferred
5 AURA_ID                 User logging ID in Aura. The user will have a new Aura_id each time she logs in Aura. Hashed              
6 PHONE_ID               Phone number of the user   NOT transferred NOT transferred 
7 CHANNEL_CD             Code of the channel where the action happened                              
8 DOMAIN_CD               Code of the domain where the action happened     NOT transferred      
9 CATEGORY_CD             Code of the category where the action happened     NOT transferred  
10 COUNTRY_CD             Code of the country                NOT transferred         
11 CORR_ID                 Correlator ID of the request that produces this data                            
12 IS_CACHED               Shows if the entity content was already cached or not NOT transferred     
13 STATUS_CD               Status code of the action, if meaningful                            
14 REASON                 Result of the action in error case, code of the error                              
15 VERSION_ID             Aura version that produces this data                              
16 LANG_CD                 Language configured by the user for communication NOT transferred             
17 TZ_CD                   Timezone where the communication happened    NOT transferred            
18 DURATION_NU             Duration in milliseconds of the action                              
19 SCORE_NU               Score returned by the recognizer                                  
20 INPUT                   User input sent to the recognizer. Null if incoming message is an AuraCommand Anonymized              
21 OUTPUT                 Complete output generated by the recognizer                            
22 INTENT                 Intent returned by the recognizer                               
23 ENTITIES               Entities returned by the recognizer due to the intent                            
24 COMMON_THRESHOLD_NU     Common threshold used to determine the best answer of all recognizers                 NOT transferred
25 THRESHOLD               Specific threshold of the specific recognizer being executed  NOT transferred            
26 EXPECTED_INTENT         Intent expected to be returned by the recognizer  NOT transferred            
27 EXPECTED_ENTITIES       Entities expected to be returned by the recognizer due to the intent             NOT transferred
28 AURA_ID_GLOBAL         Identifies the same user_id logged with the same authentication method Hashed              
29 ACCOUNT_NUMBER         Unique account number of the user    NOT transferred NOT transferred             

This Markdown table can be directly used in your GitHub Markdown files.  

API dataset

API request dataset (stored in local Kernel)

  • Columns “FIELD” and “DESCRIPTION”: instances managed by the OB

  • Columns “PPD RAW” and “PPD CLEAN”: instances managed by Aura Global Team

#   FIELD            DESCRIPTION     PPD RAW       PPD CLEAN
1   USER_ID               Unique user ID in the OB systems                                   NOT transferred NOT transferred             
2   REQUEST_DT           Timestamp of the data                                                                        
3   REQUEST_ID           Unique ID of the request                                                                      
4   ACTION_CD             Code of the action that produces the data                                       NOT transferred
5   AURA_ID               User logging ID in Aura. The user will have a new Aura_id each time she logs in Aura Hashed NOT transferred
6   PHONE_ID             Phone number of the user                                           NOT transferred NOT transferred
7   CHANNEL_CD           Code of the channel where the action happened                                   NOT transferred
8   DOMAIN_CD             Code of the domain where the action happened                                   NOT transferred
9   CATEGORY_CD           Code of the category where the action happened                                 NOT transferred
10 COUNTRY_CD           Code of the country                                                             NOT transferred
11 CORR_ID               Correlator ID of the request that produces this data                                        
12 IS_CACHED             Shows if the entity content was already cached or not             NOT transferred NOT transferred             
13 STATUS_CD             Status code of the API request                                                                
14 REASON               Result of the action in error case, code of the error                                        
15 VERSION_ID           Aura version that produces this data                               NOT transferred
16 LANG_CD               Language configured by the user for communication                    NOT transferred          
17 TZ_CD                 Timezone where the communication happened                                      
18 DURATION_NU           Duration in milliseconds of the action                                                        
19 HOST                 Host of the API                                                                              
20 PATH                 Specific path of the API being called                               NOT transferred           
21 HTTP_STATUS           HTTP status of the server response                                  NOT transferred            
22 RESPONSE             Response body                                                     Anonymized                  
23 AURA_ID_GLOBAL       Identifies the same user_id logged with the same authentication method Hashed NOT transferred
24 ACCOUNT_NUMBER       Unique account number of the user         NOT transferred NOT transferred             
25 REQUEST               Request body