This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Prerequisites

Prerequisites for working with Aura NLP

Key requirements that are essential to configure the Aura NLP development environment, prior to the generation and training of an understanding model

Introduction

Before starting the development of use cases over Aura NLP, there are certain tasks that must be carried out in order to install and configure this component:

1 - Technical resources

Technical resources for working with Aura NLP

Mandatory resources required by NLP experts of linguists in order to work with Aura NLP

Resources list

🔹 Aura NLP technical resources 🔹
Aura installation
- Latest Aura Platform release
Operating systems
- Linux over distribution Ubuntu 18.04 LTS (with Java preinstalled)
Configuration of development environment
- Python 3.9.
- Pip3
- virtualenv
- For Linux distributions: libsqlite3-dev liblzma-dev libbz2-dev
Software
- GitHub licence
- Text editor: Pycharm or similar
- Use of Grammars: Unitex/Gramlab open-source corpus processing suite
- Grammars engine: GrapeNLP
- CLU stage: Microsoft CLU account
- OpenAI stage: Azure OpenAI Service account
NLP Training and testing tool
Abacus 1.0.0.

2 - Generate a local branch

Generate a local branch for the NLP data repository

Discover the structure of Aura NLP data repository and learn how to clone it for working purposes in local environment

Introduction to Aura NLP data repository

The GitHub Aura NLP data repositories, for uses cases, are defined below for every country:

  • Use cases: aura-nlpdata-[country_code]

Both have the same specific structure of folders and files, as shown in the section Aura NLP data repository structure

Local NLP experts must work over a local branch, thus cloning the intended global repository, following the steps in section Generate a local branch.

In the continuous process for Aura NLP optimization, Aura Global Team offers the possibility of splitting the NLP repository into different repos, for a more efficient way of working. Find the details in section Split Aura NLP repository.

The following sections show the content of each folder and file in the Aura NLP repository, for use cases.

As an example, access https://github.com/Telefonica/aura-nlpdata-es

.github

GitHub config files

config/etc

This folder includes files for the configuration of the Aura NLP model:

config/etc/ Description Modifiable for use cases development? Detailed information
bootstrap.cfg General purpose config file. No NLP system configuration
nlp_config/nlp.json File that contains the configuration by language and channel for each stage of the pipeline. Yes Configure your NLP model
build_catalogs.cfg.tpl File to configure source data for dictionaries. Only required if the NLP model includes stages using dictionaries. Yes Guidelines for the generation of dictionaries in Aura NLP
api_trainings.cfg.tpl File only used in ABACUS tool. It is a configuration template that will be filled automatically with the values defined in build_local_variables.sh. No ABACUS documentation
env.js.tpl File only used in ABACUS tool. This template will be filled automatically. No ABACUS documentation

data/

This folder includes the resources and files required for the generation of the Aura NLP pipeline and for the training of every NLP stage:

data/ Description Modifiable for use cases development? Detailed information
pipeline.json File for building up the NLP dynamic pipeline Yes Build the NLP dynamic pipeline
Training files Specific training files for each NLP stage Yes Define your data resources
sdict_items.json
sdict_aliases.json
Dictionary files automatically generated per language and channel Yes Guidelines for the generation of dictionaries in Aura NLP

delivery

Internal folder containing scripts and resources related with Continuous Integration.

⚠️ Do not to modify this folder when developing new use cases.

pipeline_eval

pipeline_eval/ Description Modifiable for use cases development? Detailed information
pipeline_eval/ob/[country_code]/resources/[language]/[channel]/ end-to-end tests for evaluation of the pipeline accuracy per country, language and channel Yes Define your E2E tests

tools

Scripts for local training and testing of the Aura NLP model:

tools/ Description Used for use cases development? Detailed information
build_local_variables.sh.tpl File for configuration purposes, specifically for the definition of CLU and other connection parameters. yes Set up configuration properties
build_local.sh Script that automatically generates the local training environment and results files. yes Execute the training script
build_local_testset.sh Script for the definition of specific E2E testsets files for an isolated stage. Currently, available for the OpenAI embeddings stage. yes Define stage-specific E2E testset files
run_local_pipeline.sh Script used to test the system in a live mode during the pipeline launching stage. yes Launch and test your pipeline locally
build_local_catalogs.sh Script used to generate dictionaries using local catalogs data. yes Guidelines for the generation of dictionaries in Aura NLP
run_web_training.sh Script used to run ABACUS tool. yes ABACUS documentation
import_nlpdata_tools.sh Auxiliar script used by other scripts. This script must not be executed by the user. no

ℹ️ Now, all the scripts need to connect with the centralized repository in Github aura-nlp-tools, so it is necessary that your Github user have read access to it. Ask the APE Team to get this permission.

catalogs

Folder required just in case the Aura NLP uses manual catalogs.

catalogs/ Description Used for use cases development? Detailed information
catalogs/[language]/[channel]/ Files for the manual update of catalogs yes Guidelines for the generation or update of entities catalogs

validation

Configuration files for different validators.

⚠️ These files must not be modified.

gitignore

Config file containing files to be ignored by the version control system.

CODEOWNERS

Config file indicating which user or group is the code owner responsible for merging the code.

⚠️ This file must not be modified.

config.txt

File containing branch name of current working release, used in different scripts.

⚠️ This file must not be modified.

requirements.txt

File containing Python module dependencies. These dependencies are installed automatically during the training process.

⚠️ This file must not be modified.

Generate a local branch

The GitHub interaction allows the generation of local branches from the master branch.

Local NLP experts must carry out the NLP customization over the local branch, that is a clone of the NLP GitHub repository and, afterwards, create a Pull Request (PR) to push the local branch to master or release branch of the corresponding Aura release.

For this purpose, follow these steps:

  1. Create the working directory:

    mkdir -p ~/Telefonica
    cd ~/Telefonica 
    
  2. In order to clone the Aura NLP data project (Step 3), generate an SSH key and add it to your Github account.
    For this purpose, follow the instructions in Github documentation or access to the document SSH configuration guidelines.

  3. Clone the Aura NLP data project of your country. The repository URL follows the next pattern: https://github.com/Telefonica/aura-nlpdata-[country_code]-[optional:channelName).git

    Where [country_code] is the acronym of a specific country, for example: es, br, de, gb

    In order to clone the repository, it is possible to use some git client as GitKraken or it can be done directly from a console running the command:
    git clone <url_repo>

    The project should be cloned in the folder where the above command was executed and the folder should have the same name as the repository:
    git clone git@github.com:Telefonica/aura-nlpdata-[country_code].git

  4. Once the repository is cloned in the local machine, create a new git branch every time modifications need to be made concerning new use cases implementation, bug fixing, etc.

    The name of the branch should start with one of the next reserved words, depending on the modification purpose, followed by a slash and a brief description:

    • feat/: new functionalities (for example, feat/weather_forecast_UC_#56624)
    • fix/: bugs or non-relevant modifications (for example: fix/balance_light_on_#117076)
    • release/: release synchronization

    The command to create this new branch must follow this pattern:

    cd ~/Telefonica/aura-nlpdata-gb
    git checkout -b "[feat|fix|release]/<change_description>"
    

    Find here detailed information regarding Semantic Commit Messages.

Split Aura NLP repository

As a recommendation, the OB’s aura-nlpdata repository can be split by groups of channels with similar uses cases. This provides a greater flexibility and independence to constructors.

At the same time, this functionality allows optimizing the training times, as only the pipelines of the repositories that undergo modifications will be retrained.

In this scenario, the format of the repository name must be: aura-nlpdata-[country_code]-[repo_name]

If OBs want to organize their NLP repo in this way, they must contact with Aura Global Team.

Finally, it is possible to allocate dedicated processing capacity of the C.I, system, if necessary, but only after a joint analysis with Aura Global Team.