Review of a Pull Request by NLP Global Team

Procedure followed by the NLP Global Team in order to validate the Pull Request including the NLP model

This process is done once the Pull Request is launched, for the evaluation of the NLP model by the NLP Global Team: Certify NLP model accuracy: review by the NLP Global Team.

Introduction

The review of the Pull Request including the NLP model carried out by the NLP Global Team includes the processes explained in the following sections.

It can be very useful for Local Teams to know these processes and criteria used by the NLP Global Team in order to focus on the critical points.

Categories of errors and problems

Detected errors are classified into three categories:

BLOCK: Blocking task. It must be resolved in order to approve and merge the PR. In case there are certain blockers to be modified, the system dismisses the GitHub Pull Request and publishes a comment describing the problem and indicating the procedure to resolve it. This case requires re-training the NLP model.
NON-BLOCK: Mandatory but non-blocking task. It must be resolved following the guidelines and best practices in the current or in further PRs.
SUGG: Not mandatory but recommended modifications that should be taken into account even for subsequent PRs. For them, it is recommended to inform the NLP Global Team whether the suggestion is taken into account or not.

The setting of an adequate threshold for the NLP system accuracy depends on the use case. Therefore, for a specific use case, the minimum accuracy should be agreed by L-CDO and the NLP Global Team.

Best practices for the Pull Request validation

These best practices should be followed both by the NLP Global Team and the local linguists, if they participate in the validation process.

Take into account the following icons that indicate different status to reviewers:
- 👍 It indicates that the reported problem has been visualized and will be included in further commits.
- 👀 It indicates that linguists have gone over the comment but it is not resolved yet. In this situation, linguists must include an explanatory text with the justification of this status (for example, to be resolved later; disagreement; etc.)
Comments should be launched from the corresponding file or from the general screen (conversation). For its resolution, click Resolve conversation or select Hide from the drop-down menu. Afterwards, select the option Resolved.
If the comment cannot be resolved, it is edited and substituted by “OK”.
In general, reviewers are in charge of changing the comment status to Resolved.
Comments should be as clarifying as possible by including screenshots or other references.
In case a comment resolution is pending, the local developer must be informed and it is recommended to change the status to still pending.
If the answer to a comment by the reviewer is not clearly understood, the local team can contact him.
If modifications affect to several channels, changes can be uploaded to one channel and, afterwards, copied to the other channels.
Comparation of branches:
- In case of merging of a large PR, it is recommended to compare the corresponding branches to avoid information to be lost. For this purpose, Pycharm can be useful.
- The tool compare allows this comparison, just by selecting the folder/file with the right bottom, selecting the option Git and compare with branch and then clicking on the branch to be compared.
- The different files appear in different colors: existing files in blue, added files in green and deleted ones in grey. By clicking on a file, a new window is opened showing the differences between branches.
- It is also possible to compare branches and versions from Github: https://github.com/Telefonica/[REPO] /compare/
For the PR review, it is recommended the use of REGEX. Some examples are included below:
- Finding duplicates: ^(.?)$\s+?^(?=.^\1$)
- No space after an entity: [ent.[a-z_]+][a-z]+
- No space before an entity: [a-z]+[ent.[a-z_]+]
- No extra spaces after values: \h+$
- Sentences missing: "\¿[a-záéíóúñ _[].]+"
- Sentences missing: "[a-záéíóúñ _[].]+?"
The PR is reviewed by different members of the team, within an ongoing process.

Most frequent comments in the review process

The following table includes some of the most frequent comments that are reported during the review of the Pull Requests by the NLP Global Team, organized by category.

⚠️ Please, take the following tables as merely indicative in terms of the category where each comment is included as, depending on the specific scenario and the use case specifications, a comment can be moved from one category (“block”, “non-block” or “sugg”) to another.

Review of CLU training and testset

The following best practices are valid for the CLU intent recognition stage.

Entities

Block	non-block	Sugg
Ill-formed (incorrect name, missing ‘[‘, blank space missing before/after the entity; blank space before ‘:’ in the entity name)	Alphabetic order missing (by type and by value)	Structuring of training and test set files in blocks (for example, verbs, use cases, entities, etc.)
Value declared in phraselist but not tagged in training set	“Cosmetic changes”: uppercase letters, question marks, unnecessary blank spaces, accents	New values for entities
Values with an incorrect entity	Indentation	Suggestions on phrases for training and test set files
Repeated values in two entities		Suggestions on new entities
Repeated values for a specific entity		Suggestions on patterns for the test set file
Value tagged but not declared in a phraselist
Typographical errors (if not on purpose), missing words
Values representativeness: as far as possible, the training set must contain all the different values of entities. At least, it must include a representative list of them

Intents

Block	non-block	Sugg
Intent name not agreed by the Global Team	Alphabetic order missing (by type and by value)	Structuring of training and test set files in blocks (for example, verbs, use cases, entities, etc.)
All intents not represented in the training set and testset files	“Cosmetic changes”: uppercase letters, question marks, unnecessary blank spaces, accents	New values for entities
Overlap between intents	Indentation	Suggestions on phrases for training and test set files
Phrases with out-of-scope intent		Suggestions on new entities
Typographical errors, missing words		Suggestions on patterns for the test set file
Repeated phrases
Illogical phrases
Unfulfillment of ratio 80%-20% for training-test statements

Files

Block	non-block	Sugg
Ill-formed json files
Not updated date
Different information between channels (between shared intents)
Modification on configuration files (except to agreed changes)

Review of E2E testset

Block	non-block	Sugg
Ill-formed json files	“Cosmetic changes”: uppercase letters, question marks, unnecessary blank spaces, accents	Structuring of training and test set files in blocks (for example, verbs, use cases, entities, etc.)
Wrong position of entities	Lack of representativeness of the different structures	New values for entities
Incorrect tags	Alphabetic order missing (by domain, intent & utterance)	Suggestions on phrases for training and test set files
Not represented intent	“Default” domain	Suggestions on new entities
Wrong order for keys: phrase, domain, intent, entities		Suggestions on patterns for the test set file
Typographical errors (if not on purpose), missing words
Accuracy lower than 80% (by default value set by Aura Global Team)
Result validation: Review of results from the PR, identification of errors and improvement suggestions
Regression file: Bugs or specific phrases not included in the `testset.json` file that must be recognized
Canonical phrase not included in E2E testset
Unfulfillment for recommended number of testing statements in the E2E test set: - 20 statements (CLU); - 30 statements (CLU + Grammar); - 3 statements (Grammar)

Last modified May 14, 2025: feat: Documentation improvement for Prince release #AURA-29163 [RTM] (c69c1272)