Categories:
Review of a Pull Request by NLP Global Team
Procedure followed by the NLP Global Team in order to validate the Pull Request including the NLP model
This process is done once the Pull Request is launched, for the evaluation of the NLP model by the NLP Global Team: Certify NLP model accuracy: review by the NLP Global Team.
Introduction
The review of the Pull Request including the NLP model carried out by the NLP Global Team includes the processes explained in the following sections.
It can be very useful for Local Teams to know these processes and criteria used by the NLP Global Team in order to focus on the critical points.
Categories of errors and problems
Detected errors are classified into three categories:
-
BLOCK: Blocking task. It must be resolved in order to approve and merge the PR. In case there are certain blockers to be modified, the system dismisses the GitHub Pull Request and publishes a comment describing the problem and indicating the procedure to resolve it. This case requires re-training the NLP model.
-
NON-BLOCK: Mandatory but non-blocking task. It must be resolved following the guidelines and best practices in the current or in further PRs.
-
SUGG: Not mandatory but recommended modifications that should be taken into account even for subsequent PRs. For them, it is recommended to inform the NLP Global Team whether the suggestion is taken into account or not.
The setting of an adequate threshold for the NLP system accuracy depends on the use case. Therefore, for a specific use case, the minimum accuracy should be agreed by L-CDO and the NLP Global Team.
Best practices for the Pull Request validation
These best practices should be followed both by the NLP Global Team and the local linguists, if they participate in the validation process.
-
Take into account the following icons that indicate different status to reviewers:
- 👍 It indicates that the reported problem has been visualized and will be included in further commits.
- 👀 It indicates that linguists have gone over the comment but it is not resolved yet. In this situation, linguists must include an explanatory text with the justification of this status (for example, to be resolved later; disagreement; etc.)
-
Comments should be launched from the corresponding file or from the general screen (conversation). For its resolution, click Resolve conversation or select Hide from the drop-down menu. Afterwards, select the option Resolved.
-
If the comment cannot be resolved, it is edited and substituted by “OK”.
-
In general, reviewers are in charge of changing the comment status to Resolved.
-
Comments should be as clarifying as possible by including screenshots or other references.
-
In case a comment resolution is pending, the local developer must be informed and it is recommended to change the status to still pending.
-
If the answer to a comment by the reviewer is not clearly understood, the local team can contact him.
-
If modifications affect to several channels, changes can be uploaded to one channel and, afterwards, copied to the other channels.
-
Comparation of branches:
- In case of merging of a large PR, it is recommended to compare the corresponding branches to avoid information to be lost. For this purpose, Pycharm can be useful.
- The tool compare allows this comparison, just by selecting the folder/file with the right bottom, selecting the option Git and compare with branch and then clicking on the branch to be compared.
- The different files appear in different colors: existing files in blue, added files in green and deleted ones in grey. By clicking on a file, a new window is opened showing the differences between branches.
- It is also possible to compare branches and versions from Github: https://github.com/Telefonica/[REPO] /compare/
-
For the PR review, it is recommended the use of REGEX. Some examples are included below:
- Finding duplicates: ^(.?)$\s+?^(?=.^\1$)
- No space after an entity: [ent.[a-z_]+][a-z]+
- No space before an entity: [a-z]+[ent.[a-z_]+]
- No extra spaces after values: \h+$
- Sentences missing: "\¿[a-záéíóúñ _[].]+"
- Sentences missing: "[a-záéíóúñ _[].]+?"
-
The PR is reviewed by different members of the team, within an ongoing process.
Most frequent comments in the review process
The following table includes some of the most frequent comments that are reported during the review of the Pull Requests by the NLP Global Team, organized by category.
⚠️ Please, take the following tables as merely indicative in terms of the category where each comment is included as, depending on the specific scenario and the use case specifications, a comment can be moved from one category (“block”, “non-block” or “sugg”) to another.
Review of CLU training and testset
The following best practices are valid for the CLU intent recognition stage.
Entities
| Block | non-block | Sugg |
|---|---|---|
| Ill-formed (incorrect name, missing ‘[‘, blank space missing before/after the entity; blank space before ‘:’ in the entity name) | Alphabetic order missing (by type and by value) | Structuring of training and test set files in blocks (for example, verbs, use cases, entities, etc.) |
| Value declared in phraselist but not tagged in training set | “Cosmetic changes”: uppercase letters, question marks, unnecessary blank spaces, accents | New values for entities |
| Values with an incorrect entity | Indentation | Suggestions on phrases for training and test set files |
| Repeated values in two entities | Suggestions on new entities | |
| Repeated values for a specific entity | Suggestions on patterns for the test set file | |
| Value tagged but not declared in a phraselist | ||
| Typographical errors (if not on purpose), missing words | ||
| Values representativeness: as far as possible, the training set must contain all the different values of entities. At least, it must include a representative list of them |
Intents
| Block | non-block | Sugg |
|---|---|---|
| Intent name not agreed by the Global Team | Alphabetic order missing (by type and by value) | Structuring of training and test set files in blocks (for example, verbs, use cases, entities, etc.) |
| All intents not represented in the training set and testset files | “Cosmetic changes”: uppercase letters, question marks, unnecessary blank spaces, accents | New values for entities |
| Overlap between intents | Indentation | Suggestions on phrases for training and test set files |
| Phrases with out-of-scope intent | Suggestions on new entities | |
| Typographical errors, missing words | Suggestions on patterns for the test set file | |
| Repeated phrases | ||
| Illogical phrases | ||
| Unfulfillment of ratio 80%-20% for training-test statements |
Files
| Block | non-block | Sugg |
|---|---|---|
| Ill-formed json files | ||
| Not updated date | ||
| Different information between channels (between shared intents) | ||
| Modification on configuration files (except to agreed changes) |
Review of E2E testset
| Block | non-block | Sugg |
|---|---|---|
| Ill-formed json files | “Cosmetic changes”: uppercase letters, question marks, unnecessary blank spaces, accents | Structuring of training and test set files in blocks (for example, verbs, use cases, entities, etc.) |
| Wrong position of entities | Lack of representativeness of the different structures | New values for entities |
| Incorrect tags | Alphabetic order missing (by domain, intent & utterance) | Suggestions on phrases for training and test set files |
| Not represented intent | “Default” domain | Suggestions on new entities |
| Wrong order for keys: phrase, domain, intent, entities | Suggestions on patterns for the test set file | |
| Typographical errors (if not on purpose), missing words | ||
| Accuracy lower than 80% (by default value set by Aura Global Team) | ||
| Result validation: Review of results from the PR, identification of errors and improvement suggestions | ||
Regression file: Bugs or specific phrases not included in the testset.json file that must be recognized |
||
| Canonical phrase not included in E2E testset | ||
| Unfulfillment for recommended number of testing statements in the E2E test set: - 20 statements (CLU); - 30 statements (CLU + Grammar); - 3 statements (Grammar) |