Norikura normalization pipeline

Description and stages

Norikura is a pipeline used for the normalization of the user’s utterance through the execution of the following normalizers:

  • PunctuationNormalizer
  • SplitPunctNormalizer
  • SpaceNormalizer
  • CurrencyNormalizer
  • UnicodeNormalizer
  • LowercaseNormalizer
  • StopWordsFromFileNormalizer
  • WordReplacerFromFileNormalizer
P u n c t u a t i o n N o r m a l i z e r S p l i t P u n c t N o r m a l i z e r W o r d R e p l a c e r S F p r a o c m e F N i o l r e m N a o l r i m z a e l r i z e r S t o p W C o u r r d r s e F n r c o y m N F o i r l m e a N l o i r z m e a r l i z e r L U o n w i e c r o c d a e s N e o N r o m r a m l a i l z i e z r e r

Configuration

This stage requires the following configuration in the nlp.json configuration file:

For the specific language and channel, in the nlp field of this JSON file, the key normalizer_pipeline_class must be filled in with the value: auracog_pipelines.pipelines.normalization.norikura.NorikuraPipeline

{
  "es-es": {
    "mp": {
         "nlp": {
         "normalizer_pipeline_class": "auracog_pipelines.pipelines.normalization.norikura.NorikuraPipeline"
      }
    }
  }
}