Niseko normalization pipeline

Description and stages

Niseko is a pipeline used for the normalization of the user’s utterance through the execution of the following normalizers:

  • PunctuationNormalizer
  • SplitPunctNormalizer
  • SpaceNormalizer
  • CurrencyNormalizer
  • UnicodeNormalizer
  • LowercaseNormalizer
  • CardinalityNormalizer
  • PunctuationNormalizer
  • SpaceNormalizer
  • StopWordsFromFileNormalizer
  • WordReplacerFromFileNormalizer
S W t o o r P p d u W R n o e c r p t d l u s a a F c t r e i o r o m F n F r N i o o l m r e F m N i a o l l r e i m N z a o e l r r i m z a e l r i z e r S p l i S t p P a u c n e c N t o N r o m r a m l a i l z i e z r e r P u n S c p t a u c a e t N i o o r n m N a o l r i m z a e l r i z e r C a C r u d r i r n e a n l c i y t N y o N r o m r a m l a i l z i e z r e r L U o n w i e c r o c d a e s N e o N r o m r a m l a i l z i e z r e r

Configuration

This stage requires the following configuration in the nlp.json configuration file:

For the specific language and channel, in the nlp field of this JSON file, the key normalizer_pipeline_class must be filled in with the value: auracog_pipelines.pipelines.normalization.niseko.NisekoPipeline

{
  "es-es": {
    "mp": {
         "nlp": {
         "normalizer_pipeline_class": "auracog_pipelines.pipelines.normalization.niseko.NisekoPipeline"
      }
    }
  }
}