You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nlpcraft.apache.org by se...@apache.org on 2022/10/19 06:11:26 UTC
[incubator-nlpcraft-website] 02/02: WIP.
This is an automated email from the ASF dual-hosted git repository.
sergeykamov pushed a commit to branch NLPCRAFT-513
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git
commit e54bd225c0de70b42c9be7bd8d1998068c30622b
Author: skhdl <sk...@gmail.com>
AuthorDate: Wed Oct 19 10:11:13 2022 +0400
WIP.
---
examples/light_switch_ru.html | 468 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 468 insertions(+)
diff --git a/examples/light_switch_ru.html b/examples/light_switch_ru.html
new file mode 100644
index 0000000..80d8d44
--- /dev/null
+++ b/examples/light_switch_ru.html
@@ -0,0 +1,468 @@
+---
+active_crumb: Light Switch RU <code><sub>ex</sub></code>
+layout: documentation
+id: light_switch_ru
+fa_icon: fa-cube
+---
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<div class="col-md-8 second-column example">
+ <section id="overview">
+ <h2 class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <p>
+ This example provides a very simple Russian language implementation for NLI-powered light switch. You can say something like
+ "Выключи свет по всем доме" or "Включи свет в детской".
+ By modifying intent callbacks using, for example, HomeKit or Arduino-based controllers you can provide the
+ actual light switching.
+ </p>
+ <p>
+ <b>Complexity:</b> <span class="complexity-two-star"><i class="fas fa-square"></i> <i class="fas fa-square"></i> <i class="far fa-square"></i></span><br/>
+ <span class="ex-src">Source code: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples/lightswitch_ru">GitHub <i class="fab fa-fw fa-github"></i></a><br/></span>
+ <span class="ex-review-all">Review: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">All Examples at GitHub <i class="fab fa-fw fa-github"></i></a></span>
+ </p>
+ </section>
+ <section id="new_project">
+ <h2 class="section-title">Create New Project <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <p>
+ You can create new Scala projects in many ways - we'll use SBT
+ to accomplish this task. Make sure that <code>build.sbt</code> file has the following content:
+ </p>
+ <pre class="brush: js, highlight: [8, 9, 10]">
+ ThisBuild / version := "0.1.0-SNAPSHOT"
+ ThisBuild / scalaVersion := "3.1.3"
+ lazy val root = (project in file("."))
+ .settings(
+ name := "NLPCraft LightSwitch FR Example",
+ version := "{{site.latest_version}}",
+ libraryDependencies += "org.apache.nlpcraft" % "nlpcraft" % "{{site.latest_version}}",
+ libraryDependencies += "org.apache.lucene" % "lucene-analyzers-common" % "8.11.2",
+ libraryDependencies += "org.languagetool" % "languagetool-core" % "5.8",
+ libraryDependencies += "org.languagetool" % "language-ru" % "5.8"
+ libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.14" % "test"
+ )
+ </pre>
+
+ <p>
+ On <code>lines 8, 9 and 10</code> added libraries, which used for support base NLP operations with Russian language.
+ </p>
+
+ <p><b>NOTE: </b>use the latest versions of Scala and ScalaTest.</p>
+ <p>Create the following files so that resulting project structure would look like the following:</p>
+ <ul>
+ <li><code>lightswitch_model_ru.yaml</code> - YAML configuration file, which contains model description.</li>
+ <li><code>LightSwitchRuModel.scala</code> - Scala class, model implementation.</li>
+ <li><code>NCRuSemanticEntityParser.scala</code> - Scala class, semantic entity parser, custom implementation for Russian language.</li>
+ <li><code>NCRuLemmaPosTokenEnricher.scala</code> - Scala class, lemma and point of speech token enricher, custom implementation for Russian language.</li>
+ <li><code>NCRuStopWordsTokenEnricher.scala</code> - Scala class, stop-words token enricher, custom implementation for Russian language.</li>
+ <li><code>NCRuTokenParser.scala</code> - Scala class, token parser, custom implementation for Russian language.</li>
+ <li><code>LightSwitchRuModelSpec.scala</code> - Scala tests class, which allows to test your model.</li>
+ </ul>
+ <pre class="brush: plain, highlight: [7, 10, 14, 17, 18, 20, 24]">
+ | build.sbt
+ +--project
+ | build.properties
+ \--src
+ +--main
+ | +--resources
+ | | lightswitch_model_ru.yaml
+ | \--scala
+ | \--demo
+ | | LightSwitchRuModel.scala
+ | \--nlp
+ | +--entity
+ | | \--parser
+ | | NCRuSemanticEntityParser.scala
+ | \--token
+ | +--enricher
+ | | NCRuLemmaPosTokenEnricher.scala
+ | | NCRuStopWordsTokenEnricher.scala
+ | \--parser
+ | NCRuTokenParser.scala
+ \--test
+ \--scala
+ \--demo
+ LightSwitchRuModelSpec.scala
+ </pre>
+ </section>
+ <section id="model">
+ <h2 class="section-title">Data Model<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <p>
+ We are going to start with declaring the static part of our model using YAML which we will later load using
+ <code>NCModelAdapter</code> in our Scala-based model implementation.
+ Open <code>src/main/resources/<b>light_switch_ru.yaml</b></code>
+ file and replace its content with the following YAML:
+ </p>
+ <pre class="brush: js, highlight: [1, 8, 13, 21]">
+ macros:
+ "<TURN_ON>" : "{включить|включать|врубить|врубать|запустить|запускать|зажигать|зажечь}"
+ "<TURN_OFF>" : "{погасить|загасить|гасить|выключить|выключать|вырубить|вырубать|отключить|отключать|убрать|убирать|приглушить|приглушать|стоп}"
+ "<ENTIRE_OPT>" : "{весь|все|всё|повсюду|вокруг|полностью|везде|_}"
+ "<LIGHT_OPT>" : "{это|лампа|бра|люстра|светильник|лампочка|лампа|освещение|свет|электричество|электрика|_}"
+
+ elements:
+ - id: "ls:loc"
+ description: "Location of lights."
+ synonyms:
+ - "<ENTIRE_OPT> {здание|помещение|дом|кухня|детская|кабинет|гостиная|спальня|ванная|туалет|{большая|обеденная|ванная|детская|туалетная} комната}"
+
+ - id: "ls:on"
+ groups:
+ - "act"
+ description: "Light switch ON action."
+ synonyms:
+ - "<LIGHT_OPT> <ENTIRE_OPT> <TURN_ON>"
+ - "<TURN_ON> <ENTIRE_OPT> <LIGHT_OPT>"
+
+ - id: "ls:off"
+ groups:
+ - "act"
+ description: "Light switch OFF action."
+ synonyms:
+ - "<LIGHT_OPT> <ENTIRE_OPT> <TURN_OFF>"
+ - "<TURN_OFF> <ENTIRE_OPT> <LIGHT_OPT>"
+ - "без <ENTIRE_OPT> <LIGHT_OPT>"
+ </pre>
+ <p>There are number of important points here:</p>
+ <ul>
+ <li>
+ <code>Line 1</code> defines several macros that are used later on throughout the model's elements
+ to shorten the synonym declarations. Note how macros coupled with option groups
+ shorten overall synonym declarations 1000:1 vs. manually listing all possible word permutations.
+ </li>
+ <li>
+ <code>Lines 8, 13, 21</code> define three model elements: the location of the light, and actions to turn
+ the light on and off. Action elements belong to the same group <code>act</code> which
+ will be used in our intent, defined in <code>LightSwitchRuModel</code> class. Note that these model
+ elements are defined mostly through macros we have defined above.
+
+ </li>
+ </ul>
+ <div class="bq info">
+ <p><b>YAML vs. API</b></p>
+ <p>
+ As usual, this YAML-based static model definition is convenient but totally optional. All elements definitions
+ can be provided programmatically inside Scala model <code>LightSwitchRuModel</code> class as well.
+ </p>
+ </div>
+ </section>
+ <section id="code">
+ <h2 class="section-title">Model Class <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <p>
+ Open <code>src/main/scala/demo/<b>LightSwitchRuModel.scala</b></code> file and replace its content with the following code:
+ </p>
+ <pre class="brush: scala, highlight: [11, 12, 13, 20, 21, 24, 25, 32]">
+ package demo
+
+ import com.google.gson.Gson
+ import org.apache.nlpcraft.*
+ import org.apache.nlpcraft.annotations.*
+ import demo.nlp.entity.parser.NCRuSemanticEntityParser
+ import demo.lightswitch.nlp.token.enricher.*
+ import demo.lightswitch.nlp.token.parser.NCRuTokenParser
+ import scala.jdk.CollectionConverters.*
+
+ class LightSwitchRuModel extends NCModelAdapter(
+ NCModelConfig("nlpcraft.lightswitch.ru.ex", "LightSwitch Example Model RU", "1.0"),
+ new NCPipelineBuilder().
+ withTokenParser(new NCRuTokenParser()).
+ withTokenEnricher(new NCRuLemmaPosTokenEnricher()).
+ withTokenEnricher(new NCRuStopWordsTokenEnricher()).
+ withEntityParser(new NCRuSemanticEntityParser("lightswitch_model_ru.yaml")).
+ build
+ ):
+ @NCIntent("intent=ls term(act)={has(ent_groups, 'act')} term(loc)={# == 'ls:loc'}*")
+ def onMatch(
+ ctx: NCContext,
+ im: NCIntentMatch,
+ @NCIntentTerm("act") actEnt: NCEntity,
+ @NCIntentTerm("loc") locEnts: List[NCEntity]
+ ): NCResult =
+ val action = if actEnt.getId == "ls:on" then "включить" else "выключить"
+ val locations = if locEnts.isEmpty then "весь дом" else locEnts.map(_.mkText).mkString(", ")
+
+ // Add HomeKit, Arduino or other integration here.
+ // By default - just return a descriptive action string.
+ NCResult(new Gson().toJson(Map("locations" -> locations, "action" -> action).asJava))
+ </pre>
+ <p>
+ The intent callback logic is very simple - we return a descriptive confirmation message
+ back (explaining what lights were changed). With action and location detected, you can add
+ the actual light switching using HomeKit or Arduino devices. Let's review this implementation step by step:
+ </p>
+ <ul>
+ <li>
+ On <code>line 11</code> our class extends <code>NCModelAdapter</code> that allows us to pass
+ prepared configuration and pipeline into model.
+ </li>
+ <li>
+ On <code>line 12</code> created model configuration with most default parameters.
+ </li>
+ <li>
+ On <code>line 13</code> created pipeline, based on custom Russian language components, which are described below.
+ <ul>
+ <li><code>NCRuTokenParser</code>. Token parser.</li>
+ <li><code>NCRuLemmaPosTokenEnricher</code>. Lemma and point of speech token enricher.</li>
+ <li><code>NCRuStopWordsTokenEnricher</code>. Stop-words token enricher.</li>
+ <li><code>NCRuSemanticEntityParser</code>. Semantic entity parser extending.</li>
+ </ul>
+ Note that <code>NCRuSemanticEntityParser</code> is based on semantic model definition,
+ described in <code>lightswitch_model_ru.yaml</code> file.
+ </li>
+ <li>
+ <code>Lines 20 and 21</code> annotates intents <code>ls</code> and its callback method <code>onMatch</code>.
+ Intent <code>ls</code> requires one action (a token belonging to the group act) and optional list of light locations
+ (zero or more tokens with ID ls:loc) - by default we assume the entire house as a default location.
+ </li>
+ <li>
+ <code>Lines 24 and 25</code> map terms from detected intent to the formal method parameters of the
+ <code>onMatch</code> method.
+ </li>
+ <li>
+ On the <code>line 32</code> the intent callback simply returns a confirmation message.
+ </li>
+ </ul>
+
+ <p>
+ Lets review each custom pipeline components.
+ </p>
+
+ <p>
+ Open <code>src/main/scala/demo/nlp/token/parser/<b>NCRuTokenParser.scala</b></code> file and replace its content with the following code:
+ </p>
+
+ <pre class="brush: scala, highlight: [19]">
+ package demo.nlp.token.parser
+
+ import org.apache.nlpcraft.*
+ import org.languagetool.tokenizers.WordTokenizer
+ import scala.jdk.CollectionConverters.*
+
+ class NCRuTokenParser extends NCTokenParser:
+ private val tokenizer = new WordTokenizer
+
+ override def tokenize(text: String): List[NCToken] =
+ val toks = collection.mutable.ArrayBuffer.empty[NCToken]
+ var sumLen = 0
+
+ for ((word, idx) <- tokenizer.tokenize(text).asScala.zipWithIndex)
+ val start = sumLen
+ val end = sumLen + word.length
+
+ if word.strip.nonEmpty then
+ toks += new NCPropertyMapAdapter with NCToken:
+ override def getText: String = word
+ override def getIndex: Int = idx
+ override def getStartCharIndex: Int = start
+ override def getEndCharIndex: Int = end
+
+ sumLen = end
+
+ toks.toList
+ </pre>
+ <p>
+ <code>NCRuTokenParser</code> is simple wrapper, which implements <code>NCTokenParser</code> based on
+ open source solution <a href="https://languagetool.org">Language Tool</a> solution.
+ On <code>line 19</code> <code>NCToken</code> instances created.
+ </p>
+
+ <p>
+ Open <code>src/main/scala/demo/nlp/token/enricher/<b>NCRuLemmaPosTokenEnricher.scala</b></code> file and replace its content with the following code:
+ </p>
+ <pre class="brush: scala, highlight: [27, 28]">
+ package demo.nlp.token.enricher
+
+ import org.apache.nlpcraft.*
+ import org.languagetool.AnalyzedToken
+ import org.languagetool.tagging.ru.RussianTagger
+ import scala.jdk.CollectionConverters.*
+
+ class NCRuLemmaPosTokenEnricher extends NCTokenEnricher:
+ private def nvl(v: String, dflt : => String): String = if v != null then v else dflt
+
+ override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit =
+ val tags = RussianTagger.INSTANCE.tag(toks.map(_.getText).asJava).asScala
+
+ require(toks.size == tags.size)
+
+ toks.zip(tags).foreach { case (tok, tag) =>
+ val readings = tag.getReadings.asScala
+
+ val (lemma, pos) = readings.size match
+ // No data. Lemma is word as is, POS is undefined.
+ case 0 => (tok.getText, "")
+ // Takes first. Other variants ignored.
+ case _ =>
+ val aTok: AnalyzedToken = readings.head
+ (nvl(aTok.getLemma, tok.getText), nvl(aTok.getPOSTag, ""))
+
+ tok.put("pos", pos)
+ tok.put("lemma", lemma)
+
+ () // Otherwise NPE.
+ }
+ </pre>
+ <p>
+ <code>NCRuLemmaPosTokenEnricher</code> lemma and point of speech tokens enricher, based on
+ open source solution <a href="https://languagetool.org">Language Tool</a> solution.
+ On <code>line 27 and 28</code> tokens are enriched by <code>pos</code> and <code>lemma</code> data.
+ </p>
+
+ <p>
+ Open <code>src/main/scala/demo/nlp/token/enricher/<b>NCRuStopWordsTokenEnricher.scala</b></code> file and replace its content with the following code:
+ </p>
+
+ <pre class="brush: scala, highlight: [17]">
+ package demo.nlp.token.enricher
+
+ import org.apache.lucene.analysis.ru.RussianAnalyzer
+ import org.apache.nlpcraft.*
+
+ class NCRuStopWordsTokenEnricher extends NCTokenEnricher:
+ private val stops = RussianAnalyzer.getDefaultStopSet
+
+ private def getPos(t: NCToken): String = t.get("pos").getOrElse(throw new NCException("POS not found in token."))
+ private def getLemma(t: NCToken): String = t.get("lemma").getOrElse(throw new NCException("Lemma not found in token."))
+
+ override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit =
+ for (t <- toks)
+ val lemma = getLemma(t)
+ lazy val pos = getPos(t)
+
+ t.put(
+ "stopword",
+ lemma.length == 1 && !Character.isLetter(lemma.head) && !Character.isDigit(lemma.head) ||
+ stops.contains(lemma.toLowerCase) ||
+ pos.startsWith("PARTICLE") ||
+ pos.startsWith("INTERJECTION") ||
+ pos.startsWith("PREP")
+ )
+ </pre>
+ <p>
+ <code>NCRuStopWordsTokenEnricher</code> stop-words tokens enricher, based on
+ open source solution <a href="https://lucene.apache.org/">Apache Lucene</a> solution.
+ On <code>line 17</code> tokens are enriched by <code>stopword</code> flags data.
+ </p>
+
+ <p>
+ Open <code>src/main/scala/demo/nlp/entity/parser/<b>NCRuSemanticEntityParser.scala</b></code> file and replace its content with the following code:
+ </p>
+
+ <pre class="brush: scala, highlight: [8, 12]">
+ package demo.nlp.entity.parser
+
+ import opennlp.tools.stemmer.snowball.SnowballStemmer
+ import demo.nlp.token.parser.NCRuTokenParser
+ import org.apache.nlpcraft.nlp.parsers.*
+
+ class NCRuSemanticEntityParser(src: String) extends NCSemanticEntityParser(
+ new NCSemanticStemmer:
+ private val stemmer = new SnowballStemmer(SnowballStemmer.ALGORITHM.RUSSIAN)
+ override def stem(txt: String): String = stemmer.synchronized { stemmer.stem(txt.toLowerCase).toString }
+ ,
+ new NCRuTokenParser(),
+ mdlSrcOpt = Option(src)
+ )
+ </pre>
+ <p>
+ <code>NCRuSemanticEntityParser</code> extends <code>NCSemanticEntityParser</code>
+ It uses stemmer implementation from <a href="https://opennlp.apache.org/">Apache OpenNLP</a> solution
+ on <code>line 8</code> and already described <code>NCRuTokenParser</code> token parser implementation on <code>line 12</code>.
+ </p>
+ </section>
+
+ <section id="testing">
+ <h2 class="section-title">Testing <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <p>
+ The test defined in <code>LightSwitchRuModelSpec</code> allows to check that all input test sentences are
+ processed correctly and trigger the expected intent <code>ls</code>:
+ </p>
+ <pre class="brush: scala, highlight: [9, 11]">
+ package demo
+
+ import org.apache.nlpcraft.*
+ import org.scalatest.funsuite.AnyFunSuite
+ import scala.util.Using
+
+ class LightSwitchRuModelSpec extends AnyFunSuite:
+ test("test") {
+ Using.resource(new NCModelClient(new LightSwitchRuModel)) { client =>
+ def check(txt: String): Unit =
+ require(client.debugAsk(txt, "userId", true).getIntentId == "ls")
+
+ check("Выключи свет по всем доме")
+ check("Выруби электричество!")
+ check("Включи свет в детской")
+ check("Включай повсюду освещение")
+ check("Включайте лампы в детской комнате")
+ check("Свет на кухне, пожалуйста, приглуши")
+ check("Нельзя ли повсюду выключить свет?")
+ check("Пожалуйста без света")
+ check("Отключи электричество в ванной")
+ check("Выключи, пожалуйста, тут всюду свет")
+ check("Выключай все!")
+ check("Свет пожалуйста везде включи")
+ check("Зажги лампу на кухне")
+ }
+ }
+ </pre>
+ <ul>
+ <li>
+ On <code>line 9</code> the client for our model is created.
+ </li>
+ <li>
+ On <code>line 11</code> a special method <code>debugAsk</code> is called.
+ It allows to check the winning intent and its callback parameters without actually
+ calling the intent.
+ </li>
+ <li>
+ <code>Lines 13-25</code> define all the test input sentences that should all
+ trigger <code>ls</code> intent.
+ </li>
+ </ul>
+ <p>
+ You can run this test via SBT task <code>executeTests</code> or using IDE.
+ </p>
+ <pre class="brush: scala, highlight: []">
+ PS C:\apache\incubator-nlpcraft-examples\lightswitch_ru> sbt executeTests
+ </pre>
+ </section>
+ <section>
+ <h2 class="section-title">Done! 👌 <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <p>
+ You've created light switch data model and tested it.
+ </p>
+ </section>
+</div>
+<div class="col-md-2 third-column">
+ <ul class="side-nav">
+ <li class="side-nav-title">On This Page</li>
+ <li><a href="#overview">Overview</a></li>
+ <li><a href="#new_project">New Project</a></li>
+ <li><a href="#model">Data Model</a></li>
+ <li><a href="#code">Model Class</a></li>
+ <li><a href="#testing">Testing</a></li>
+ {% include quick-links.html %}
+ </ul>
+</div>
+
+
+
+
+
+