You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nlpcraft.apache.org by ar...@apache.org on 2023/03/01 20:39:40 UTC
[incubator-nlpcraft-website] branch master updated: Fixes.
This is an automated email from the ASF dual-hosted git repository.
aradzinski pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git
The following commit(s) were added to refs/heads/master by this push:
new d9f6b3a Fixes.
d9f6b3a is described below
commit d9f6b3a8d85b22a67c7ec54df37b690471ed916e
Author: Aaron Radzinski <ar...@datalingvo.com>
AuthorDate: Wed Mar 1 12:39:34 2023 -0800
Fixes.
---
data-model.html | 2997 -------------------------------------------------------
docs.html | 3 -
2 files changed, 3000 deletions(-)
diff --git a/data-model.html b/data-model.html
deleted file mode 100644
index 3cf569a..0000000
--- a/data-model.html
+++ /dev/null
@@ -1,2997 +0,0 @@
----
-active_crumb: Data Model
-layout: documentation
-id: data_model
----
-
-<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
-
-<div class="col-md-8 second-column">
- <section id="overview">
- <h2 class="section-title">Model Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Data model is a central concept in NLPCraft defining natural language interface to your data sources
- like a database or a SaaS application.
- NLPCraft employs a <em>model-as-a-code</em> approach where entire data model is an implementation of
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface which
- can be developed using any JVM programming language like Java, Scala, Kotlin, or Groovy.
- </p>
- <p>
- A data model defines:
- </p>
- <ul>
- <li>Set of model <a href="#elements">elements</a> (a.k.a. named entities) to be detected in the user input.</li>
- <li>Zero or more intents and their callbacks.</li>
- <li>Common model configuration and various life-cycle callbacks.</li>
- </ul>
- <p>
- Note that model-as-a-code approach natively supports any software life
- cycle tools and frameworks like various build tools, CI/SCM tools, IDEs, etc.
- You don't have to use additional web-based tools to manage some aspects of your
- data models - your entire model and all of its components are part of your project source code.
- </p>
- <p>
- Here's two quick examples of the fully-functional data model implementations (from <a href="/examples/light_switch.html">Light Switch</a> and
- <a href="/examples/alarm_clock.html">Alarm Clock</a> examples). You will find specific details about these
- implementations in the following sections:
- </p>
- <nav>
- <div class="nav nav-tabs" role="tablist">
- <a class="nav-item nav-link active" data-toggle="tab" href="#lightswitch" role="tab"><b>LightSwitch <code><sub>ex</sub></code></b></a>
- <a class="nav-item nav-link" data-toggle="tab" href="#alarm" role="tab"><b>Alarm <code><sub>ex</sub></code></b></a>
- </div>
- </nav>
- <div class="tab-content">
- <div class="tab-pane fade show active" id="lightswitch" role="tabpanel">
- <nav>
- <div class="nav nav-tabs" role="tablist">
- <a class="nav-item nav-link active" data-toggle="tab" href="#lightswitch_scala_model" role="tab"><code>LightSwitchModel.scala</code></a>
- <a class="nav-item nav-link" data-toggle="tab" href="#lightswitch_yaml_model" role="tab"><code>lightswitch_model.yaml</code></a>
- </div>
- </nav>
- <div class="tab-content">
- <div class="tab-pane fade show active" id="lightswitch_scala_model" role="tabpanel">
- <pre class="brush: scala">
-package org.apache.nlpcraft.examples.lightswitch
-
-import org.apache.nlpcraft.model.{NCIntentTerm, _}
-
-class LightSwitchModel extends NCModelFileAdapter("lightswitch_model.yaml") {
- @NCIntentRef("ls")
- @NCIntentSample(Array(
- "Turn the lights off in the entire house.",
- "Switch on the illumination in the master bedroom closet.",
- "Get the lights on.",
- "Lights up in the kitchen.",
- "Please, put the light out in the upstairs bedroom.",
- "Set the lights on in the entire house.",
- "Turn the lights off in the guest bedroom.",
- "Could you please switch off all the lights?",
- "Dial off illumination on the 2nd floor.",
- "Please, no lights!",
- "Kill off all the lights now!",
- "No lights in the bedroom, please.",
- "Light up the garage, please!"
- ))
- def onMatch(
- @NCIntentTerm("act") actTok: NCToken,
- @NCIntentTerm("loc") locToks: List[NCToken]
- ): NCResult = {
- val status = if (actTok.getId == "ls:on") "on" else "off"
- val locations =
- if (locToks.isEmpty)
- "entire house"
- else
- locToks.map(_.meta[String]("nlpcraft:nlp:origtext")).mkString(", ")
-
- // Add HomeKit, Arduino or other integration here.
-
- // By default - return a descriptive action string.
- NCResult.text(s"Lights are [$status] in [${locations.toLowerCase}].")
- }
-}
- </pre>
- </div>
- <div class="tab-pane fade show" id="lightswitch_yaml_model" role="tabpanel">
- <pre class="brush: js">
-id: "nlpcraft.lightswitch.ex"
-name: "Light Switch Example Model"
-version: "1.0"
-description: "NLI-powered light switch example model."
-macros:
- - name: "<ACTION>"
- macro: "{turn|switch|dial|let|set|get|put}"
- - name: "<KILL>"
- macro: "{shut|kill|stop|eliminate}"
- - name: "<ENTIRE_OPT>"
- macro: "{entire|full|whole|total|_}"
- - name: "<FLOOR_OPT>"
- macro: "{upstairs|downstairs|{1st|first|2nd|second|3rd|third|4th|fourth|5th|fifth|top|ground} floor|_}"
- - name: "<TYPE>"
- macro: "{room|closet|attic|loft|{store|storage} {room|_}}"
- - name: "<LIGHT>"
- macro: "{all|_} {it|them|light|illumination|lamp|lamplight}"
-enabledBuiltInTokens: [] # This example doesn't use any built-in tokens.
-
-#
-# Allows for multi-word synonyms in this entire model
-# to be sparse and permutate them for better detection.
-# These two properties generally enable a free-form
-# natural language comprehension.
-#
-permutateSynonyms: true
-sparse: true
-
-elements:
- - id: "ls:loc"
- description: "Location of lights."
- synonyms:
- - "<ENTIRE_OPT> <FLOOR_OPT> {kitchen|library|closet|garage|office|playroom|{dinning|laundry|play} <TYPE>}"
- - "<ENTIRE_OPT> <FLOOR_OPT> {master|kid|children|child|guest|_} {bedroom|bathroom|washroom|storage} {<TYPE>|_}"
- - "<ENTIRE_OPT> {house|home|building|{1st|first} floor|{2nd|second} floor}"
-
- - id: "ls:on"
- groups:
- - "act"
- description: "Light switch ON action."
- synonyms:
- - "<ACTION> {on|up|_} <LIGHT> {on|up|_}"
- - "<LIGHT> {on|up}"
-
- - id: "ls:off"
- groups:
- - "act"
- description: "Light switch OFF action."
- synonyms:
- - "<ACTION> <LIGHT> {off|out|down}"
- - "{<ACTION>|<KILL>} {off|out|down} <LIGHT>"
- - "<KILL> <LIGHT>"
- - "<LIGHT> <KILL>"
- - "{out|no|off|down} <LIGHT>"
- - "<LIGHT> {out|off|down}"
-
-intents:
- - "intent=ls term(act)={has(tok_groups, 'act')} term(loc)={# == 'ls:loc'}*"
- </pre>
- </div>
- </div>
- </div>
- <div class="tab-pane fade show" id="alarm" role="tabpanel">
- <nav>
- <div class="nav nav-tabs" role="tablist">
- <a class="nav-item nav-link active" data-toggle="tab" href="#alarm_java_model" role="tab"><code>AlarmModel.java</code></a>
- <a class="nav-item nav-link" data-toggle="tab" href="#alarm_intents_idl" role="tab"><code>intents.idl</code></a>
- <a class="nav-item nav-link" data-toggle="tab" href="#alarm_json_model" role="tab"><code>alarm_model.json</code></a>
- </div>
- </nav>
- <div class="tab-content">
- <div class="tab-pane fade show active" id="alarm_java_model" role="tabpanel">
- <pre class="brush: java">
-package org.apache.nlpcraft.examples.alarm;
-
-import org.apache.nlpcraft.model.*;
-
-import java.time.*;
-import java.util.*;
-
-import static java.time.temporal.ChronoUnit.MILLIS;
-
-public class AlarmModel extends NCModelFileAdapter {
- private static final DateTimeFormatter FMT =
- DateTimeFormatter.ofPattern("HH'h' mm'm' ss's'").withZone(ZoneId.systemDefault());
-
- private final Timer timer = new Timer();
-
- public AlarmModel() {
- // Loading the model from the file.
- super("alarm_model.json");
- }
-
- @NCIntentRef("alarm") // Intent is defined in JSON model file (alarm_model.json and intents.idl).
- @NCIntentSampleRef("alarm_samples.txt") // Samples supplied in an external file.
- NCResult onMatch(
- NCIntentMatch ctx,
- @NCIntentTerm("nums") List<NCToken> numToks
- ) {
- long ms = calculateTime(numToks);
-
- assert ms >= 0;
-
- timer.schedule(
- new TimerTask() {
- @Override
- public void run() {
- System.out.println(
- "BEEP BEEP BEEP for: " + ctx.getContext().getRequest().getNormalizedText() + ""
- );
- }
- },
- ms
- );
-
- return NCResult.text("Timer set for: " + FMT.format(LocalDateTime.now().plus(ms, MILLIS)));
- }
-
- @Override
- public void onDiscard() {
- // Clean up when model gets discarded (e.g. during testing).
- timer.cancel();
- }
-
- public static long calculateTime(List<NCToken> numToks) {
- LocalDateTime now = LocalDateTime.now();
- LocalDateTime dt = now;
-
- for (NCToken num : numToks) {
- String unit = num.meta("nlpcraft:num:unit");
-
- // Skip possible fractional to simplify.
- long v = ((Double)num.meta("nlpcraft:num:from")).longValue();
-
- if (v <= 0)
- throw new NCRejection("Value must be positive: " + unit);
-
- switch (unit) {
- case "second": { dt = dt.plusSeconds(v); break; }
- case "minute": { dt = dt.plusMinutes(v); break; }
- case "hour": { dt = dt.plusHours(v); break; }
- case "day": { dt = dt.plusDays(v); break; }
- case "week": { dt = dt.plusWeeks(v); break; }
- case "month": { dt = dt.plusMonths(v); break; }
- case "year": { dt = dt.plusYears(v); break; }
-
- default:
- // It shouldn't be an assertion, because 'datetime' unit can be extended outside.
- throw new NCRejection("Unsupported time unit: " + unit);
- }
- }
-
- return now.until(dt, MILLIS);
- }
-}
- </pre>
- </div>
- <div class="tab-pane fade show" id="alarm_intents_idl" role="tabpanel">
- <pre class="brush: idl">
-// Fragments (mostly for demo purposes here).
-fragment=buzz term~{# == 'x:alarm'}
-fragment=when
- term(nums)~{
- // Demonstrating term variables.
- @type = meta_tok('nlpcraft:num:unittype')
- @iseq = meta_tok('nlpcraft:num:isequalcondition') // Excludes conditional statements.
-
- # == 'nlpcraft:num' && @type == 'datetime' && @iseq == true
- }[1,7]
-
-// Intents (using fragments).
-intent=alarm
- fragment(buzz)
- fragment(when)
- </pre>
- </div>
- <div class="tab-pane fade show" id="alarm_json_model" role="tabpanel">
- <pre class="brush: js">
-{
- "id": "nlpcraft.alarm.ex",
- "name": "Alarm Example Model",
- "version": "1.0",
- "description": "Alarm example model.",
- "enabledBuiltInTokens": [
- "nlpcraft:num"
- ],
- "elements": [
- {
- "id": "x:alarm",
- "description": "Alarm token indicator.",
- "synonyms": [
- "{ping|buzz|wake|call|hit} {me|up|me up|_}",
- "{set|_} {my|_} {wake|wake up|_} {alarm|timer|clock|buzzer|call} {clock|_} {up|_}"
- ]
- }
- ],
- "intents": [
- "import('intents.idl')" // Import intents from external file.
- ]
-}
- </pre>
- </div>
- </div>
- </div>
- </div>
- <p>
- Further sub-sections will provide details on model's static configuration and dynamic programmable
- logic implementation.
- </p>
- </section>
- <section id="dataflow">
- <h2 class="section-title">Model Dataflow <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <figure>
- <img alt="data model dataflow" class="img-fluid" src="/images/homepage-fig1.1.png">
- <figcaption><b>Fig 1.</b> NLPCraft Architecture</figcaption>
- </figure>
- <p>
- Let's review the general dataflow of the user request in NLPCraft (from right to left).
- User request starts with the user application (like a chatbot or NLI-based system) making a
- REST call using <a href="/using-rest.html">NLPCraft REST API</a>. That REST call carries among
- other things the input text and data model ID, and it arrives first to the REST server.
- </p>
- <p>
- Upon receiving the user request, the REST server performs NLP pre-processing converting the input
- text into a sequence of tokens and enriching them with additional information.
- Once finished, the sequence of tokens is sent further down to the probe where the requested data model
- is deployed.
- </p>
- <p>
- Upon receiving that sequence of tokens, the data probe further
- enriches it based on the user data model and <a href="/intent-matching.html">matches</a> it against declared intents. When a matching
- intent is found its callback method is called and its result travels back from the data probe to the
- REST server and eventually to the user that made the REST call.
- </p>
- <div class="bq info">
- <p>
- <b>Security <span class="amp">&</span> Isolation</b>
- </p>
- <p>
- Note that in this architecture the user-defined data model is fully isolated from the REST server accepting
- user calls. Users never access data probes and hence data models directly. Typically REST server
- should be deployed in DMZ and only <em>ingress connectivity is required</em> from the REST server to data probes.
- </p>
- </div>
- </section>
- <section id="lifecycle">
- <h2 class="section-title">Model Lifecycle <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Data model is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface.
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface has
- defaults for most of its methods. These are the only methods that must to be implemented by its sub-class:
- </p>
- <ul>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId()">getId()</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName()">getName()</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion()">getVersion()</a></li>
- </ul>
- <p>
- You can either implement <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a>
- interface directly or use one of the adapters (recommended in most cases):
- </p>
- <ul>
- <li>
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelAdapter.html">NCModelAdapter</a> - when
- entire model definition is in sub-class source code.
- </li>
- <li>
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> - when
- using external JSON/YAML declaration for model definition.
- </li>
- </ul>
- <p>
- Note that you can also use 3rd party IoC frameworks like <a target=_ href="https://spring.io">Spring</a> to construct your data models. See
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFactory.html">NCModelFactory</a> for more information.
- </p>
- <div class="bq success">
- <div class="bq-idea-container">
- <div><div>💡</div></div>
- <div>
- <p>
- <b>Using Adapters</b>
- </p>
- <p>
- It is recommended to use one of the adapter classes when defining your
- own data model in the most uses cases.
- </p>
- </div>
- </div>
- </div>
- <h2 id="deployment" class="section-sub-title">Deployment <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Data models get <a href="/server-and-probe.html">deployed</a> to and hosted by the data probes - a lightweight
- container whose job is to host data models and securely transfer requests between REST server and the data
- models. When a data probe starts it reads its <a href="/server-and-probe.html">configuration</a>
- to see which models to deploy.
- </p>
- <p>
- Note that data probes don't support hot-redeployment. To redeploy the data model you need to restart
- the data probe. Note also that data probe can be started in <a href="/tools/embedded_probe.html">embedded mode</a>, i.e. it can be started
- from within an existing JVM process like user application.
- </p>
- <h2 id="callbacks" class="section-sub-title">Callbacks <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- There are two lifecycle callbacks on
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface
- (by way of extending <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html">NCLifecycle</a> interface) that you can override to affect the the default lifecycle behavior:
- </p>
- <ul>
- <li>
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onInit()">onInit()</a> - called
- right after the model was loaded and deployed.
- </li>
- <li>
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onDiscard()">onDiscard()</a> - called to
- discard the data model when and only when data probe is orderly shutting down.
- </li>
- </ul>
- <p>
- There are also several callbacks that you can override to affect model behavior during
- <a href="/intent-matching.html#model_callbacks">intent matching</a>
- to perform logging, debugging, statistic or usage collection, explicit update or initialization of
- conversation context, security audit or validation:
- </p>
- <ul>
- <li>
- <a target="javadoc"
- href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onParsedVariant(org.apache.nlpcraft.model.NCVariant)">onParsedVariant(...)</a>
- </li>
- <li>
- <a target="javadoc"
- href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext(org.apache.nlpcraft.model.NCContext)">onContext(...)</a>
- </li>
- <li>
- <a target="javadoc"
- href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent(org.apache.nlpcraft.model.NCIntentMatch)">onMatchedIntent(...)</a>
- </li>
- <li>
- <a target="javadoc"
- href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onResult(org.apache.nlpcraft.model.NCIntentMatch,org.apache.nlpcraft.model.NCResult)">onResult(...)</a>
- </li>
- <li>
- <a target="javadoc"
- href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onError(org.apache.nlpcraft.model.NCContext,java.lang.Throwable)">onError(...)</a>
- </li>
- <li>
- <a target="javadoc"
- href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onRejection(org.apache.nlpcraft.model.NCIntentMatch,org.apache.nlpcraft.model.NCRejection)">onRejection(...)</a>
- </li>
- </ul>
- <div class="bq info">
- <b>Conversation Reset</b>
- <p>
- Callbacks
- <a target="javadoc"
- href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext(org.apache.nlpcraft.model.NCContext)">onContext(...)</a> and
- <a target="javadoc"
- href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent(org.apache.nlpcraft.model.NCIntentMatch)">onMatchedIntent(...)</a>
- are especially handy to perform a soft reset on the conversation context. Read their Javadoc documentation
- to understand these callbacks protocol.
- </p>
- </div>
-
- <div class="bq info">
- <b>Lifecycle Components</b>
- <p>
- Note that both the server and the probe provide their own lifecycle components support. When registered in
- the probe or server configuration the lifecycle components will be called
- during various stages of the probe or server startup or shutdown procedures. These callbacks can be used
- to control lifecycle of external libraries and systems that the data probe or the server rely on, i.e.
- <a href="metrics-and-tracing.html">OpenCensus exporters</a>, security environment, devops hooks, etc.
- </p>
- <p>
- See server and probe <a href="">configuration</a>.
- </p>
- </div>
- </section>
- <section id="config">
- <h2 class="section-title">Model Configuration <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Apart from mandatory model <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId()">ID</a>,
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName()">name</a> and
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion()">version</a>
- there is a number of static model configurations that you can set. All of these properties have sensible
- defaults that you can override, when required, in either sub-classes or via external JSON/YAML declaration:
- </p>
- <ul>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getAdditionalStopWords()">getAdditionalStopWords</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">getEnabledBuiltInTokens</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getExcludedStopWords()">getExcludedStopWords</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxFreeWords()">getMaxFreeWords</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxSuspiciousWords()">getMaxSuspiciousWords</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTokens()">getMaxTokens</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTotalSynonyms()">getMaxTotalSynonyms</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxUnknownWords()">getMaxUnknownWords</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxWords()">getMaxWords</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMetadata()">getMetadata</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinNonStopwords()">getMinNonStopwords</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinTokens()">getMinTokens</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinWords()">getMinWords</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getSuspiciousWords()">getSuspiciousWords</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isDupSynonymsAllowed()">isDupSynonymsAllowed</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed()">isNonEnglishAllowed</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoNounsAllowed()">isNoNounsAllowed</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNotLatinCharsetAllowed()">isNotLatinCharsetAllowed</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoUserTokensAllowed()">isNoUserTokensAllowed</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isPermutateSynonyms()">isPermutateSynonyms</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSparse()">isSparse</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed()">isSwearWordsAllowed</a></li>
- </ul>
- <h2 class="section-sub-title">External JSON/YAML Declaration <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- You can move out all the static model configuration into an external JSON or YAML file. To load that
- configuration you need to use <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a>
- adapter when creating your data model. Here are JSON and YAML sample templates and you can find more details in
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> Javadoc and in
- <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">examples</a>.
- </p>
-
- <nav>
- <div class="nav nav-tabs" role="tablist">
- <a class="nav-item nav-link active" data-toggle="tab" href="#model-json" role="tab">JSON</a>
- <a class="nav-item nav-link" data-toggle="tab" href="#model-yaml" role="tab">YAML</a>
- </div>
- </nav>
- <div class="tab-content">
- <div class="tab-pane fade show active" id="model-json" role="tabpanel">
- <pre class="brush: js">
-{
- "id": "user.defined.id",
- "name": "User Defined Name",
- "version": "1.0",
- "description": "Short model description.",
- "enabledBuiltInTokens": ["google:person", "google:location"]
- "macros": [],
- "metadata": {},
- "elements": [
- {
- "id": "x:id",
- "description": "",
- "groups": [],
- "parentId": "",
- "synonyms": [],
- "metadata": {},
- "values": []
- }
- ],
- ...
- "intents": []
-}
- </pre>
- </div>
- <div class="tab-pane fade show" id="model-yaml" role="tabpanel">
- <pre class="brush: js">
-id: "user.defined.id"
-name: "User Defined Name"
-version: "1.0"
-description: "Short model description."
-macros:
-enabledBuiltInTokens:
-elements:
- - id: "x:id"
- description: ""
- synonyms:
- groups:
- values:
- parentId:
- metadata:
-...
-intents:
- </pre>
- </div>
- </div>
- <div class="bq success">
- <div class="bq-idea-container">
- <div><div>💡</div></div>
- <div>
- Note that using JSON/YAML-based configuration is a <b>canonical way</b> for
- creating data models in NLPCraft as it allows to cleanly separate static configuration from model's
- programmable logic.
- </div>
- </div>
- </div>
- </section>
- <section id="ne">
- <h2 class="section-title">Named Entities <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Named entity, also known as a model element or a token, is one of the main a components defined by the NLPCraft data model.
- A named entity is one or more individual words that have a consistent semantic meaning and typically denote a
- real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such
- object can be abstract or have a physical existence.
- </p>
- <p>
- For example, in the following sentence:
- </p>
- <figure>
- <img alt="named entities" class="img-fluid" src="/images/named-entities.png">
- <figcaption><b>Fig 2.</b> Named Entities</figcaption>
- </figure>
- <p>
- the following named entities can be detected:
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Words</th>
- <th>Type</th>
- <th>Normalized Value</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><b>Top 20</b></td>
- <td><code>nlpcraft:limit</code></td>
- <td>top 20</td>
- </tr>
- <tr>
- <td><b>best pages</b></td>
- <td><code>user:element</code></td>
- <td>best pages</td>
- </tr>
- <tr>
- <td><b>California USA</b></td>
- <td><code>nlpcraft:geo</code></td>
- <td>USA, California</td>
- </tr>
- <tr>
- <td><b>last 3 months</b></td>
- <td><code>nlpcraft:date</code></td>
- <td>1/1/2021 - 4/1/2021</td>
- </tr>
- </tbody>
- </table>
- <p>
- In most cases named entities will have associated <em>normalized value</em>. It is especially important for named entities that have many
- notational forms such as time and date, currency, geographical locations, etc. For example, <code>New York</code>,
- <code>New York City</code> and <code>NYC</code> all refer to the same "New York City, NY USA" location which is a standard normalized form.
- </p>
- <p>
- The process of detecting named entities is called Named Entity Recognition (NER). There are many ways of how a certain named entity can be detected: through list of synonyms, by name, rule-based or by using
- statistical techniques like neural networks with large corpus of predefined data. NLPCraft natively supports synonym-based
- named entities definition as well as the ability to compose new named entities through powerful <a href="/intent-matching.html">Intent Definition Language</a> (IDL)
- combining other named entities including named entities from
- <a href="/integrations.html">external project</a> such OpenNLP, spaCy or Stanford CoreNLP.
- </p>
- <p>
- Named entities allow you to abstract from basic linguistic forms like nouns and verbs to deal with the higher level semantic
- abstractions like geographical location or time when you are trying to understand the meaning of the sentence.
- One of the main goals of named entities is to act as an input ingredients for <a href="/intent-matching.html">intent matching</a>.
- </p>
- <div class="bq info">
- <p>
- <b>😀 User Input → Named Entities → Parsing Variants → Intent Matcher → Winning Intent 🚀</b>
- </p>
- <p>
- User input is parsed into the list of named entities. That list is then further transformed into one or more
- parsing variants where each variant represents a particular order and combination of detected named entities.
- Finally, the list of variants act as an input to intent matching where each variant is matched against every intent
- in the process of detecting the best matching intent for the original user input.
- </p>
- </div>
- </section>
- <section id="elements">
- <h2 class="section-title">Model Elements <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Data model element defines a named entity that will be detected in the user input.
- Model element is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a>
- interface. <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> provides
- its elements via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getElements()">getElements()</a> method.
- Typically, you create model elements by either:
- </p>
- <ul>
- <li>
- Implementing <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> interface directly, or
- </li>
- <li>
- Using JSON or YAML static model configuration (the preferred way in most cases).
- </li>
- </ul>
- <p>
- Note that when you use external static model configuration with JSON or YAML you can still modify it after it was loaded
- using <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a>
- adapter. It is particular convenient when synonyms or values are loaded separately from, or in
- addition to, the model elements themselves, i.e. from a database or another file.
- </p>
- <div class="bq info">
- <p>
- <b>Model Element <span class="amp">&</span> Named Entity <span class="amp">&</span> Token</b>
- </p>
- <p>
- Terms 'model element', 'named entity' and 'token' are used throughout this documentation relatively interchangeably:
- </p>
- <dl>
- <dt>Model Element</dt>
- <dd>
- Denotes a named entity <em>declared</em> in NLPCraft model.
- </dd>
- <dt>Token</dt>
- <dd>
- Denotes a model element that was <em>detected</em> by NLPCraft in the user input.
- </dd>
- <dt>Named Entity</dt>
- <dd>
- Denotes a classic term, i.e. one or more individual words that have a
- consistent semantic meaning and typically define a real-world object.
- </dd>
- </dl>
- </div>
- <p>
- Although model element and named entity describe a similar concept, the NLPCraft model
- elements provide a much more powerful instrument. Unlike named entities support in other projects
- NLPCraft model elements have number of unique capabilities:
- </p>
- <ul>
- <li>
- New model elements can be added declaratively via a subset of NLPCraft <a href="/intent-matching.html">IDL</a>, regex and macro expansion.
- </li>
- <li>
- New model elements can be also added programmatically for ultimate flexibility.
- </li>
- <li>
- Model elements can have many-to-many group memberships.
- </li>
- <li>
- Model elements can form a hierarchical structure.
- </li>
- <li>
- Model elements are composable, i.e. a model element can use other model elements in its definition.
- </li>
- <li>
- Model elements can be declared with user defined metadata.
- </li>
- <li>
- Model elements provide normalized values and can define their own "proper nouns".
- </li>
- <li>
- Model elements can compose named entities from many <a href="integrations.html#nlp">3rd party libraries</a>.
- </li>
- <li>
- All properties of model elements (id, groups, parent & ancestors, values, and metadata) can be used in NLPCraft <a href="/intent-matching.html">IDL</a>.
- </li>
- </ul>
- <h2 class="section-title">User vs. Built-In Elements <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Additionally to the model elements that are defined by the user in the data model (i.e. <em>user model elements</em>)
- NLPCraft provides its own <a href="#builtin">built-in named entities</a> as well as the integration with number of <a href="integrations.html#nlp">3rd party projects</a>. You can think of these built-in elements as if they were implicitly defined in your model - you
- can use them in exactly the same way as if you defined them yourself.
- You can find more information on how to configure external token providers
- in <a href="/integrations.html#nlp">Integrations</a> section.
- </p>
- <p>
- Note that you can't directly change group membership, parent-child relationship or metadata of the
- built-in elements. You can, however, "wrap" built-in entity into your own one using <code>^^{tok_id() == 'external.id'}^^</code>
- <a href="/intent-matching.html">IDL</a> expression as its synonym where you can define all necessary additional
- configuration properties (more on that below).
- </p>
- <span id="synonyms" class="section-sub-title">Synonyms <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- NLPCraft uses fully deterministic named entity recognition and is not based on statistical approaches that
- would require pre-existing marked up data sets and extensive training. For each model element you can either provide a
- set of synonyms to match on or specify a piece of code that would be responsible for detecting that named
- entity (discussed below). A synonym can have one or more individual words. Note that element's ID is its
- implicit synonym so that even if no additional synonyms are defined at least one synonym always exists. Note
- also that synonym matching is performed on <em>normalized</em> and <em>stemmatized</em> forms of both
- a synonym and user input.
- </p>
- <p>
- Here's an example of a simple model element definition in JSON:
- </p>
- <pre class="brush: js, highlight: [6,7,8,9,10,11,12]">
- ...
- "elements": [
- {
- "id": "transport.vehicle",
- "description": "Transportation vehicle",
- "synonyms": [
- "car",
- "truck",
- "light duty truck"
- "heavy duty truck"
- "sedan",
- "coupe"
- ]
- }
- ]
- ...
- </pre>
- <p>
- While adding multi-word synonyms looks somewhat
- trivial - in real models, the naive approach can lead to thousands and even tens of thousands of
- possible synonyms due to words, grammar, and linguistic permutations - which quickly becomes untenable if
- performed manually.
- </p>
- <p>
- NLPCraft provides an effective tool for a compact synonyms representation. Instead of listing all possible
- multi-word synonyms one by one you can use combination of following techniques:
- </p>
- <ul>
- <li><a href="#macros">Macros</a></li>
- <li><a href="#regex">Regular expressions</a></li>
- <li><a href="#option-groups">Option Groups</a></li>
- <li><a href="#dsl">IDL expressions</a></li>
- <li><a href="#custom_ners">Programmable NERs</a></li>
- </ul>
- <p>
- Each whitespace separated string in the synonym can be either a regular word (like in the above transportation example
- where it will be matched on using its normalized and stemmatized form) or one of the above expression.
- </p>
- <p>
- Note that this synonyms definition is also used in the following
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> methods:
- </p>
- <ul>
- <li><code>getSynonyms()</code> - gets synonyms to match on.</li>
- <li><code>getValues()</code> - get values to match on (see <a href="#values">below</a>).</li>
- </ul>
- <span id="values" class="section-sub-title">Element Values <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- Model element can have an optional set of special synonyms called <em>values</em> or "proper nouns" for this element.
- Unlike basic synonyms, each value is a pair of a name and a set of standard synonyms by which that value,
- and ultimately its element, can be recognized in the user input. Note that the value name itself acts as an
- implicit synonym even when no additional synonyms added for that value.
- </p>
- <p>
- When a model element is recognized it is made available to the model's matching logic as an instance of
- the <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> interface.
- This interface has a method
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getValue()">getValue()</a> which
- returns the name of the value, if any, by which
- that model element was recognized. That value name can be further used in intent matching.
- </p>
- <p>
- To understand the importance of the values consider the following changes to our transportation
- example model:
- </p>
- <pre class="brush: js, highlight: [19,20,21,22,23,24,25,26,27,28,29,30]">
- ...
- "macros": [
- {
- "name": "<TRUCK_TYPE>",
- "macro": "{light duty|heavy duty|half ton|1/2 ton|3/4 ton|one ton|super duty}"
- }
- ]
- "elements": [
- {
- "id": "transport.vehicle",
- "description": "Transportation vehicle",
- "synonyms": [
- "car",
- "{<TRUCK_TYPE>|_} {pickup|_} truck"
- "sedan",
- "coupe"
- ],
- "values": [
- {
- "value": "mercedes",
- "synonyms": ["mercedes-ben{z|s}", "mb", "ben{z|s}"]
- },
- {
- "value": "bmw",
- "synonyms": ["{bimmer|bimer|beemer}", "bayerische motoren werke"]
- }
- {
- "value": "chevrolet",
- "synonyms": ["chevy"]
- }
- ]
- }
- ]
- ...
- </pre>
- <p>
- With that setup <code>transport.vehicle</code> element will be recognized by any of the following input string:
- </p>
- <ul>
- <li><code>car</code></li>
- <li><code>benz</code> (with value <code>mercedes</code>)</li>
- <li><code>3/4 ton pickup truck</code></li>
- <li><code>light duty truck</code></li>
- <li><code>chevy</code> (with value <code>chevrolet</code>)</li>
- <li><code>bimmer</code> (with value <code>bmw</code>)</li>
- <li><code>transport.vehicle</code></li>
- </ul>
- <span id="groups" class="section-sub-title">Element Groups <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- Each model element always belongs to one or more groups. Model element provides its groups via
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html#getGroups()">getGroups()</a> method.
- By default, if element group is not specified, the element ID will act as its default group ID.
- Group membership is a quick and easy way to organise similar model elements together and use this
- categorization in <a href="/intent-matching.html">IDL</a> intents.
- </p>
- <p>
- Note that the proper grouping of the elements is also necessary for the correct operation of
- Short-Term-Memory (STM) in the conversational context. Consider a
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> that
- represents a previously found model element that is stored in the conversation. Such token
- will be overridden in the conversation by the more <b>recent token</b>
- from the <b>same group</b> - a critical rule of maintaining the proper conversational context.
- See
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a>
- for mode details.
- </p>
- <span id="parent" class="section-sub-title">Element Parent <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- Each model element can form an optional hierarchical relationship with other element by specifying its
- parent element ID via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html#getParentId()">getParentID()</a> method.
- The main idea here is that sometimes model elements can act not only individually but
- their place in the hierarchy can be important too.
- </p>
- <p>
- For example, we could have designed our transportation example model in a different way by using
- multiple model elements linked with this hierarchy:
- </p>
- <pre>
-+-- vehicle
-| +--truck
-| | |-- light.duty.truck
-| | |-- heavy.duty.truck
-| | +-- medium.duty.truck
-| +--car
-| | |-- coupe
-| | |-- sedan
-| | |-- hatchback
-| | +-- wagon
- </pre>
- <p>
- Then in our intent, for example, we could look for any token with root parent ID <code>vehicle</code>
- or immediate parent ID <code>truck</code> or <code>car</code> without a need to match on all current and
- future individual sub-IDs. For example:
- </p>
- <pre class="brush: idl">
- intent=vehicle.intent term~{has(tok_ancestors, 'vehicle')}
- intent=truck.intent term~{tok_parent == 'truck'}
- intent=car.intent term~{tok_parent == 'car'}
- </pre>
- </section>
- <section id="syns-tools">
- <span id="macros" class="section-sub-title">Macros <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- Listing all possible multi-word synonyms for a given element can be a time-consuming task. Macros
- together with option groups allow for significant simplification of this task.
- Macros allow you to give a name to an often used set of words or option groups and reuse it without
- repeating those words or option groups again and again. A model provides a list of macros via
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMacros()">getMacros()</a> method.
- Each macro has a name in a form of <code><X></code> where <code>X</code>
- is any string, and a string value. Note that macros can be nested (but not recursive), i.e. macro value can include
- references to other macros. When macro name <code>X</code> is encountered in the synonym it gets recursively
- replaced with its value.
- </p>
- <p>
- Here's a code snippet of macro definitions using JSON definition:
- </p>
- <pre class="brush: js">
- "macros": [
- {
- "name": "<A>",
- "macro": "aaa"
- },
- {
- "name": "<B>",
- "macro": "<A> bbb"
- },
- {
- "name": "<C>",
- "macro": "<A> bbb {z|w}"
- }
- ]
- </pre>
- <span id="option-groups" class="section-sub-title">Option Groups <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- Option groups are similar to wildcard patterns that operates on a single word base. One line of
- option group expands into one or more individual synonyms. Option groups is the key mechanism for shortened
- synonyms notation. The following examples demonstrate how to use option groups.
- </p>
- <p>
- Consider the following macros defined below (note that macros <code><B></code> and <code><C></code>
- are nested):
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Name</th>
- <th>Value</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><A></code></td>
- <td><code>aaa</code></td>
- </tr>
- <tr>
- <td><code><B></code></td>
- <td><code><A> bbb</code></td>
- </tr>
- <tr>
- <td><code><C></code></td>
- <td><code><A> bbb {z|w}</code></td>
- </tr>
- </tbody>
- </table>
- <p>
- Then the following option group expansions will occur in these examples:
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Synonym</th>
- <th>Synonym Expansions</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><A> {b|_} c</code></td>
- <td>
- <code>"aaa b c"</code><br>
- <code>"aaa c"</code>
- </td>
- </tr>
- <tr>
- <td><code><A> {b|a}[1,2] c</code></td>
- <td>
- <code>"aaa b c"</code><br>
- <code>"aaa b b c"</code><br>
- <code>"aaa a c"</code><br>
- <code>"aaa a a c"</code><br>
- <code>"aaa c"</code>
- </td>
- </tr>
- <tr>
- <td>
- <code><B> {b|_} c</code><br>
- or<br>
- <code><B> {b}[0,1] c</code>
- </td>
- <td>
- <code>"aaa bbb b c"</code><br>
- <code>"aaa bbb c"</code>
- </td>
- </tr>
- <tr>
- <td><code>{b|\{\_\}}</code></td>
- <td>
- <code>"b"</code><br>
- <code>"b {_}"</code>
- </td>
- </tr>
- <tr>
- <td><code>a {b|_}. c</code></td>
- <td>
- <code>"a b. c"</code><br>
- <code>"a . c"</code>
- </td>
- </tr>
- <tr>
- <td><code>a .{b, |_}. c</code></td>
- <td>
- <code>"a .b, . c"</code><br>
- <code>"a .. c"</code>
- </td>
- </tr>
- <tr>
- <td><code>
- {% raw %}a {{b|c}|_}.{% endraw %}</code></td>
- <td>
- <code>"a ."</code><br>
- <code>"a b."</code><br>
- <code>"a c."</code>
- </td>
- </tr>
- <tr>
- <td><code>a {% raw %}{{{<C>}}|{_}}{% endraw %} c</code></td>
- <td>
- <code>"a aaa bbb z c"</code><br>
- <code>"a aaa bbb w c"</code><br>
- <code>"a c"</code>
- </td>
- </tr>
- <tr>
- <td><code>{% raw %}{{{a}}} {b||_|{{_}}||_}{% endraw %}</code></td>
- <td>
- <code>"a b"</code><br>
- <code>"a"</code>
- </td>
- </tr>
- </tbody>
- </table>
- <p>
- Specifically:
- </p>
- <ul>
- <li><code>{A|B}</code> denotes either <code>A</code> or <code>B</code>.</li>
- <li>
- <code>{A|B|_}</code> denotes either <code>A</code> or <code>B</code> or nothing.
- <ul>
- <li>Symbol <code>_</code> cam appear anywhere in the list of options, i.e. <code>{A|B|_}</code> is equal to <code>{A|_|B}</code>.</li>
- </ul>
- </li>
- <li>
- <code>{C}[x,y]</code> denotes an option group with quantifier, i.e. group <code>C</code> appearing from <code>x</code> to <code>y</code> times inclusive.
- <ul>
- <li>For example, <code>{C}[1,3]</code> is the same as <code>{C|C C|C C C}</code> notation.</li>
- <li>Note that <code>{C|_}</code> is equal to <code>{C}[0,1]</code></li>
- </ul>
- </li>
- <li>Excessive curly brackets are ignored, when safe to do so.</li>
- <li>Macros cannot be recursive but can be nested.</li>
- <li>Option groups can be nested.</li>
- <li>
- <code>'\'</code> (backslash) can be used to escape <code>'{'</code>, <code>'}'</code>, <code>'|'</code> and
- <code>'_'</code> special symbols used by the option groups.
- </li>
- <li>Excessive whitespaces are trimmed when expanding option groups.</li>
- </ul>
- <p>
- We can rewrite our transportation model element in a more efficient way using macros and option groups.
- Even though the actual length of definition hasn't changed much it now auto-generates many dozens of synonyms
- we would have to write out manually otherwise:
- </p>
- <pre class="brush: js, highlight: [4,5,14]">
- ...
- "macros": [
- {
- "name": "<TRUCK_TYPE>",
- "macro": "{ {light|super|heavy|medium} duty|half ton|1/2 ton|3/4 ton|one ton}"
- }
- ]
- "elements": [
- {
- "id": "transport.vehicle",
- "description": "Transportation vehicle",
- "synonyms": [
- "car",
- "{<TRUCK_TYPE>|_} {pickup|_} truck"
- "sedan",
- "coupe"
- ]
- }
- ]
- ...
- </pre>
- <span id="regex" class="section-sub-title">Regular Expressions <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- Any individual synonym word that starts and ends with <code>//</code> (two forward slashes) is
- considered to be Java regular expression as defined in <code>java.util.regex.Pattern</code>. Note that
- regular expression can only span a single word, i.e. only individual words from the user input will be
- matched against given regular expression and no whitespaces are allowed within regular expression. Note
- also that option group special symbols <code>{</code>, <code>}</code>,
- <code>|</code> and <code>_</code> have to be escaped in the regular expression using <code>\</code>
- (backslash).
- </p>
- <p>
- For example, the following synonym:
- </p>
- <pre class="brush: js">
- "synonyms": [
- "{foo|//[bar].+//}}"
- ]
- </pre>
- <p>
- will match word <code>foo</code> or any other strings that start with <code>bar</code> as long as
- this string doesn't contain whitespaces.
- </p>
- <div class="bq info">
- <b>Regular Expressions Performance</b>
- <p>
- It's important to note that regular expressions can significantly affect the performance of the
- NLPCraft processing if used uncontrolled. Use it with caution and test the performance
- of your model to ensure it meets your requirements.
- </p>
- </div>
- <h2 id="dsl" class="section-sub-title">IDL Expressions <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Any individual synonym word that that starts and ends with <code>^^</code> is a
- <a href="/intent-matching.html#idl">IDL expression.</a> IDL
- expression inside of <code>^^ ... ^^</code> markers allows you to define a predicate on already parsed and detected token.
- It is very important to note that unlike all other synonyms the IDL expression operates on a
- already detected <em>token</em>, not on an individual unparsed <em>word</em>.
- </p>
- <p>
- IDL expressions allows you to <em>compose</em> named entities, i.e. use one name entity when defining another one. For example,
- we could define a model element for the race car using our previous transportation example (note how synonym on
- <b>line 18</b>
- references the element defined on <b>line 4</b>):
- </p>
- <pre class="brush: js, highlight: [4, 18]">
- ...
- "elements": [
- {
- "id": "transport.vehicle",
- "description": "Transportation vehicle",
- "synonyms": [
- "car",
- "truck",
- "{light|heavy|super|medium} duty {pickup|_} truck"
- "sedan",
- "coupe"
- ]
- },
- {
- "id": "race.vehicle",
- "description": "Race vehicle",
- "synonyms": [
- "{race|speed|track} ^^{# == 'transport.vehicle'}^^"
- ]
- }
-
- ]
- ...
- </pre>
- <div class="bq warn">
- <p>
- <b>Greedy NERs <span class="amp">&</span> Synonyms Conflicts</b>
- </p>
- <p>
- Note that in the above example you need to ensure that words <code>race</code>,
- <code>speed</code> or <code>track</code> are not part of the <code>transport.vehicle</code>
- token. It is particular important for the 3rd party NERs where specific rules about what
- words can or cannot be part of the token are unclear or undefined. In such cases the only remedy is
- to extensively test with 3rd party NERs and verify the synonyms recognition in data probe logs.
- </p>
- </div>
- <p>
- Another use case is to wrap 3rd party named entities to add group membership, metadata or hierarchical
- relationship to the externally defined named entity. For example, you can wrap <code>google:location</code>
- token and add group membership for <code>my_group</code> group:
- </p>
- <pre class="brush: js, highlight: [6,8]">
- ...
- "elements": [
- {
- "id": "google.loc.wrap",
- "description": "Wrapper for google location",
- "groups": ["my_group"],
- "synonyms": [
- "^^{# == 'google:location'}^^"
- ]
- }
- ]
- ...
- </pre>
- <b>IDL Expression Syntax</b>
- <p>
- IDL expressions are a subset of overall <a href="/intent-matching.html#idl">IDL syntax</a>. You can
- review formal
- <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft/src/main/scala/org/apache/nlpcraft/model/intent/compiler/antlr4/NCIdl.g4">ANTLR4 grammar</a>
- but basically
- an IDL expression for synonym is a term expression with the optional alias at the beginning.
- Here's an example of IDL expression defining a synonym for the population of any city in France:
- </p>
- <pre class="brush: js">
- "synonyms": [
- "population {of|for} ^^[city]{# == 'nlpcraft:city' && lowercase(meta_tok('city:country')) == 'france'}^^"
- ]
- </pre>
- <b>NOTES:</b>
- <ul>
- <li>Optional alias <code>city</code> can be used to access a constituent part token (with ID <code>nlpcraft:city</code>).</li>
- <li>
- The expression between <code>{</code> and <code>}</code> brackets is a standard IDL term expression.
- </li>
- </ul>
- <h2 id="custom_ners" class="section-sub-title">Custom NERs <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- By default, the data model detects its elements by their synonyms, regexp or IDL expressions. However, in some cases
- these methods are either not expressive enough or cannot be used. For example, detecting model elements based
- on neural networks or integration with a non-standard 3rd-party NER components. In such cases, a user-defined parser
- can be defined for the model that would allow the user to define its own arbitrary NER logic to detect the model elements
- in the user input programmatically. Note that a custom parser can detect any number of model elements.
- </p>
- <p>
- Model provides its custom parsers via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getParsers()">getParsers()</a> method.
- </p>
- </section>
- <section id="logic">
- <h2 class="section-title">Model Logic <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- When a user sends its request via REST API it is received by the REST server. Upon receipt,
- the REST server does the basic NLP processing and enriching. Once finished, the REST server
- sends the enriched request down to a specific data probe selected based on the requested data model.
- </p>
- <p>
- The model logic is defined in <a href="intent-matching.html">intents</a>, specifically in the intent callbacks that get called when
- their intent is chosen as a winning match against the user request.
- Below we will quickly discuss the key APIs that are essential for developing intent callbacks.
- Note that this does now replace a more detailed <a target=_ href="/apis/latest/index.html">Javadoc</a>
- documentation that you are encouraged to read through as well:
- </p>
- <ul>
- <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></li>
- <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></li>
- <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></li>
- <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></li>
- <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></li>
- <li>Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></li>
- </ul>
- <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- This interface provides read-only view on data model. Model view defines a declarative, or configurable, part of the model.
- All properties in this interface can be defined or overridden in JSON/YAML external
- presentation when used with <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> adapter.
- </p>
- <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- This interface defines a context of a particular intent match. It can be passed into the callback of the matched intent
- and provides the following:
- </p>
- <ul>
- <li>ID of the matched intent.</li>
- <li>Specific parsing variant that was matched against this intent.</li>
- <li>Access to the original query context (<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a>).</li>
- <li>Various access APIs for intent tokens.</li>
- </ul>
- <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- This interface provides all available data about the parsed user input and all its
- supplemental information. It's accessible from <code>NCIntentMatch</code> interface and
- provide large amount of information to the intent callback logic:
- </p>
- <ul>
- <li>
- Server request ID. Server request is defined as a processing of one user input sentence.
- </li>
- <li>
- Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a>
- for controlling STM of conversation manager and dialog flow.
- </li>
- <li>
- Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a>
- instance that the intent callback method belongs to giving access to entire static model configuration.
- </li>
- <li>
- Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> that
- provides detailed information about the user input.
- </li>
- <li>
- List of parsing variants provided
- by <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants()">getVariants()</a>
- method. When the user sentence gets parsed into individual tokens (i.e. detected model elements) there is generally
- more than one way to do it. This ambiguity is perfectly fine because only the data model has all the
- necessary information to select one parsing variant that fits that model the best. Without the data model
- there isn't enough context to determine which variant is the best fitting.
- Method <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants()">getVariants()</a>
- returns list of all parsing variants for a given user input.
- </li>
- </ul>
- <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> interface
- is one of the several important entities in Data Model API that you as a model developer will be working with. You
- should review its <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">Javadoc</a> but
- here is an outline of the information it provides:
- </p>
- <ul>
- <li>
- Information about the user that issued the request.
- </li>
- <li>
- User agent and remote address, if any available, of the user's application that made the initial REST call.
- </li>
- <li>
- Original request text, timestamp of its receipt, and server request ID.
- </li>
- </ul>
- <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> object is another
- key abstraction in Data Model API. A token is a detected model element and is a part of a fully parsed user input.
- Sequence of tokens represents parsed user input. A single token corresponds to a one or more words, sequential
- or not, in the user sentence.
- </p>
- <p>
- Most of the token's information is stored in map-based metadata accessible via
- <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html#getMetadata()">getMetadata()</a> method.
- Depending on the token ID each token will have different set of <a href="#meta">metadata properties</a>. Some common NLP properties
- are always present for tokens of all types.
- </p>
- <h2 class="section-sub-title">Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- This class defines the result returned from model's intent callbacks. Result consists of the
- text body and the type. The result types are similar in notion to MIME type and have specific meaning only for REST applications
- that interpret them accordingly. For example, the REST client interfacing between NLPCraft and Amazon Alexa or Apple HomeKit could
- only accept text result type and ignore everything else.
- </p>
- <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html">NCMetadata</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html">NCMetadata</a>
- provides support for mutable runtime-only metadata. This interface can be used to attach user-defined runtime data
- to variety of different objects in NLPCraft API. This interface is implemented by the following types:
- </p>
- <ul>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCompany.html">NCCompany</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCUser.html">NCUser</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCustomElement.html">NCCustomElement</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCDialogFlowItem.html">NCDialogFlowItem</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></li>
- <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCVariant.html">NCVariant</a></li>
- </ul>
- </section>
- <section id="builtin">
- <h2 class="section-title">Built-In Tokens <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- NLPCraft provides a number of built-in model elements (i.e. tokens) including the
- <a href="integrations.html">integration</a> with several popular 3rd party NER frameworks. Table
- below provides information about these built-in tokens. Section about <a href="#meta">token metadata</a> provides
- further information about <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html#getMetadata()">metadata</a> that each type of token carries.
- </p>
- <p>
- Built-in tokens have to be explicitly enabled on both the REST server and in the model. See
- <code>nlpcraft.server.tokenProviders</code> configuration property and
- <a target="javadoc" href="apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">NCModelView#getEnabledBuiltInTokens()</a>
- method for more details. By default, only NLPCraft tokens are enabled (token ID
- starting with <code>nlpcraft</code>).
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Token ID</th>
- <th>Description</th>
- <th>Example</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code>nlpcraft:nlp</code></td>
- <td>
- <p>
- This token denotes a word (always a single word) that is not a part of any other token. It's
- also call a free-word, i.e. a word that is not linked to any other detected model element.
- </p>
- <p>
- <b>NOTE:</b> the metadata from this token defines a common set of NLP properties and
- is present in every other token as well.
- </p>
- </td>
- <td>
- <ul>
- <li>Jamie goes <code>home</code> (assuming that a word 'home' does not belong to any model element).</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:date</code></td>
- <td>
- This token denotes a date range. It recognizes dates from 1900 up to 2023. Note that it does not
- currently recognize time component.
- </td>
- <td>
- <ul>
- <li>Meeting <code>next tuesday</code>.</li>
- <li>Report for entire <code>2018 year</code>.</li>
- <li>Data <code>from 1/1/2017 to 12/31/2018</code>.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:num</code></td>
- <td>
- This token denotes a single numeric value or numeric condition.
- </td>
- <td>
- <ul>
- <li>Price <code>> 100</code>.</li>
- <li>Price is <code>less than $100</code>.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:continent</code></td>
- <td>
- This token denotes a geographical continent.
- </td>
- <td>
- <ul>
- <li>Population of <code>Africa</code>.</li>
- <li>Surface area of <code>America</code>.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:subcontinent</code></td>
- <td>
- This token denotes a geographical subcontinent.
- </td>
- <td>
- <ul>
- <li>Population of <code>Alaskan peninsula</code>.</li>
- <li>Surface area of <code>South America</code>.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:region</code></td>
- <td>
- This token denotes a geographical region/state.
- </td>
- <td>
- <ul>
- <li>Population of <code>California</code>.</li>
- <li>Surface area of <code>South Dakota</code>.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:country</code></td>
- <td>
- This token denotes a country.
- </td>
- <td>
- <ul>
- <li>Population of <code>France</code>.</li>
- <li>Surface area of <code>USA</code>.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:city</code></td>
- <td>
- This token denotes a city.
- </td>
- <td>
- <ul>
- <li>Population of <code>Paris</code>.</li>
- <li>Surface area of <code>Washington DC</code>.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:metro</code></td>
- <td>
- This token denotes a metro area.
- </td>
- <td>
- <ul>
- <li>Population of <code>Cedar Rapids-Waterloo-Iowa City & Dubuque, IA</code> metro area.</li>
- <li>Surface area of <code>Norfolk-Portsmouth-Newport News, VA</code>.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:sort</code></td>
- <td>
- This token denotes a sorting or ordering.
- </td>
- <td>
- <ul>
- <li>Report <code>sorted from top to bottom</code>.</li>
- <li>Analysis <code>sorted in descending order</code>.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:limit</code></td>
- <td>
- This token denotes a numerical limit.
- </td>
- <td>
- <ul>
- <li>Show <code>top 5</code> brands.</li>
- <li>Show <code>several</code> brands.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:coordinate</code></td>
- <td>
- This token denotes a latitude and longitude coordinates.
- </td>
- <td>
- <ul>
- <li>Route the path to <code>55.7558, 37.6173</code> location.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>nlpcraft:relation</code></td>
- <td>
- This token denotes a relation function:
- <code>compare</code> or
- <code>correlate</code>. Note this token always need another two tokens that it references.
- </td>
- <td>
- <ul>
- <li>
- What is the <code><b>correlation between</b></code> <code>price</code> <code><b>and</b></code> <code>location</code>
- (assuming that 'price' and 'location' are also detected tokens).
- </li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>google:xxx</code></td>
- <td>
- <p>
- These tokens denote <code>xxx</code> that is a lower case name of the named entity
- in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e.
- <code>google:person</code>, <code>google:location</code>, etc.
- </p>
- <p>
- See <a href="integrations.html#google">integration</a> section for more details on how
- to configure Google named entity provider.
- </p>
- </td>
- <td>
- <ul>
- <li>
- Articles by <code>Ken Thompson</code>.
- </li>
- <li>
- Best restaurants in <code>Paris</code>.
- </li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>opennlp:xxx</code></td>
- <td>
- <p>
- These tokens denote <code>xxx</code> that is a lower case name of the named entity
- in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e.
- <code>opennlp:person</code>, <code>opennlp:money</code>, etc.
- </p>
- <p>
- See <a href="integrations.html#opennlp">integration</a> section for more details on how
- to configure Apache OpenNLP named entity provider.
- </p>
- </td>
- <td>
- <ul>
- <li>
- Articles by <code>Ken Thompson</code>.
- </li>
- <li>
- Best restaurants under <code>100$</code>.
- </li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>spacy:xxx</code></td>
- <td>
- <p>
- These tokens denote <code>xxx</code> that is a lower case name of the named entity
- in <a target=_ href="https://spacy.io/">spaCy</a>, i.e.
- <code>spacy:person</code>, <code>spacy:location</code>, etc.
- </p>
- <p>
- See <a href="integrations.html#spacy">integration</a> section for more details on how
- to configure spaCy named entity provider.
- </p>
- </td>
- <td>
- <ul>
- <li>
- Articles by <code>Ken Thompson</code>.
- </li>
- <li>
- Best restaurants in <code>Paris</code>.
- </li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><code>stanford:xxx</code></td>
- <td>
- <p>
- These tokens denote <code>xxx</code> that is a lower case name of the named entity
- in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e.
- <code>stanford:person</code>, <code>stanford:location</code>, etc.
- </p>
- <p>
- See <a href="integrations.html#stanford">integration</a> section for more details on how
- to configure Stanford CoreNLP named entity provider.
- </p>
- </td>
- <td>
- <ul>
- <li>
- Articles by <code>Ken Thompson</code>.
- </li>
- <li>
- Best restaurants in <code>Paris</code>.
- </li>
- </ul>
- </td>
- </tr>
- </tbody>
- </table>
- </section>
- <section id="meta">
- <h2 class="section-title">Token Metadata <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Each token has different set of metadata. Sections below describe metadata for each built-in token
- supported by NLPCraft:
- </p>
- <ul>
- <li><a href="#nlpcraft:nlp">Token ID <code>nlpcraft:nlp</code></a></li>
- <li><a href="#nlpcraft:date">Token ID <code>nlpcraft:date</code></a></li>
- <li><a href="#nlpcraft:num">Token ID <code>nlpcraft:num</code></a></li>
- <li><a href="#nlpcraft:city">Token ID <code>nlpcraft:city</code></a></li>
- <li><a href="#nlpcraft:continent">Token ID <code>nlpcraft:continent</code></a></li>
- <li><a href="#nlpcraft:subcontinent">Token ID <code>nlpcraft:subcontinent</code></a></li>
- <li><a href="#nlpcraft:region">Token ID <code>nlpcraft:region</code></a></li>
- <li><a href="#nlpcraft:country">Token ID <code>nlpcraft:country</code></a></li>
- <li><a href="#nlpcraft:metro">Token ID <code>nlpcraft:metro</code></a></li>
- <li><a href="#nlpcraft:coordinate">Token ID <code>nlpcraft:coordinate</code></a></li>
- <li><a href="#nlpcraft:sort">Token ID <code>nlpcraft:sort</code></a></li>
- <li><a href="#nlpcraft:limit">Token ID <code>nlpcraft:limit</code></a></li>
- <li><a href="#nlpcraft:relation">Token ID <code>nlpcraft:relation</code></a></li>
- <li><a href="#stanford:xxx">Token ID <code>stanford:xxx</code></a></li>
- <li><a href="#spacy:xxx">Token ID <code>spacy:xxx</code></a></li>
- <li><a href="#google:xxx">Token ID <code>google:xxx</code></a></li>
- <li><a href="#opennlp:xxx">Token ID <code>opennlp:xxx</code></a></li>
- </ul>
- <div class="bq info">
- <p>
- <b>Metadata Name Conflicts</b>
- </p>
- <p>
- Note that model element metadata gets merged into the same map container as common NLP token metadata
- (see <code>nlpcraft:nlp:xxx</code> properties below).
- In other words, their share the same namespace. It is important to remember that and choose unique names
- for user-defined metadata properties. One possible way that is used by NLPCraft internally is to prefix
- metadata name with some unique prefix based on the token ID.
- </p>
- </div>
- <span id="nlpcraft:nlp" class="section-sub-title">Token ID <code>nlpcraft:nlp</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token's metadata provides common basic NLP properties that are part of any token.
- <b>All tokens</b> without exception have these metadata properties. This metadata
- represents a common set of NLP properties for a given token. All these metadata properties are <b>mandatory</b>.
- Note also that interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a>
- provides a direct access to most of these properties.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:nlp:unid</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Internal globally unique system ID of the token.</td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:bracketed</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>Whether or not this token is surrounded by any of <code>'['</code>, <code>']'</code>, <code>'{'</code>, <code>'}'</code>, <code>'('</code>, <code>')'</code> brackets.</td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:freeword</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>Whether or not this token represents a free word. A free word is a token that was detected neither as a part of user defined or system tokens.</td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:direct</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>Whether or not this token was matched on direct (not permutated) synonym.</td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:english</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether this token represents an English word. Note that this only checks that token's text
- consists of characters of English alphabet, i.e. the text doesn't have to be necessary a
- known valid English word. See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed()" target="javadoc">NCModelView.isNonEnglishAllowed()</a> method
- for corresponding model configuration.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:lemma</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Lemma of this token, i.e. a canonical form of this word. Note that stemming and
- lemmatization allow to reduce inflectional forms and sometimes derivationally related forms
- of a word to a common base form. Lemmatization refers to the use of a vocabulary and
- morphological analysis of words, normally aiming to remove inflectional endings only and to
- return the base or dictionary form of a word, which is known as the lemma.
- Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a>
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:stem</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Stem of this token. Note that stemming and lemmatization allow to reduce inflectional forms
- and sometimes derivationally related forms of a word to a common base form. Unlike lemma,
- stemming is a basic heuristic process that chops off the ends of words in the hope of
- achieving this goal correctly most of the time, and often includes the removal of derivational
- affixes.
- Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a>
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:pos</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Penn Treebank POS tag for this token. Note that additionally to standard Penn Treebank POS
- tags NLPCraft introduced '---' synthetic tag to indicate a POS tag for multiword tokens.
- Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a>
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:posdesc</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Description of Penn Treebank POS tag.
- Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a>
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:swear</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether or not this token is a swear word. NLPCraft has built-in list of common English swear words.
- See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed()" target="javadoc">NCModelView.isSwearWordsAllowed()</a> for corresponding model configuration
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:origtext</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Original user input text for this token.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:normtext</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Normalized user input text for this token.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:sparsity</b></code></td>
- <td><code>java.lang.Integer</code></td>
- <td>
- Numeric value of how sparse the token is. Sparsity zero means that all individual words in
- the token follow each other.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:minindex</b></code></td>
- <td><code>java.lang.Integer</code></td>
- <td>
- Index of the first word in this token. Note that token may not be contiguous.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:maxindex</b></code></td>
- <td><code>java.lang.Integer</code></td>
- <td>
- Index of the last word in this token. Note that token may not be contiguous.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:wordindexes</b></code></td>
- <td><code>java.util.List<Integer></code></td>
- <td>
- List of original word indexes in this token. Note that a token can have words that are not
- contiguous in the original sentence. Always has at least one element in it.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:wordlength</b></code></td>
- <td><code>java.lang.Integer</code></td>
- <td>
- Number of individual words in this token. Equal to the size of <code>wordindexes</code> list.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:contiguous</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether or not this token has zero sparsity, i.e. consists of contiguous words.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:start</b></code></td>
- <td><code>java.lang.Integer</code></td>
- <td>
- Start character index of this token.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:end</b></code></td>
- <td><code>java.lang.Integer</code></td>
- <td>
- End character index of this token.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:index</b></code></td>
- <td><code>java.lang.Integer</code></td>
- <td>
- Index of this token in the sentence.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:charlength</b></code></td>
- <td><code>java.lang.Integer</code></td>
- <td>
- Character length of this token.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:quoted</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether or not this token is surrounded by single or double quotes.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:stopword</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether or not this token is a stopword. Stopwords are some extremely common words which
- add little value in helping understanding user input and are excluded from the processing entirely.
- For example, words like a, the, can, of, about, over, etc. are typical stopwords in English.
- NLPCraft has built-in set of stopwords.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:nlp:dict</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether or not this token is found in Princeton WordNet database.
- </td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:date" class="section-sub-title">Token ID <code>nlpcraft:date</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a date range including single days.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b>.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:date:from</b></code></td>
- <td><code>java.lang.Long</code></td>
- <td>
- Start timestamp of the datetime range.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:date:to</b></code></td>
- <td><code>java.lang.Long</code></td>
- <td>
- End timestamp of the datetime range.
- </td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:num" class="section-sub-title">Token ID <code>nlpcraft:num</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a single numerical value or a numeric condition.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:num:from</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>
- Start of numeric range that satisfies the condition (exclusive). Note that if <code>from</code>
- and <code>to</code> are the same this token represent a single value (whole or fractional) in
- which case <code>isequalcondition</code>> will be <code>true</code>.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:to</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>
- Ed of numeric range that satisfies the condition (exclusive). Note that if <code>from</code>
- and <code>to</code> are the same this token represent a single value (whole or fractional) in
- which case <code>isequalcondition</code>> will be <code>true</code>.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:fromincl</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether or not start of the numeric range is inclusive
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:toincl</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether or not end of the numeric range is inclusive
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:isequalcondition</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether this is an equality condition. Note that single numeric values also default to equality
- condition and this property will be <code>true</code>. Indeed, <code>A is equal to 2</code> and
- <code>A is 2</code> have the same meaning.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:isnotequalcondition</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether this is a not-equality condition.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:isfromnegativeinfinity</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether this range is from negative infinity.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:israngecondition</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether this is a range condition.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:istopositiveinfinity</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether this range is to positive infinity.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:isfractional</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether this token's value (single numeric value of a range) is a whole or a fractional number.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:unit</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>
- Optional numeric value unit name (see below).
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:num:unittype</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>
- Optional numeric value unit type (see below).
- </td>
- </tr>
- </tbody>
- </table>
- <p>
- Following table provides possible values for <code><b>nlpcraft:num:unit</b></code> and <code><b>nlpcraft:num:unittype</b></code>
- properties:
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>num:unittype</th>
- <th>num:unit <sub>possible values</sub></th>
- </tr>
- </thead>
- <tbody>
- <tr><td><code>mass</code></td><td><code>feet per second</code><br/><code>grams</code><br/><code>kilogram</code><br/><code>grain</code><br/><code>dram</code><br/><code>ounce</code><br/><code>pound</code><br/><code>hundredweight</code><br/><code>ton</code><br/><code>tonne</code><br/><code>slug</code></td>
- <tr><td><code>torque</code></td><td><code>newton meter</code></td>
- <tr><td><code>area</code></td><td><code>square meter</code><br/><code>acre</code><br/><code>are</code><br/><code>hectare</code><br/><code>square inches</code><br/><code>square feet</code><br/><code>square yards</code><br/><code>square miles</code></td>
- <tr><td><code>paper quantity</code></td><td><code>paper bale</code></td>
- <tr><td><code>force</code></td><td><code>kilopond</code><br/><code>pond</code></td>
- <tr><td><code>pressure</code></td><td><code>pounds per square inch</code></td>
- <tr><td><code>solid angle</code></td><td><code>steradian</code></td>
- <tr><td><code>pressure</code><br/><code>stress</code></td><td><code>pascal</code></td>
- <tr><td><code>luminous</code></td><td><code>flux</code><br/><code>lumen</code></td>
- <tr><td><code>amount of substance</code></td><td><code>mole</code></td>
- <tr><td><code>luminance</code></td><td><code>candela per square metre</code></td>
- <tr><td><code>angle</code></td><td><code>radian</code><br/><code>degree</code></td>
- <tr><td><code>magnetic flux density</code><br/><code>magnetic field</code></td><td><code>tesla</code></td>
- <tr><td><code>power</code><br/><code>radiant flux</code></td><td><code>watt</code></td>
- <tr><td><code>datetime</code></td><td><code>second</code><br/><code>minute</code><br/><code>hour</code><br/><code>day</code><br/><code>week</code><br/><code>month</code><br/><code>year</code></td>
- <tr><td><code>electrical inductance</code></td><td><code>henry</code></td>
- <tr><td><code>electric charge</code></td><td><code>coulomb</code></td>
- <tr><td><code>temperature</code></td><td><code>kelvin</code><br/><code>centigrade</code><br/><code>fahrenheit</code></td>
- <tr><td><code>voltage</code><br/><code>electrical</code></td><td><code>volt</code></td>
- <tr><td><code>momentum</code></td><td><code>kilogram meters per second</code></td>
- <tr><td><code>amount of heat</code></td><td><code>calorie</code></td>
- <tr><td><code>electrical capacitance</code></td><td><code>farad</code></td>
- <tr><td><code>radioactive decay</code></td><td><code>becquerel</code></td>
- <tr><td><code>electrical conductance</code></td><td><code>siemens</code></td>
- <tr><td><code>luminous intensity</code></td><td><code>candela</code></td>
- <tr><td><code>work</code><br/><code>energy</code></td><td><code>joule</code></td>
- <tr><td><code>quantities</code></td><td><code>dozen</code></td>
- <tr><td><code>density</code></td><td><code>density</code></td>
- <tr><td><code>sound</code></td><td><code>decibel</code></td>
- <tr><td><code>electrical resistance</code><br/><code>impedance</code></td><td><code>ohm</code></td>
- <tr><td><code>force</code><br/><code>weight</code></td><td><code>newton</code></td>
- <tr><td><code>light quantity</code></td><td><code>lumen seconds</code></td>
- <tr><td><code>length</code></td><td><code>meter</code><br/><code>millimeter</code><br/><code>centimeter</code><br/><code>decimeter</code><br/><code>kilometer</code><br/><code>astronomical unit</code><br/><code>light year</code><br/><code>parsec</code><br/><code>inch</code><br/><code>foot</code><br/><code>yard</code><br/><code>mile</code><br/><code>nautical mile</code></td>
- <tr><td><code>refractive index</code></td><td><code>diopter</code></td>
- <tr><td><code>frequency</code></td><td><code>hertz</code><br/><code>angular frequency</code></td>
- <tr><td><code>power</code></td><td><code>kilowatt</code><br/><code>horsepower</code><br/><code>bar</code></td>
- <tr><td><code>magnetic flux</code></td><td><code>weber</code></td>
- <tr><td><code>current</code></td><td><code>ampere</code></td>
- <tr><td><code>acceleration of gravity</code></td><td><code>gravity imperial</code><br/><code>gravity metric</code></td>
- <tr><td><code>volume</code></td><td><code>cubic meter</code><br/><code>liter</code><br/><code>milliliter</code><br/><code>centiliter</code><br/><code>deciliter</code><br/><code>hectoliter</code><br/><code>cubic inch</code><br/><code>cubic foot</code><br/><code>cubic yard</code><br/><code>acre-foot</code><br/><code>teaspoon</code><br/><code>tablespoon</code><br/><code>fluid ounce</code><br/><code>cup</code><br/><code>gill</code><br/><code>pint</code><br/><code>quart</code> [...]
- <tr><td><code>speed</code></td><td><code>miles per hour</code><br/><code>meters per second</code></td>
- <tr><td><code>illuminance</code></td><td><code>lux</code></td>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:city" class="section-sub-title">Token ID <code>nlpcraft:city</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a city.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:city:city</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Name of the city.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:city:continent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Continent name.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:city:subcontinent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Subcontinent name.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:city:subcontinent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Subcontinent name.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:city:countrymeta</b></code></td>
- <td><code>java.util.Map</code></td>
- <td>
- Supplemental metadata for city's country (see below).
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:city:citymeta</b></code></td>
- <td><code>java.util.Map</code></td>
- <td>
- Supplemental metadata for city (see below).
- </td>
- </tr>
- </tbody>
- </table>
- <p>
- Following tables provides possible values for <code><b>nlpcraft:city:countrymeta</b></code> map. The data is
- obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Key</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>iso</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>ISO country code.</td>
- </tr>
- <tr>
- <td><code><b>iso3</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>ISO 3166 country code.</td>
- </tr>
- <tr>
- <td><code><b>isocode</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>ISO country code.</td>
- </tr>
- <tr>
- <td><code><b>capital</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country capital city name.</td>
- </tr>
- <tr>
- <td><code><b>area</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.Double</code></td>
- <td>Optional country surface area.</td>
- </tr>
- <tr>
- <td><code><b>population</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.Long</code></td>
- <td>Optional country population.</td>
- </tr>
- <tr>
- <td><code><b>continent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country continent.</td>
- </tr>
- <tr>
- <td><code><b>currencycode</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Country currency code.</td>
- </tr>
- <tr>
- <td><code><b>currencyname</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Country currency name.</td>
- </tr>
- <tr>
- <td><code><b>phone</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country phone code.</td>
- </tr>
- <tr>
- <td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country postal code format.</td>
- </tr>
- <tr>
- <td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country postal code regular expression.</td>
- </tr>
- <tr>
- <td><code><b>languages</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country list of languages.</td>
- </tr>
- <tr>
- <td><code><b>neighbours</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country list of neighbours.</td>
- </tr>
- </tbody>
- </table>
- <p>
- Following tables provides possible values for <code><b>nlpcraft:city:citymeta</b></code> map. The data is
- obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Key</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>latitude</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>City latitude.</td>
- </tr>
- <tr>
- <td><code><b>longitude</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>City longitude.</td>
- </tr>
- <tr>
- <td><code><b>population</b></code></td>
- <td><code>java.lang.Long</code></td>
- <td>City population.</td>
- </tr>
- <tr>
- <td><code><b>elevation</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.Integer</code></td>
- <td>Optional city elevation in meters.</td>
- </tr>
- <tr>
- <td><code><b>timezone</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>City timezone.</td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:continent" class="section-sub-title">Token ID <code>nlpcraft:continent</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a continent.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:continent:continent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Name of the continent.</td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:subcontinent" class="section-sub-title">Token ID <code>nlpcraft:subcontinent</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a subcontinent.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:subcontinent:continent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Name of the continent.</td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:subcontinent:subcontinent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Name of the subcontinent.</td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:metro" class="section-sub-title">Token ID <code>nlpcraft:metro</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a metro area.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:metro:metro</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Name of the metro area.</td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:region" class="section-sub-title">Token ID <code>nlpcraft:region</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a geographical region.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- </tbody>
- <tr>
- <td><code><b>nlpcraft:region:region</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Name of the region.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:region:continent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Continent name.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:region:subcontinent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Subcontinent name.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:region:subcontinent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Subcontinent name.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:region:countrymeta</b></code></td>
- <td><code>java.util.Map</code></td>
- <td>
- Supplemental metadata for region's country (see below).
- </td>
- </tr>
- </table>
- <p>
- Following tables provides possible values for <code><b>nlpcraft:region:countrymeta</b></code> map. The data is
- obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Key</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>iso</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>ISO country code.</td>
- </tr>
- <tr>
- <td><code><b>iso3</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>ISO 3166 country code.</td>
- </tr>
- <tr>
- <td><code><b>isocode</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>ISO country code.</td>
- </tr>
- <tr>
- <td><code><b>capital</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country capital city name.</td>
- </tr>
- <tr>
- <td><code><b>area</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.Double</code></td>
- <td>Optional country surface area.</td>
- </tr>
- <tr>
- <td><code><b>population</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.Long</code></td>
- <td>Optional country population.</td>
- </tr>
- <tr>
- <td><code><b>continent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country continent.</td>
- </tr>
- <tr>
- <td><code><b>currencycode</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Country currency code.</td>
- </tr>
- <tr>
- <td><code><b>currencyname</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Country currency name.</td>
- </tr>
- <tr>
- <td><code><b>phone</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country phone code.</td>
- </tr>
- <tr>
- <td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country postal code format.</td>
- </tr>
- <tr>
- <td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country postal code regular expression.</td>
- </tr>
- <tr>
- <td><code><b>languages</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country list of languages.</td>
- </tr>
- <tr>
- <td><code><b>neighbours</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country list of neighbours.</td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:country" class="section-sub-title">Token ID <code>nlpcraft:country</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a country.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- </tbody>
- <tr>
- <td><code><b>nlpcraft:country:country</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Name of the country.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:country:continent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Continent name.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:country:subcontinent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Subcontinent name.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:country:subcontinent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Subcontinent name.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:country:countrymeta</b></code></td>
- <td><code>java.util.Map</code></td>
- <td>
- Supplemental metadata for region's country (see below).
- </td>
- </tr>
- </table>
- <p>
- Following tables provides possible values for <code><b>nlpcraft:country:countrymeta</b></code> map. The data is
- obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Key</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>iso</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>ISO country code.</td>
- </tr>
- <tr>
- <td><code><b>iso3</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>ISO 3166 country code.</td>
- </tr>
- <tr>
- <td><code><b>isocode</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>ISO country code.</td>
- </tr>
- <tr>
- <td><code><b>capital</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country capital city name.</td>
- </tr>
- <tr>
- <td><code><b>area</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.Double</code></td>
- <td>Optional country surface area.</td>
- </tr>
- <tr>
- <td><code><b>population</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.Long</code></td>
- <td>Optional country population.</td>
- </tr>
- <tr>
- <td><code><b>continent</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country continent.</td>
- </tr>
- <tr>
- <td><code><b>currencycode</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Country currency code.</td>
- </tr>
- <tr>
- <td><code><b>currencyname</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>Country currency name.</td>
- </tr>
- <tr>
- <td><code><b>phone</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country phone code.</td>
- </tr>
- <tr>
- <td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country postal code format.</td>
- </tr>
- <tr>
- <td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country postal code regular expression.</td>
- </tr>
- <tr>
- <td><code><b>languages</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country list of languages.</td>
- </tr>
- <tr>
- <td><code><b>neighbours</b></code> <sub>opt.</sub></td>
- <td><code>java.lang.String</code></td>
- <td>Optional country list of neighbours.</td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:coordinate" class="section-sub-title">Token ID <code>nlpcraft:coordinate</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a latitude and longitude coordinate.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>coordinate:latitude</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>Coordinate latitude.</td>
- </tr>
- <tr>
- <td><code><b>coordinate:longitude</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>Coordinate longitude.</td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:sort" class="section-sub-title">Token ID <code>nlpcraft:sort</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a sorting or ordering function.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:sort:subjindexes</b></code></td>
- <td><code>java.util.List<Integer></code></td>
- <td>One of more indexes of the target tokens (i.e. the token that being sorted).</td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:sort:byindexes</b></code></td>
- <td><code>java.util.List<Integer></code></td>
- <td>Zero or more (i.e. optional) indexes of the reference token (i.e. the token being sorted by).</td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:sort:asc</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether sorting is in ascending or descending order.
- </td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:limit" class="section-sub-title">Token ID <code>nlpcraft:limit</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a numeric limit value (like in "top 10" or "bottom five").
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:limit:indexes</b></code></td>
- <td><code>java.util.List<Integer></code></td>
- <td>Index (always only one) of the reference token (i.e. the token being limited).</td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:limit:asc</b></code></td>
- <td><code>java.lang.Boolean</code></td>
- <td>
- Whether limit order is ascending or descending.
- </td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:limit:limit</b></code></td>
- <td><code>java.lang.Integer</code></td>
- <td>
- Numeric value of the limit.
- </td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="nlpcraft:relation" class="section-sub-title">Token ID <code>nlpcraft:relation</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- This token denotes a numeric limit value (like in "top 10" or "bottom five").
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>nlpcraft:relation:indexes</b></code></td>
- <td><code>java.util.List<Integer></code></td>
- <td>Index (always only one) of the reference token (i.e. the token being related to).</td>
- </tr>
- <tr>
- <td><code><b>nlpcraft:relation:type</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Type of the relation. One of the following values:
- <ul>
- <li><code>compare</code></li>
- <li><code>correlate</code></li>
- </ul>
- </td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="google:xxx" class="section-sub-title">Token ID <code>google:xxx</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- These tokens denote <code>xxx</code> that is a lower case name of the named entity
- in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e.
- <code>google:person</code>, <code>google:location</code>, etc.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>google:salience</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>Correctness probability of this token by Google Natural Language.</td>
- </tr>
- <tr>
- <td><code><b>google:meta</b></code></td>
- <td><code>java.util.Map<String></code></td>
- <td>
- Map-based container for Google Natural Language specific properties.
- </td>
- </tr>
- <tr>
- <td><code><b>google:mentionsbeginoffsets</b></code></td>
- <td><code>java.util.List<String></code></td>
- <td>
- List of the mention begin offsets in the original normalized text.
- </td>
- </tr>
- <tr>
- <td><code><b>google:mentionscontents</b></code></td>
- <td><code>java.util.List<String></code></td>
- <td>
- List of the mentions.
- </td>
- </tr>
- <tr>
- <td><code><b>google:mentionstypes</b></code></td>
- <td><code>java.util.List<String></code></td>
- <td>
- List of the mention types.
- </td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="stanford:xxx" class="section-sub-title">Token ID <code>stanford:xxx</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- These tokens denote <code>xxx</code> that is a lower case name of the named entity
- in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e.
- <code>stanford:person</code>, <code>stanford:location</code>, etc.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>stanford:confidence</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>Correctness probability of this token by Stanford CoreNLP.</td>
- </tr>
- <tr>
- <td><code><b>stanford:nne</b></code></td>
- <td><code>java.lang.String</code></td>
- <td>
- Normalized Named Entity (NNE) text.
- </td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="spacy:xxx" class="section-sub-title">Token ID <code>spacy:xxx</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
- <p>
- These tokens denote <code>xxx</code> that is a lower case name of the named entity
- in <a target=_ href="https://spacy.io/">spaCy</a>, i.e.
- <code>spacy:person</code>, <code>spacy:location</code>, etc.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>spacy:vector</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>spaCy span vector. </td>
- </tr>
- <tr>
- <td><code><b>spacy:sentiment</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>
- A scalar value indicating the positivity or negativity of the token.
- </td>
- </tr>
- </tbody>
- </table>
- <br/>
- <span id="opennlp:xxx" class="section-sub-title">Token ID <code>opennlp:xxx</code></span>
- <p>
- These tokens denote <code>xxx</code> that is a lower case name of the named entity
- in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e.
- <code>opennlp:person</code>, <code>opennlp:money</code>, etc.
- Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
- metadata properties all of which are <b>mandatory</b> unless otherwise noted.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Property</th>
- <th>Java Type</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code><b>opennlp:probability</b></code></td>
- <td><code>java.lang.Double</code></td>
- <td>Correctness probability of this token by OpenNLP.</td>
- </tr>
- </tbody>
- </table>
- </section>
-</div>
-<div class="col-md-2 third-column">
- <ul class="side-nav">
- <li class="side-nav-title">On This Page</li>
- <li><a href="#overview">Model Overview</a></li>
- <li><a href="#dataflow">Model Dataflow</a></li>
- <li><a href="#lifecycle">Model Lifecycle</a></li>
- <li><a href="#config">Model Configuration</a></li>
- <li><a href="#ne">Named Entities</a></li>
- <li><a href="#elements">Model Elements</a></li>
- <li><a class="toc2" href="#macros">Macros</a></li>
- <li><a class="toc2" href="#regex">Regular Expressions</a></li>
- <li><a class="toc2" href="#option-groups">Option Groups</a></li>
- <li><a class="toc2" href="#dsl">IDL Expression</a></li>
- <li><a class="toc2" href="#custom_ners">Custom NERs</a></li>
- <li><a href="#logic">Model Logic</a></li>
- <li><a href="#builtin">Built-In Tokens</a></li>
- <li><a href="#meta">Token Metadata</a></li>
- <li><a class="toc2" href="#nlpcraft:nlp"><code><b>nlpcraft:nlp</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:date"><code><b>nlpcraft:date</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:num"><code><b>nlpcraft:num</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:city"><code><b>nlpcraft:city</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:continent"><code><b>nlpcraft:continent</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:subcontinent"><code><b>nlpcraft:subcontinent</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:region"><code><b>nlpcraft:region</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:country"><code><b>nlpcraft:country</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:metro"><code><b>nlpcraft:metro</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:coordinate"><code><b>nlpcraft:coordinate</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:sort"><code><b>nlpcraft:sort</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:limit"><code><b>nlpcraft:limit</b></code></a></li>
- <li><a class="toc2" href="#nlpcraft:relation"><code><b>nlpcraft:relation</b></code></a></li>
- <li><a class="toc2" href="#stanford:xxx"><code><b>stanford:xxx</b></code></a></li>
- <li><a class="toc2" href="#spacy:xxx"><code><b>spacy:xxx</b></code></a></li>
- <li><a class="toc2" href="#google:xxx"><code><b>google:xxx</b></code></a></li>
- <li><a class="toc2" href="#opennlp:xxx"><code><b>opennlp:xxx</b></code></a></li>
- {% include quick-links.html %}
- </ul>
-</div>
diff --git a/docs.html b/docs.html
index db5d193..59d3808 100644
--- a/docs.html
+++ b/docs.html
@@ -63,9 +63,6 @@ id: overview
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
- <li><a href="#data-model">Data Model</a></li>
- <li><a href="#data-probe">Data Probe</a></li>
- <li><a href="#server">REST Server</a></li>
{% include quick-links.html %}
</ul>
</div>