You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2015/02/02 15:51:34 UTC

[jira] [Created] (OPENNLP-755) Add support to use stop word list in query building

Joern Kottmann created OPENNLP-755:
--------------------------------------

             Summary: Add support to use stop word list in query building
                 Key: OPENNLP-755
                 URL: https://issues.apache.org/jira/browse/OPENNLP-755
             Project: OpenNLP
          Issue Type: Improvement
          Components: Entity Linker
            Reporter: Joern Kottmann


The geocoder in it's current version might create queries which match on terms on which the matching shouldn't happen. These terms could be listed in a stop word list. This stop word list could be used to construct queries which match only the desired terms.

For example:
<START> New York City <END> is not in Slovenia

This currently matches a hotel called "BTC City".

The index is searched for all terms in the mention. The problem here is if only "City" matches the response will be kind of bad. Or if only "New" and "City" matches.

Many place names contain the word "City" and that doesn't help much to disambiguate the matches.

There should be some special logic dealing with stop words.
The stop words could be removed form the location mention, or better only used for boosting.

For the case above it could be like this:
- MUST match York
- SHOULD match New OR City

If a name only consists out of stop words e.g. "New City" we could require that the mention only matches the entire name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)