You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Xiang Zhang (JIRA)" <ji...@apache.org> on 2018/03/25 21:57:00 UTC

[jira] [Commented] (OPENNLP-755) Add support to use a stop word list in query building

    [ https://issues.apache.org/jira/browse/OPENNLP-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413210#comment-16413210 ] 

Xiang Zhang commented on OPENNLP-755:
-------------------------------------

Hi, is there any clarification on how can I start to solve this?

> Add support to use a stop word list in query building
> -----------------------------------------------------
>
>                 Key: OPENNLP-755
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-755
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Entity Linker
>            Reporter: Joern Kottmann
>            Priority: Major
>
> The geocoder in it's current version might create queries which match on terms on which the matching shouldn't happen. These terms could be listed in a stop word list. This stop word list could be used to construct queries which match only the desired terms.
> For example:
> <START> New York City <END> is not in Slovenia
> This currently matches a hotel called "BTC City".
> The index is searched for all terms in the mention. The problem here is if only "City" matches the response will be kind of bad. Or if only "New" and "City" matches.
> Many place names contain the word "City" and that doesn't help much to disambiguate the matches.
> There should be some special logic dealing with stop words.
> The stop words could be removed form the location mention, or better only used for boosting.
> For the case above it could be like this:
> - MUST match York
> - SHOULD match New OR City
> If a name only consists out of stop words e.g. "New City" we could require that the mention only matches the entire name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)