You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2018/06/05 18:32:00 UTC

[jira] [Resolved] (SOLR-12376) New TaggerRequestHandler (aka SolrTextTagger)

     [ https://issues.apache.org/jira/browse/SOLR-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley resolved SOLR-12376.
---------------------------------
    Resolution: Fixed

> New TaggerRequestHandler (aka SolrTextTagger)
> ---------------------------------------------
>
>                 Key: SOLR-12376
>                 URL: https://issues.apache.org/jira/browse/SOLR-12376
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>             Fix For: 7.4
>
>         Attachments: SOLR-12376.patch, SOLR-12376.patch, SOLR-12376.patch
>
>
> This issue introduces a new RequestHandler: {{TaggerRequestHandler}}, AKA the SolrTextTagger from the OpenSextant project [https://github.com/OpenSextant/SolrTextTagger]. It's used for named entity recognition (NER) of text past to it. It doesn't do any NLP (outside of Lucene text analysis) so it's said to be a "naive tagger", but it's definitely useful as-is and a more complete NER or ERD (entity recognition and disambiguation) system can be built with this as a key component. The SolrTextTagger has been used on queries for query-understanding, and it's been used on full-text, and it's been used on dictionaries that number tens of millions in size. Since it's small and has been used a bunch (including helping win an ERD competition and in [Apache Stanbol|https://stanbol.apache.org/]), several people have asked me when or why isn't this in Solr yet. So here it is.
> To use it, first you need a collection of documents that have a name-like field (short text) indexed with the ConcatenateFilter (LUCENE-8323) at the end. We call this the dictionary. Once that's in place, you simply post text to a {{TaggerRequestHandler}} and it returns the offset pairs into that text for matches in the dictionary along with the uniqueKey of the matching documents. It can also return other document data desired. That's the gist; I'll add more details on use to the Solr Reference Guide.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org