You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by cbeer <gi...@git.apache.org> on 2018/01/17 17:33:09 UTC

[GitHub] lucene-solr pull request #308: Add a suggester that operates on tokenized va...

GitHub user cbeer opened a pull request:

    https://github.com/apache/lucene-solr/pull/308

    Add a suggester that operates on tokenized values from a field

    The `TokenizingSuggester` is suspiciously similar to the `AnalyzingInfixSuggester` (and presumably it could be merged into or extend the `AnalyzingInfixSuggester`), but with an additional feature (the `tokenizingAnalyzer`) that allows us to pre-tokenizing suggestions into a manageable size (perhaps single words, shingles of multiple words, or perhaps even NLP-extracted noun phrases) .
    
    Our use case is providing autocomplete suggestions for searching within OCR text of a document (searching within is powered by highlighting), and we're dealing with some page-level OCR that can easily exceed the 32k size limit for the `AnalyzingInfixSuggester`'s exacttext string field. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cbeer/lucene-solr tokenizing-suggester-upstreamable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #308
    
----
commit c516bcaabbe6214ba4938859d6775ae7992fed0a
Author: Chris Beer <ca...@...>
Date:   2018-01-16T21:29:51Z

    Add a suggester that operates on tokenized values from a field

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org