You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2015/03/06 01:02:38 UTC

[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester

    [ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349653#comment-14349653 ] 

Michael McCandless commented on LUCENE-6339:
--------------------------------------------

This looks really nice!

I think AutomatonUtil is (nearly?) the same thing as
TokenStreamToAutomaton?  Can we somehow consolidate the two?

When I try to "ant test" with the patch on current 5.x some things are
angry:

{noformat}
    [mkdir] Created dir: /l/areek/lucene/build/suggest/classes/java
    [javac] Compiling 65 source files to /l/areek/lucene/build/suggest/classes/java
    [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.java:597: warning: [cast] redundant cast to TopFieldDocs
    [javac]       TopFieldDocs hits = (TopFieldDocs) c.topDocs();
    [javac]                           ^
    [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/document/NRTSuggester.java:208: error: local variable collector is accessed from within inner class; needs to be declared final
    [javac]               collector.collect(docID);
    [javac]               ^
    [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionFieldsProducer.java:164: error: CompletionFieldsProducer.CompletionsTermsReader is not abstract and does not override abstract method getChildResources() in Accountable
    [javac]   private class CompletionsTermsReader implements Accountable {
    [javac]           ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] 2 errors
    [javac] 1 warning
{noformat}

Not sure why we need an FSTBuilder inside the NRTSuggesterBuilder;
can't the first be absorbed into the latter?  Can NRTSuggesterBuilder
be package private?  Ie the public API here is the postings format and
SuggestIndexSearcher / SuggestTopDocs?  I think other things can be
private, e.g. CompletionTokenStream.

Can you use CodecUtil.writeIndexHeader when storing the FST?  It also
stores the segment ID and file extension in the header.  And then
CodecUtil.checkIndexHeader at read-time.

CompletionTermsReader.lookup() should be sync'd?   Else two threads
could try to use the IndexInput (dictIn) at once?

Maybe we should move the code in SuggestIndexSearcher.suggest into
a new TopSuggestDocs.merge method?

Do we really need the separate SegmentLookup interface?  Seems like we
can just invoke lookup method directly on CompletionTerms?

Why do we allow -1 weight?  And why do we restrict to int not long
(other suggesters are long I think, though it does seem like
overkill!).


> [suggest] Near real time Document Suggester
> -------------------------------------------
>
>                 Key: LUCENE-6339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6339
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 5.0
>            Reporter: Areek Zillur
>            Assignee: Areek Zillur
>             Fix For: 5.0
>
>         Attachments: LUCENE-6339.patch
>
>
> The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key.
> A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time.
> Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time.
> A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions.
> h4. Usage
> {code:java}
>   // hook up custom postings format
>   // indexAnalyzer for SuggestField
>   Analyzer analyzer = ...
>   IndexWriterConfig config = new IndexWriterConfig(analyzer);
>   Codec codec = new Lucene50Codec() {
>     @Override
>     public PostingsFormat getPostingsFormatForField(String field) {
>       if (isSuggestField(field)) {
>         return new CompletionPostingsFormat(super.getPostingsFormatForField(field));
>       }
>       return super.getPostingsFormatForField(field);
>     }
>   };
>   config.setCodec(codec);
>   IndexWriter writer = new IndexWriter(dir, config);
>   // index some documents with suggestions
>   Document doc = new Document();
>   doc.add(new SuggestField("suggest_title", "title1", 2));
>   doc.add(new SuggestField("suggest_name", "name1", 3));
>   writer.addDocument(doc)
>   ...
>   // open an nrt reader for the directory
>   DirectoryReader reader = DirectoryReader.open(writer, false);
>   // SuggestIndexSearcher is a thin wrapper over IndexSearcher
>   // queryAnalyzer will be used to analyze the query string
>   SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer);
>   
>   // suggest 10 documents for "titl" on "suggest_title" field
>   TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
> {code}
> h4. Indexing
> Index analyzer set through *IndexWriterConfig*
> {code:java}
> SuggestField(String name, String value, long weight) 
> {code}
> h4. Query
> Query analyzer set through *SuggestIndexSearcher*.
> Hits are collected in descending order of the suggestion's weight 
> {code:java}
> // full options for TopSuggestDocs (TopDocs)
> TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter)
> // full options for Collector
> // note: only collects does not score
> void suggest(String field, CharSequence key, int maxNumPerLeaf, Filter filter, Collector collector)
> {code}
> h4. Analyzer
> *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. 
> {code:java}
> CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer);
> completionAnalyzer.setPreserveSep(..)
> completionAnalyzer.setPreservePositionsIncrements(..)
> completionAnalyzer.setMaxGraphExpansions(..)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org