You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Areek Zillur (JIRA)" <ji...@apache.org> on 2015/03/04 23:47:38 UTC
[jira] [Updated] (LUCENE-6339) [suggest] Near real time Document
Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Areek Zillur updated LUCENE-6339:
---------------------------------
Description:
The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key.
Individual *SuggestField* can be assigned a numeric weight to be used to score the suggestion at query time.
Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time.
A custom postings format (CompletionPostingsFormat) is used to index *SuggestField*s and perform document suggestions.
h4. Usage
{code:java}
// hook up custom postings format
// indexAnalyzer for SuggestField
Analyzer analyzer = ...
IndexWriterConfig config = new IndexWriterConfig(analyzer);
Codec codec = new Lucene50Codec() {
@Override
public PostingsFormat getPostingsFormatForField(String field) {
if (isSuggestField(field)) {
return new CompletionPostingsFormat(super.getPostingsFormatForField(field));
}
return super.getPostingsFormatForField(field);
}
};
config.setCodec(codec);
IndexWriter writer = new IndexWriter(dir, config);
// index some documents with suggestions
Document doc = new Document();
doc.add(new SuggestField("suggest_title", "title1", 2));
doc.add(new SuggestField("suggest_name", "name1", 3));
writer.addDocument(document)
...
// open an nrt reader for the directory
DirectoryReader reader = DirectoryReader.open(writer, false);
// SuggestIndexSearcher is a thin wrapper over IndexSearcher
// queryAnalyzer will be used to analyze the query string
SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer);
// suggest 10 documents for "titl" on "suggest_title" field
TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
{code}
h4. Indexing
Index analyzer set through *IndexWriterConfig*
{code:java}
new SuggestField(name, suggestion, weight)
{code}
h4. Query
Query analyzer set through *SuggestIndexSearcher*
{code:java}
// full options for TopSuggestDocs (TopDocs)
TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, CharSequence key, int num, Filter filter)
// full options for Collector
// note: only collects does not score
suggestIndexSearcher.suggest(String field, CharSequence key, int maxNumPerLeaf, Filter filter, Collector collector)
{code}
h4. Analyzer
*CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters.
{code:java}
CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer);
completionAnalyzer.setPreserveSep(..)
completionAnalyzer.setPreservePositionsIncrements(..)
completionAnalyzer.setMaxGraphExpansions(..)
{code}
was:
The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key.
Individual *SuggestField* can be assigned a numeric weight to be used to score the suggestion at query time.
Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time.
A custom postings format (CompletionPostingsFormat) is used to index *SuggestField*s and perform document suggestions.
h4. Usage
{code:java}
// hook up custom postings format
// indexAnalyzer for SuggestField
Analyzer analyzer = ...
IndexWriterConfig config = new IndexWriterConfig(analyzer);
Codec codec = new Lucene50Codec() {
@Override
public PostingsFormat getPostingsFormatForField(String field) {
if (isSuggestField(field)) {
return new CompletionPostingsFormat(super.getPostingsFormatForField(field));
}
return super.getPostingsFormatForField(field);
}
};
config.setCodec(codec);
IndexWriter writer = new IndexWriter(dir, config);
// index some documents with suggestions
Document doc = new Document();
doc.add(new SuggestField("suggest_title", "title1", 2));
doc.add(new SuggestField("suggest_name", "name1", 3));
writer.addDocument(document)
...
// open an nrt reader for the directory
DirectoryReader reader = DirectoryReader.open(writer, false);
// SuggestIndexSearcher is a thin wrapper over IndexSearcher
// queryAnalyzer will be used to analyze the query string
SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer);
// suggest 10 documents for "titl" on "suggest_title" field
TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
{code}
h4. Indexing
Index analyzer set through *IndexWriterConfig*
{code:java}
new SuggestField(name, suggestion, weight)
{code}
h4. Query
Query analyzer set through *SuggestIndexSearcher*
{code:java}
// full options for TopSuggestDocs (TopDocs)
TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, CharSequence key, int num, Filter filter)
// full options for Collector
// note: only collects does not score
suggestIndexSearcher.suggest(String field, CharSequence key, int maxNumPerLeaf, Filter filter, Collector collector)
{code}
h4. Analyzer
*CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters.
{code:java}
CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer);
completionAnalyzer.setPreserveSep(..)
completionAnalyzer.setPreservePositionsIncrements(..)
completionAnalyzer.setMaxGraphExpansions(..)
{code}
> [suggest] Near real time Document Suggester
> -------------------------------------------
>
> Key: LUCENE-6339
> URL: https://issues.apache.org/jira/browse/LUCENE-6339
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/search
> Affects Versions: 5.0
> Reporter: Areek Zillur
> Assignee: Areek Zillur
> Fix For: 5.0
>
>
> The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key.
> Individual *SuggestField* can be assigned a numeric weight to be used to score the suggestion at query time.
> Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time.
> A custom postings format (CompletionPostingsFormat) is used to index *SuggestField*s and perform document suggestions.
> h4. Usage
> {code:java}
> // hook up custom postings format
> // indexAnalyzer for SuggestField
> Analyzer analyzer = ...
> IndexWriterConfig config = new IndexWriterConfig(analyzer);
> Codec codec = new Lucene50Codec() {
> @Override
> public PostingsFormat getPostingsFormatForField(String field) {
> if (isSuggestField(field)) {
> return new CompletionPostingsFormat(super.getPostingsFormatForField(field));
> }
> return super.getPostingsFormatForField(field);
> }
> };
> config.setCodec(codec);
> IndexWriter writer = new IndexWriter(dir, config);
> // index some documents with suggestions
> Document doc = new Document();
> doc.add(new SuggestField("suggest_title", "title1", 2));
> doc.add(new SuggestField("suggest_name", "name1", 3));
> writer.addDocument(document)
> ...
> // open an nrt reader for the directory
> DirectoryReader reader = DirectoryReader.open(writer, false);
> // SuggestIndexSearcher is a thin wrapper over IndexSearcher
> // queryAnalyzer will be used to analyze the query string
> SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer);
>
> // suggest 10 documents for "titl" on "suggest_title" field
> TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
> {code}
> h4. Indexing
> Index analyzer set through *IndexWriterConfig*
> {code:java}
> new SuggestField(name, suggestion, weight)
> {code}
> h4. Query
> Query analyzer set through *SuggestIndexSearcher*
> {code:java}
> // full options for TopSuggestDocs (TopDocs)
> TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, CharSequence key, int num, Filter filter)
> // full options for Collector
> // note: only collects does not score
> suggestIndexSearcher.suggest(String field, CharSequence key, int maxNumPerLeaf, Filter filter, Collector collector)
> {code}
> h4. Analyzer
> *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters.
> {code:java}
> CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer);
> completionAnalyzer.setPreserveSep(..)
> completionAnalyzer.setPreservePositionsIncrements(..)
> completionAnalyzer.setMaxGraphExpansions(..)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org