You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Areek Zillur (JIRA)" <ji...@apache.org> on 2015/05/09 23:43:01 UTC

[jira] [Comment Edited] (LUCENE-6459) [suggest] Query Interface for suggest API

    [ https://issues.apache.org/jira/browse/LUCENE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536908#comment-14536908 ] 

Areek Zillur edited comment on LUCENE-6459 at 5/9/15 9:42 PM:
--------------------------------------------------------------

Thanks [~mikemccand] for taking a look!
{quote}
It seems like the overall idea is to make a generic index-time and
search-time API that other suggesters could use, but for now it's just
NRTSuggester using it? Do we expect the non-document based suggesters
to also eventually be able to use this API?
{quote}

At the moment, only NRTSuggester is using it. We should use this API for other suggesters, 
but maybe in a separate issue :). For [LUCENE-6464|https://issues.apache.org/jira/browse/LUCENE-6464], the existing {{ContextQuery}} needs 
to add support for {{BooleanClause.Occur}}.

{quote}
This patch also adds new capabilities to NRTSuggester, like fuzzy and
regexp suggestions? What other new functions are exposed? What
use-cases do you see for RegexCompletionQuery?
{quote} 

In terms of new functionality, fuzzy, regex and context queries are added. One thing to
note, for fuzzy and context queries, the suggestion scores are influenced by their common 
prefix length (w.r.t. query term) and query-time context boosts respectively, 
along with index-time suggestion weights.

IMO RegexCompletionQuery can be used to query multiple prefixes at one go. This can allow 
for simpler query analyzer, but still give the power to query for synonyms, domain-specific typos etc. 
In the future, we can also add boosting  (like in ContextQuery) where query-time boosts can be 
specified for some matched prefixes of the regex pattern.

{quote}
Should FSTPath.toString also include the context?
{quote}
It should, will change.

{quote}
but there are some differences, e.g. we pass an Analyzer to the
completion queries (so they can build the automaton)
{quote}

open to suggestions on improving this :)

{quote}
If you try to use ContextQuery against a field that you had not
indexed contexts with (using ContextSuggestField) do you see any
error? Maybe this is too hard.
{quote}
 
There should not be any error. A ContextQuery will never be run on a SuggestField, 
CompletionQuery rewrites appropriately given the type of the field (context-enabled or not). 
This also makes non-context queries work as expected when run against ContextSuggestField 
(as in the query is wrapped as a ContextQuery with no context filtering/boosting).

If a ContextSuggestField is indexed with no context, then a null context is extracted at query 
time for the entry. Fields with no context will only be returned, if a wildcard context '*' is 
specified (default behaviour of ContextQuery). 

{quote}
Are you allowed to mix ContextSuggestField and SuggestField even for
the same field name, within one suggester?
{quote}

No you are not. If mixed, CompletionQuery rewrite will throw IllegalStateException 
when a query is run against a mixed field. Ideally, it should error out on indexing?



was (Author: areek):
Thanks [~mikemccand] for taking a look!
{quote}
It seems like the overall idea is to make a generic index-time and
search-time API that other suggesters could use, but for now it's just
NRTSuggester using it? Do we expect the non-document based suggesters
to also eventually be able to use this API?
{quote}

At the moment, only NRTSuggester is using it. We should use this API for other suggesters, 
but maybe in a separate issue :). For [LUCENE-6464|https://issues.apache.org/jira/browse/LUCENE-6464], the existing {{ContextQuery}} needs 
to add support for {{BooleanClause.Occur}}.

{quote}
This patch also adds new capabilities to NRTSuggester, like fuzzy and
regexp suggestions? What other new functions are exposed? What
use-cases do you see for RegexCompletionQuery?
{quote} 

In terms of new functionality, fuzzy, regex and context queries are added. One thing to
note, for fuzzy and context queries, the suggestion scores are influenced by their common 
prefix length (w.r.t. query term) and query-time context boosts respectively, 
along with index-time suggestion weights.

IMO RegexCompletionQuery can be used to query multiple prefixes at one go. This can allow 
for simpler query analyzer, but still give the power to query for synonyms, domain-specific typos etc. 
In the future, we can also add boosting  (like in ContextQuery) where query-time boosts can be 
specified for matched entries from the regex pattern.

{quote}
Should FSTPath.toString also include the context?
{quote}
It should, will change.

{quote}
but there are some differences, e.g. we pass an Analyzer to the
completion queries (so they can build the automaton)
{quote}

open to suggestions on improving this :)

{quote}
If you try to use ContextQuery against a field that you had not
indexed contexts with (using ContextSuggestField) do you see any
error? Maybe this is too hard.
{quote}
 
There should not be any error. A ContextQuery will never be run on a SuggestField, 
CompletionQuery rewrites appropriately given the type of the field (context-enabled or not). 
This also makes non-context queries work as expected when run against ContextSuggestField 
(as in the query is wrapped as a ContextQuery with no context filtering/boosting).

If a ContextSuggestField is indexed with no context, then a null context is extracted at query 
time for the entry. Fields with no context will only be returned, if a wildcard context '*' is 
specified (default behaviour of ContextQuery). 

{quote}
Are you allowed to mix ContextSuggestField and SuggestField even for
the same field name, within one suggester?
{quote}

No you are not. If mixed, CompletionQuery rewrite will throw IllegalStateException 
when a query is run against a mixed field. Ideally, it should error out on indexing?


> [suggest] Query Interface for suggest API
> -----------------------------------------
>
>                 Key: LUCENE-6459
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6459
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 5.1
>            Reporter: Areek Zillur
>            Assignee: Areek Zillur
>             Fix For: Trunk, 5.x, 5.1
>
>         Attachments: LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch
>
>
> This patch factors out common indexing/search API used by the recently introduced [NRTSuggester|https://issues.apache.org/jira/browse/LUCENE-6339]. The motivation is to provide a query interface for FST-based fields (*SuggestField* and *ContextSuggestField*) for enabling suggestion scoring and more powerful automaton queries. 
> Previously, only prefix ‘queries’ with index-time weights were supported but we can also support:
> * Prefix queries expressed as regular expressions:  get suggestions that match multiple prefixes
>       ** Example: _star\[wa\|tr\]_ matches _starwars_ and _startrek_
> * Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions scored by how close they are to the query prefix
>     ** Example: querying for _seper_ will score _separate_ higher then _superstitious_
> * Context Queries: get suggestions boosted and/or filtered based on their indexed contexts (meta data)
>     ** Example: get typo tolerant suggestions on song names with prefix _like a roling_ boosting songs with genre _rock_ and _indie_
>     ** Example: get suggestion on all file names starting with _finan_ only for _user1_ and _user2_
> h3. Suggest API
> {code}
> SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader);
> CompletionQuery query = ...
> TopSuggestDocs suggest = searcher.suggest(query, num);
> {code}
> h3. CompletionQuery
> *CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*. A *CompletionQuery* produces a *CompletionWeight*, which allows *CompletionQuery* implementations to pass in an automaton that will be intersected with a FST and allows boosting and meta data extraction from the intersected partial paths. A *CompletionWeight* produces a *CompletionScorer*. A *CompletionScorer* executes a Top N search against the FST with the provided automaton, scoring and filtering all matched paths. 
> h4. PrefixCompletionQuery
> Return documents with values that match the prefix of an analyzed term text 
> Documents are sorted according to their suggest field weight. 
> {code}
> PrefixCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h4. RegexCompletionQuery
> Return documents with values that match the prefix of a regular expression
> Documents are sorted according to their suggest field weight.
> {code}
> RegexCompletionQuery(Term term)
> {code}
> h4. FuzzyCompletionQuery
> Return documents with values that has prefixes within a specified edit distance of an analyzed term text.
> Documents are ‘boosted’ by the number of matching prefix letters of the suggestion with respect to the original term text.
> {code}
> FuzzyCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all integers. 
> {{boost = # of prefix characters matched}}
> h4. ContextQuery
> Return documents that match a {{CompletionQuery}} filtered and/or boosted by provided context(s). 
> {code}
> ContextQuery(CompletionQuery query)
> contextQuery.addContext(CharSequence context, int boost, boolean exact)
> {code}
> *NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query suggestions boosted and/or filtered by contexts
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * context_boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}} are all integers
> When used with {{FuzzyCompletionQuery}},
> {{suggestion_weight + (global_maximum_weight * (context_boost + fuzzy_boost))}}
> h3. Context Suggest Field
> To use {{ContextQuery}}, use {{ContextSuggestField}} instead of {{SuggestField}}. Any {{CompletionQuery}} can be used with {{ContextSuggestField}}, the default behaviour is to return suggestions from *all* contexts. {{Context}} for every completion hit can be accessed through {{SuggestScoreDoc#context}}.
> {code}
> ContextSuggestField(String name, Collection<CharSequence> contexts, String value, int weight) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org