You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Nils Knappmeier <n....@i-views.de> on 2013/03/11 11:31:55 UTC

AutoSuggest with Query-Filters

Dear all,

I have a request to implement an auto-suggest feature for our lucene
based product.
We have upgraded to Lucene 4.1 and intend to use the AnalyzingSuggester,
but we cannot determine the correct way of using it for our request.

We have problems with two aspects:

1) The suggester should suggest original (stored) field values. The API
is be built such that a LuceneDictionary is used to provide terms to the
suggester. A Dictionary provides a BytesRefIterator, which is (i.e. in
LuceneDictionary) implemented to return the tokenized and analyzed terms
with reduced umlauts and plural forms).
How is the intended use here?

2) We do want to suggest terms that have an empty search result. There
are a number of filters that can be set (zip-code, categories). Our
problem is that there is no way to tell the suggester about these
filters. Do we have to iterate all suggested terms and check for each
one, if it provides results with the given filter settings?

Thanks in advance,
Nils Knappmeier

--
--

Nils Knappmeier | Software Engineer
intelligent views gmbh
Julius-Reiber-Str. 17 |64293 Darmstadt

Tel ++49(0)6151 - 5006-228 | Fax ++49(0)6151 - 5006-138
e-mail: n.knappmeier@i-views.de | www.i-views.de

Geschäftsführer: Achim Gärtner, Jörg Kleinz, Klaus Reichenberger Die
Gesellschaft ist eingetragen beim Amtsgericht Darmstadt (Sitz der
Gesellschaft) Nr. HRB 7965

Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorised copying, disclosure or distribution of the contents in this e-mail is strictly forbidden.

Re: AutoSuggest with Query-Filters

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Mar 11, 2013 at 7:33 AM, Nils Knappmeier
<n....@i-views.de> wrote:
> Hi,
>
>> This is tricky.
>>
>> You could build a separate suggester per category/zip code (or,
>> possibly prefix-code each suggestion with the category/zip code into
>> one suggester), but likely this will blow up (ie, if the same
>> suggestion often appears across zip codes / categories).  If your
>> suggestions are already highly orthogonal across category / zip code
>> then it may not blow up...
>>
>> Alternatively maybe you could store some info per-suggestion about
>> which zip code / category it appears in, using upcoming payloads
>> addition (see LUCENE-4820), and use that to filter each suggestion as
>> it arrives.
>>
>> But: have you confirmed this is really a problem in practice?  Ie,
>> typically suggestions have a strong a-priori rank based on eg how
>> often that query was asked (if suggestions come from your query logs,
>> like Google) or based on how popular that item is (if your suggestions
>> come from your content, like Netflix), in which case, if suggestions
>> are not that orthogonal, the risk of a bad suggestion may be very low?
>
> Maybe we had a misconception of the intended use case of the
> AnalyzingSuggester or the auto-suggest feature in general.
>
> Our suggestions should come solely from the index and not from a query log.
> I haven't even thought about using a query log as source. I think, in this
> case, it would be better to work on the index directly (using a
> PrefixTermEnum or so)...

It's fine for the source of the suggestions to be the index, but then
those input strings are necessarily whatever you had previously
indexed/analyzed/tokenized.

Ie, if you normalize accents and stem your tokens, then the input to
the suggester will be the normalized form not the surface form, and it
will suggest only those normalized forms.

Whereas the power of the AnalyzingSuggester is to take the surface
forms (unanalyzed) as input, yet make suggestions based on the
analyzed form.  So the user will see suggestions with accents and with
plurals.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: AutoSuggest with Query-Filters

Posted by Nils Knappmeier <n....@i-views.de>.

Hi,
> This is tricky.
>
> You could build a separate suggester per category/zip code (or,
> possibly prefix-code each suggestion with the category/zip code into
> one suggester), but likely this will blow up (ie, if the same
> suggestion often appears across zip codes / categories).  If your
> suggestions are already highly orthogonal across category / zip code
> then it may not blow up...
>
> Alternatively maybe you could store some info per-suggestion about
> which zip code / category it appears in, using upcoming payloads
> addition (see LUCENE-4820), and use that to filter each suggestion as
> it arrives.
>
> But: have you confirmed this is really a problem in practice?  Ie,
> typically suggestions have a strong a-priori rank based on eg how
> often that query was asked (if suggestions come from your query logs,
> like Google) or based on how popular that item is (if your suggestions
> come from your content, like Netflix), in which case, if suggestions
> are not that orthogonal, the risk of a bad suggestion may be very low?
Maybe we had a misconception of the intended use case of the 
AnalyzingSuggester or the auto-suggest feature in general.

Our suggestions should come solely from the index and not from a query 
log. I haven't even thought about using a query log as source. I think, 
in this case, it would be better to work on the index directly (using a 
PrefixTermEnum or so)...

-- 
--

Nils Knappmeier | Software Engineer
intelligent views gmbh
Julius-Reiber-Str. 17 |64293 Darmstadt

Tel ++49(0)6151 - 5006-228 | Fax ++49(0)6151 - 5006-138
e-mail: n.knappmeier@i-views.de | www.i-views.de


Gesch�ftsf�hrer: Achim G�rtner, J�rg Kleinz, Klaus Reichenberger Die
Gesellschaft ist eingetragen beim Amtsgericht Darmstadt (Sitz der
Gesellschaft) Nr. HRB 7965

Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorised copying, disclosure or distribution of the contents in this e-mail is strictly forbidden.

Re: AutoSuggest with Query-Filters

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Mar 11, 2013 at 6:31 AM, Nils Knappmeier
<n....@i-views.de> wrote:
> Dear all,
>
> I have a request to implement an auto-suggest feature for our lucene based
> product.
> We have upgraded to Lucene 4.1 and intend to use the AnalyzingSuggester, but
> we cannot determine the correct way of using it for our request.
>
> We have problems with two aspects:
>
> 1) The suggester should suggest original (stored) field values. The API is
> be built such that a LuceneDictionary is used to provide terms to the
> suggester. A Dictionary provides a BytesRefIterator, which is (i.e. in
> LuceneDictionary) implemented to return the tokenized and analyzed terms
> with reduced umlauts and plural forms).
> How is the intended use here?

You shouldn't use LuceneDictionary, since it just enumerates the
tokens from the index.

Instead, make your own TermFreqIterator that provides the original
suggestion, and pass an Analyzer to AnalyzingSuggester to normalize
the surface forms.

> 2) We do want to suggest terms that have an empty search result. There are a

I think you meant "do not"?

> number of filters that can be set (zip-code, categories). Our problem is
> that there is no way to tell the suggester about these filters. Do we have
> to iterate all suggested terms and check for each one, if it provides
> results with the given filter settings?

This is tricky.

You could build a separate suggester per category/zip code (or,
possibly prefix-code each suggestion with the category/zip code into
one suggester), but likely this will blow up (ie, if the same
suggestion often appears across zip codes / categories).  If your
suggestions are already highly orthogonal across category / zip code
then it may not blow up...

Alternatively maybe you could store some info per-suggestion about
which zip code / category it appears in, using upcoming payloads
addition (see LUCENE-4820), and use that to filter each suggestion as
it arrives.

But: have you confirmed this is really a problem in practice?  Ie,
typically suggestions have a strong a-priori rank based on eg how
often that query was asked (if suggestions come from your query logs,
like Google) or based on how popular that item is (if your suggestions
come from your content, like Netflix), in which case, if suggestions
are not that orthogonal, the risk of a bad suggestion may be very low?

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org