You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hamish Campbell <ha...@koordinates.com> on 2014/01/15 21:58:33 UTC

Search Suggestion Filtering

Hi all,

I'm looking into options for filtering the search suggestions dictionary.

Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a
field based dictionary, we're indexing records for a multi-tenanted SaaS
platform. SearchHandler records are always filtered by the particular
client warehouse (e.g. by domain), however we need a way to apply a similar
filter to the spell check dictionary to prevent leaking terms between
clients. In other words: when client A searches for a document title they
should not receive spelling suggestions for client B's document titles.

This has been asked a couple of times, on the mailing list and on
StackOverflow. Some of the suggested approaches:

1. Use dynamic fields to create dictionaries per-warehouse (mentioned here:
http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html
)

That might be a reasonable option for us (we already considered a similar
approach), but at what point does this stop scaling efficiently? How many
dynamic fields are too many?

2. Run a query to populate the suggestion list (also mentioned in that
thread)

If I understand this correctly, this would give us a lot of flexibility and
power: for example to give a more nuanced result set using the users
permissions to expose private documents in their spelling suggestions.

I expect this would be a slow query, but our total document count is
currently relatively small (on the order of 10^3 objects) and I imagine you
could create a specific word index with the appropriate fields to keep this
in check. Is this a feasible approach, and if so, how do you build a
dynamic suggestion list?

3. Other options:

It seems like this is a common problem - and we could through some
resources at building an extension to provide some limited suggestion
dictionary filtering. Is anyone already doing something similar, or has
found a clever hack around this, or can suggest a starting point?

Thanks everyone!

-- 
Hamish Campbell
Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
PH   +64 9 966 0433
FAX +64 9 966 0045

Re: Search Suggestion Filtering

Posted by Areek Zillur <ar...@gmail.com>.
Hey Hamish,

You might want to check this out LUCENE-5402 . I added support for
index-time pruning for suggesters that consumes from the index itself.
I plan to add this support to file-based suggesters as well.
In order to use this functionality from Solr, more changes are required. I
am planning to support this in the new SuggesterComponent (SOLR-5378) in
Solr.

Hope that helps!

Areek


On Wed, Jan 15, 2014 at 6:10 PM, Hamish Campbell <
hamish.campbell@koordinates.com> wrote:

> Thanks Tomás, I'll take a look.
>
> Still interested to hear from anyone about using queries to populate the
> list - I'm willing to give up a bit of performance for the flexibility it
> would provide.
>
>
> On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe <
> tomasflobbe@gmail.com> wrote:
>
> > I think your use case is the one described in LUCENE-5350, maybe you want
> > to take a look to the patch and comments there.
> >
> > Tomás
> >
> >
> > On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell <
> > hamish.campbell@koordinates.com> wrote:
> >
> > > Hi all,
> > >
> > > I'm looking into options for filtering the search suggestions
> dictionary.
> > >
> > > Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using
> a
> > > field based dictionary, we're indexing records for a multi-tenanted
> SaaS
> > > platform. SearchHandler records are always filtered by the particular
> > > client warehouse (e.g. by domain), however we need a way to apply a
> > similar
> > > filter to the spell check dictionary to prevent leaking terms between
> > > clients. In other words: when client A searches for a document title
> they
> > > should not receive spelling suggestions for client B's document titles.
> > >
> > > This has been asked a couple of times, on the mailing list and on
> > > StackOverflow. Some of the suggested approaches:
> > >
> > > 1. Use dynamic fields to create dictionaries per-warehouse (mentioned
> > here:
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html
> > > )
> > >
> > > That might be a reasonable option for us (we already considered a
> similar
> > > approach), but at what point does this stop scaling efficiently? How
> many
> > > dynamic fields are too many?
> > >
> > > 2. Run a query to populate the suggestion list (also mentioned in that
> > > thread)
> > >
> > > If I understand this correctly, this would give us a lot of flexibility
> > and
> > > power: for example to give a more nuanced result set using the users
> > > permissions to expose private documents in their spelling suggestions.
> > >
> > > I expect this would be a slow query, but our total document count is
> > > currently relatively small (on the order of 10^3 objects) and I imagine
> > you
> > > could create a specific word index with the appropriate fields to keep
> > this
> > > in check. Is this a feasible approach, and if so, how do you build a
> > > dynamic suggestion list?
> > >
> > > 3. Other options:
> > >
> > > It seems like this is a common problem - and we could through some
> > > resources at building an extension to provide some limited suggestion
> > > dictionary filtering. Is anyone already doing something similar, or has
> > > found a clever hack around this, or can suggest a starting point?
> > >
> > > Thanks everyone!
> > >
> > > --
> > > Hamish Campbell
> > > Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> > > PH   +64 9 966 0433
> > > FAX +64 9 966 0045
> > >
> >
>
>
>
> --
> Hamish Campbell
> Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> PH   +64 9 966 0433
> FAX +64 9 966 0045
>

Re: Search Suggestion Filtering

Posted by Areek Zillur <ar...@gmail.com>.
Regarding LUCENE-5350, the context is the filter. i.e. the context is
prefixed with every entry (suggestion) that is in the context. So when
users lookup "foo" entry in context of "bar", the actual lookup is
bar(ctx_seperator)foo.  This filters entries that match "foo" in another
context in the lookup. For details on use-cases see the description of the
jira.

Regarding SOLR-5378, the final documentation is unfortunately unavailable
as of now (will try to fix that ASAP). For now, you can look at the
description of SOLR-5378 (outdated response format) along with the
description of SOLR-5529 (updated response format). I have linked the
related JIRAs in SOLR-5378.

Hope that helps,
Areek


On Mon, Jan 20, 2014 at 2:23 AM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> Hi guys, following this thread I have some question :
>
> 1) regarding LUCENE-5350, what is the context quoted ? Is it the context a
> filter query ?
>
> 2) regarding https://issues.apache.org/jira/browse/SOLR-5378, do we have
> the final documentation available ?
>
> Cheers
>
>
> 2014/1/16 Hamish Campbell <ha...@koordinates.com>
>
> > Thank you Jorge. We looked at phrase suggestions from previous user
> > queries, but they're not so useful in our case. However, I have a
> follow-up
> > question about similar functionality that I'll post shortly.
> >
> > The list might like to know that I've come up with a quick and
> exceedingly
> > dirty <strike>hack</strike> solution that works for our limited case.
> >
> > You have been warned!
> >
> > Note that we're using django-haystack to actually interact with Solr:
> >
> > 1. Set nonFuzzyPrefix of the Suggester to 4.
> > 2. At index time, the haystack index will build suggestion terms by
> > extracting the relevant terms and prefixing with a 4 (alpha) character
> > reference for the target instance.
> > 3. At search time, the user's query is split, terms are prefixed and
> > concatenated. The new query is sent to solr and the results are cleaned
> of
> > references before returned to the front end.
> >
> > I'm not proud of it, but it works. =D
> >
> >
> >
> > On Fri, Jan 17, 2014 at 3:13 AM, Jorge Luis Betancourt González <
> > jlbetancourt@uci.cu> wrote:
> >
> > > In a custom application we have, we use a separated core (under Solr
> > > 3.6.1) to store the queries used by the users and then provide the
> > > autocomplete feauture. In our case we need to filter some phrases, that
> > we
> > > don't need to be suggested to the users. I build a custom
> > > UpdateRequestProcessor to implement this logic, so we define this
> > "blocking
> > > patterns" in some external source of information (DB, files, etc.). For
> > the
> > > suggestions per-se we use as a base
> > > https://github.com/cominvent/autocomplete configuration, described in
> > > www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/
> > > which is pretty usable as it comes. I found (personally) this approach
> > way
> > > more flexible than the original suggester component, but it involves
> > > storing the user's queries into a separated core.
> > >
> > > Greetings,
> > >
> > > ----- Original Message -----
> > > From: "Hamish Campbell" <ha...@koordinates.com>
> > > To: solr-user@lucene.apache.org
> > > Sent: Wednesday, January 15, 2014 9:10:16 PM
> > > Subject: Re: Search Suggestion Filtering
> > >
> > > Thanks Tomás, I'll take a look.
> > >
> > > Still interested to hear from anyone about using queries to populate
> the
> > > list - I'm willing to give up a bit of performance for the flexibility
> it
> > > would provide.
> > >
> > >
> > > On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe <
> > > tomasflobbe@gmail.com> wrote:
> > >
> > > > I think your use case is the one described in LUCENE-5350, maybe you
> > want
> > > > to take a look to the patch and comments there.
> > > >
> > > > Tomás
> > > >
> > > >
> > > > On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell <
> > > > hamish.campbell@koordinates.com> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'm looking into options for filtering the search suggestions
> > > dictionary.
> > > > >
> > > > > Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory
> > using
> > > a
> > > > > field based dictionary, we're indexing records for a multi-tenanted
> > > SaaS
> > > > > platform. SearchHandler records are always filtered by the
> particular
> > > > > client warehouse (e.g. by domain), however we need a way to apply a
> > > > similar
> > > > > filter to the spell check dictionary to prevent leaking terms
> between
> > > > > clients. In other words: when client A searches for a document
> title
> > > they
> > > > > should not receive spelling suggestions for client B's document
> > titles.
> > > > >
> > > > > This has been asked a couple of times, on the mailing list and on
> > > > > StackOverflow. Some of the suggested approaches:
> > > > >
> > > > > 1. Use dynamic fields to create dictionaries per-warehouse
> (mentioned
> > > > here:
> > > > >
> > > > >
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html
> > > > > )
> > > > >
> > > > > That might be a reasonable option for us (we already considered a
> > > similar
> > > > > approach), but at what point does this stop scaling efficiently?
> How
> > > many
> > > > > dynamic fields are too many?
> > > > >
> > > > > 2. Run a query to populate the suggestion list (also mentioned in
> > that
> > > > > thread)
> > > > >
> > > > > If I understand this correctly, this would give us a lot of
> > flexibility
> > > > and
> > > > > power: for example to give a more nuanced result set using the
> users
> > > > > permissions to expose private documents in their spelling
> > suggestions.
> > > > >
> > > > > I expect this would be a slow query, but our total document count
> is
> > > > > currently relatively small (on the order of 10^3 objects) and I
> > imagine
> > > > you
> > > > > could create a specific word index with the appropriate fields to
> > keep
> > > > this
> > > > > in check. Is this a feasible approach, and if so, how do you build
> a
> > > > > dynamic suggestion list?
> > > > >
> > > > > 3. Other options:
> > > > >
> > > > > It seems like this is a common problem - and we could through some
> > > > > resources at building an extension to provide some limited
> suggestion
> > > > > dictionary filtering. Is anyone already doing something similar, or
> > has
> > > > > found a clever hack around this, or can suggest a starting point?
> > > > >
> > > > > Thanks everyone!
> > > > >
> > > > > --
> > > > > Hamish Campbell
> > > > > Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> > > > > PH   +64 9 966 0433
> > > > > FAX +64 9 966 0045
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Hamish Campbell
> > > Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> > > PH   +64 9 966 0433
> > > FAX +64 9 966 0045
> > >
> > >
> >
> ________________________________________________________________________________________________
> > > III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero
> > > del 2014. Ver www.uci.cu
> > >
> >
> >
> >
> > --
> > Hamish Campbell
> > Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> > PH   +64 9 966 0433
> > FAX +64 9 966 0045
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Search Suggestion Filtering

Posted by Alessandro Benedetti <be...@gmail.com>.
Hi guys, following this thread I have some question :

1) regarding LUCENE-5350, what is the context quoted ? Is it the context a
filter query ?

2) regarding https://issues.apache.org/jira/browse/SOLR-5378, do we have
the final documentation available ?

Cheers


2014/1/16 Hamish Campbell <ha...@koordinates.com>

> Thank you Jorge. We looked at phrase suggestions from previous user
> queries, but they're not so useful in our case. However, I have a follow-up
> question about similar functionality that I'll post shortly.
>
> The list might like to know that I've come up with a quick and exceedingly
> dirty <strike>hack</strike> solution that works for our limited case.
>
> You have been warned!
>
> Note that we're using django-haystack to actually interact with Solr:
>
> 1. Set nonFuzzyPrefix of the Suggester to 4.
> 2. At index time, the haystack index will build suggestion terms by
> extracting the relevant terms and prefixing with a 4 (alpha) character
> reference for the target instance.
> 3. At search time, the user's query is split, terms are prefixed and
> concatenated. The new query is sent to solr and the results are cleaned of
> references before returned to the front end.
>
> I'm not proud of it, but it works. =D
>
>
>
> On Fri, Jan 17, 2014 at 3:13 AM, Jorge Luis Betancourt González <
> jlbetancourt@uci.cu> wrote:
>
> > In a custom application we have, we use a separated core (under Solr
> > 3.6.1) to store the queries used by the users and then provide the
> > autocomplete feauture. In our case we need to filter some phrases, that
> we
> > don't need to be suggested to the users. I build a custom
> > UpdateRequestProcessor to implement this logic, so we define this
> "blocking
> > patterns" in some external source of information (DB, files, etc.). For
> the
> > suggestions per-se we use as a base
> > https://github.com/cominvent/autocomplete configuration, described in
> > www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/
> > which is pretty usable as it comes. I found (personally) this approach
> way
> > more flexible than the original suggester component, but it involves
> > storing the user's queries into a separated core.
> >
> > Greetings,
> >
> > ----- Original Message -----
> > From: "Hamish Campbell" <ha...@koordinates.com>
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, January 15, 2014 9:10:16 PM
> > Subject: Re: Search Suggestion Filtering
> >
> > Thanks Tomás, I'll take a look.
> >
> > Still interested to hear from anyone about using queries to populate the
> > list - I'm willing to give up a bit of performance for the flexibility it
> > would provide.
> >
> >
> > On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe <
> > tomasflobbe@gmail.com> wrote:
> >
> > > I think your use case is the one described in LUCENE-5350, maybe you
> want
> > > to take a look to the patch and comments there.
> > >
> > > Tomás
> > >
> > >
> > > On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell <
> > > hamish.campbell@koordinates.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'm looking into options for filtering the search suggestions
> > dictionary.
> > > >
> > > > Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory
> using
> > a
> > > > field based dictionary, we're indexing records for a multi-tenanted
> > SaaS
> > > > platform. SearchHandler records are always filtered by the particular
> > > > client warehouse (e.g. by domain), however we need a way to apply a
> > > similar
> > > > filter to the spell check dictionary to prevent leaking terms between
> > > > clients. In other words: when client A searches for a document title
> > they
> > > > should not receive spelling suggestions for client B's document
> titles.
> > > >
> > > > This has been asked a couple of times, on the mailing list and on
> > > > StackOverflow. Some of the suggested approaches:
> > > >
> > > > 1. Use dynamic fields to create dictionaries per-warehouse (mentioned
> > > here:
> > > >
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html
> > > > )
> > > >
> > > > That might be a reasonable option for us (we already considered a
> > similar
> > > > approach), but at what point does this stop scaling efficiently? How
> > many
> > > > dynamic fields are too many?
> > > >
> > > > 2. Run a query to populate the suggestion list (also mentioned in
> that
> > > > thread)
> > > >
> > > > If I understand this correctly, this would give us a lot of
> flexibility
> > > and
> > > > power: for example to give a more nuanced result set using the users
> > > > permissions to expose private documents in their spelling
> suggestions.
> > > >
> > > > I expect this would be a slow query, but our total document count is
> > > > currently relatively small (on the order of 10^3 objects) and I
> imagine
> > > you
> > > > could create a specific word index with the appropriate fields to
> keep
> > > this
> > > > in check. Is this a feasible approach, and if so, how do you build a
> > > > dynamic suggestion list?
> > > >
> > > > 3. Other options:
> > > >
> > > > It seems like this is a common problem - and we could through some
> > > > resources at building an extension to provide some limited suggestion
> > > > dictionary filtering. Is anyone already doing something similar, or
> has
> > > > found a clever hack around this, or can suggest a starting point?
> > > >
> > > > Thanks everyone!
> > > >
> > > > --
> > > > Hamish Campbell
> > > > Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> > > > PH   +64 9 966 0433
> > > > FAX +64 9 966 0045
> > > >
> > >
> >
> >
> >
> > --
> > Hamish Campbell
> > Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> > PH   +64 9 966 0433
> > FAX +64 9 966 0045
> >
> >
> ________________________________________________________________________________________________
> > III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero
> > del 2014. Ver www.uci.cu
> >
>
>
>
> --
> Hamish Campbell
> Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> PH   +64 9 966 0433
> FAX +64 9 966 0045
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Search Suggestion Filtering

Posted by Hamish Campbell <ha...@koordinates.com>.
Thank you Jorge. We looked at phrase suggestions from previous user
queries, but they're not so useful in our case. However, I have a follow-up
question about similar functionality that I'll post shortly.

The list might like to know that I've come up with a quick and exceedingly
dirty <strike>hack</strike> solution that works for our limited case.

You have been warned!

Note that we're using django-haystack to actually interact with Solr:

1. Set nonFuzzyPrefix of the Suggester to 4.
2. At index time, the haystack index will build suggestion terms by
extracting the relevant terms and prefixing with a 4 (alpha) character
reference for the target instance.
3. At search time, the user's query is split, terms are prefixed and
concatenated. The new query is sent to solr and the results are cleaned of
references before returned to the front end.

I'm not proud of it, but it works. =D



On Fri, Jan 17, 2014 at 3:13 AM, Jorge Luis Betancourt González <
jlbetancourt@uci.cu> wrote:

> In a custom application we have, we use a separated core (under Solr
> 3.6.1) to store the queries used by the users and then provide the
> autocomplete feauture. In our case we need to filter some phrases, that we
> don't need to be suggested to the users. I build a custom
> UpdateRequestProcessor to implement this logic, so we define this "blocking
> patterns" in some external source of information (DB, files, etc.). For the
> suggestions per-se we use as a base
> https://github.com/cominvent/autocomplete configuration, described in
> www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/
> which is pretty usable as it comes. I found (personally) this approach way
> more flexible than the original suggester component, but it involves
> storing the user's queries into a separated core.
>
> Greetings,
>
> ----- Original Message -----
> From: "Hamish Campbell" <ha...@koordinates.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, January 15, 2014 9:10:16 PM
> Subject: Re: Search Suggestion Filtering
>
> Thanks Tomás, I'll take a look.
>
> Still interested to hear from anyone about using queries to populate the
> list - I'm willing to give up a bit of performance for the flexibility it
> would provide.
>
>
> On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe <
> tomasflobbe@gmail.com> wrote:
>
> > I think your use case is the one described in LUCENE-5350, maybe you want
> > to take a look to the patch and comments there.
> >
> > Tomás
> >
> >
> > On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell <
> > hamish.campbell@koordinates.com> wrote:
> >
> > > Hi all,
> > >
> > > I'm looking into options for filtering the search suggestions
> dictionary.
> > >
> > > Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using
> a
> > > field based dictionary, we're indexing records for a multi-tenanted
> SaaS
> > > platform. SearchHandler records are always filtered by the particular
> > > client warehouse (e.g. by domain), however we need a way to apply a
> > similar
> > > filter to the spell check dictionary to prevent leaking terms between
> > > clients. In other words: when client A searches for a document title
> they
> > > should not receive spelling suggestions for client B's document titles.
> > >
> > > This has been asked a couple of times, on the mailing list and on
> > > StackOverflow. Some of the suggested approaches:
> > >
> > > 1. Use dynamic fields to create dictionaries per-warehouse (mentioned
> > here:
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html
> > > )
> > >
> > > That might be a reasonable option for us (we already considered a
> similar
> > > approach), but at what point does this stop scaling efficiently? How
> many
> > > dynamic fields are too many?
> > >
> > > 2. Run a query to populate the suggestion list (also mentioned in that
> > > thread)
> > >
> > > If I understand this correctly, this would give us a lot of flexibility
> > and
> > > power: for example to give a more nuanced result set using the users
> > > permissions to expose private documents in their spelling suggestions.
> > >
> > > I expect this would be a slow query, but our total document count is
> > > currently relatively small (on the order of 10^3 objects) and I imagine
> > you
> > > could create a specific word index with the appropriate fields to keep
> > this
> > > in check. Is this a feasible approach, and if so, how do you build a
> > > dynamic suggestion list?
> > >
> > > 3. Other options:
> > >
> > > It seems like this is a common problem - and we could through some
> > > resources at building an extension to provide some limited suggestion
> > > dictionary filtering. Is anyone already doing something similar, or has
> > > found a clever hack around this, or can suggest a starting point?
> > >
> > > Thanks everyone!
> > >
> > > --
> > > Hamish Campbell
> > > Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> > > PH   +64 9 966 0433
> > > FAX +64 9 966 0045
> > >
> >
>
>
>
> --
> Hamish Campbell
> Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> PH   +64 9 966 0433
> FAX +64 9 966 0045
>
> ________________________________________________________________________________________________
> III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero
> del 2014. Ver www.uci.cu
>



-- 
Hamish Campbell
Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
PH   +64 9 966 0433
FAX +64 9 966 0045

Re: Search Suggestion Filtering

Posted by Jorge Luis Betancourt González <jl...@uci.cu>.
In a custom application we have, we use a separated core (under Solr 3.6.1) to store the queries used by the users and then provide the autocomplete feauture. In our case we need to filter some phrases, that we don't need to be suggested to the users. I build a custom UpdateRequestProcessor to implement this logic, so we define this "blocking patterns" in some external source of information (DB, files, etc.). For the suggestions per-se we use as a base https://github.com/cominvent/autocomplete‎ configuration, described in www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/‎ which is pretty usable as it comes. I found (personally) this approach way more flexible than the original suggester component, but it involves storing the user's queries into a separated core.

Greetings,

----- Original Message -----
From: "Hamish Campbell" <ha...@koordinates.com>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 15, 2014 9:10:16 PM
Subject: Re: Search Suggestion Filtering

Thanks Tomás, I'll take a look.

Still interested to hear from anyone about using queries to populate the
list - I'm willing to give up a bit of performance for the flexibility it
would provide.


On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe <
tomasflobbe@gmail.com> wrote:

> I think your use case is the one described in LUCENE-5350, maybe you want
> to take a look to the patch and comments there.
>
> Tomás
>
>
> On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell <
> hamish.campbell@koordinates.com> wrote:
>
> > Hi all,
> >
> > I'm looking into options for filtering the search suggestions dictionary.
> >
> > Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a
> > field based dictionary, we're indexing records for a multi-tenanted SaaS
> > platform. SearchHandler records are always filtered by the particular
> > client warehouse (e.g. by domain), however we need a way to apply a
> similar
> > filter to the spell check dictionary to prevent leaking terms between
> > clients. In other words: when client A searches for a document title they
> > should not receive spelling suggestions for client B's document titles.
> >
> > This has been asked a couple of times, on the mailing list and on
> > StackOverflow. Some of the suggested approaches:
> >
> > 1. Use dynamic fields to create dictionaries per-warehouse (mentioned
> here:
> >
> >
> http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html
> > )
> >
> > That might be a reasonable option for us (we already considered a similar
> > approach), but at what point does this stop scaling efficiently? How many
> > dynamic fields are too many?
> >
> > 2. Run a query to populate the suggestion list (also mentioned in that
> > thread)
> >
> > If I understand this correctly, this would give us a lot of flexibility
> and
> > power: for example to give a more nuanced result set using the users
> > permissions to expose private documents in their spelling suggestions.
> >
> > I expect this would be a slow query, but our total document count is
> > currently relatively small (on the order of 10^3 objects) and I imagine
> you
> > could create a specific word index with the appropriate fields to keep
> this
> > in check. Is this a feasible approach, and if so, how do you build a
> > dynamic suggestion list?
> >
> > 3. Other options:
> >
> > It seems like this is a common problem - and we could through some
> > resources at building an extension to provide some limited suggestion
> > dictionary filtering. Is anyone already doing something similar, or has
> > found a clever hack around this, or can suggest a starting point?
> >
> > Thanks everyone!
> >
> > --
> > Hamish Campbell
> > Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> > PH   +64 9 966 0433
> > FAX +64 9 966 0045
> >
>



-- 
Hamish Campbell
Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
PH   +64 9 966 0433
FAX +64 9 966 0045
________________________________________________________________________________________________
III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu

Re: Search Suggestion Filtering

Posted by Hamish Campbell <ha...@koordinates.com>.
Thanks Tomás, I'll take a look.

Still interested to hear from anyone about using queries to populate the
list - I'm willing to give up a bit of performance for the flexibility it
would provide.


On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe <
tomasflobbe@gmail.com> wrote:

> I think your use case is the one described in LUCENE-5350, maybe you want
> to take a look to the patch and comments there.
>
> Tomás
>
>
> On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell <
> hamish.campbell@koordinates.com> wrote:
>
> > Hi all,
> >
> > I'm looking into options for filtering the search suggestions dictionary.
> >
> > Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a
> > field based dictionary, we're indexing records for a multi-tenanted SaaS
> > platform. SearchHandler records are always filtered by the particular
> > client warehouse (e.g. by domain), however we need a way to apply a
> similar
> > filter to the spell check dictionary to prevent leaking terms between
> > clients. In other words: when client A searches for a document title they
> > should not receive spelling suggestions for client B's document titles.
> >
> > This has been asked a couple of times, on the mailing list and on
> > StackOverflow. Some of the suggested approaches:
> >
> > 1. Use dynamic fields to create dictionaries per-warehouse (mentioned
> here:
> >
> >
> http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html
> > )
> >
> > That might be a reasonable option for us (we already considered a similar
> > approach), but at what point does this stop scaling efficiently? How many
> > dynamic fields are too many?
> >
> > 2. Run a query to populate the suggestion list (also mentioned in that
> > thread)
> >
> > If I understand this correctly, this would give us a lot of flexibility
> and
> > power: for example to give a more nuanced result set using the users
> > permissions to expose private documents in their spelling suggestions.
> >
> > I expect this would be a slow query, but our total document count is
> > currently relatively small (on the order of 10^3 objects) and I imagine
> you
> > could create a specific word index with the appropriate fields to keep
> this
> > in check. Is this a feasible approach, and if so, how do you build a
> > dynamic suggestion list?
> >
> > 3. Other options:
> >
> > It seems like this is a common problem - and we could through some
> > resources at building an extension to provide some limited suggestion
> > dictionary filtering. Is anyone already doing something similar, or has
> > found a clever hack around this, or can suggest a starting point?
> >
> > Thanks everyone!
> >
> > --
> > Hamish Campbell
> > Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> > PH   +64 9 966 0433
> > FAX +64 9 966 0045
> >
>



-- 
Hamish Campbell
Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
PH   +64 9 966 0433
FAX +64 9 966 0045

Re: Search Suggestion Filtering

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
I think your use case is the one described in LUCENE-5350, maybe you want
to take a look to the patch and comments there.

Tomás


On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell <
hamish.campbell@koordinates.com> wrote:

> Hi all,
>
> I'm looking into options for filtering the search suggestions dictionary.
>
> Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using a
> field based dictionary, we're indexing records for a multi-tenanted SaaS
> platform. SearchHandler records are always filtered by the particular
> client warehouse (e.g. by domain), however we need a way to apply a similar
> filter to the spell check dictionary to prevent leaking terms between
> clients. In other words: when client A searches for a document title they
> should not receive spelling suggestions for client B's document titles.
>
> This has been asked a couple of times, on the mailing list and on
> StackOverflow. Some of the suggested approaches:
>
> 1. Use dynamic fields to create dictionaries per-warehouse (mentioned here:
>
> http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html
> )
>
> That might be a reasonable option for us (we already considered a similar
> approach), but at what point does this stop scaling efficiently? How many
> dynamic fields are too many?
>
> 2. Run a query to populate the suggestion list (also mentioned in that
> thread)
>
> If I understand this correctly, this would give us a lot of flexibility and
> power: for example to give a more nuanced result set using the users
> permissions to expose private documents in their spelling suggestions.
>
> I expect this would be a slow query, but our total document count is
> currently relatively small (on the order of 10^3 objects) and I imagine you
> could create a specific word index with the appropriate fields to keep this
> in check. Is this a feasible approach, and if so, how do you build a
> dynamic suggestion list?
>
> 3. Other options:
>
> It seems like this is a common problem - and we could through some
> resources at building an extension to provide some limited suggestion
> dictionary filtering. Is anyone already doing something similar, or has
> found a clever hack around this, or can suggest a starting point?
>
> Thanks everyone!
>
> --
> Hamish Campbell
> Koordinates Ltd <http://koordinates.com/?_bzhc=esig>
> PH   +64 9 966 0433
> FAX +64 9 966 0045
>