You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by André Maldonado <an...@gmail.com> on 2013/04/26 21:50:43 UTC

Not In query

Hi all.

We have an index with 300.000 documents and a lot, a lot of fields.

We're planning a module where users will choose some documents to exclude
from their search results. So, these documents will be excluded for UserA
and visible for UserB.

So, we have some options to do this. The simplest way is to do a "Not In"
query in document id. But we don't know the performance impact this will
have. Is this an option?

There is another reasonable way to accomplish this?

Thank's

*
----------------------------------------------------------------------------------------------
*
*"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*

 *andre.maldonado*@gmail.com <an...@gmail.com>
 (11) 9112-4227

<http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
<http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
<http://www.facebook.com/profile.php?id=100000659376883>
  <http://twitter.com/andremaldonado> <http://www.delicious.com/andre.maldonado>
  <https://profiles.google.com/105605760943701739931>
<http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3>
  <http://www.youtube.com/andremaldonado>

Re: Not In query

Posted by André Maldonado <an...@gmail.com>.
Hi Jan. Thank's again for your reply.

You're right. It is almost impossible to an user exclude 200.000 documents.

I'll do some tests with NOT IN query.

Thank you again.

*
----------------------------------------------------------------------------------------------
*
*"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*

 *andre.maldonado*@gmail.com <an...@gmail.com>
 (11) 9112-4227

<http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
<http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
<http://www.facebook.com/profile.php?id=100000659376883>
  <http://twitter.com/andremaldonado> <http://www.delicious.com/andre.maldonado>
  <https://profiles.google.com/105605760943701739931>
<http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3>
  <http://www.youtube.com/andremaldonado>



On Tue, Apr 30, 2013 at 6:09 PM, Jan Høydahl <ja...@cominvent.com> wrote:

> Hi,
>
> How, practically would a user end up with 200.000 documents excluded? Is
> there some way in your application to exclude "categories" of documents
> with one click? If so, I would index those category IDs on all docs in that
> category, and then do &fq=-cat:123 instead of adding all the individual
> docids. Anyway, I'd start with the simple approach and then optimize once
> you (perhaps, perhaps not) bump into problems. Most likely it will work
> like a charm :)
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 30. apr. 2013 kl. 16:21 skrev André Maldonado <an...@gmail.com>:
>
> > Thank's Jan for your reply.
> >
> > My application has thousands of users and I don't know yet how many of
> them
> > will use this feature. They can exclude one document from their search
> > results or can exclude 200.000 documents. It's much more natural that
> they
> > exclude something like 50~300 documents. More than this will be strange.
> >
> > However, I don't know how cache will work because we have a large number
> of
> > users who can use this feature. Even that query for user 1 be cached, it
> > won't work for other users.
> >
> > Do you see another solution for this case?
> >
> > Thank's
> >
> >
> >
> > *
> >
> ----------------------------------------------------------------------------------------------
> > *
> > *"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*
> >
> > *andre.maldonado*@gmail.com <an...@gmail.com>
> > (11) 9112-4227
> >
> > <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> > <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> > <http://www.facebook.com/profile.php?id=100000659376883>
> >  <http://twitter.com/andremaldonado> <
> http://www.delicious.com/andre.maldonado>
> >  <https://profiles.google.com/105605760943701739931>
> > <http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3>
> >  <http://www.youtube.com/andremaldonado>
> >
> >
> >
> > On Fri, Apr 26, 2013 at 6:18 PM, Jan Høydahl <ja...@cominvent.com>
> wrote:
> >
> >> I would start with the way you propose, a negative filter
> >>
> >> q=foo bar&fq=-id:(123 729 640 112...)
> >>
> >> This will effectively hide those doc ids, and a benefit is that it is
> >> cached so if the list of ids is long, you'll only take the performance
> hit
> >> the first time. I don't know your application, but if it is highly
> likely
> >> that a single user will add excludes for several thousand ids then you
> >> should perhaps consider other options and benchmark up front.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
> >>
> >> 26. apr. 2013 kl. 21:50 skrev André Maldonado <
> andre.maldonado@gmail.com>:
> >>
> >>> Hi all.
> >>>
> >>> We have an index with 300.000 documents and a lot, a lot of fields.
> >>>
> >>> We're planning a module where users will choose some documents to
> exclude
> >>> from their search results. So, these documents will be excluded for
> UserA
> >>> and visible for UserB.
> >>>
> >>> So, we have some options to do this. The simplest way is to do a "Not
> In"
> >>> query in document id. But we don't know the performance impact this
> will
> >>> have. Is this an option?
> >>>
> >>> There is another reasonable way to accomplish this?
> >>>
> >>> Thank's
> >>>
> >>> *
> >>>
> >>
> ----------------------------------------------------------------------------------------------
> >>> *
> >>> *"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*
> >>>
> >>> *andre.maldonado*@gmail.com <an...@gmail.com>
> >>> (11) 9112-4227
> >>>
> >>> <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> >>> <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> >>> <http://www.facebook.com/profile.php?id=100000659376883>
> >>> <http://twitter.com/andremaldonado> <
> >> http://www.delicious.com/andre.maldonado>
> >>> <https://profiles.google.com/105605760943701739931>
> >>> <http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3>
> >>> <http://www.youtube.com/andremaldonado>
> >>
> >>
>
>

Re: Not In query

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

How, practically would a user end up with 200.000 documents excluded? Is there some way in your application to exclude "categories" of documents with one click? If so, I would index those category IDs on all docs in that category, and then do &fq=-cat:123 instead of adding all the individual docids. Anyway, I'd start with the simple approach and then optimize once you (perhaps, perhaps not) bump into problems. Most likely it will work like a charm :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

30. apr. 2013 kl. 16:21 skrev André Maldonado <an...@gmail.com>:

> Thank's Jan for your reply.
> 
> My application has thousands of users and I don't know yet how many of them
> will use this feature. They can exclude one document from their search
> results or can exclude 200.000 documents. It's much more natural that they
> exclude something like 50~300 documents. More than this will be strange.
> 
> However, I don't know how cache will work because we have a large number of
> users who can use this feature. Even that query for user 1 be cached, it
> won't work for other users.
> 
> Do you see another solution for this case?
> 
> Thank's
> 
> 
> 
> *
> ----------------------------------------------------------------------------------------------
> *
> *"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*
> 
> *andre.maldonado*@gmail.com <an...@gmail.com>
> (11) 9112-4227
> 
> <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> <http://www.facebook.com/profile.php?id=100000659376883>
>  <http://twitter.com/andremaldonado> <http://www.delicious.com/andre.maldonado>
>  <https://profiles.google.com/105605760943701739931>
> <http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3>
>  <http://www.youtube.com/andremaldonado>
> 
> 
> 
> On Fri, Apr 26, 2013 at 6:18 PM, Jan Høydahl <ja...@cominvent.com> wrote:
> 
>> I would start with the way you propose, a negative filter
>> 
>> q=foo bar&fq=-id:(123 729 640 112...)
>> 
>> This will effectively hide those doc ids, and a benefit is that it is
>> cached so if the list of ids is long, you'll only take the performance hit
>> the first time. I don't know your application, but if it is highly likely
>> that a single user will add excludes for several thousand ids then you
>> should perhaps consider other options and benchmark up front.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> 26. apr. 2013 kl. 21:50 skrev André Maldonado <an...@gmail.com>:
>> 
>>> Hi all.
>>> 
>>> We have an index with 300.000 documents and a lot, a lot of fields.
>>> 
>>> We're planning a module where users will choose some documents to exclude
>>> from their search results. So, these documents will be excluded for UserA
>>> and visible for UserB.
>>> 
>>> So, we have some options to do this. The simplest way is to do a "Not In"
>>> query in document id. But we don't know the performance impact this will
>>> have. Is this an option?
>>> 
>>> There is another reasonable way to accomplish this?
>>> 
>>> Thank's
>>> 
>>> *
>>> 
>> ----------------------------------------------------------------------------------------------
>>> *
>>> *"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*
>>> 
>>> *andre.maldonado*@gmail.com <an...@gmail.com>
>>> (11) 9112-4227
>>> 
>>> <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
>>> <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
>>> <http://www.facebook.com/profile.php?id=100000659376883>
>>> <http://twitter.com/andremaldonado> <
>> http://www.delicious.com/andre.maldonado>
>>> <https://profiles.google.com/105605760943701739931>
>>> <http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3>
>>> <http://www.youtube.com/andremaldonado>
>> 
>> 


Re: Not In query

Posted by André Maldonado <an...@gmail.com>.
Thank's Jan for your reply.

My application has thousands of users and I don't know yet how many of them
will use this feature. They can exclude one document from their search
results or can exclude 200.000 documents. It's much more natural that they
exclude something like 50~300 documents. More than this will be strange.

However, I don't know how cache will work because we have a large number of
users who can use this feature. Even that query for user 1 be cached, it
won't work for other users.

Do you see another solution for this case?

Thank's



*
----------------------------------------------------------------------------------------------
*
*"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*

 *andre.maldonado*@gmail.com <an...@gmail.com>
 (11) 9112-4227

<http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
<http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
<http://www.facebook.com/profile.php?id=100000659376883>
  <http://twitter.com/andremaldonado> <http://www.delicious.com/andre.maldonado>
  <https://profiles.google.com/105605760943701739931>
<http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3>
  <http://www.youtube.com/andremaldonado>



On Fri, Apr 26, 2013 at 6:18 PM, Jan Høydahl <ja...@cominvent.com> wrote:

> I would start with the way you propose, a negative filter
>
> q=foo bar&fq=-id:(123 729 640 112...)
>
> This will effectively hide those doc ids, and a benefit is that it is
> cached so if the list of ids is long, you'll only take the performance hit
> the first time. I don't know your application, but if it is highly likely
> that a single user will add excludes for several thousand ids then you
> should perhaps consider other options and benchmark up front.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 26. apr. 2013 kl. 21:50 skrev André Maldonado <an...@gmail.com>:
>
> > Hi all.
> >
> > We have an index with 300.000 documents and a lot, a lot of fields.
> >
> > We're planning a module where users will choose some documents to exclude
> > from their search results. So, these documents will be excluded for UserA
> > and visible for UserB.
> >
> > So, we have some options to do this. The simplest way is to do a "Not In"
> > query in document id. But we don't know the performance impact this will
> > have. Is this an option?
> >
> > There is another reasonable way to accomplish this?
> >
> > Thank's
> >
> > *
> >
> ----------------------------------------------------------------------------------------------
> > *
> > *"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*
> >
> > *andre.maldonado*@gmail.com <an...@gmail.com>
> > (11) 9112-4227
> >
> > <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> > <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> > <http://www.facebook.com/profile.php?id=100000659376883>
> >  <http://twitter.com/andremaldonado> <
> http://www.delicious.com/andre.maldonado>
> >  <https://profiles.google.com/105605760943701739931>
> > <http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3>
> >  <http://www.youtube.com/andremaldonado>
>
>

Re: Not In query

Posted by Jan Høydahl <ja...@cominvent.com>.
I would start with the way you propose, a negative filter

q=foo bar&fq=-id:(123 729 640 112...)

This will effectively hide those doc ids, and a benefit is that it is cached so if the list of ids is long, you'll only take the performance hit the first time. I don't know your application, but if it is highly likely that a single user will add excludes for several thousand ids then you should perhaps consider other options and benchmark up front.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

26. apr. 2013 kl. 21:50 skrev André Maldonado <an...@gmail.com>:

> Hi all.
> 
> We have an index with 300.000 documents and a lot, a lot of fields.
> 
> We're planning a module where users will choose some documents to exclude
> from their search results. So, these documents will be excluded for UserA
> and visible for UserB.
> 
> So, we have some options to do this. The simplest way is to do a "Not In"
> query in document id. But we don't know the performance impact this will
> have. Is this an option?
> 
> There is another reasonable way to accomplish this?
> 
> Thank's
> 
> *
> ----------------------------------------------------------------------------------------------
> *
> *"E conhecereis a verdade, e a verdade vos libertará." (João 8:32)*
> 
> *andre.maldonado*@gmail.com <an...@gmail.com>
> (11) 9112-4227
> 
> <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> <http://www.orkut.com.br/Main#Profile?uid=2397703412199036664>
> <http://www.facebook.com/profile.php?id=100000659376883>
>  <http://twitter.com/andremaldonado> <http://www.delicious.com/andre.maldonado>
>  <https://profiles.google.com/105605760943701739931>
> <http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3>
>  <http://www.youtube.com/andremaldonado>