You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tri Cao <tm...@me.com> on 2014/02/13 09:38:29 UTC

Re: filtering/faceting by a big list of IDs

Hi Joel,

Thanks a lot for the suggestion.

After thinking more about this, I think I could skip the faceting count for now,
and so just provide a filtering option without display how many items that would
be there after filtering. After all, even Google Shopping product search doesn't
display the facet counts :) Given that, I think the easiest way is to add a new
PostFilter to the query.

Thanks again,
Tri 

On Feb 12, 2014, at 12:03 PM, Joel Bernstein <jo...@gmail.com> wrote:

Tri,

You will most likely need to implement a custom QParserPlugin to
efficiently handle what you described. Inside of this QParserPlugin you
could create the logic that would bring in your outside list of ID's and
build a DocSet that could be applied to the fq and the facet.query. I
haven't attempted to use a QParserPlugin with a facet.query, but in theory
it would work.

With the filter query you also have the option of implementing your Query
as a PostFilter. PostFilter logic is applied at collect time so the logic
needs to only be applied to the documents that match the query. In many
cause this can be faster, especially when result sets are relatively small
but the index is large.


Joel Bernstein
Search Engineer at Heliosearch


On Wed, Feb 12, 2014 at 2:12 PM, Tri Cao <tm...@me.com> wrote:

Hi all,
I am running a Solr application and I would need to implement a feature
that requires faceting and filtering on a large list of IDs. The IDs are
stored outside of Solr and is specific to the current logged on user. An
example of this is the articles/tweets the user has read in the last few
weeks. Note that the IDs here are the real document IDs and not Lucene
internal docids.
So the question is what would be the best way to implement this in Solr?
The list could be as large as a ten of thousands of IDs. The obvious way of
rewriting Solr query to add the ID list as "facet.query" and "fq" doesn't
seem to be the best way because: a) the query would be very long, and b) it
would surely exceed that the default limit of 1024 Boolean clauses and I
am sure the limit is there for a reason.
I had a similar problem before but back then I was using Lucene directly
and the way I solved it is to use a MultiTermQuery to retrieve the internal
docids from the ID list and then apply the resulting DocSet to counting and
filtering. It was working reasonably for list of size ~10K, and with proper
caching, it was working ok. My current application is very invested in Solr
that going back to Lucene is not an option anymore.
All advice/suggestion are welcomed.
Thanks,
Tri

Re: filtering/faceting by a big list of IDs

Posted by Roman Chyla <ro...@gmail.com>.
Hi Tri,
Look at this:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/%3CCAEN8dyX_Am_v4f=5614eu35FNHb5h7dzKMKzdFWvRRM1xpqTLw@mail.gmail.com%3E
Roman
On 13 Feb 2014 03:39, "Tri Cao" <tm...@me.com> wrote:

> Hi Joel,
>
> Thanks a lot for the suggestion.
>
> After thinking more about this, I think I could skip the faceting count
> for now,
> and so just provide a filtering option without display how many items that
> would
> be there after filtering. After all, even Google Shopping product search
> doesn't
> display the facet counts :) Given that, I think the easiest way is to add
> a new
> PostFilter to the query.
>
> Thanks again,
> Tri
>
> On Feb 12, 2014, at 12:03 PM, Joel Bernstein <jo...@gmail.com> wrote:
>
> Tri,
>
> You will most likely need to implement a custom QParserPlugin to
> efficiently handle what you described. Inside of this QParserPlugin you
> could create the logic that would bring in your outside list of ID's and
> build a DocSet that could be applied to the fq and the facet.query. I
> haven't attempted to use a QParserPlugin with a facet.query, but in theory
> it would work.
>
> With the filter query you also have the option of implementing your Query
> as a PostFilter. PostFilter logic is applied at collect time so the logic
> needs to only be applied to the documents that match the query. In many
> cause this can be faster, especially when result sets are relatively small
> but the index is large.
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Wed, Feb 12, 2014 at 2:12 PM, Tri Cao <tm...@me.com> wrote:
>
> Hi all,
>
> I am running a Solr application and I would need to implement a feature
>
> that requires faceting and filtering on a large list of IDs. The IDs are
>
> stored outside of Solr and is specific to the current logged on user. An
>
> example of this is the articles/tweets the user has read in the last few
>
> weeks. Note that the IDs here are the real document IDs and not Lucene
>
> internal docids.
>
> So the question is what would be the best way to implement this in Solr?
>
> The list could be as large as a ten of thousands of IDs. The obvious way of
>
> rewriting Solr query to add the ID list as "facet.query" and "fq" doesn't
>
> seem to be the best way because: a) the query would be very long, and b) it
>
> would surely exceed that the default limit of 1024 Boolean clauses and I
>
> am sure the limit is there for a reason.
>
> I had a similar problem before but back then I was using Lucene directly
>
> and the way I solved it is to use a MultiTermQuery to retrieve the internal
>
> docids from the ID list and then apply the resulting DocSet to counting and
>
> filtering. It was working reasonably for list of size ~10K, and with proper
>
> caching, it was working ok. My current application is very invested in Solr
>
> that going back to Lucene is not an option anymore.
>
> All advice/suggestion are welcomed.
>
> Thanks,
>
> Tri
>
>