You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2012/03/01 14:41:39 UTC

Re: performance between ExternalFileField and Join

Hmmm. ExternalFileFields can only be float values, so I'm not
sure "the necessary data" is straight-forward. Additionally, they
are used in function queries. Does this still work?

I really don't know the performance characteristics if, say, you have
users with access to all documents for SOLR-2272, but I'd make
sure and test first, that's kinda scary.

You might also be able to work something out with
"no cache" filter queries, see:
http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/
and
http://www.lucidimagination.com/blog/2012/02/22/custom-security-filtering-in-solr/

Best
Erick


On Mon, Feb 27, 2012 at 3:13 PM, Kevin Osborn
<ke...@cbsinteractive.com> wrote:
> I am looking at two different options to filter results in Solr, basically
> a per-user access control list. Our index is about 2.5 million documents
>
> The first option is to use ExternalFieldField. It seems pretty
> straightforward. Just put the necessary data in the files and query against
> that data.
>
> I was also intrigued by the Join feature in 4.0 trunk (SOLR-2272). In this
> case, I would keep my access data in a separate core, and do cross-core
> join queries. The two cores would have about the same number of documents
> (2.5 million), but one core would have the actual data and the other core
> would have the access information. So, the number of unique terms on the
> key would be quite high. Would this be too slow?
>
> If someone has any knowledge about the performance issues on these two
> methods, please give an advice. Thanks.
>
> --
> KEVIN OSBORN
> LEAD SOFTWARE ENGINEER
> T 949.399.8714      C 949.310.4677
> 5 Park Plaza, Suite 600, Irvine, CA 92614

Re: performance between ExternalFileField and Join

Posted by Chris Hostetter <ho...@fucit.org>.
: unique terms) but I agree with Erik on the ExternalFileField as you can use
: it just inside a function query, for example, for boosting.

with {!frange} it would be trivial to filter based on values in an 
ExternalFileField ... whether that would be *faster* then a custom
plugin that worked similar to ExternalFileField but only provided boolean 
logic for set membership would require some testing.

: > > I was also intrigued by the Join feature in 4.0 trunk (SOLR-2272). In
: > this
: > > case, I would keep my access data in a separate core, and do cross-core
: > > join queries. The two cores would have about the same number of documents

watch out with this approach, cross-core joins have remained undocumented 
because they are fairly broken...

https://issues.apache.org/jira/browse/SOLR-2824



-Hoss

Re: performance between ExternalFileField and Join

Posted by Tommaso Teofili <to...@gmail.com>.
Also regarding the Join functionality I remember Yonik pointed out it's O(#
unique terms) but I agree with Erik on the ExternalFileField as you can use
it just inside a function query, for example, for boosting.
Tommaso

2012/3/1 Erick Erickson <er...@gmail.com>

> Hmmm. ExternalFileFields can only be float values, so I'm not
> sure "the necessary data" is straight-forward. Additionally, they
> are used in function queries. Does this still work?
>
> I really don't know the performance characteristics if, say, you have
> users with access to all documents for SOLR-2272, but I'd make
> sure and test first, that's kinda scary.
>
> You might also be able to work something out with
> "no cache" filter queries, see:
>
> http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/
> and
>
> http://www.lucidimagination.com/blog/2012/02/22/custom-security-filtering-in-solr/
>
> Best
> Erick
>
>
> On Mon, Feb 27, 2012 at 3:13 PM, Kevin Osborn
> <ke...@cbsinteractive.com> wrote:
> > I am looking at two different options to filter results in Solr,
> basically
> > a per-user access control list. Our index is about 2.5 million documents
> >
> > The first option is to use ExternalFieldField. It seems pretty
> > straightforward. Just put the necessary data in the files and query
> against
> > that data.
> >
> > I was also intrigued by the Join feature in 4.0 trunk (SOLR-2272). In
> this
> > case, I would keep my access data in a separate core, and do cross-core
> > join queries. The two cores would have about the same number of documents
> > (2.5 million), but one core would have the actual data and the other core
> > would have the access information. So, the number of unique terms on the
> > key would be quite high. Would this be too slow?
> >
> > If someone has any knowledge about the performance issues on these two
> > methods, please give an advice. Thanks.
> >
> > --
> > KEVIN OSBORN
> > LEAD SOFTWARE ENGINEER
> > T 949.399.8714      C 949.310.4677
> > 5 Park Plaza, Suite 600, Irvine, CA 92614
>