You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nigel <ni...@gmail.com> on 2010/03/08 02:49:17 UTC

Fields with cardinality = 1?

Does Lucene have any special optimization for a field that has the same
value for all documents in the index?  For example, rather than storing a
list of all doc ids for the single term, it could in theory note this
special case and not save any ids for that field.

(You might well ask what the point of doing this is, anyway.  In my case, I
have a collection of index shards, and all documents within one shard would
have the same value for one field.  I could simply omit that field, but then
queries referring to that field wouldn't work, unless the queries were
automatically rewritten somehow to remove references to the field that's
assumed to be present.  If this case is optimized somehow, then I can
include the field without taking up any space in the index.  If it's not
optimized, then it might be worth omitting the field and rewriting the
queries instead.)

Thanks,
Chris

Re: Fields with cardinality = 1?

Posted by Nigel <ni...@gmail.com>.
Thanks, Mike -- that makes sense. Yes, the fields would be known in advance
so the codec would know to ignore them at index time.

Thanks,
Chris

Re: Fields with cardinality = 1?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Lucene doesn't optimize for this today.

But with flex (still on branch but hopefully landing on trunk soon)
you could impl a codec that did optimize such fields.  You would know,
in advance, which field(s) do this, right?

And at searching time the codec would pretend all docs appeared in the
posting, for that one value...

Mike

On Sun, Mar 7, 2010 at 8:49 PM, Nigel <ni...@gmail.com> wrote:
> Does Lucene have any special optimization for a field that has the same
> value for all documents in the index?  For example, rather than storing a
> list of all doc ids for the single term, it could in theory note this
> special case and not save any ids for that field.
>
> (You might well ask what the point of doing this is, anyway.  In my case, I
> have a collection of index shards, and all documents within one shard would
> have the same value for one field.  I could simply omit that field, but then
> queries referring to that field wouldn't work, unless the queries were
> automatically rewritten somehow to remove references to the field that's
> assumed to be present.  If this case is optimized somehow, then I can
> include the field without taking up any space in the index.  If it's not
> optimized, then it might be worth omitting the field and rewriting the
> queries instead.)
>
> Thanks,
> Chris
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org