You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by lee carroll <le...@googlemail.com> on 2011/07/14 13:36:10 UTC

Stored Field

Hi
Do Stored field values get added to the index for each document field
combination literally or is a pointer used ?
I've been reading http://lucene.apache.org/java/2_4_0/fileformats.pdf
and I think thats the case but not 100% so thought I'd ask.

In logical terms for stored fields do we get this sort of storage:

doc0 field0 > "xxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxx"
doc0field1 > "yyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyy"
doc1 field0 > "xxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxx"
doc1field1 > "yyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyy"

or this:

doc0 field0 > {1}
doc0field1 > {2}
doc1 field0 > {1}
doc1field1 > {2}

val1 >"xxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxx"
val2 >"yyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyy"

I'm trying to understand possible impact of storing fields which have
a small set of repeating values, hoping it would not have an impact on
file size. But I'm now think it will?

thanks in advance

Re: Stored Field

Posted by Erick Erickson <er...@gmail.com>.
Well, it all depends upon what you mean by "size" <G>...

This page http://lucene.apache.org/java/3_0_2/fileformats.html#file-names
explains what goes where in the files created by Lucene. The
point is that the raw text (i.e. *stored* data) is put in separate fles
from the indexed (i.e. searched) data. So search times won't be
affected. I'm pretty sure (but not 100%) that the verbatim text is
just stored, with no reference to other possible usages in other
docs.

But this doesn't really affect searching. The primary impact will be
on replicating the index since you're copying more bytes.

Best
Erick

On Thu, Jul 14, 2011 at 7:36 AM, lee carroll
<le...@googlemail.com> wrote:
> Hi
> Do Stored field values get added to the index for each document field
> combination literally or is a pointer used ?
> I've been reading http://lucene.apache.org/java/2_4_0/fileformats.pdf
> and I think thats the case but not 100% so thought I'd ask.
>
> In logical terms for stored fields do we get this sort of storage:
>
> doc0 field0 > "xxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxx"
> doc0field1 > "yyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyy"
> doc1 field0 > "xxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxx"
> doc1field1 > "yyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyy"
>
> or this:
>
> doc0 field0 > {1}
> doc0field1 > {2}
> doc1 field0 > {1}
> doc1field1 > {2}
>
> val1 >"xxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxx"
> val2 >"yyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yyy"
>
> I'm trying to understand possible impact of storing fields which have
> a small set of repeating values, hoping it would not have an impact on
> file size. But I'm now think it will?
>
> thanks in advance
>