You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Koji Sekiguchi <ko...@r.email.ne.jp> on 2010/05/01 05:18:11 UTC

FieldCache memory estimation - term values are interned?

Hello,

Are Strings that are got via FieldCache.DEFAULT.getStrings( reader,
field ) interned?

Since I have a requirement for having FieldCaches of some
fields in 250M docs index, I'd like to estimate memory
consumed by FieldCache.

By looking at FieldCacheImpl source code, it seems that
field names are interned, but values are not?

Than you,

Koji

-- 
http://www.rondhuit.com/en/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FieldCache memory estimation - term values are interned?

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Yonik Seeley wrote:
> On Sat, May 1, 2010 at 8:23 PM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>   
>> Yonik Seeley wrote:
>>     
>>> Values are not interned, but in a single field cache entry (String[])
>>> the same String object is used for all docs with that same value.
>>>       
>> Yeah, you are right. Because I could see the arbitrary two Strings that have
>> same value were identical (== is true) in String[] gotten by getStrings()
>> but
>> I coudn't see explicit intern() for them, I asked. Can you elaborate who
>> done it?
>>     
>
> FieldCacheImpl:675 on trunk.
>
> For every term, a string value in constructed... then TermDocs (now
> DocsEnum in trunk) steps over every doc with that value and sets the
> same value.
>
>   
Uh, I looked that code but I didn't wise up.
Thank you very much for taking your time!

Koji

-- 
http://www.rondhuit.com/en/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FieldCache memory estimation - term values are interned?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sat, May 1, 2010 at 8:23 PM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> Yonik Seeley wrote:
>> Values are not interned, but in a single field cache entry (String[])
>> the same String object is used for all docs with that same value.
>
> Yeah, you are right. Because I could see the arbitrary two Strings that have
> same value were identical (== is true) in String[] gotten by getStrings()
> but
> I coudn't see explicit intern() for them, I asked. Can you elaborate who
> done it?

FieldCacheImpl:675 on trunk.

For every term, a string value in constructed... then TermDocs (now
DocsEnum in trunk) steps over every doc with that value and sets the
same value.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague




>> But... I think StringIndex is more commonly used in both Lucene and
>> Solr than String[] (sorting, faceting, etc) so double check that it's
>> not StringIndex you should be looking at.
>>
>>
>
> Yes, I think StringIndex is better than String[] for my requirement in terms
> of memory consumption as well (an extreme case, male/female field etc.).
>
> Thank you,
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FieldCache memory estimation - term values are interned?

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Yonik Seeley wrote:
> 2010/4/30 Koji Sekiguchi <ko...@r.email.ne.jp>:
>   
>> Are Strings that are got via FieldCache.DEFAULT.getStrings( reader,
>> field ) interned?
>>
>> Since I have a requirement for having FieldCaches of some
>> fields in 250M docs index, I'd like to estimate memory
>> consumed by FieldCache.
>>
>> By looking at FieldCacheImpl source code, it seems that
>> field names are interned, but values are not?
>>     
>
> Values are not interned, but in a single field cache entry (String[])
> the same String object is used for all docs with that same value.
>
>   
Yeah, you are right. Because I could see the arbitrary two Strings that have
same value were identical (== is true) in String[] gotten by 
getStrings() but
I coudn't see explicit intern() for them, I asked. Can you elaborate who 
done it?

> But... I think StringIndex is more commonly used in both Lucene and
> Solr than String[] (sorting, faceting, etc) so double check that it's
> not StringIndex you should be looking at.
>
>   
Yes, I think StringIndex is better than String[] for my requirement in terms
of memory consumption as well (an extreme case, male/female field etc.).

Thank you,

Koji

-- 
http://www.rondhuit.com/en/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FieldCache memory estimation - term values are interned?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
2010/4/30 Koji Sekiguchi <ko...@r.email.ne.jp>:
> Are Strings that are got via FieldCache.DEFAULT.getStrings( reader,
> field ) interned?
>
> Since I have a requirement for having FieldCaches of some
> fields in 250M docs index, I'd like to estimate memory
> consumed by FieldCache.
>
> By looking at FieldCacheImpl source code, it seems that
> field names are interned, but values are not?

Values are not interned, but in a single field cache entry (String[])
the same String object is used for all docs with that same value.

But... I think StringIndex is more commonly used in both Lucene and
Solr than String[] (sorting, faceting, etc) so double check that it's
not StringIndex you should be looking at.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org