You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Bernd Fehling <be...@uni-bielefeld.de> on 2011/06/21 11:32:27 UTC

questions about fieldCache

I'm trying to understand the logic of/behind fieldCache.

Who has written this peace of code or has good knowledge about it?

Why is it under the hood of jetty?

I see FieldCache$StringIndex with
- f_dccollection
- f_dcyear
- f_dctype
but also
- dctitle --> f_dctitle --> f_dccreator
- title --> f_dcyear

There are some entries without further reference like the first examples
and some that have references to further HashMaps like a chain.
Why is it this way, what is the purpose?

What is fieldCache doing if a server is replicated, will all old content
be cleaned up because of a new index with new content?

Regards
Bernd


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: questions about fieldCache

Posted by Bernd Fehling <be...@uni-bielefeld.de>.

OK, after some sorting fieldCache has some entries and also all other caches.
Next I called optimize which started a new searcher.
All caches a cleared, _except_ fieldCache.
I then started a GC with jconsole and the logfile reported "Full GC".
The heap reduced its size but the fieldCache is _still_ holding its entries.

 From this point of view the fieldCache will never cleaned up by GC,
holding more and more data until OOM, that is very bad.

Why is fieldCache not under the hood of Lucene/Solr so that it will be cleaned
up with a new searched or Full GC?

Regards,
Bernd


Am 21.06.2011 19:37, schrieb Erick Erickson:
>> So action that starts a new searcher and closes the old one (like
>> replication)
>> should release cache from fieldCache through garbage collection?
>
> Absolutely. It won't be immediate, because the JVM has some
> heuristics it uses to initiate garbage collection. You could try
> attaching to the Solr instance with jConsole and use that to trigger
> garbage collections to see what that could tell you...
>
> Best
> Erick
>
> On Tue, Jun 21, 2011 at 8:39 AM, Bernd Fehling
> <be...@uni-bielefeld.de>  wrote:
>>
>> Currently I'm using version 3.2.
>> I used already 4.x some month ago but there was to much change to that time
>> so I decided to go with 3.0.x and updated to 3.1 and now to 3.2.
>>
>> I'm still dealing with my fieldCache OOM issue and want to understand
>> why things are as they are.
>> I have already removed/solved one "insane message" from fieldCache and
>> three ReadOnlyDirectoryReader Entries from fieldCache.
>> Only sorting produces now an entry.
>>
>> So action that starts a new searcher and closes the old one (like
>> replication)
>> should release cache from fieldCache through garbage collection?
>>
>> Regards
>> Bernd
>>
>> Am 21.06.2011 13:49, schrieb Erick Erickson:
>>>
>>> Hmmm, I'm not going to even try to talk about the code itself, but I will
>>> add
>>> a couple of clarifications:
>>>
>>> Jetty has nothing to do with it. It's in Lucene, and it's used for sorting
>>> and
>>> sometimes faceting. The cache is associated with a reader on a machine
>>> used to search. When replication happens, that searcher should be closed
>>> and any data associated with the cache is returned to the system.
>>>
>>> Someone else will have to chime in on the underlying details<G>..
>>>
>>> By the way, what version of Solr are you using? Because the memory
>>> requirements for string sorting and faceting have been drastically
>>> reduced on the trunk version. In a really rough test I've seen 75%
>>> reductions in memory requirements (note I was doing the worst things
>>> I could think of, so I don't necessarily expect your results to be as
>>> drastic).
>>>
>>> Best
>>> Erick
>>>
>>> On Tue, Jun 21, 2011 at 5:32 AM, Bernd Fehling
>>> <be...@uni-bielefeld.de>    wrote:
>>>>
>>>> I'm trying to understand the logic of/behind fieldCache.
>>>>
>>>> Who has written this peace of code or has good knowledge about it?
>>>>
>>>> Why is it under the hood of jetty?
>>>>
>>>> I see FieldCache$StringIndex with
>>>> - f_dccollection
>>>> - f_dcyear
>>>> - f_dctype
>>>> but also
>>>> - dctitle -->    f_dctitle -->    f_dccreator
>>>> - title -->    f_dcyear
>>>>
>>>> There are some entries without further reference like the first examples
>>>> and some that have references to further HashMaps like a chain.
>>>> Why is it this way, what is the purpose?
>>>>
>>>> What is fieldCache doing if a server is replicated, will all old content
>>>> be cleaned up because of a new index with new content?
>>>>
>>>> Regards
>>>> Bernd
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

-- 
*************************************************************
Bernd Fehling                Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)                        Universitätsstr. 25
Tel. +49 521 106-4060                   Fax. +49 521 106-4052
bernd.fehling@uni-bielefeld.de                33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: questions about fieldCache

Posted by Erick Erickson <er...@gmail.com>.

> So action that starts a new searcher and closes the old one (like
> replication)
> should release cache from fieldCache through garbage collection?

Absolutely. It won't be immediate, because the JVM has some
heuristics it uses to initiate garbage collection. You could try
attaching to the Solr instance with jConsole and use that to trigger
garbage collections to see what that could tell you...

Best
Erick

On Tue, Jun 21, 2011 at 8:39 AM, Bernd Fehling
<be...@uni-bielefeld.de> wrote:
>
> Currently I'm using version 3.2.
> I used already 4.x some month ago but there was to much change to that time
> so I decided to go with 3.0.x and updated to 3.1 and now to 3.2.
>
> I'm still dealing with my fieldCache OOM issue and want to understand
> why things are as they are.
> I have already removed/solved one "insane message" from fieldCache and
> three ReadOnlyDirectoryReader Entries from fieldCache.
> Only sorting produces now an entry.
>
> So action that starts a new searcher and closes the old one (like
> replication)
> should release cache from fieldCache through garbage collection?
>
> Regards
> Bernd
>
> Am 21.06.2011 13:49, schrieb Erick Erickson:
>>
>> Hmmm, I'm not going to even try to talk about the code itself, but I will
>> add
>> a couple of clarifications:
>>
>> Jetty has nothing to do with it. It's in Lucene, and it's used for sorting
>> and
>> sometimes faceting. The cache is associated with a reader on a machine
>> used to search. When replication happens, that searcher should be closed
>> and any data associated with the cache is returned to the system.
>>
>> Someone else will have to chime in on the underlying details<G>..
>>
>> By the way, what version of Solr are you using? Because the memory
>> requirements for string sorting and faceting have been drastically
>> reduced on the trunk version. In a really rough test I've seen 75%
>> reductions in memory requirements (note I was doing the worst things
>> I could think of, so I don't necessarily expect your results to be as
>> drastic).
>>
>> Best
>> Erick
>>
>> On Tue, Jun 21, 2011 at 5:32 AM, Bernd Fehling
>> <be...@uni-bielefeld.de>  wrote:
>>>
>>> I'm trying to understand the logic of/behind fieldCache.
>>>
>>> Who has written this peace of code or has good knowledge about it?
>>>
>>> Why is it under the hood of jetty?
>>>
>>> I see FieldCache$StringIndex with
>>> - f_dccollection
>>> - f_dcyear
>>> - f_dctype
>>> but also
>>> - dctitle -->  f_dctitle -->  f_dccreator
>>> - title -->  f_dcyear
>>>
>>> There are some entries without further reference like the first examples
>>> and some that have references to further HashMaps like a chain.
>>> Why is it this way, what is the purpose?
>>>
>>> What is fieldCache doing if a server is replicated, will all old content
>>> be cleaned up because of a new index with new content?
>>>
>>> Regards
>>> Bernd
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: questions about fieldCache

Posted by Bernd Fehling <be...@uni-bielefeld.de>.

Currently I'm using version 3.2.
I used already 4.x some month ago but there was to much change to that time
so I decided to go with 3.0.x and updated to 3.1 and now to 3.2.

I'm still dealing with my fieldCache OOM issue and want to understand
why things are as they are.
I have already removed/solved one "insane message" from fieldCache and
three ReadOnlyDirectoryReader Entries from fieldCache.
Only sorting produces now an entry.

So action that starts a new searcher and closes the old one (like replication)
should release cache from fieldCache through garbage collection?

Regards
Bernd

Am 21.06.2011 13:49, schrieb Erick Erickson:
> Hmmm, I'm not going to even try to talk about the code itself, but I will add
> a couple of clarifications:
>
> Jetty has nothing to do with it. It's in Lucene, and it's used for sorting and
> sometimes faceting. The cache is associated with a reader on a machine
> used to search. When replication happens, that searcher should be closed
> and any data associated with the cache is returned to the system.
>
> Someone else will have to chime in on the underlying details<G>..
>
> By the way, what version of Solr are you using? Because the memory
> requirements for string sorting and faceting have been drastically
> reduced on the trunk version. In a really rough test I've seen 75%
> reductions in memory requirements (note I was doing the worst things
> I could think of, so I don't necessarily expect your results to be as
> drastic).
>
> Best
> Erick
>
> On Tue, Jun 21, 2011 at 5:32 AM, Bernd Fehling
> <be...@uni-bielefeld.de>  wrote:
>> I'm trying to understand the logic of/behind fieldCache.
>>
>> Who has written this peace of code or has good knowledge about it?
>>
>> Why is it under the hood of jetty?
>>
>> I see FieldCache$StringIndex with
>> - f_dccollection
>> - f_dcyear
>> - f_dctype
>> but also
>> - dctitle -->  f_dctitle -->  f_dccreator
>> - title -->  f_dcyear
>>
>> There are some entries without further reference like the first examples
>> and some that have references to further HashMaps like a chain.
>> Why is it this way, what is the purpose?
>>
>> What is fieldCache doing if a server is replicated, will all old content
>> be cleaned up because of a new index with new content?
>>
>> Regards
>> Bernd
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: questions about fieldCache

Posted by Erick Erickson <er...@gmail.com>.

Hmmm, I'm not going to even try to talk about the code itself, but I will add
a couple of clarifications:

Jetty has nothing to do with it. It's in Lucene, and it's used for sorting and
sometimes faceting. The cache is associated with a reader on a machine
used to search. When replication happens, that searcher should be closed
and any data associated with the cache is returned to the system.

Someone else will have to chime in on the underlying details <G>..

By the way, what version of Solr are you using? Because the memory
requirements for string sorting and faceting have been drastically
reduced on the trunk version. In a really rough test I've seen 75%
reductions in memory requirements (note I was doing the worst things
I could think of, so I don't necessarily expect your results to be as
drastic).

Best
Erick

On Tue, Jun 21, 2011 at 5:32 AM, Bernd Fehling
<be...@uni-bielefeld.de> wrote:
> I'm trying to understand the logic of/behind fieldCache.
>
> Who has written this peace of code or has good knowledge about it?
>
> Why is it under the hood of jetty?
>
> I see FieldCache$StringIndex with
> - f_dccollection
> - f_dcyear
> - f_dctype
> but also
> - dctitle --> f_dctitle --> f_dccreator
> - title --> f_dcyear
>
> There are some entries without further reference like the first examples
> and some that have references to further HashMaps like a chain.
> Why is it this way, what is the purpose?
>
> What is fieldCache doing if a server is replicated, will all old content
> be cleaned up because of a new index with new content?
>
> Regards
> Bernd
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org