You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Isaac Hebsh <is...@gmail.com> on 2013/08/29 20:47:35 UTC

documentCache and lazyFieldLoading

Hi,
We've investigated a memory dump, which was taken after some frequent OOM
incidents.

The main issue we found was a lot of millions of LazyField instances,
taking ~2GB of memory, even though queries request about 10 small fields
only.

We've found that LazyDocument creates a LazyField object for every item in
a multivalued field, even if do not want this field.

For example, documents contain a multivalued field, named "f", with a lot
of values (let's say 100 values per document). Queries set fl=id (request
only document id). The documentCache will grow up in memory :(

In our case, documentCache was configured to 32000. There are 2 cores per
node, so 64000 LazyDocument instances are in memory. This is pretty big
number, and we'll reduce it.


I'm curious whether it's a known issue or not? and why should the
LazyDocument know the amount of values in a multivalued field which is not
requested?

Another thought which I had: Is it reasonable to add something like
"{!cache=false}" which will affect documentCache. For example. If my query
request "id" only, with a big rows parameter, I don't want documentCache to
hold these big LazyDocument objects.

Did anyone else encounter this?

Re: documentCache and lazyFieldLoading

Posted by Chris Hostetter <ho...@fucit.org>.

: 2. I understand this architecture of LazyFields, but i did not understand
: why multiple LazyFields should be created for the multivalued field. You
: can't load a part of them. If you request the field, you will get ALL of
: its values. so 100 (or more) placeholders are not necessary in this case.
: Moreover, why should Solr KNOW how much values are in that unloaded field?

It's been a while since i looked at it closely, but i believe the crux of 
the reasoning has to do with the way the lucene Document API is 
structured -- each document consists a list of IndexableField objects 
which contain the field name and the field value -- there is not single object 
representing a fieldname and all of it's intidivual values hanging off of 
it, so the lucene Documents produced by LazyDocument have to register a 
LazyField instance as a placeholder for each of those IndexableField 
instances, so that if/when the Document API is used to access them, they 
can be used to fetch the corisponding value.  

there just isn't really any other way that the LazyDocument class can 
modify the Document object to know about the lazy fields.

But as i mentioned before: these LazyField objects are *tiny*.  Unless a 
subsequent request that reuses the doc from the cache asks to fetch the 
underlying value having 100+K of them in RAM shouldn't amount to much. 
 (And if the underlying field values are requested, then the amount 
of space they take up should be a fairly insignificant amount 
more then the underlying value itself -- if the underlying values are 
small enough that it's noticable overhead, you probably don't want to 
bother using it all, evne if you frequently don't need those values).

FWIW: if/when you ask for one LazyField's real value, it goes ahead and 
populates the values of all the other LazyField's with the same name (so 
no redundent work is done when iterative over al the values of a field in 
the typical flow)

: What do you think about temporary disabling documentCache, for a specific
: query?

I don't see anything wrong with teh idea conceptually, but I'm not sure 
how feasible that would be or have any suggestions to how/where to 
implement it.

I still think you should really consider dialing your documentCache size 
way, way down and test the performance -- even with your multiple 
concurrent requests asking for rows=2000 i suspect you won't see any 
painful increases in response time, and it will most certainly help your 
OOM porblems.


-Hoss

Re: documentCache and lazyFieldLoading

Posted by Isaac Hebsh <is...@gmail.com>.

Thanks Hoss.

1. We currently use Solr 4.3.0.
2. I understand this architecture of LazyFields, but i did not understand
why multiple LazyFields should be created for the multivalued field. You
can't load a part of them. If you request the field, you will get ALL of
its values. so 100 (or more) placeholders are not necessary in this case.
Moreover, why should Solr KNOW how much values are in that unloaded field?
3. In our poor case, we might handle some concurrent queries, each one
requests rows=2000.

What do you think about temporary disabling documentCache, for a specific
query?

On Thu, Aug 29, 2013 at 10:11 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : The main issue we found was a lot of millions of LazyField instances,
> : taking ~2GB of memory, even though queries request about 10 small fields
> : only.
>
> which version of Solr are you using?  there was a really bad bug with
> lazyFieldLoading fixed in Solr 4.2.1 (SOLR-4589)
>
> : We've found that LazyDocument creates a LazyField object for every item
> in
> : a multivalued field, even if do not want this field.
>
> right, that's exactly how lazyFieldLoading is designed to work -- instead
> of loading the full field values into ram, only a small LazyField object
> is loaded in it's place and that LazyField only fetches the underlying
> data if/when it's requested.
>
> If the LazyField instances weren't created as placeholders, subsequent
> requests for the document that *might* request additional fields (beyond
> the "10 small fields" that were requested the first time) would have no
> way of knowing if/when those additional fields existed to be able to fetch
> them from the index.
>
> : In our case, documentCache was configured to 32000. There are 2 cores per
> : node, so 64000 LazyDocument instances are in memory. This is pretty big
> : number, and we'll reduce it.
>
> FWIW: Even at 1/10 that size, that seems like a ridiculously large
> documentCache to me.
>
>
> -Hoss
>

Re: documentCache and lazyFieldLoading

Posted by Chris Hostetter <ho...@fucit.org>.

: The main issue we found was a lot of millions of LazyField instances,
: taking ~2GB of memory, even though queries request about 10 small fields
: only.

which version of Solr are you using?  there was a really bad bug with 
lazyFieldLoading fixed in Solr 4.2.1 (SOLR-4589)

: We've found that LazyDocument creates a LazyField object for every item in
: a multivalued field, even if do not want this field.

right, that's exactly how lazyFieldLoading is designed to work -- instead 
of loading the full field values into ram, only a small LazyField object 
is loaded in it's place and that LazyField only fetches the underlying 
data if/when it's requested.

If the LazyField instances weren't created as placeholders, subsequent 
requests for the document that *might* request additional fields (beyond 
the "10 small fields" that were requested the first time) would have no 
way of knowing if/when those additional fields existed to be able to fetch 
them from the index.

: In our case, documentCache was configured to 32000. There are 2 cores per
: node, so 64000 LazyDocument instances are in memory. This is pretty big
: number, and we'll reduce it.

FWIW: Even at 1/10 that size, that seems like a ridiculously large 
documentCache to me.


-Hoss