You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Isaac Hebsh <is...@gmail.com> on 2013/08/29 20:47:35 UTC
documentCache and lazyFieldLoading
Hi,
We've investigated a memory dump, which was taken after some frequent OOM
incidents.
The main issue we found was a lot of millions of LazyField instances,
taking ~2GB of memory, even though queries request about 10 small fields
only.
We've found that LazyDocument creates a LazyField object for every item in
a multivalued field, even if do not want this field.
For example, documents contain a multivalued field, named "f", with a lot
of values (let's say 100 values per document). Queries set fl=id (request
only document id). The documentCache will grow up in memory :(
In our case, documentCache was configured to 32000. There are 2 cores per
node, so 64000 LazyDocument instances are in memory. This is pretty big
number, and we'll reduce it.
I'm curious whether it's a known issue or not? and why should the
LazyDocument know the amount of values in a multivalued field which is not
requested?
Another thought which I had: Is it reasonable to add something like
"{!cache=false}" which will affect documentCache. For example. If my query
request "id" only, with a big rows parameter, I don't want documentCache to
hold these big LazyDocument objects.
Did anyone else encounter this?
Re: documentCache and lazyFieldLoading
Posted by Chris Hostetter <ho...@fucit.org>.
: 2. I understand this architecture of LazyFields, but i did not understand
: why multiple LazyFields should be created for the multivalued field. You
: can't load a part of them. If you request the field, you will get ALL of
: its values. so 100 (or more) placeholders are not necessary in this case.
: Moreover, why should Solr KNOW how much values are in that unloaded field?
It's been a while since i looked at it closely, but i believe the crux of
the reasoning has to do with the way the lucene Document API is
structured -- each document consists a list of IndexableField objects
which contain the field name and the field value -- there is not single object
representing a fieldname and all of it's intidivual values hanging off of
it, so the lucene Documents produced by LazyDocument have to register a
LazyField instance as a placeholder for each of those IndexableField
instances, so that if/when the Document API is used to access them, they
can be used to fetch the corisponding value.
there just isn't really any other way that the LazyDocument class can
modify the Document object to know about the lazy fields.
But as i mentioned before: these LazyField objects are *tiny*. Unless a
subsequent request that reuses the doc from the cache asks to fetch the
underlying value having 100+K of them in RAM shouldn't amount to much.
(And if the underlying field values are requested, then the amount
of space they take up should be a fairly insignificant amount
more then the underlying value itself -- if the underlying values are
small enough that it's noticable overhead, you probably don't want to
bother using it all, evne if you frequently don't need those values).
FWIW: if/when you ask for one LazyField's real value, it goes ahead and
populates the values of all the other LazyField's with the same name (so
no redundent work is done when iterative over al the values of a field in
the typical flow)
: What do you think about temporary disabling documentCache, for a specific
: query?
I don't see anything wrong with teh idea conceptually, but I'm not sure
how feasible that would be or have any suggestions to how/where to
implement it.
I still think you should really consider dialing your documentCache size
way, way down and test the performance -- even with your multiple
concurrent requests asking for rows=2000 i suspect you won't see any
painful increases in response time, and it will most certainly help your
OOM porblems.
-Hoss
Re: documentCache and lazyFieldLoading
Posted by Isaac Hebsh <is...@gmail.com>.
Thanks Hoss.
1. We currently use Solr 4.3.0.
2. I understand this architecture of LazyFields, but i did not understand
why multiple LazyFields should be created for the multivalued field. You
can't load a part of them. If you request the field, you will get ALL of
its values. so 100 (or more) placeholders are not necessary in this case.
Moreover, why should Solr KNOW how much values are in that unloaded field?
3. In our poor case, we might handle some concurrent queries, each one
requests rows=2000.
What do you think about temporary disabling documentCache, for a specific
query?
On Thu, Aug 29, 2013 at 10:11 PM, Chris Hostetter
<ho...@fucit.org>wrote:
>
> : The main issue we found was a lot of millions of LazyField instances,
> : taking ~2GB of memory, even though queries request about 10 small fields
> : only.
>
> which version of Solr are you using? there was a really bad bug with
> lazyFieldLoading fixed in Solr 4.2.1 (SOLR-4589)
>
> : We've found that LazyDocument creates a LazyField object for every item
> in
> : a multivalued field, even if do not want this field.
>
> right, that's exactly how lazyFieldLoading is designed to work -- instead
> of loading the full field values into ram, only a small LazyField object
> is loaded in it's place and that LazyField only fetches the underlying
> data if/when it's requested.
>
> If the LazyField instances weren't created as placeholders, subsequent
> requests for the document that *might* request additional fields (beyond
> the "10 small fields" that were requested the first time) would have no
> way of knowing if/when those additional fields existed to be able to fetch
> them from the index.
>
> : In our case, documentCache was configured to 32000. There are 2 cores per
> : node, so 64000 LazyDocument instances are in memory. This is pretty big
> : number, and we'll reduce it.
>
> FWIW: Even at 1/10 that size, that seems like a ridiculously large
> documentCache to me.
>
>
> -Hoss
>
Re: documentCache and lazyFieldLoading
Posted by Chris Hostetter <ho...@fucit.org>.
: The main issue we found was a lot of millions of LazyField instances,
: taking ~2GB of memory, even though queries request about 10 small fields
: only.
which version of Solr are you using? there was a really bad bug with
lazyFieldLoading fixed in Solr 4.2.1 (SOLR-4589)
: We've found that LazyDocument creates a LazyField object for every item in
: a multivalued field, even if do not want this field.
right, that's exactly how lazyFieldLoading is designed to work -- instead
of loading the full field values into ram, only a small LazyField object
is loaded in it's place and that LazyField only fetches the underlying
data if/when it's requested.
If the LazyField instances weren't created as placeholders, subsequent
requests for the document that *might* request additional fields (beyond
the "10 small fields" that were requested the first time) would have no
way of knowing if/when those additional fields existed to be able to fetch
them from the index.
: In our case, documentCache was configured to 32000. There are 2 cores per
: node, so 64000 LazyDocument instances are in memory. This is pretty big
: number, and we'll reduce it.
FWIW: Even at 1/10 that size, that seems like a ridiculously large
documentCache to me.
-Hoss