You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by tradergene <no...@krevets.com> on 2014/03/19 22:01:41 UTC

Excessive Heap Usage from docValues?

Hello All,

I'm hoping to get your assistance in debugging what seems like a memory
issue.

I have a Solr index with about 32 million docs.  Each doc is relatively
small but has multiple dynamic fields that are storing INTs.  The initial
problem that I had to resolve is that we were running into OOMs (on a 48GB
heap, 130GB on-disk index).  I narrowed that issue down to Lucene FieldCache
filling up the heap due to all the dynamic fields.  To mitigate this, I
enabled docValues on the schema for many of the dynamicField culprits.  This
dropped the FieldCache down to almost nothing.

Now, when re-indexing for docValues functionality, I ran into OOMs as soon
as I reached 12 million of the 32 million documents.  Before enabling
docValues, I was able to load up Solr on a 48GB heap but ran into problems
after enough unique searches occurred (normal FieldCache issue).  Now, with
docValues, a 48GB heap is giving me OOM after 12 million docs indexed.  I
split the collection into 10 shards and with 2 nodes (48GB heap each) was
able to get up to 21 million docs indexed.  Now, I've had to move the shards
to more nodes and am up to 10 shards across 4 nodes and am hoping to be able
to get all 32 million docs indexed.  This will be 48GB x 4 heap which seems
really excessive for an index that was only 132GB pre-docValues.

I would love some thoughts as to whether I'm expecting too much efficiency
with docValues enabled.  I was under the impression that docValues would
increase storage requirements on disk (which it has), but l thought that RAM
usage would go down during searching (which I haven't tested) as well as
indexing.

Thanks for any assistance anyone can provide.

Gene



--
View this message in context: http://lucene.472066.n3.nabble.com/Excessive-Heap-Usage-from-docValues-tp4125577.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Excessive Heap Usage from docValues?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2014-03-19 at 22:01 +0100, tradergene wrote:
> I have a Solr index with about 32 million docs.  Each doc is relatively
> small but has multiple dynamic fields that are storing INTs.  The initial
> problem that I had to resolve is that we were running into OOMs (on a 48GB
> heap, 130GB on-disk index).  I narrowed that issue down to Lucene FieldCache
> filling up the heap due to all the dynamic fields.

48GB heap for a 130GB, 32M docs index sounds excessive.  Could you tell
us how many unique fields your searcher uses in total for faceting and
maybe the overall layout of your index? Is this perhaps a case of many
distinct groups of data put in the same index, where the searches are
always within a single group and each group has its own fields for
faceting? Are the fields single- or multi-valued?

- Toke Eskildsen, State and University Library, Denmark



Re: Excessive Heap Usage from docValues?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

Which type of doc values? See Wiki or reference guide for a list of types.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Mar 19, 2014 5:02 PM, "tradergene" <no...@krevets.com> wrote:

> Hello All,
>
> I'm hoping to get your assistance in debugging what seems like a memory
> issue.
>
> I have a Solr index with about 32 million docs.  Each doc is relatively
> small but has multiple dynamic fields that are storing INTs.  The initial
> problem that I had to resolve is that we were running into OOMs (on a 48GB
> heap, 130GB on-disk index).  I narrowed that issue down to Lucene
> FieldCache
> filling up the heap due to all the dynamic fields.  To mitigate this, I
> enabled docValues on the schema for many of the dynamicField culprits.
>  This
> dropped the FieldCache down to almost nothing.
>
> Now, when re-indexing for docValues functionality, I ran into OOMs as soon
> as I reached 12 million of the 32 million documents.  Before enabling
> docValues, I was able to load up Solr on a 48GB heap but ran into problems
> after enough unique searches occurred (normal FieldCache issue).  Now, with
> docValues, a 48GB heap is giving me OOM after 12 million docs indexed.  I
> split the collection into 10 shards and with 2 nodes (48GB heap each) was
> able to get up to 21 million docs indexed.  Now, I've had to move the
> shards
> to more nodes and am up to 10 shards across 4 nodes and am hoping to be
> able
> to get all 32 million docs indexed.  This will be 48GB x 4 heap which seems
> really excessive for an index that was only 132GB pre-docValues.
>
> I would love some thoughts as to whether I'm expecting too much efficiency
> with docValues enabled.  I was under the impression that docValues would
> increase storage requirements on disk (which it has), but l thought that
> RAM
> usage would go down during searching (which I haven't tested) as well as
> indexing.
>
> Thanks for any assistance anyone can provide.
>
> Gene
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Excessive-Heap-Usage-from-docValues-tp4125577.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>