You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Norgorn <ls...@mail.ru> on 2015/01/22 08:00:19 UTC

Field collapsing memory usage

We are trying to run SOLR with big index, using as little RAM as possible.
Simple search for our cases works nice, but field collapsing (group=true)
queries fall with OOM.

Our setup is several shards per SOLR entity, each shard on it's own HDD.
We've tried same queries, but to one specific shard, and those queries
worked well (no OOMs).

Then we changed shard being queried and measured RAM usage. We saw, that
while there is only one shard being queried, used RAM increased
significantly.

So, as we see, memory, used by first shard to group, wasn't released.
Caches are already nearly zero.

Changing shards, we've managed to make SOLR fall.

My question is, why is it so? What do we need to do, to release memory, to,
at the end, be able to query shards alternately (cause parallel group query
fails nearly always)?



--
View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-memory-usage-tp4181092.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field collapsing memory usage

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Thu, 2015-01-22 at 22:52 +0100, Erick Erickson wrote:
> What do you think about folding this into the Solr (or Lucene?) code
> base? Or is it to specialized?

(writing under the assumption that DVEnabler actually works as it should
for everyone and not just us)

Right now it is an explicit tool. As such, users need to find it and
learn how to use it, which is a large barrier. Most of the time it is
easier just to re-index everything.

It seems to me that it should be possible to do seamlessly instead:
Simply change the schema and reload. Old segments would have emulated
DocValues (high speed, high memory overhead), new segments would have
pure DVs. An optimize would be optional, but highly recommended.

- Toke Eskildsen, State and University Library, Denmark

Re: Field collapsing memory usage

Posted by Erick Erickson <er...@gmail.com>.

Toke:

What do you think about folding this into the Solr (or Lucene?) code
base? Or is it to specialized?

Not sure one way or the other, just askin'....

Erick

On Thu, Jan 22, 2015 at 3:47 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> Norgorn [lsunnydayl@mail.ru] wrote:
>> Is there any way to make 'docValues="true"' without reindexing?
>
> Depends on how brave you are :-)
>
> We recently had the same need and made https://github.com/netarchivesuite/dvenabler
> To my knowledge that is the only existing tool for that task an as we are the only ones having used it, robustness is not guaranteed. Warnings aside, it works without problems in our tests as well as the few real corpuses we have tested on. It does use a fairly memory hungry structure during the conversion. If the number of _unique_ values in your grouping field approaches 1b, I loosely guess that you will need 40GB+ of heap. Do read https://github.com/netarchivesuite/dvenabler/issues/14 if you want to try it.
>
> - Toke Eskildsen

RE: Field collapsing memory usage

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

Norgorn [lsunnydayl@mail.ru] wrote:
> Nice, thanks!
> If u'd like to, I'll write our results with that amazing util.

By all means, please do. Good as well as bad. Independent testing is needed to ensure proper working tools.

- Toke Eskildsen

RE: Field collapsing memory usage

Posted by Norgorn <ls...@mail.ru>.

Nice, thanks!
If u'd like to, I'll write our results with that amazing util.



--
View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-memory-usage-tp4181092p4181159.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Field collapsing memory usage

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

Norgorn [lsunnydayl@mail.ru] wrote:
> Is there any way to make 'docValues="true"' without reindexing?

Depends on how brave you are :-)

We recently had the same need and made https://github.com/netarchivesuite/dvenabler
To my knowledge that is the only existing tool for that task an as we are the only ones having used it, robustness is not guaranteed. Warnings aside, it works without problems in our tests as well as the few real corpuses we have tested on. It does use a fairly memory hungry structure during the conversion. If the number of _unique_ values in your grouping field approaches 1b, I loosely guess that you will need 40GB+ of heap. Do read https://github.com/netarchivesuite/dvenabler/issues/14 if you want to try it.

- Toke Eskildsen

RE: Field collapsing memory usage

Posted by Norgorn <ls...@mail.ru>.

Thank you for your answer.
We've found out that the problem was in our SOLR spec (heliosearch 0.08).
There are no crushes, after changing to 4.10.3 (although, there are lot of
OOMs while handling query, it's not really strange for 1.1 bil of documents
).
Now we are going to try latest Heliosearch.

Is there any way to make 'docValues="true"' without reindexing?



--
View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-memory-usage-tp4181092p4181108.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Field collapsing memory usage

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

Norgorn [lsunnydayl@mail.ru] wrote:
> So, as we see, memory, used by first shard to group, wasn't released.
> Caches are already nearly zero.

It should be one or the other: Either the memory is released or there is something in the caches. Anyway, DocValues is the way to go, so ensure that it turned on for your group field: We do grouping on indexes with 250M documents (and 200M+ unique values in the group field) without any significant memory overhead, using DocValues.

Caveat: If you ask for very large result sets, the memory usage will be high. But only temporarily.

- Toke Eskildsen