You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Chetas Joshi <ch...@gmail.com> on 2017/04/11 20:56:48 UTC

Long GC pauses while reading Solr docs using Cursor approach

Hello,

I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
with number of shards = 80 and replication Factor=2

Sold JVM heap size = 20 GB
solr.hdfs.blockcache.enabled = true
solr.hdfs.blockcache.direct.memory.allocation = true
MaxDirectMemorySize = 25 GB

I am querying a solr collection with index size = 500 MB per core.

The off-heap (25 GB) is huge so that it can load the entire index.

Using cursor approach (number of rows = 100K), I read 2 fields (Total 40
bytes per solr doc) from the Solr docs that satisfy the query. The docs are
sorted by "id" and then by those 2 fields.

I am not able to understand why the heap memory is getting full and Full
GCs are consecutively running with long GC pauses (> 30 seconds). I am
using CMS GC.

-XX:NewRatio=3 \

-XX:SurvivorRatio=4 \

-XX:TargetSurvivorRatio=90 \

-XX:MaxTenuringThreshold=8 \

-XX:+UseConcMarkSweepGC \

-XX:+UseParNewGC \

-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \

-XX:+CMSScavengeBeforeRemark \

-XX:PretenureSizeThreshold=64m \

-XX:+UseCMSInitiatingOccupancyOnly \

-XX:CMSInitiatingOccupancyFraction=50 \

-XX:CMSMaxAbortablePrecleanTime=6000 \

-XX:+CMSParallelRemarkEnabled \

-XX:+ParallelRefProcEnabled"


Please guide me in debugging the heap usage issue.


Thanks!

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Toke Eskildsen <to...@kb.dk>.

Chetas Joshi <ch...@gmail.com> wrote:
> Thanks for the insights into the memory requirements. Looks like cursor
> approach is going to require a lot of memory for millions of documents.

Sorry, that is a premature conclusion from your observations.

> If I run a query that returns only 500K documents still keeping 100K docs
> per page, I don't see long GC pauses.

500K docs is far less than your worst-case 80*100K. You are not keeping the effective page size constant across your tests. You need to do that in order to conclude that it is the result set size that is the problem.

> So it is not really the number of rows per page but the overall number of
> docs.

It is the effective maximum number of document results handled at any point (the merger really) during the transaction. If your page size is 100K and you match 8M documents, then the maximum is 8M (as you indirectly calculated earlier). If you match 800M documents, the maximum is _still_ 8M.

(note: Okay, it is not just the maximum number of results as the internal structures for determining the result sets at the individual nodes are allocated from the page size. However, that does not affect the merging process)

The high number 8M might be the reason for your high GC activity. Effectively 2 or 3 times that many tiny objects needs to be allocated, be alive at the same time, then de-allocated. A very short time after de-allocation, a new bunch needs to be allocated, so a guess is that the garbage collector has a hard time keeping up with this pattern. One strategy for coping is to allocate more memory and hope for the barrage to end, which would explain your jump in heap. But I'm in guess-land here.

Hopefully it is simple for you to turn the page size way down - to 10K or even 1K. Why don't you try that, then see how it affects speed and memory requirements?

- Toke

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Shawn Heisey <ap...@elyograg.org>.

On 4/13/2017 11:51 AM, Chetas Joshi wrote:
> Thanks for the insights into the memory requirements. Looks like cursor
> approach is going to require a lot of memory for millions of documents.
> If I run a query that returns only 500K documents still keeping 100K docs
> per page, I don't see long GC pauses. So it is not really the number of
> rows per page but the overall number of docs. May be I can reduce the
> document cache and the field cache. What do you think?

Lucene handles the field cache automatically and as far as I am aware,
it is not configurable in any way.  Having docValues on fields that you
are using will reduce the amount of memory required for the field cache.

The filterCache is typically going to be much larger than any of the
other configurable caches.  Each entry in filterCache will be 25 million
bytes on a 200 million document index.  The filterCache should not be
configured with a large size -- typical example defaults have a size of
512 ... 512 entries that are each 25 million bytes will use 12
gigabytes.  The other caches typically have much smaller entries and
therefore can usually be configured with fairly large sizes.

Thanks,
Shawn

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Chetas Joshi <ch...@gmail.com>.

Hi Shawn,

Thanks for the insights into the memory requirements. Looks like cursor
approach is going to require a lot of memory for millions of documents.
If I run a query that returns only 500K documents still keeping 100K docs
per page, I don't see long GC pauses. So it is not really the number of
rows per page but the overall number of docs. May be I can reduce the
document cache and the field cache. What do you think?

Erick,

I was using the streaming approach to get back results from Solr but I was
running into some run time exceptions. That bug has been fixed in solr 6.0.
But because of some reasons, I won't be able to move to Java 8 and hence I
will have to stick to solr 5.5.0. That is the reason I had to switch to the
cursor approach.

Thanks!

On Wed, Apr 12, 2017 at 8:37 PM, Erick Erickson <er...@gmail.com>
wrote:

> You're missing the point of my comment. Since they already are
> docValues, you can use the /export functionality to get the results
> back as a _stream_ and avoid all of the overhead of the aggregator
> node doing a merge sort and all of that.
>
> You'll have to do this from SolrJ, but see CloudSolrStream. You can
> see examples of its usage in StreamingTest.java.
>
> this should
> 1> complete much, much faster. The design goal is 400K rows/second but YMMV
> 2> use vastly less memory on your Solr instances.
> 3> only require _one_ query
>
> Best,
> Erick
>
> On Wed, Apr 12, 2017 at 7:36 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> > On 4/12/2017 5:19 PM, Chetas Joshi wrote:
> >> I am getting back 100K results per page.
> >> The fields have docValues enabled and I am getting sorted results based
> on "id" and 2 more fields (String: 32 Bytes and Long: 8 Bytes).
> >>
> >> I have a solr Cloud of 80 nodes. There will be one shard that will get
> top 100K docs from each shard and apply merge sort. So, the max memory
> usage of any shard could be 40 bytes * 100K * 80 = 320 MB. Why would heap
> memory usage shoot up from 8 GB to 17 GB?
> >
> > From what I understand, Java overhead for a String object is 56 bytes
> > above the actual byte size of the string itself.  And each character in
> > the string will be two bytes -- Java uses UTF-16 for character
> > representation internally.  If I'm right about these numbers, it means
> > that each of those id values will take 120 bytes -- and that doesn't
> > include the size the actual response (xml, json, etc).
> >
> > I don't know what the overhead for a long is, but you can be sure that
> > it's going to take more than eight bytes total memory usage for each one.
> >
> > Then there is overhead for all the Lucene memory structures required to
> > execute the query and gather results, plus Solr memory structures to
> > keep track of everything.  I have absolutely no idea how much memory
> > Lucene and Solr use to accomplish a query, but it's not going to be
> > small when you have 200 million documents per shard.
> >
> > Speaking of Solr memory requirements, under normal query circumstances
> > the aggregating node is going to receive at least 100K results from
> > *every* shard in the collection, which it will condense down to the
> > final result with 100K entries.  The behavior during a cursor-based
> > request may be more memory-efficient than what I have described, but I
> > am unsure whether that is the case.
> >
> > If the cursor behavior is not more efficient, then each entry in those
> > results will contain the uniqueKey value and the score.  That's going to
> > be many megabytes for every shard.  If there are 80 shards, it would
> > probably be over a gigabyte for one request.
> >
> > Thanks,
> > Shawn
> >
>

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Erick Erickson <er...@gmail.com>.

You're missing the point of my comment. Since they already are
docValues, you can use the /export functionality to get the results
back as a _stream_ and avoid all of the overhead of the aggregator
node doing a merge sort and all of that.

You'll have to do this from SolrJ, but see CloudSolrStream. You can
see examples of its usage in StreamingTest.java.

this should
1> complete much, much faster. The design goal is 400K rows/second but YMMV
2> use vastly less memory on your Solr instances.
3> only require _one_ query

Best,
Erick

On Wed, Apr 12, 2017 at 7:36 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 4/12/2017 5:19 PM, Chetas Joshi wrote:
>> I am getting back 100K results per page.
>> The fields have docValues enabled and I am getting sorted results based on "id" and 2 more fields (String: 32 Bytes and Long: 8 Bytes).
>>
>> I have a solr Cloud of 80 nodes. There will be one shard that will get top 100K docs from each shard and apply merge sort. So, the max memory usage of any shard could be 40 bytes * 100K * 80 = 320 MB. Why would heap memory usage shoot up from 8 GB to 17 GB?
>
> From what I understand, Java overhead for a String object is 56 bytes
> above the actual byte size of the string itself.  And each character in
> the string will be two bytes -- Java uses UTF-16 for character
> representation internally.  If I'm right about these numbers, it means
> that each of those id values will take 120 bytes -- and that doesn't
> include the size the actual response (xml, json, etc).
>
> I don't know what the overhead for a long is, but you can be sure that
> it's going to take more than eight bytes total memory usage for each one.
>
> Then there is overhead for all the Lucene memory structures required to
> execute the query and gather results, plus Solr memory structures to
> keep track of everything.  I have absolutely no idea how much memory
> Lucene and Solr use to accomplish a query, but it's not going to be
> small when you have 200 million documents per shard.
>
> Speaking of Solr memory requirements, under normal query circumstances
> the aggregating node is going to receive at least 100K results from
> *every* shard in the collection, which it will condense down to the
> final result with 100K entries.  The behavior during a cursor-based
> request may be more memory-efficient than what I have described, but I
> am unsure whether that is the case.
>
> If the cursor behavior is not more efficient, then each entry in those
> results will contain the uniqueKey value and the score.  That's going to
> be many megabytes for every shard.  If there are 80 shards, it would
> probably be over a gigabyte for one request.
>
> Thanks,
> Shawn
>

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Shawn Heisey <ap...@elyograg.org>.

On 4/12/2017 5:19 PM, Chetas Joshi wrote:
> I am getting back 100K results per page.
> The fields have docValues enabled and I am getting sorted results based on "id" and 2 more fields (String: 32 Bytes and Long: 8 Bytes).
>
> I have a solr Cloud of 80 nodes. There will be one shard that will get top 100K docs from each shard and apply merge sort. So, the max memory usage of any shard could be 40 bytes * 100K * 80 = 320 MB. Why would heap memory usage shoot up from 8 GB to 17 GB?

From what I understand, Java overhead for a String object is 56 bytes
above the actual byte size of the string itself.  And each character in
the string will be two bytes -- Java uses UTF-16 for character
representation internally.  If I'm right about these numbers, it means
that each of those id values will take 120 bytes -- and that doesn't
include the size the actual response (xml, json, etc).

I don't know what the overhead for a long is, but you can be sure that
it's going to take more than eight bytes total memory usage for each one.

Then there is overhead for all the Lucene memory structures required to
execute the query and gather results, plus Solr memory structures to
keep track of everything.  I have absolutely no idea how much memory
Lucene and Solr use to accomplish a query, but it's not going to be
small when you have 200 million documents per shard.

Speaking of Solr memory requirements, under normal query circumstances
the aggregating node is going to receive at least 100K results from
*every* shard in the collection, which it will condense down to the
final result with 100K entries.  The behavior during a cursor-based
request may be more memory-efficient than what I have described, but I
am unsure whether that is the case.

If the cursor behavior is not more efficient, then each entry in those
results will contain the uniqueKey value and the score.  That's going to
be many megabytes for every shard.  If there are 80 shards, it would
probably be over a gigabyte for one request.

Thanks,
Shawn

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Chetas Joshi <ch...@gmail.com>.

I am getting back 100K results per page.
The fields have docValues enabled and I am getting sorted results based on
"id" and 2 more fields (String: 32 Bytes and Long: 8 Bytes).

I have a solr Cloud of 80 nodes. There will be one shard that will get top
100K docs from each shard and apply merge sort. So, the max memory usage of
any shard could be 40 bytes * 100K * 80 = 320 MB. Why would heap memory
usage shoot up from 8 GB to 17 GB?

Thanks!

On Wed, Apr 12, 2017 at 1:32 PM, Erick Erickson <er...@gmail.com>
wrote:

> Oh my. Returning 100K rows per request is usually poor practice.
> One hopes these are very tiny docs.
>
> But this may well be an "XY" problem. What kinds of information
> are you returning in your docs and could they all be docValues
> types? In which case you would be waaay far ahead by using
> the various Streaming options.
>
> Best,
> Erick
>
> On Wed, Apr 12, 2017 at 12:59 PM, Chetas Joshi <ch...@gmail.com>
> wrote:
> > I am running a query that returns 10 MM docs in total and the number of
> > rows per page is 100K.
> >
> > On Wed, Apr 12, 2017 at 12:53 PM, Mikhail Khludnev <gg...@gmail.com>
> wrote:
> >
> >> And what is the rows parameter?
> >>
> >> 12 апр. 2017 г. 21:32 пользователь "Chetas Joshi" <
> chetas.joshi@gmail.com>
> >> написал:
> >>
> >> > Thanks for your response Shawn and Wunder.
> >> >
> >> > Hi Shawn,
> >> >
> >> > Here is the system config:
> >> >
> >> > Total system memory = 512 GB
> >> > each server handles two 500 MB cores
> >> > Number of solr docs per 500 MB core = 200 MM
> >> >
> >> > The average heap usage is around 4-6 GB. When the read starts using
> the
> >> > Cursor approach, the heap usage starts increasing with the base of the
> >> > sawtooth at 8 GB and then shooting up to 17 GB. Even after the full
> GC,
> >> the
> >> > heap usage remains around 15 GB and then it comes down to 8 GB.
> >> >
> >> > With 100K docs, the requirement will be in MBs so it is strange it is
> >> > jumping from 8 GB to 17 GB while preparing the sorted response.
> >> >
> >> > Thanks!
> >> >
> >> >
> >> >
> >> > On Tue, Apr 11, 2017 at 8:48 PM, Walter Underwood <
> wunder@wunderwood.org
> >> >
> >> > wrote:
> >> >
> >> > > JVM version? We’re running v8 update 121 with the G1 collector and
> it
> >> is
> >> > > working really well. We also have an 8GB heap.
> >> > >
> >> > > Graph your heap usage. You’ll see a sawtooth shape, where it grows,
> >> then
> >> > > there is a major GC. The maximum of the base of the sawtooth is the
> >> > working
> >> > > set of heap that your Solr installation needs. Set the heap to that
> >> > value,
> >> > > plus a gigabyte or so. We run with a 2GB eden (new space) because so
> >> much
> >> > > of Solr’s allocations have a lifetime of one request. So, the base
> of
> >> the
> >> > > sawtooth, plus a gigabyte breathing room, plus two more for eden.
> That
> >> > > should work.
> >> > >
> >> > > I don’t set all the ratios and stuff. When were running CMS, I set a
> >> size
> >> > > for the heap and a size for the new space. Done. With G1, I don’t
> even
> >> > get
> >> > > that fussy.
> >> > >
> >> > > wunder
> >> > > Walter Underwood
> >> > > wunder@wunderwood.org
> >> > > http://observer.wunderwood.org/  (my blog)
> >> > >
> >> > >
> >> > > > On Apr 11, 2017, at 8:22 PM, Shawn Heisey <ap...@elyograg.org>
> >> wrote:
> >> > > >
> >> > > > On 4/11/2017 2:56 PM, Chetas Joshi wrote:
> >> > > >> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold
> >> > collection
> >> > > >> with number of shards = 80 and replication Factor=2
> >> > > >>
> >> > > >> Sold JVM heap size = 20 GB
> >> > > >> solr.hdfs.blockcache.enabled = true
> >> > > >> solr.hdfs.blockcache.direct.memory.allocation = true
> >> > > >> MaxDirectMemorySize = 25 GB
> >> > > >>
> >> > > >> I am querying a solr collection with index size = 500 MB per
> core.
> >> > > >
> >> > > > I see that you and I have traded messages before on the list.
> >> > > >
> >> > > > How much total system memory is there per server?  How many of
> these
> >> > > > 500MB cores are on each server?  How many docs are in a 500MB
> core?
> >> > The
> >> > > > answers to these questions may affect the other advice that I give
> >> you.
> >> > > >
> >> > > >> The off-heap (25 GB) is huge so that it can load the entire
> index.
> >> > > >
> >> > > > I still know very little about how HDFS handles caching and
> memory.
> >> > You
> >> > > > want to be sure that as much data as possible from your indexes is
> >> > > > sitting in local memory on the server.
> >> > > >
> >> > > >> Using cursor approach (number of rows = 100K), I read 2 fields
> >> (Total
> >> > 40
> >> > > >> bytes per solr doc) from the Solr docs that satisfy the query.
> The
> >> > docs
> >> > > are sorted by "id" and then by those 2 fields.
> >> > > >>
> >> > > >> I am not able to understand why the heap memory is getting full
> and
> >> > Full
> >> > > >> GCs are consecutively running with long GC pauses (> 30
> seconds). I
> >> am
> >> > > >> using CMS GC.
> >> > > >
> >> > > > A 20GB heap is quite large.  Do you actually need it to be that
> >> large?
> >> > > > If you graph JVM heap usage over a long period of time, what are
> the
> >> > low
> >> > > > points in the graph?
> >> > > >
> >> > > > A result containing 100K docs is going to be pretty large, even
> with
> >> a
> >> > > > limited number of fields.  It is likely to be several megabytes.
> It
> >> > > > will need to be entirely built in the heap memory before it is
> sent
> >> to
> >> > > > the client -- both as Lucene data structures (which will probably
> be
> >> > > > much larger than the actual response due to Java overhead) and as
> the
> >> > > > actual response format.  Then it will be garbage as soon as the
> >> > response
> >> > > > is done.  Repeat this enough times, and you're going to go through
> >> even
> >> > > > a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB
> heap
> >> > > > are slow.
> >> > > >
> >> > > > You could try switching to G1, as long as you realize that you're
> >> going
> >> > > > against advice from Lucene experts.... but honestly, I do not
> expect
> >> > > > this to really help, because you would probably still need full
> GCs
> >> due
> >> > > > to the rate that garbage is being created.  If you do try it, I
> would
> >> > > > strongly recommend the latest Java 8, either Oracle or OpenJDK.
> >> Here's
> >> > > > my wiki page where I discuss this:
> >> > > >
> >> > > > https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_
> >> > > First.29_Collector
> >> > > >
> >> > > > Reducing the heap size (which may not be possible -- need to know
> the
> >> > > > answer to the question about memory graphing) and reducing the
> number
> >> > of
> >> > > > rows per query are the only quick solutions I can think of.
> >> > > >
> >> > > > Thanks,
> >> > > > Shawn
> >> > > >
> >> > >
> >> > >
> >> >
> >>
>

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Erick Erickson <er...@gmail.com>.

Oh my. Returning 100K rows per request is usually poor practice.
One hopes these are very tiny docs.

But this may well be an "XY" problem. What kinds of information
are you returning in your docs and could they all be docValues
types? In which case you would be waaay far ahead by using
the various Streaming options.

Best,
Erick

On Wed, Apr 12, 2017 at 12:59 PM, Chetas Joshi <ch...@gmail.com> wrote:
> I am running a query that returns 10 MM docs in total and the number of
> rows per page is 100K.
>
> On Wed, Apr 12, 2017 at 12:53 PM, Mikhail Khludnev <gg...@gmail.com> wrote:
>
>> And what is the rows parameter?
>>
>> 12 апр. 2017 г. 21:32 пользователь "Chetas Joshi" <ch...@gmail.com>
>> написал:
>>
>> > Thanks for your response Shawn and Wunder.
>> >
>> > Hi Shawn,
>> >
>> > Here is the system config:
>> >
>> > Total system memory = 512 GB
>> > each server handles two 500 MB cores
>> > Number of solr docs per 500 MB core = 200 MM
>> >
>> > The average heap usage is around 4-6 GB. When the read starts using the
>> > Cursor approach, the heap usage starts increasing with the base of the
>> > sawtooth at 8 GB and then shooting up to 17 GB. Even after the full GC,
>> the
>> > heap usage remains around 15 GB and then it comes down to 8 GB.
>> >
>> > With 100K docs, the requirement will be in MBs so it is strange it is
>> > jumping from 8 GB to 17 GB while preparing the sorted response.
>> >
>> > Thanks!
>> >
>> >
>> >
>> > On Tue, Apr 11, 2017 at 8:48 PM, Walter Underwood <wunder@wunderwood.org
>> >
>> > wrote:
>> >
>> > > JVM version? We’re running v8 update 121 with the G1 collector and it
>> is
>> > > working really well. We also have an 8GB heap.
>> > >
>> > > Graph your heap usage. You’ll see a sawtooth shape, where it grows,
>> then
>> > > there is a major GC. The maximum of the base of the sawtooth is the
>> > working
>> > > set of heap that your Solr installation needs. Set the heap to that
>> > value,
>> > > plus a gigabyte or so. We run with a 2GB eden (new space) because so
>> much
>> > > of Solr’s allocations have a lifetime of one request. So, the base of
>> the
>> > > sawtooth, plus a gigabyte breathing room, plus two more for eden. That
>> > > should work.
>> > >
>> > > I don’t set all the ratios and stuff. When were running CMS, I set a
>> size
>> > > for the heap and a size for the new space. Done. With G1, I don’t even
>> > get
>> > > that fussy.
>> > >
>> > > wunder
>> > > Walter Underwood
>> > > wunder@wunderwood.org
>> > > http://observer.wunderwood.org/  (my blog)
>> > >
>> > >
>> > > > On Apr 11, 2017, at 8:22 PM, Shawn Heisey <ap...@elyograg.org>
>> wrote:
>> > > >
>> > > > On 4/11/2017 2:56 PM, Chetas Joshi wrote:
>> > > >> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold
>> > collection
>> > > >> with number of shards = 80 and replication Factor=2
>> > > >>
>> > > >> Sold JVM heap size = 20 GB
>> > > >> solr.hdfs.blockcache.enabled = true
>> > > >> solr.hdfs.blockcache.direct.memory.allocation = true
>> > > >> MaxDirectMemorySize = 25 GB
>> > > >>
>> > > >> I am querying a solr collection with index size = 500 MB per core.
>> > > >
>> > > > I see that you and I have traded messages before on the list.
>> > > >
>> > > > How much total system memory is there per server?  How many of these
>> > > > 500MB cores are on each server?  How many docs are in a 500MB core?
>> > The
>> > > > answers to these questions may affect the other advice that I give
>> you.
>> > > >
>> > > >> The off-heap (25 GB) is huge so that it can load the entire index.
>> > > >
>> > > > I still know very little about how HDFS handles caching and memory.
>> > You
>> > > > want to be sure that as much data as possible from your indexes is
>> > > > sitting in local memory on the server.
>> > > >
>> > > >> Using cursor approach (number of rows = 100K), I read 2 fields
>> (Total
>> > 40
>> > > >> bytes per solr doc) from the Solr docs that satisfy the query. The
>> > docs
>> > > are sorted by "id" and then by those 2 fields.
>> > > >>
>> > > >> I am not able to understand why the heap memory is getting full and
>> > Full
>> > > >> GCs are consecutively running with long GC pauses (> 30 seconds). I
>> am
>> > > >> using CMS GC.
>> > > >
>> > > > A 20GB heap is quite large.  Do you actually need it to be that
>> large?
>> > > > If you graph JVM heap usage over a long period of time, what are the
>> > low
>> > > > points in the graph?
>> > > >
>> > > > A result containing 100K docs is going to be pretty large, even with
>> a
>> > > > limited number of fields.  It is likely to be several megabytes.  It
>> > > > will need to be entirely built in the heap memory before it is sent
>> to
>> > > > the client -- both as Lucene data structures (which will probably be
>> > > > much larger than the actual response due to Java overhead) and as the
>> > > > actual response format.  Then it will be garbage as soon as the
>> > response
>> > > > is done.  Repeat this enough times, and you're going to go through
>> even
>> > > > a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
>> > > > are slow.
>> > > >
>> > > > You could try switching to G1, as long as you realize that you're
>> going
>> > > > against advice from Lucene experts.... but honestly, I do not expect
>> > > > this to really help, because you would probably still need full GCs
>> due
>> > > > to the rate that garbage is being created.  If you do try it, I would
>> > > > strongly recommend the latest Java 8, either Oracle or OpenJDK.
>> Here's
>> > > > my wiki page where I discuss this:
>> > > >
>> > > > https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_
>> > > First.29_Collector
>> > > >
>> > > > Reducing the heap size (which may not be possible -- need to know the
>> > > > answer to the question about memory graphing) and reducing the number
>> > of
>> > > > rows per query are the only quick solutions I can think of.
>> > > >
>> > > > Thanks,
>> > > > Shawn
>> > > >
>> > >
>> > >
>> >
>>

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Chetas Joshi <ch...@gmail.com>.

I am running a query that returns 10 MM docs in total and the number of
rows per page is 100K.

On Wed, Apr 12, 2017 at 12:53 PM, Mikhail Khludnev <gg...@gmail.com> wrote:

> And what is the rows parameter?
>
> 12 апр. 2017 г. 21:32 пользователь "Chetas Joshi" <ch...@gmail.com>
> написал:
>
> > Thanks for your response Shawn and Wunder.
> >
> > Hi Shawn,
> >
> > Here is the system config:
> >
> > Total system memory = 512 GB
> > each server handles two 500 MB cores
> > Number of solr docs per 500 MB core = 200 MM
> >
> > The average heap usage is around 4-6 GB. When the read starts using the
> > Cursor approach, the heap usage starts increasing with the base of the
> > sawtooth at 8 GB and then shooting up to 17 GB. Even after the full GC,
> the
> > heap usage remains around 15 GB and then it comes down to 8 GB.
> >
> > With 100K docs, the requirement will be in MBs so it is strange it is
> > jumping from 8 GB to 17 GB while preparing the sorted response.
> >
> > Thanks!
> >
> >
> >
> > On Tue, Apr 11, 2017 at 8:48 PM, Walter Underwood <wunder@wunderwood.org
> >
> > wrote:
> >
> > > JVM version? We’re running v8 update 121 with the G1 collector and it
> is
> > > working really well. We also have an 8GB heap.
> > >
> > > Graph your heap usage. You’ll see a sawtooth shape, where it grows,
> then
> > > there is a major GC. The maximum of the base of the sawtooth is the
> > working
> > > set of heap that your Solr installation needs. Set the heap to that
> > value,
> > > plus a gigabyte or so. We run with a 2GB eden (new space) because so
> much
> > > of Solr’s allocations have a lifetime of one request. So, the base of
> the
> > > sawtooth, plus a gigabyte breathing room, plus two more for eden. That
> > > should work.
> > >
> > > I don’t set all the ratios and stuff. When were running CMS, I set a
> size
> > > for the heap and a size for the new space. Done. With G1, I don’t even
> > get
> > > that fussy.
> > >
> > > wunder
> > > Walter Underwood
> > > wunder@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >
> > > > On Apr 11, 2017, at 8:22 PM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> > > >
> > > > On 4/11/2017 2:56 PM, Chetas Joshi wrote:
> > > >> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold
> > collection
> > > >> with number of shards = 80 and replication Factor=2
> > > >>
> > > >> Sold JVM heap size = 20 GB
> > > >> solr.hdfs.blockcache.enabled = true
> > > >> solr.hdfs.blockcache.direct.memory.allocation = true
> > > >> MaxDirectMemorySize = 25 GB
> > > >>
> > > >> I am querying a solr collection with index size = 500 MB per core.
> > > >
> > > > I see that you and I have traded messages before on the list.
> > > >
> > > > How much total system memory is there per server?  How many of these
> > > > 500MB cores are on each server?  How many docs are in a 500MB core?
> > The
> > > > answers to these questions may affect the other advice that I give
> you.
> > > >
> > > >> The off-heap (25 GB) is huge so that it can load the entire index.
> > > >
> > > > I still know very little about how HDFS handles caching and memory.
> > You
> > > > want to be sure that as much data as possible from your indexes is
> > > > sitting in local memory on the server.
> > > >
> > > >> Using cursor approach (number of rows = 100K), I read 2 fields
> (Total
> > 40
> > > >> bytes per solr doc) from the Solr docs that satisfy the query. The
> > docs
> > > are sorted by "id" and then by those 2 fields.
> > > >>
> > > >> I am not able to understand why the heap memory is getting full and
> > Full
> > > >> GCs are consecutively running with long GC pauses (> 30 seconds). I
> am
> > > >> using CMS GC.
> > > >
> > > > A 20GB heap is quite large.  Do you actually need it to be that
> large?
> > > > If you graph JVM heap usage over a long period of time, what are the
> > low
> > > > points in the graph?
> > > >
> > > > A result containing 100K docs is going to be pretty large, even with
> a
> > > > limited number of fields.  It is likely to be several megabytes.  It
> > > > will need to be entirely built in the heap memory before it is sent
> to
> > > > the client -- both as Lucene data structures (which will probably be
> > > > much larger than the actual response due to Java overhead) and as the
> > > > actual response format.  Then it will be garbage as soon as the
> > response
> > > > is done.  Repeat this enough times, and you're going to go through
> even
> > > > a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
> > > > are slow.
> > > >
> > > > You could try switching to G1, as long as you realize that you're
> going
> > > > against advice from Lucene experts.... but honestly, I do not expect
> > > > this to really help, because you would probably still need full GCs
> due
> > > > to the rate that garbage is being created.  If you do try it, I would
> > > > strongly recommend the latest Java 8, either Oracle or OpenJDK.
> Here's
> > > > my wiki page where I discuss this:
> > > >
> > > > https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_
> > > First.29_Collector
> > > >
> > > > Reducing the heap size (which may not be possible -- need to know the
> > > > answer to the question about memory graphing) and reducing the number
> > of
> > > > rows per query are the only quick solutions I can think of.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > >
> > >
> >
>

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Mikhail Khludnev <gg...@gmail.com>.

And what is the rows parameter?

12 апр. 2017 г. 21:32 пользователь "Chetas Joshi" <ch...@gmail.com>
написал:

> Thanks for your response Shawn and Wunder.
>
> Hi Shawn,
>
> Here is the system config:
>
> Total system memory = 512 GB
> each server handles two 500 MB cores
> Number of solr docs per 500 MB core = 200 MM
>
> The average heap usage is around 4-6 GB. When the read starts using the
> Cursor approach, the heap usage starts increasing with the base of the
> sawtooth at 8 GB and then shooting up to 17 GB. Even after the full GC, the
> heap usage remains around 15 GB and then it comes down to 8 GB.
>
> With 100K docs, the requirement will be in MBs so it is strange it is
> jumping from 8 GB to 17 GB while preparing the sorted response.
>
> Thanks!
>
>
>
> On Tue, Apr 11, 2017 at 8:48 PM, Walter Underwood <wu...@wunderwood.org>
> wrote:
>
> > JVM version? We’re running v8 update 121 with the G1 collector and it is
> > working really well. We also have an 8GB heap.
> >
> > Graph your heap usage. You’ll see a sawtooth shape, where it grows, then
> > there is a major GC. The maximum of the base of the sawtooth is the
> working
> > set of heap that your Solr installation needs. Set the heap to that
> value,
> > plus a gigabyte or so. We run with a 2GB eden (new space) because so much
> > of Solr’s allocations have a lifetime of one request. So, the base of the
> > sawtooth, plus a gigabyte breathing room, plus two more for eden. That
> > should work.
> >
> > I don’t set all the ratios and stuff. When were running CMS, I set a size
> > for the heap and a size for the new space. Done. With G1, I don’t even
> get
> > that fussy.
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Apr 11, 2017, at 8:22 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> > >
> > > On 4/11/2017 2:56 PM, Chetas Joshi wrote:
> > >> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold
> collection
> > >> with number of shards = 80 and replication Factor=2
> > >>
> > >> Sold JVM heap size = 20 GB
> > >> solr.hdfs.blockcache.enabled = true
> > >> solr.hdfs.blockcache.direct.memory.allocation = true
> > >> MaxDirectMemorySize = 25 GB
> > >>
> > >> I am querying a solr collection with index size = 500 MB per core.
> > >
> > > I see that you and I have traded messages before on the list.
> > >
> > > How much total system memory is there per server?  How many of these
> > > 500MB cores are on each server?  How many docs are in a 500MB core?
> The
> > > answers to these questions may affect the other advice that I give you.
> > >
> > >> The off-heap (25 GB) is huge so that it can load the entire index.
> > >
> > > I still know very little about how HDFS handles caching and memory.
> You
> > > want to be sure that as much data as possible from your indexes is
> > > sitting in local memory on the server.
> > >
> > >> Using cursor approach (number of rows = 100K), I read 2 fields (Total
> 40
> > >> bytes per solr doc) from the Solr docs that satisfy the query. The
> docs
> > are sorted by "id" and then by those 2 fields.
> > >>
> > >> I am not able to understand why the heap memory is getting full and
> Full
> > >> GCs are consecutively running with long GC pauses (> 30 seconds). I am
> > >> using CMS GC.
> > >
> > > A 20GB heap is quite large.  Do you actually need it to be that large?
> > > If you graph JVM heap usage over a long period of time, what are the
> low
> > > points in the graph?
> > >
> > > A result containing 100K docs is going to be pretty large, even with a
> > > limited number of fields.  It is likely to be several megabytes.  It
> > > will need to be entirely built in the heap memory before it is sent to
> > > the client -- both as Lucene data structures (which will probably be
> > > much larger than the actual response due to Java overhead) and as the
> > > actual response format.  Then it will be garbage as soon as the
> response
> > > is done.  Repeat this enough times, and you're going to go through even
> > > a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
> > > are slow.
> > >
> > > You could try switching to G1, as long as you realize that you're going
> > > against advice from Lucene experts.... but honestly, I do not expect
> > > this to really help, because you would probably still need full GCs due
> > > to the rate that garbage is being created.  If you do try it, I would
> > > strongly recommend the latest Java 8, either Oracle or OpenJDK.  Here's
> > > my wiki page where I discuss this:
> > >
> > > https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_
> > First.29_Collector
> > >
> > > Reducing the heap size (which may not be possible -- need to know the
> > > answer to the question about memory graphing) and reducing the number
> of
> > > rows per query are the only quick solutions I can think of.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
> >
>

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Chetas Joshi <ch...@gmail.com>.

Thanks for your response Shawn and Wunder.

Hi Shawn,

Here is the system config:

Total system memory = 512 GB
each server handles two 500 MB cores
Number of solr docs per 500 MB core = 200 MM

The average heap usage is around 4-6 GB. When the read starts using the
Cursor approach, the heap usage starts increasing with the base of the
sawtooth at 8 GB and then shooting up to 17 GB. Even after the full GC, the
heap usage remains around 15 GB and then it comes down to 8 GB.

With 100K docs, the requirement will be in MBs so it is strange it is
jumping from 8 GB to 17 GB while preparing the sorted response.

Thanks!



On Tue, Apr 11, 2017 at 8:48 PM, Walter Underwood <wu...@wunderwood.org>
wrote:

> JVM version? We’re running v8 update 121 with the G1 collector and it is
> working really well. We also have an 8GB heap.
>
> Graph your heap usage. You’ll see a sawtooth shape, where it grows, then
> there is a major GC. The maximum of the base of the sawtooth is the working
> set of heap that your Solr installation needs. Set the heap to that value,
> plus a gigabyte or so. We run with a 2GB eden (new space) because so much
> of Solr’s allocations have a lifetime of one request. So, the base of the
> sawtooth, plus a gigabyte breathing room, plus two more for eden. That
> should work.
>
> I don’t set all the ratios and stuff. When were running CMS, I set a size
> for the heap and a size for the new space. Done. With G1, I don’t even get
> that fussy.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Apr 11, 2017, at 8:22 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> >
> > On 4/11/2017 2:56 PM, Chetas Joshi wrote:
> >> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
> >> with number of shards = 80 and replication Factor=2
> >>
> >> Sold JVM heap size = 20 GB
> >> solr.hdfs.blockcache.enabled = true
> >> solr.hdfs.blockcache.direct.memory.allocation = true
> >> MaxDirectMemorySize = 25 GB
> >>
> >> I am querying a solr collection with index size = 500 MB per core.
> >
> > I see that you and I have traded messages before on the list.
> >
> > How much total system memory is there per server?  How many of these
> > 500MB cores are on each server?  How many docs are in a 500MB core?  The
> > answers to these questions may affect the other advice that I give you.
> >
> >> The off-heap (25 GB) is huge so that it can load the entire index.
> >
> > I still know very little about how HDFS handles caching and memory.  You
> > want to be sure that as much data as possible from your indexes is
> > sitting in local memory on the server.
> >
> >> Using cursor approach (number of rows = 100K), I read 2 fields (Total 40
> >> bytes per solr doc) from the Solr docs that satisfy the query. The docs
> are sorted by "id" and then by those 2 fields.
> >>
> >> I am not able to understand why the heap memory is getting full and Full
> >> GCs are consecutively running with long GC pauses (> 30 seconds). I am
> >> using CMS GC.
> >
> > A 20GB heap is quite large.  Do you actually need it to be that large?
> > If you graph JVM heap usage over a long period of time, what are the low
> > points in the graph?
> >
> > A result containing 100K docs is going to be pretty large, even with a
> > limited number of fields.  It is likely to be several megabytes.  It
> > will need to be entirely built in the heap memory before it is sent to
> > the client -- both as Lucene data structures (which will probably be
> > much larger than the actual response due to Java overhead) and as the
> > actual response format.  Then it will be garbage as soon as the response
> > is done.  Repeat this enough times, and you're going to go through even
> > a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
> > are slow.
> >
> > You could try switching to G1, as long as you realize that you're going
> > against advice from Lucene experts.... but honestly, I do not expect
> > this to really help, because you would probably still need full GCs due
> > to the rate that garbage is being created.  If you do try it, I would
> > strongly recommend the latest Java 8, either Oracle or OpenJDK.  Here's
> > my wiki page where I discuss this:
> >
> > https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_
> First.29_Collector
> >
> > Reducing the heap size (which may not be possible -- need to know the
> > answer to the question about memory graphing) and reducing the number of
> > rows per query are the only quick solutions I can think of.
> >
> > Thanks,
> > Shawn
> >
>
>

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Walter Underwood <wu...@wunderwood.org>.

JVM version? We’re running v8 update 121 with the G1 collector and it is working really well. We also have an 8GB heap.

Graph your heap usage. You’ll see a sawtooth shape, where it grows, then there is a major GC. The maximum of the base of the sawtooth is the working set of heap that your Solr installation needs. Set the heap to that value, plus a gigabyte or so. We run with a 2GB eden (new space) because so much of Solr’s allocations have a lifetime of one request. So, the base of the sawtooth, plus a gigabyte breathing room, plus two more for eden. That should work.

I don’t set all the ratios and stuff. When were running CMS, I set a size for the heap and a size for the new space. Done. With G1, I don’t even get that fussy.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Apr 11, 2017, at 8:22 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> On 4/11/2017 2:56 PM, Chetas Joshi wrote:
>> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
>> with number of shards = 80 and replication Factor=2
>> 
>> Sold JVM heap size = 20 GB
>> solr.hdfs.blockcache.enabled = true
>> solr.hdfs.blockcache.direct.memory.allocation = true
>> MaxDirectMemorySize = 25 GB
>> 
>> I am querying a solr collection with index size = 500 MB per core.
> 
> I see that you and I have traded messages before on the list.
> 
> How much total system memory is there per server?  How many of these
> 500MB cores are on each server?  How many docs are in a 500MB core?  The
> answers to these questions may affect the other advice that I give you.
> 
>> The off-heap (25 GB) is huge so that it can load the entire index.
> 
> I still know very little about how HDFS handles caching and memory.  You
> want to be sure that as much data as possible from your indexes is
> sitting in local memory on the server.
> 
>> Using cursor approach (number of rows = 100K), I read 2 fields (Total 40
>> bytes per solr doc) from the Solr docs that satisfy the query. The docs are sorted by "id" and then by those 2 fields.
>> 
>> I am not able to understand why the heap memory is getting full and Full
>> GCs are consecutively running with long GC pauses (> 30 seconds). I am
>> using CMS GC.
> 
> A 20GB heap is quite large.  Do you actually need it to be that large? 
> If you graph JVM heap usage over a long period of time, what are the low
> points in the graph?
> 
> A result containing 100K docs is going to be pretty large, even with a
> limited number of fields.  It is likely to be several megabytes.  It
> will need to be entirely built in the heap memory before it is sent to
> the client -- both as Lucene data structures (which will probably be
> much larger than the actual response due to Java overhead) and as the
> actual response format.  Then it will be garbage as soon as the response
> is done.  Repeat this enough times, and you're going to go through even
> a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
> are slow.
> 
> You could try switching to G1, as long as you realize that you're going
> against advice from Lucene experts.... but honestly, I do not expect
> this to really help, because you would probably still need full GCs due
> to the rate that garbage is being created.  If you do try it, I would
> strongly recommend the latest Java 8, either Oracle or OpenJDK.  Here's
> my wiki page where I discuss this:
> 
> https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector
> 
> Reducing the heap size (which may not be possible -- need to know the
> answer to the question about memory graphing) and reducing the number of
> rows per query are the only quick solutions I can think of.
> 
> Thanks,
> Shawn
>

Re: Long GC pauses while reading Solr docs using Cursor approach

Posted by Shawn Heisey <ap...@elyograg.org>.

On 4/11/2017 2:56 PM, Chetas Joshi wrote:
> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
> with number of shards = 80 and replication Factor=2
>
> Sold JVM heap size = 20 GB
> solr.hdfs.blockcache.enabled = true
> solr.hdfs.blockcache.direct.memory.allocation = true
> MaxDirectMemorySize = 25 GB
>
> I am querying a solr collection with index size = 500 MB per core.

I see that you and I have traded messages before on the list.

How much total system memory is there per server?  How many of these
500MB cores are on each server?  How many docs are in a 500MB core?  The
answers to these questions may affect the other advice that I give you.

> The off-heap (25 GB) is huge so that it can load the entire index.

I still know very little about how HDFS handles caching and memory.  You
want to be sure that as much data as possible from your indexes is
sitting in local memory on the server.

> Using cursor approach (number of rows = 100K), I read 2 fields (Total 40
> bytes per solr doc) from the Solr docs that satisfy the query. The docs are sorted by "id" and then by those 2 fields.
>
> I am not able to understand why the heap memory is getting full and Full
> GCs are consecutively running with long GC pauses (> 30 seconds). I am
> using CMS GC.

A 20GB heap is quite large.  Do you actually need it to be that large? 
If you graph JVM heap usage over a long period of time, what are the low
points in the graph?

A result containing 100K docs is going to be pretty large, even with a
limited number of fields.  It is likely to be several megabytes.  It
will need to be entirely built in the heap memory before it is sent to
the client -- both as Lucene data structures (which will probably be
much larger than the actual response due to Java overhead) and as the
actual response format.  Then it will be garbage as soon as the response
is done.  Repeat this enough times, and you're going to go through even
a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
are slow.

You could try switching to G1, as long as you realize that you're going
against advice from Lucene experts.... but honestly, I do not expect
this to really help, because you would probably still need full GCs due
to the rate that garbage is being created.  If you do try it, I would
strongly recommend the latest Java 8, either Oracle or OpenJDK.  Here's
my wiki page where I discuss this:

https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector

Reducing the heap size (which may not be possible -- need to know the
answer to the question about memory graphing) and reducing the number of
rows per query are the only quick solutions I can think of.

Thanks,
Shawn