You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joseph Hagerty <jo...@gmail.com> on 2014/01/30 23:20:27 UTC

JVM heap constraints and garbage collection

Greetings esteemed Solr-ites,

I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.

Since my average load during peak hours is becoming quite high, and since
I'm finally starting to notice a little bit of performance degradation and
intermittent errors (e.g. "Solr returned response 0" on perfectly valid
reads during load spikes), I think it's time to tune my Slave box before
things get out of control.

In particular, *I am curious how others are tuning their JVM heap
constraints (xms, xms, etc.) and garbage collection (parallel or
concurrent) to meet the needs of Solr*. I am using the Sun JVM Version 6,
not the fancy third party offerings.

Some more info, FWIW:

- Average document size in my index is probably around 6k
- Using CentOS
- Master-Slave setup. Master gets all the writes, Slave gets all the read
requests. It is the *Slave* that is suffering-- the Master seems fine.
- The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM
- DaemonThreads skyrocket during the aforementioned load spikes

Thanks for reading, and to the devs: thanks for an excellent product.

-- 
- Joe

RE: JVM heap constraints and garbage collection

Posted by Michael Della Bitta <mi...@appinions.com>.
> i2.xlarge looks vastly better than m2.2xlarge at about the same price, so
I must be missing something: Is it the 120 IPs that explains why anyone
would choose m2.2xlarge?

i2.xlarge is a relatively new instance type (December 2013). In our case,
we're partway through a yearlong reservation of m2.2xlarges and won't be up
for reconsidering that for a few months. I don't think that Amazon has ever
dropped a legacy instance type, so there's bound to be some overlap as they
roll out new ones. And I imagine someone setting up a huge memcached pool
might rather have the extra RAM over the SSD, so it still makes sense for
the m2.2xlarge to be around.

It can be kind of hard to understand how the various parameters that make
up an instance type get decided on, though. I have to consult that
ec2instances.info link all the time to make sure I'm not missing something
regarding what types we should be using.


On Feb 1, 2014 1:51 PM, "Toke Eskildsen" <te...@statsbiblioteket.dk> wrote:

> Michael Della Bitta [michael.della.bitta@appinions.com] wrote:
> > Here at Appinions, we use mostly m2.2xlarges, but the new i2.xlarges look
> > pretty tasty primarily because of the SSD, and I'll probably push for a
> > switch to those when our reservations run out.
>
> > http://www.ec2instances.info/
>
> i2.xlarge looks vastly better than m2.2xlarge at about the same price, so
> I must be missing something: Is it the 120 IPs that explains why anyone
> would choose m2.2xlarge?
>
> Anyhow, it is good to see that Amazon now has 11 different setups with
> SSD. The IOPS looks solid at around 40K/s (estimated) for the i2.xlarge and
> they even have TRIM (
> http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance/).
>
> - Toke Eskildsen

RE: JVM heap constraints and garbage collection

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Michael Della Bitta [michael.della.bitta@appinions.com] wrote:
> Here at Appinions, we use mostly m2.2xlarges, but the new i2.xlarges look
> pretty tasty primarily because of the SSD, and I'll probably push for a
> switch to those when our reservations run out.

> http://www.ec2instances.info/

i2.xlarge looks vastly better than m2.2xlarge at about the same price, so I must be missing something: Is it the 120 IPs that explains why anyone would choose m2.2xlarge?

Anyhow, it is good to see that Amazon now has 11 different setups with SSD. The IOPS looks solid at around 40K/s (estimated) for the i2.xlarge and they even have TRIM ( http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance/ ). 

- Toke Eskildsen

Re: JVM heap constraints and garbage collection

Posted by Michael Della Bitta <mi...@appinions.com>.
Here at Appinions, we use mostly m2.2xlarges, but the new i2.xlarges look
pretty tasty primarily because of the SSD, and I'll probably push for a
switch to those when our reservations run out.

http://www.ec2instances.info/

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Thu, Jan 30, 2014 at 7:43 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 1/30/2014 3:20 PM, Joseph Hagerty wrote:
>
>> I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.
>>
>
> <snip>
>
>
>  - The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM
>>
>
> One detail that you did not provide was how much of your 7.5GB RAM you are
> allocating to the Java heap for Solr, but I actually don't think I need
> that information, because for your index size, you simply don't have
> enough. If you're sticking with Amazon, you'll want one of the instances
> with at least 30GB of RAM, and you might want to consider more memory than
> that.
>
> An ideal RAM size for Solr is equal to the size of on-disk data plus the
> heap space used by Solr and other programs.  This means that if your java
> heap for Solr is 4GB and there are no other significant programs running on
> the same server, you'd want a minimum of 34GB of RAM for an ideal setup
> with your index.  4GB of that would be for Solr itself, the remainder would
> be for the operating system to fully cache your index in the OS disk cache.
>
> Depending on your query patterns and how your schema is arranged, you
> *might* be able to get away as little as half of your index size just for
> the OS disk cache, but it's better to make it big enough for the whole
> index, plus room for growth.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Many people are *shocked* when they are told this information, but if you
> think about the relative speeds of getting a chunk of data from a hard disk
> vs. getting the same information from memory, it's not all that shocking.
>
> Thanks,
> Shawn
>
>

Re: JVM heap constraints and garbage collection

Posted by Erick Erickson <er...@gmail.com>.
Be a little careful when looking at on-disk index sizes.
The *.fdt and *.fdx files are pretty irrelevant for the in-memory
requirements. They are just read to assemble the response (usually
10-20 docs). That said, you can _make_ them more relevant by
specifying very large document cache sizes.

Best,
Erick

On Fri, Jan 31, 2014 at 9:49 AM, Michael Della Bitta
<mi...@appinions.com> wrote:
> Joesph:
>
> Not so much after using some of the settings available on Shawn's Solr Wiki
> page: https://wiki.apache.org/solr/ShawnHeisey
>
> This is what we're running with right now:
>
> -Xmx6g
> -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=80
>
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> "The Science of Influence Marketing"
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
> w: appinions.com <http://www.appinions.com/>
>
>
> On Fri, Jan 31, 2014 at 10:58 AM, Joseph Hagerty <jo...@gmail.com> wrote:
>
>> Thanks, Shawn. This information is actually not all that shocking to me.
>> It's always been in the back of my mind that I was "getting away with
>> something" in serving from the m1.large. Remarkably, however, it has served
>> me well for nearly two years; also, although the index has not always been
>> 30GB, it has always been much larger than the RAM on the box. As you
>> suggested, I can only suppose that usage patterns and the index schema have
>> in some way facilitated minimal heap usage, up to this point.
>>
>> For now, we're going to increase the heap size on the instance and see
>> where that gets us; if it still doesn't suffice for now, then we'll upgrade
>> to a more powerful instance.
>>
>> Michael, thanks for weighing in. Those i2 instances look delicious indeed.
>> Just curious -- have you struggled with garbage collection pausing at all?
>>
>>
>>
>> On Thu, Jan 30, 2014 at 7:43 PM, Shawn Heisey <so...@elyograg.org> wrote:
>>
>> > On 1/30/2014 3:20 PM, Joseph Hagerty wrote:
>> >
>> >> I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.
>> >>
>> >
>> > <snip>
>> >
>> >
>> >  - The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM
>> >>
>> >
>> > One detail that you did not provide was how much of your 7.5GB RAM you
>> are
>> > allocating to the Java heap for Solr, but I actually don't think I need
>> > that information, because for your index size, you simply don't have
>> > enough. If you're sticking with Amazon, you'll want one of the instances
>> > with at least 30GB of RAM, and you might want to consider more memory
>> than
>> > that.
>> >
>> > An ideal RAM size for Solr is equal to the size of on-disk data plus the
>> > heap space used by Solr and other programs.  This means that if your java
>> > heap for Solr is 4GB and there are no other significant programs running
>> on
>> > the same server, you'd want a minimum of 34GB of RAM for an ideal setup
>> > with your index.  4GB of that would be for Solr itself, the remainder
>> would
>> > be for the operating system to fully cache your index in the OS disk
>> cache.
>> >
>> > Depending on your query patterns and how your schema is arranged, you
>> > *might* be able to get away as little as half of your index size just for
>> > the OS disk cache, but it's better to make it big enough for the whole
>> > index, plus room for growth.
>> >
>> > http://wiki.apache.org/solr/SolrPerformanceProblems
>> >
>> > Many people are *shocked* when they are told this information, but if you
>> > think about the relative speeds of getting a chunk of data from a hard
>> disk
>> > vs. getting the same information from memory, it's not all that shocking.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>>
>> --
>> - Joe
>>

Re: JVM heap constraints and garbage collection

Posted by Michael Della Bitta <mi...@appinions.com>.
Joesph:

Not so much after using some of the settings available on Shawn's Solr Wiki
page: https://wiki.apache.org/solr/ShawnHeisey

This is what we're running with right now:

-Xmx6g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=80



Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Fri, Jan 31, 2014 at 10:58 AM, Joseph Hagerty <jo...@gmail.com> wrote:

> Thanks, Shawn. This information is actually not all that shocking to me.
> It's always been in the back of my mind that I was "getting away with
> something" in serving from the m1.large. Remarkably, however, it has served
> me well for nearly two years; also, although the index has not always been
> 30GB, it has always been much larger than the RAM on the box. As you
> suggested, I can only suppose that usage patterns and the index schema have
> in some way facilitated minimal heap usage, up to this point.
>
> For now, we're going to increase the heap size on the instance and see
> where that gets us; if it still doesn't suffice for now, then we'll upgrade
> to a more powerful instance.
>
> Michael, thanks for weighing in. Those i2 instances look delicious indeed.
> Just curious -- have you struggled with garbage collection pausing at all?
>
>
>
> On Thu, Jan 30, 2014 at 7:43 PM, Shawn Heisey <so...@elyograg.org> wrote:
>
> > On 1/30/2014 3:20 PM, Joseph Hagerty wrote:
> >
> >> I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.
> >>
> >
> > <snip>
> >
> >
> >  - The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM
> >>
> >
> > One detail that you did not provide was how much of your 7.5GB RAM you
> are
> > allocating to the Java heap for Solr, but I actually don't think I need
> > that information, because for your index size, you simply don't have
> > enough. If you're sticking with Amazon, you'll want one of the instances
> > with at least 30GB of RAM, and you might want to consider more memory
> than
> > that.
> >
> > An ideal RAM size for Solr is equal to the size of on-disk data plus the
> > heap space used by Solr and other programs.  This means that if your java
> > heap for Solr is 4GB and there are no other significant programs running
> on
> > the same server, you'd want a minimum of 34GB of RAM for an ideal setup
> > with your index.  4GB of that would be for Solr itself, the remainder
> would
> > be for the operating system to fully cache your index in the OS disk
> cache.
> >
> > Depending on your query patterns and how your schema is arranged, you
> > *might* be able to get away as little as half of your index size just for
> > the OS disk cache, but it's better to make it big enough for the whole
> > index, plus room for growth.
> >
> > http://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Many people are *shocked* when they are told this information, but if you
> > think about the relative speeds of getting a chunk of data from a hard
> disk
> > vs. getting the same information from memory, it's not all that shocking.
> >
> > Thanks,
> > Shawn
> >
> >
>
>
> --
> - Joe
>

Re: JVM heap constraints and garbage collection

Posted by Joseph Hagerty <jo...@gmail.com>.
Thanks, Shawn. This information is actually not all that shocking to me.
It's always been in the back of my mind that I was "getting away with
something" in serving from the m1.large. Remarkably, however, it has served
me well for nearly two years; also, although the index has not always been
30GB, it has always been much larger than the RAM on the box. As you
suggested, I can only suppose that usage patterns and the index schema have
in some way facilitated minimal heap usage, up to this point.

For now, we're going to increase the heap size on the instance and see
where that gets us; if it still doesn't suffice for now, then we'll upgrade
to a more powerful instance.

Michael, thanks for weighing in. Those i2 instances look delicious indeed.
Just curious -- have you struggled with garbage collection pausing at all?



On Thu, Jan 30, 2014 at 7:43 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 1/30/2014 3:20 PM, Joseph Hagerty wrote:
>
>> I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.
>>
>
> <snip>
>
>
>  - The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM
>>
>
> One detail that you did not provide was how much of your 7.5GB RAM you are
> allocating to the Java heap for Solr, but I actually don't think I need
> that information, because for your index size, you simply don't have
> enough. If you're sticking with Amazon, you'll want one of the instances
> with at least 30GB of RAM, and you might want to consider more memory than
> that.
>
> An ideal RAM size for Solr is equal to the size of on-disk data plus the
> heap space used by Solr and other programs.  This means that if your java
> heap for Solr is 4GB and there are no other significant programs running on
> the same server, you'd want a minimum of 34GB of RAM for an ideal setup
> with your index.  4GB of that would be for Solr itself, the remainder would
> be for the operating system to fully cache your index in the OS disk cache.
>
> Depending on your query patterns and how your schema is arranged, you
> *might* be able to get away as little as half of your index size just for
> the OS disk cache, but it's better to make it big enough for the whole
> index, plus room for growth.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Many people are *shocked* when they are told this information, but if you
> think about the relative speeds of getting a chunk of data from a hard disk
> vs. getting the same information from memory, it's not all that shocking.
>
> Thanks,
> Shawn
>
>


-- 
- Joe

Re: JVM heap constraints and garbage collection

Posted by Shawn Heisey <so...@elyograg.org>.
On 1/30/2014 3:20 PM, Joseph Hagerty wrote:
> I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.

<snip>

> - The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM

One detail that you did not provide was how much of your 7.5GB RAM you 
are allocating to the Java heap for Solr, but I actually don't think I 
need that information, because for your index size, you simply don't 
have enough. If you're sticking with Amazon, you'll want one of the 
instances with at least 30GB of RAM, and you might want to consider more 
memory than that.

An ideal RAM size for Solr is equal to the size of on-disk data plus the 
heap space used by Solr and other programs.  This means that if your 
java heap for Solr is 4GB and there are no other significant programs 
running on the same server, you'd want a minimum of 34GB of RAM for an 
ideal setup with your index.  4GB of that would be for Solr itself, the 
remainder would be for the operating system to fully cache your index in 
the OS disk cache.

Depending on your query patterns and how your schema is arranged, you 
*might* be able to get away as little as half of your index size just 
for the OS disk cache, but it's better to make it big enough for the 
whole index, plus room for growth.

http://wiki.apache.org/solr/SolrPerformanceProblems

Many people are *shocked* when they are told this information, but if 
you think about the relative speeds of getting a chunk of data from a 
hard disk vs. getting the same information from memory, it's not all 
that shocking.

Thanks,
Shawn