You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by samarth s <sa...@gmail.com> on 2014/02/06 17:56:36 UTC

Commit Issue in Solr 3.4

Hi,

I have been using the solr version 3.4 in a project for about more than a
year. It is only now that I have started facing a weird problem of never
ending back to back commit cycles. I can say this looking at the InfoStream
logs, that, as soon as one commit cycle is done with another one almost
immediately spawns. My writer processes, which use solrj as the client, do
not get a chance to write even a single document between these commits. I
have waited for hours to let these commits take its own course and get over
in a natural way, but they dont. Finally, I had to restart the solr server.
Post that, my writers could get away with writing a few thousand docs,
after which the same infinite commit cycles start. Could not find any
related JIRA on this.

Size of index = 260 GB
Total Docs = 100mn
Usual writing speed = 50K per hour
autoCommit-maxDocs = 400,000
autoCommit-maxTime = 1500,000 (25 mins)
merge factor = 10

M/c memory = 30 GB, Xmx = 20 GB
Server - Jetty
OS - Cent OS 6


Please let me know if any other details are needed on the setup. Any help
is highly appreciated. Thanks.

-- 
Regards,
Samarth

Re: Commit Issue in Solr 3.4

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/8/2014 10:22 AM, Shawn Heisey wrote:
> Can you share your solrconfig.xml file?  I may be able to confirm a
> couple of things I suspect, and depending on what's there, may be able
> to offer some ideas to help a little bit.  It's best if you use a file
> sharing site like dropbox - the list doesn't deal with attachments very
> well.  Sometimes they work, but most of the time they don't.

One additional idea:  Unless you know through actual testing that you
really do need a 20GB heap, try reducing it.  You have 100 million
documents, so perhaps you really do need a heap that big.

In addition to the solrconfig.xml, I would also be interested in knowing
what memory/GC tuning options you've used to start your java instance,
and I'd like to see a sampling of typical and worst-case query
parameters or URL strings.  I'd need to see all parameters and know
which request handler you used, so I can cross-reference with the config.

Thanks,
Shawn


Re: Commit Issue in Solr 3.4

Posted by Roman Chyla <ro...@gmail.com>.
Thanks for the links. I think it would be worth getting more detailed info.
Because it could be the performance threshold, or it could be st else /such
as updated java version or st else, loosely related to ram, eg what is held
in memory before the commit, what is cached, leaked custom query objects
with holding to some big object etc/. Btw if i study the graph, i see that
there *are* warning signs. That's the point of testing/measuring after all,
IMHO.

--roman
On 8 Feb 2014 13:51, "Shawn Heisey" <so...@elyograg.org> wrote:

> On 2/8/2014 11:02 AM, Roman Chyla wrote:
> > I would be curious what the cause is. Samarth says that it worked for
> over
> > a year /and supposedly docs were being added all the time/. Did the index
> > grew considerably in the last period? Perhaps he could attach visualvm
> > while it is in the 'black hole' state to see what is actually going on. I
> > don't know if the instance is used also for searching, but if its only
> > indexing, maybe just shorter commit intervals would alleviate the
> problem.
> > To add context, our indexer is configured with 16gb heap, on machine with
> > 64gb ram, but busy one, so sometimes there is no cache to spare for os.
> The
> > index is 300gb (out of which 140gb stored values), and it is working just
> > 'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and
> > fetched from two databases, so the slowness is outside solr. I didnt see
> > big improvements with bigger heap, but I don't remember exact numbers.
> This
> > is solr4.
>
> For this discussion, refer to this image, or the Google Books link where
> I originally found it:
>
> https://dl.dropboxusercontent.com/u/97770508/performance-dropoff-graph.png
>
> http://books.google.com/books?id=dUiNGYCiWg0C&pg=PA33#v=onepage&q&f=false
>
> Computer systems have had a long history of performance curves like
> this.  Everything goes really well, possibly for a really long time,
> until you cross some threshold where a resource cannot keep up with the
> demands being placed on it.  That threshold is usually something you
> can't calculate in advance.  Once it is crossed, even by a tiny amount,
> performance drops VERY quickly.
>
> I do recommend that people closely analyze their GC characteristics, but
> jconsole, jvisualvm, and other tools like that are actually not very
> good at this task.  You can only get summary info -- how many GCs
> occurred and total amount of time spent doing GC, often with a useless
> granularity -- jconsole reports the time in minutes on a system that has
> been running for any length of time.
>
> I *was* having occasional super-long GC pauses (15 seconds or more), but
> I did not know it, even though I had religiously looked at GC info in
> jconsole and jstat.  I discovered the problem indirectly, and had to
> find additional tools to quantify it.  After discovering it, I tuned my
> garbage collection and have not had the problem since.
>
> If you have detailed GC logs enabled, this is a good free tool for
> offline analysis:
>
> https://code.google.com/p/gclogviewer/
>
> I have also had good results with this free tool, but it requires a
> little more work to set up:
>
> http://www.azulsystems.com/jHiccup
>
> Azul Systems has an alternate Java implementation for Linux that
> virtually eliminates GC pauses, but it isn't free.  I do not have any
> information about how much it costs.  We found our own solution, but for
> those who can throw money at the problem, I've heard good things about it.
>
> Thanks,
> Shawn
>
>

Re: Commit Issue in Solr 3.4

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/8/2014 11:02 AM, Roman Chyla wrote:
> I would be curious what the cause is. Samarth says that it worked for over
> a year /and supposedly docs were being added all the time/. Did the index
> grew considerably in the last period? Perhaps he could attach visualvm
> while it is in the 'black hole' state to see what is actually going on. I
> don't know if the instance is used also for searching, but if its only
> indexing, maybe just shorter commit intervals would alleviate the problem.
> To add context, our indexer is configured with 16gb heap, on machine with
> 64gb ram, but busy one, so sometimes there is no cache to spare for os. The
> index is 300gb (out of which 140gb stored values), and it is working just
> 'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and
> fetched from two databases, so the slowness is outside solr. I didnt see
> big improvements with bigger heap, but I don't remember exact numbers. This
> is solr4.

For this discussion, refer to this image, or the Google Books link where
I originally found it:

https://dl.dropboxusercontent.com/u/97770508/performance-dropoff-graph.png

http://books.google.com/books?id=dUiNGYCiWg0C&pg=PA33#v=onepage&q&f=false

Computer systems have had a long history of performance curves like
this.  Everything goes really well, possibly for a really long time,
until you cross some threshold where a resource cannot keep up with the
demands being placed on it.  That threshold is usually something you
can't calculate in advance.  Once it is crossed, even by a tiny amount,
performance drops VERY quickly.

I do recommend that people closely analyze their GC characteristics, but
jconsole, jvisualvm, and other tools like that are actually not very
good at this task.  You can only get summary info -- how many GCs
occurred and total amount of time spent doing GC, often with a useless
granularity -- jconsole reports the time in minutes on a system that has
been running for any length of time.

I *was* having occasional super-long GC pauses (15 seconds or more), but
I did not know it, even though I had religiously looked at GC info in
jconsole and jstat.  I discovered the problem indirectly, and had to
find additional tools to quantify it.  After discovering it, I tuned my
garbage collection and have not had the problem since.

If you have detailed GC logs enabled, this is a good free tool for
offline analysis:

https://code.google.com/p/gclogviewer/

I have also had good results with this free tool, but it requires a
little more work to set up:

http://www.azulsystems.com/jHiccup

Azul Systems has an alternate Java implementation for Linux that
virtually eliminates GC pauses, but it isn't free.  I do not have any
information about how much it costs.  We found our own solution, but for
those who can throw money at the problem, I've heard good things about it.

Thanks,
Shawn


Re: Commit Issue in Solr 3.4

Posted by Roman Chyla <ro...@gmail.com>.
I would be curious what the cause is. Samarth says that it worked for over
a year /and supposedly docs were being added all the time/. Did the index
grew considerably in the last period? Perhaps he could attach visualvm
while it is in the 'black hole' state to see what is actually going on. I
don't know if the instance is used also for searching, but if its only
indexing, maybe just shorter commit intervals would alleviate the problem.
To add context, our indexer is configured with 16gb heap, on machine with
64gb ram, but busy one, so sometimes there is no cache to spare for os. The
index is 300gb (out of which 140gb stored values), and it is working just
'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and
fetched from two databases, so the slowness is outside solr. I didnt see
big improvements with bigger heap, but I don't remember exact numbers. This
is solr4.

Roman
On 8 Feb 2014 12:23, "Shawn Heisey" <so...@elyograg.org> wrote:

> On 2/8/2014 1:40 AM, samarth s wrote:
> > Yes it is amazon ec2 indeed.
> >
> > To expqnd on that,
> > This solr deployment was working fine, handling the same load, on a 34 GB
> > instance on ebs storage for quite some time. To reduce the time taken by
> a
> > commit, I shifted this to a 30 GB SSD instance. It performed better in
> > writes and commits for sure. But, since the last week I started facing
> this
> > problem of infinite back to back commits. Not being able to resolve
> this, I
> > have finally switched back to a 34 GB machine with ebs storage, and now
> the
> > commits are working fine, though slow.
>
> The extra 4GB of RAM is almost guaranteed to be the difference.  If your
> index continues to grow, you'll probably be having problems very soon
> even with 34GB of RAM.  If you could put it on a box with 128 to 256GB
> of RAM, you'd likely see your performance increase dramatically.
>
> Can you share your solrconfig.xml file?  I may be able to confirm a
> couple of things I suspect, and depending on what's there, may be able
> to offer some ideas to help a little bit.  It's best if you use a file
> sharing site like dropbox - the list doesn't deal with attachments very
> well.  Sometimes they work, but most of the time they don't.
>
> I will reiterate my main point -- you really need a LOT more memory.
> Another option is to shard your index across multiple servers.  This
> doesn't actually reduce the TOTAL memory requirement, but it is
> sometimes easier to get management to agree to buy more servers than it
> is to get them to agree to buy really large servers.  It's a paradox
> that doesn't make any sense to me, but I've seen it over and over.
>
> Thanks,
> Shawn
>
>

Re: Commit Issue in Solr 3.4

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/8/2014 1:40 AM, samarth s wrote:
> Yes it is amazon ec2 indeed.
> 
> To expqnd on that,
> This solr deployment was working fine, handling the same load, on a 34 GB
> instance on ebs storage for quite some time. To reduce the time taken by a
> commit, I shifted this to a 30 GB SSD instance. It performed better in
> writes and commits for sure. But, since the last week I started facing this
> problem of infinite back to back commits. Not being able to resolve this, I
> have finally switched back to a 34 GB machine with ebs storage, and now the
> commits are working fine, though slow.

The extra 4GB of RAM is almost guaranteed to be the difference.  If your
index continues to grow, you'll probably be having problems very soon
even with 34GB of RAM.  If you could put it on a box with 128 to 256GB
of RAM, you'd likely see your performance increase dramatically.

Can you share your solrconfig.xml file?  I may be able to confirm a
couple of things I suspect, and depending on what's there, may be able
to offer some ideas to help a little bit.  It's best if you use a file
sharing site like dropbox - the list doesn't deal with attachments very
well.  Sometimes they work, but most of the time they don't.

I will reiterate my main point -- you really need a LOT more memory.
Another option is to shard your index across multiple servers.  This
doesn't actually reduce the TOTAL memory requirement, but it is
sometimes easier to get management to agree to buy more servers than it
is to get them to agree to buy really large servers.  It's a paradox
that doesn't make any sense to me, but I've seen it over and over.

Thanks,
Shawn


Re: Commit Issue in Solr 3.4

Posted by samarth s <sa...@gmail.com>.
Yes it is amazon ec2 indeed.

To expqnd on that,
This solr deployment was working fine, handling the same load, on a 34 GB
instance on ebs storage for quite some time. To reduce the time taken by a
commit, I shifted this to a 30 GB SSD instance. It performed better in
writes and commits for sure. But, since the last week I started facing this
problem of infinite back to back commits. Not being able to resolve this, I
have finally switched back to a 34 GB machine with ebs storage, and now the
commits are working fine, though slow.

Any thoughts?
On 6 Feb 2014 23:00, "Shawn Heisey" <so...@elyograg.org> wrote:

> On 2/6/2014 9:56 AM, samarth s wrote:
> > Size of index = 260 GB
> > Total Docs = 100mn
> > Usual writing speed = 50K per hour
> > autoCommit-maxDocs = 400,000
> > autoCommit-maxTime = 1500,000 (25 mins)
> > merge factor = 10
> >
> > M/c memory = 30 GB, Xmx = 20 GB
> > Server - Jetty
> > OS - Cent OS 6
>
> With 30GB of RAM (is it Amazon EC2, by chance?) and a 20GB heap, you
> have about 10GB of RAM left for caching your Solr index.  If that server
> has all 260GB of index, I am really surprised that you have only been
> having problems for a short time.  I would have expected problems from
> day one.  Even if it only has half or one quarter of the index, there is
> still a major discrepancy in RAM vs. index size.
>
> You either need more memory or you need to reduce the size of your
> index.  The size of the indexed portion generally has more of an impact
> on performance than the size of the stored portion, but they do both
> have an impact, especially on indexing and committing.  With regular
> disks, it's best to have at least 50% of your index size available to
> the OS disk cache, but 100% is better.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
>
> If you are already using SSD, you might think there can't be
> memory-related performance problems ... but you still need a pretty
> significant chunk of disk cache.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#SSD
>
> Thanks,
> Shawn
>
>

Re: Commit Issue in Solr 3.4

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/6/2014 9:56 AM, samarth s wrote:
> Size of index = 260 GB
> Total Docs = 100mn
> Usual writing speed = 50K per hour
> autoCommit-maxDocs = 400,000
> autoCommit-maxTime = 1500,000 (25 mins)
> merge factor = 10
> 
> M/c memory = 30 GB, Xmx = 20 GB
> Server - Jetty
> OS - Cent OS 6

With 30GB of RAM (is it Amazon EC2, by chance?) and a 20GB heap, you
have about 10GB of RAM left for caching your Solr index.  If that server
has all 260GB of index, I am really surprised that you have only been
having problems for a short time.  I would have expected problems from
day one.  Even if it only has half or one quarter of the index, there is
still a major discrepancy in RAM vs. index size.

You either need more memory or you need to reduce the size of your
index.  The size of the indexed portion generally has more of an impact
on performance than the size of the stored portion, but they do both
have an impact, especially on indexing and committing.  With regular
disks, it's best to have at least 50% of your index size available to
the OS disk cache, but 100% is better.

http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

If you are already using SSD, you might think there can't be
memory-related performance problems ... but you still need a pretty
significant chunk of disk cache.

https://wiki.apache.org/solr/SolrPerformanceProblems#SSD

Thanks,
Shawn