You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Whitman <br...@variogr.am> on 2007/02/24 00:02:05 UTC
lots of inserts very fast, out of heap or file descs
I'm trying to add lots of documents at once (hundreds of thousands)
in a loop. I don't need these docs to appear as results until I'm
done, though.
For a simple test, I call the post.sh script in a loop with the same
moderately sized xml file. This adds a 20K doc and then commits.
Repeat hundreds of thousands of times.
This works fine for a while, but eventually (only 10K docs in or so)
the Solr instance starts taking longer and longer to respond to my
<add>s (I print out the curl time, near the end it takes 10s an add)
and the web server (resin 3.0) eventually log dumps out with "out of
heap space" (my max heap is 1GB on a 4GB machine.)
I also see the "(Too many open files in system)" stacktrace coming
from Lucene's SegmentReader during this test. My fs.file-max was
361990, which bumped up to 2m, but I don't know how/why Solr/Lucene
would open that many.
My question is about best practices for this sort of "bulk add."
Since insert time is not a concern, I have some leeway. Should I
commit after every add? Should I optimize every so many commits? Is
there some reaper on a thread or timer that I should let breathe?
Re: lots of inserts very fast, out of heap or file descs
Posted by Brian Whitman <br...@variogr.am>.
On Feb 24, 2007, at 1:16 AM, Chris Hostetter wrote:
>
> Based on Brain's email, it sounds like it didn't work in *exactly* the
> same way, because it caused some filedescriptor leaks (and possibly
> some
> memory leaks)
> Hopefully Ryan will be a rock star and spot the probably
> immediately --
Thanks for the explanation Chris, i think I get it now and also why I
had that line in there to begin with. I'll let the rock star take
over but if anyone was up all night worrying if using the default
handler on /update (putting the /xml back in solrconfig) would solve
my heap problem-- it did. :)
With no other change but adding those four chars I am able to add
400K real world unique docs between 1-50K each one after another with
no fdesc leaks, heap problems or slowdowns. I have autocommit on for
10s / 10K docs. Famous last words, but all is well. Thanks Chris
Yonik Mike & Ryan!
-Brian
Re: lots of inserts very fast, out of heap or file descs
Posted by Ryan McKinley <ry...@gmail.com>.
>
> it sounds like we may have a very bad bug in the XmlUpdateRequestHandler
>
I haven't looked at this yet, but if i understand the description, it
would have to be a problem with the SolrDispatchFilter and/or the
SolrRequestParsers.
the part this *is* exactly the same is the XmlUpdateRequestHandler.
My guess is that is is something in the SolrRequestParser - but i
don't see anything obvious. I'll take a closer look sometime today.
ryan
Re: lots of inserts very fast, out of heap or file descs
Posted by Chris Hostetter <ho...@fucit.org>.
it sounds like we may have a very bad bug in the XmlUpdateRequestHandler
to clarify for people who may not know: the long standing "/update" URL
has historicaly been handled using a custom servlet, recently some of that
code was refactored into a RequestHandler along with a new Dispatcher
for RequestHandlers that works based on path mapping -- the goal being to
allow more customizable update processing and start accepting updates in a
variety of input formats ... if XmlUpdateRequestHandler is mapped to the
name "/update" it intercepts requests to the legacy update servlet, and
should have functioned exactly the same way.
Based on Brain's email, it sounds like it didn't work in *exactly* the
same way, because it caused some filedescriptor leaks (and possibly some
memory leaks)
Hopefully Ryan will be a rock star and spot the probably immediately --
but i'll try to look into it later this weekend.
: Date: Fri, 23 Feb 2007 22:33:10 -0500
: From: Brian Whitman <br...@variogr.am>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: lots of inserts very fast, out of heap or file descs
:
: On Feb 23, 2007, at 8:31 PM, Yonik Seeley wrote:
:
: >> -- it does not go down until I
: >> restart solr. This would be the cause of my too many files open
: >> problem. Turning off autocommit / not commiting after every add keeps
: >> this count steady at 100-200. The files are all of type:
: > [...]
: >> Bug or feature?
: >
: > If the searchers holding these index files open are still working,
: > then this is a problem, but not exactly a bug. If not, you may have
: > hit a new bug in searcher synchronization.
:
: It doesn't look like it. I hope I'm not getting a reputation on here
: for "discovering" bugs that seem to be my own fault, you'd all laugh
: if you knew how much time I wasted before posting about it this
: evening...
:
: But I just narrowed this down to a bad line in my solrconfig.xml.
:
: The one I was using said this for some reason :
:
: <requestHandler name="/update" class="solr.XmlUpdateRequestHandler"
:
: and the trunk version says this:
:
: <requestHandler name="/update/xml"
: class="solr.XmlUpdateRequestHandler"
:
: changing my line to the trunk line fixed the fdesc problem.
:
: The confounding thing to me is that the solr install worked fine
: otherwise. I don't know what would make removing the /xml path make a
: ton of files open but everything else work OK.
:
: If you want to reproduce it:
:
: 1) Download trunk/nightly
: 2) Change line 347 of example/solr/conf/solrconfig.xml to
: <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
: 3) java -jar start.jar...
: 3) Run post.sh a bunch of times on the same xml file... (in a shell
: script or whatever)
: 4) After a few seconds/minutes jetty will crash with "too many open
: files"
:
: Now, to see if this also caused my heap overflow problems. Thanks
: Mike and Yonik...
:
:
:
-Hoss
Re: lots of inserts very fast, out of heap or file descs
Posted by Brian Whitman <br...@variogr.am>.
On Feb 24, 2007, at 1:26 PM, gmail wrote:
>
> do you have a script/data that makes this happen?
all you've got to do is
apache-solr-nightly/example/exampledocs ryan$ while [ 0 -lt 1 ]; do ./
post.sh hd.xml; done
with the request handler pointing to /update. Use
# lsof | grep solr | wc -l
to watch the fdescs fly.
Tested on both Mac OS X and linux on the nightly package.
Re: lots of inserts very fast, out of heap or file descs
Posted by gmail <ry...@squid-labs.com>.
do you have a script/data that makes this happen?
I'm on a windows dev box - it does not get "too many open files" but
i'll figure it out.
ryan
Re: lots of inserts very fast, out of heap or file descs
Posted by Brian Whitman <br...@variogr.am>.
On Feb 23, 2007, at 8:31 PM, Yonik Seeley wrote:
>> -- it does not go down until I
>> restart solr. This would be the cause of my too many files open
>> problem. Turning off autocommit / not commiting after every add keeps
>> this count steady at 100-200. The files are all of type:
> [...]
>> Bug or feature?
>
> If the searchers holding these index files open are still working,
> then this is a problem, but not exactly a bug. If not, you may have
> hit a new bug in searcher synchronization.
It doesn't look like it. I hope I'm not getting a reputation on here
for "discovering" bugs that seem to be my own fault, you'd all laugh
if you knew how much time I wasted before posting about it this
evening...
But I just narrowed this down to a bad line in my solrconfig.xml.
The one I was using said this for some reason :
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler"
and the trunk version says this:
<requestHandler name="/update/xml"
class="solr.XmlUpdateRequestHandler"
changing my line to the trunk line fixed the fdesc problem.
The confounding thing to me is that the solr install worked fine
otherwise. I don't know what would make removing the /xml path make a
ton of files open but everything else work OK.
If you want to reproduce it:
1) Download trunk/nightly
2) Change line 347 of example/solr/conf/solrconfig.xml to
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
3) java -jar start.jar...
3) Run post.sh a bunch of times on the same xml file... (in a shell
script or whatever)
4) After a few seconds/minutes jetty will crash with "too many open
files"
Now, to see if this also caused my heap overflow problems. Thanks
Mike and Yonik...
Re: lots of inserts very fast, out of heap or file descs
Posted by Yonik Seeley <yo...@apache.org>.
On 2/23/07, Brian Whitman <br...@variogr.am> wrote:
> >
> > Try not committing so often (perhaps until you are done).
> > Don't use post.sh, or modify it to remove the commit.
> >
>
> OK, I modified it to not commit after and I also realized I had
> SOLR-126 (autocommit) on, which I disabled. Is there a rule of thumb
> on when to commit / optimize?
There is a map entry (UniqueKey->Integer) per document added/deleted,
and that's really the only in-memory state that is kept. So you
should be good with at least >100K docs.
> If either autoCommit is on, or I commit after every add, the number
> of open file descriptors from Lucene goes up way high and does not go
> back down.
Do you have any warming configured?
Too many searchers trying to initialize fieldcache entries can blow
out the memory, causing most of the CPU to be consumed by the garbage
collector.
> I just ran
>
> # lsof | grep solr | wc -l
>
> after adding 1000 docs and got 125265 open fdescs. If I stop adding
> docs this count does not go down
Is there detectable activity going on (like CPU usage)?
Does the admin page list all of these open searchers (check the
statistics page under "CORE")
> -- it does not go down until I
> restart solr. This would be the cause of my too many files open
> problem. Turning off autocommit / not commiting after every add keeps
> this count steady at 100-200. The files are all of type:
[...]
> Bug or feature?
If the searchers holding these index files open are still working,
then this is a problem, but not exactly a bug. If not, you may have
hit a new bug in searcher synchronization.
A workaround is to limit the number of warming searchers (see
maxWarmingSearchers in solrconfig.xml)
-Yonik
Re: lots of inserts very fast, out of heap or file descs
Posted by Brian Whitman <br...@variogr.am>.
>
> Try not committing so often (perhaps until you are done).
> Don't use post.sh, or modify it to remove the commit.
>
OK, I modified it to not commit after and I also realized I had
SOLR-126 (autocommit) on, which I disabled. Is there a rule of thumb
on when to commit / optimize?
> Part of the problem might be repeatedly inserting the same doc over
> and over again-- that is an odd pattern of deletes, which might be
> triggering a bad performance case on the lucene or solr side (I'm
> assuming the doc has a unique key).
Could be, but the same issue occurs on 400K unique docs. I made the
post.sh test case to see what exactly the issue is.
I've discovered something wonky with commits and open files...
If either autoCommit is on, or I commit after every add, the number
of open file descriptors from Lucene goes up way high and does not go
back down. I just ran
# lsof | grep solr | wc -l
after adding 1000 docs and got 125265 open fdescs. If I stop adding
docs this count does not go down -- it does not go down until I
restart solr. This would be the cause of my too many files open
problem. Turning off autocommit / not commiting after every add keeps
this count steady at 100-200. The files are all of type:
java 32254 bwhitman 3654r REG 8,1 12
15417767 /home/bwhitman/solr/working/data/index/_86u.nrm (deleted)
java 32254 bwhitman 3655r REG 8,1 42024
15417813 /home/bwhitman/solr/working/data/index/_86t.fdt (deleted)
java 32254 bwhitman 3656r REG 8,1 16
15417814 /home/bwhitman/solr/working/data/index/_86t.fdx (deleted)
java 32254 bwhitman 3657r REG 8,1 27420
15417817 /home/bwhitman/solr/working/data/index/_86t.tis (deleted)
java 32254 bwhitman 3658r REG 8,1 368
15417818 /home/bwhitman/solr/working/data/index/_86t.tii (deleted)
java 32254 bwhitman 3659r REG 8,1 7652
15417815 /home/bwhitman/solr/working/data/index/_86t.frq (deleted)
java 32254 bwhitman 3660r REG 8,1 24860
15417816 /home/bwhitman/solr/working/data/index/_86t.prx (deleted)
java 32254 bwhitman 3661r REG 8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java 32254 bwhitman 3662r REG 8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java 32254 bwhitman 3663r REG 8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java 32254 bwhitman 3664r REG 8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java 32254 bwhitman 3665r REG 8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java 32254 bwhitman 3666r REG 8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java 32254 bwhitman 3667r REG 8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java 32254 bwhitman 3668r REG 8,1 20
15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted)
java 32254 bwhitman 3669r REG 8,1 210120000
15417669 /home/bwhitman/solr/working/data/index/_85y.fdt
java 32254 bwhitman 3670r REG 8,1 80000
15417670 /home/bwhitman/solr/working/data/index/_85y.fdx
java 32254 bwhitman 3671r REG 8,1 46736
15417673 /home/bwhitman/solr/working/data/index/_85y.tis
java 32254 bwhitman 3672r REG 8,1 503
15417674 /home/bwhitman/solr/working/data/index/_85y.tii
java 32254 bwhitman 3673r REG 8,1 43936224
15417671 /home/bwhitman/solr/working/data/index/_85y.frq
java 32254 bwhitman 3674r REG 8,1 124300000
15417672 /home/bwhitman/solr/working/data/index/_85y.prx
java 32254 bwhitman 3675r REG 8,1 80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java 32254 bwhitman 3676r REG 8,1 80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java 32254 bwhitman 3677r REG 8,1 80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java 32254 bwhitman 3678r REG 8,1 80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java 32254 bwhitman 3679r REG 8,1 80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java 32254 bwhitman 3680r REG 8,1 80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java 32254 bwhitman 3681r REG 8,1 80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
java 32254 bwhitman 3682r REG 8,1 80004
15417675 /home/bwhitman/solr/working/data/index/_85y.nrm
Bug or feature?
Re: lots of inserts very fast, out of heap or file descs
Posted by Mike Klaas <mi...@gmail.com>.
On 2/23/07, Yonik Seeley <yo...@apache.org> wrote:
> On 2/23/07, Brian Whitman <br...@variogr.am> wrote:
> > I'm trying to add lots of documents at once (hundreds of thousands)
> > in a loop. I don't need these docs to appear as results until I'm
> > done, though.
> >
> > For a simple test, I call the post.sh script in a loop with the same
> > moderately sized xml file. This adds a 20K doc and then commits.
> > Repeat hundreds of thousands of times.
>
> Try not committing so often (perhaps until you are done).
> Don't use post.sh, or modify it to remove the commit.
Part of the problem might be repeatedly inserting the same doc over
and over again-- that is an odd pattern of deletes, which might be
triggering a bad performance case on the lucene or solr side (I'm
assuming the doc has a unique key).
-Mike
Re: lots of inserts very fast, out of heap or file descs
Posted by Yonik Seeley <yo...@apache.org>.
On 2/23/07, Brian Whitman <br...@variogr.am> wrote:
> I'm trying to add lots of documents at once (hundreds of thousands)
> in a loop. I don't need these docs to appear as results until I'm
> done, though.
>
> For a simple test, I call the post.sh script in a loop with the same
> moderately sized xml file. This adds a 20K doc and then commits.
> Repeat hundreds of thousands of times.
Try not committing so often (perhaps until you are done).
Don't use post.sh, or modify it to remove the commit.
-Yonik