You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Frederico Azeiteiro <Fr...@cision.com> on 2010/01/12 14:03:55 UTC

Problem comitting on 40GB index

Hi all,

I started working with solr about 1 month ago, and everything was
running well both indexing as searching documents.

I have a 40GB index with about 10 000 000 documents available. I index
3k docs for each 10m and commit after each insert.

Since yesterday, I can't commit no articles to index. I manage to search
ok, and index documents without commiting. But when I start the commit
is takes a long time and eats all of the available disk space
left(60GB). The commit eventually stops with full disk and I have to
restart SOLR and get the 60GB returned to system.

Before this, the commit was taking a few seconds to complete.

Can someone help to debug the problem? Where should I start? Should I
try to copy the index to other machine with more free space and try to
commit? Should I try an optimize?

Log for the last commit I tried:

INFO: start
commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=f
alse)
(Then, after a long time...)
Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
No space left on device
	at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Co
ncurrentMergeScheduler.java:351)
	at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurr
entMergeScheduler.java:315)
Caused by: java.io.IOException: No space left on device

I'm using Ubuntu 9.04 and Solr 1.4.0.

Thanks in advance,

Frederico

RE: Problem comitting on 40GB index

Posted by Sven Maurmann <sv...@kippdata.de>.
Hi!

Garbage collection is an issue of the underlying JVM. You may use
–XX:+PrintGCDetails as an argument to your JVM in order to collect
details of the garbage collection. If you also use the parameter
–XX:+PrintGCTimeStamps you get the time stamps of the garbage
collection.

For further information you may want to refer to the paper

<http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf>

which points you to a few other utilities related to GC.

Best,

Sven Maurmann

--On Mittwoch, 13. Januar 2010 18:03 +0000 Frederico Azeiteiro 
<Fr...@cision.com> wrote:

> The hanging didn't happen again since yesterday. I never run out of space
> again. This is still a dev environment, so the number of searches is very
> low. Maybe I'm just lucky...
>
> Where can I see the garbage collection info?
>
> -----Original Message-----
> From: Marc Des Garets [mailto:marc.desgarets@192.com]
> Sent: quarta-feira, 13 de Janeiro de 2010 17:20
> To: solr-user@lucene.apache.org
> Subject: RE: Problem comitting on 40GB index
>
> Just curious, have you checked if the hanging you are experiencing is not
> garbage collection related?
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 13 January 2010 13:33
> To: solr-user@lucene.apache.org
> Subject: Re: Problem comitting on 40GB index
>
> That's my understanding...... But fortunately disk space is cheap <G>....
>
>
> On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro <
> Frederico.Azeiteiro@cision.com> wrote:
>
>> Sorry, my bad... I replied to a current mailing list message only
>> changing the subject... Didn't know about this " Hijacking" problem.
>> Will not happen again.
>>
>> Just for close this issue, if I understand correctly, for an index of
>> 40G, I will need, for running an optimize:
>> - 40G if all activity on index is stopped
>> - 80G if index is being searched...)
>> - 120G if index is being searched and if a commit is performed.
>>
>> Is this correct?
>>
>> Thanks.
>> Frederico
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: terça-feira, 12 de Janeiro de 2010 19:18
>> To: solr-user@lucene.apache.org
>> Subject: Re: Problem comitting on 40GB index
>>
>> Huh?
>>
>> On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter
>> <ho...@fucit.org>wrote:
>>
>> >
>> > : Subject: Problem comitting on 40GB index
>> > : In-Reply-To: <
>> > 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98ac6c@mail.gmail.com>
>> >
>> > http://people.apache.org/~hossman/#threadhijack
>> > Thread Hijacking on Mailing Lists
>> >
>> > When starting a new discussion on a mailing list, please do not reply
>> > to an existing message, instead start a fresh email.  Even if you
>> > change the subject line of your email, other mail headers still track
>> > which thread you replied to and your question is "hidden" in that
>> > thread and gets less attention.   It makes following discussions in
>> > the mailing list archives particularly difficult.
>> > See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
>> >
>> >
>> >
>> > -Hoss
>> >
>> >
>>
> ----------------------------------------------------------
> This transmission is strictly confidential, possibly legally privileged,
> and intended solely for the  addressee.  Any views or opinions expressed
> within it are those of the author and do not necessarily  represent those
> of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies.
> If you  are not the intended recipient then you must not disclose, copy
> or take any action in reliance of this  transmission. If you have
> received this transmission in error, please notify the sender as soon as
> possible.  No employee or agent is authorised to conclude any binding
> agreement on behalf of  i-CD Publishing (UK) Ltd with another party by
> email without express written confirmation by an  authorised employee of
> the Company. http://www.192.com (Tel: 08000 192 192).  i-CD Publishing
> (UK) Ltd  is incorporated in England and Wales, company number 3148549,
> VAT No. GB 673128728.
> 

RE: Problem comitting on 40GB index

Posted by Frederico Azeiteiro <Fr...@cision.com>.
The hanging didn't happen again since yesterday. I never run out of space again. This is still a dev environment, so the number of searches is very low. Maybe I'm just lucky...

Where can I see the garbage collection info?

-----Original Message----- 
From: Marc Des Garets [mailto:marc.desgarets@192.com] 
Sent: quarta-feira, 13 de Janeiro de 2010 17:20
To: solr-user@lucene.apache.org
Subject: RE: Problem comitting on 40GB index

Just curious, have you checked if the hanging you are experiencing is not garbage collection related?

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 13 January 2010 13:33
To: solr-user@lucene.apache.org
Subject: Re: Problem comitting on 40GB index

That's my understanding...... But fortunately disk space is cheap <G>....


On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Sorry, my bad... I replied to a current mailing list message only changing
> the subject... Didn't know about this " Hijacking" problem. Will not happen
> again.
>
> Just for close this issue, if I understand correctly, for an index of 40G,
> I will need, for running an optimize:
> - 40G if all activity on index is stopped
> - 80G if index is being searched...)
> - 120G if index is being searched and if a commit is performed.
>
> Is this correct?
>
> Thanks.
> Frederico
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: terça-feira, 12 de Janeiro de 2010 19:18
> To: solr-user@lucene.apache.org
> Subject: Re: Problem comitting on 40GB index
>
> Huh?
>
> On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter
> <ho...@fucit.org>wrote:
>
> >
> > : Subject: Problem comitting on 40GB index
> > : In-Reply-To: <
> > 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98ac6c@mail.gmail.com>
> >
> > http://people.apache.org/~hossman/#threadhijack
> > Thread Hijacking on Mailing Lists
> >
> > When starting a new discussion on a mailing list, please do not reply to
> > an existing message, instead start a fresh email.  Even if you change the
> > subject line of your email, other mail headers still track which thread
> > you replied to and your question is "hidden" in that thread and gets less
> > attention.   It makes following discussions in the mailing list archives
> > particularly difficult.
> > See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
> >
> >
> >
> > -Hoss
> >
> >
>
----------------------------------------------------------
This transmission is strictly confidential, possibly legally privileged, and intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies.  If you 
are not the intended recipient then you must not disclose, copy or take any action in reliance of this 
transmission. If you have received this transmission in error, please notify the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.

RE: Problem comitting on 40GB index

Posted by Marc Des Garets <ma...@192.com>.
Just curious, have you checked if the hanging you are experiencing is not garbage collection related?

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 13 January 2010 13:33
To: solr-user@lucene.apache.org
Subject: Re: Problem comitting on 40GB index

That's my understanding...... But fortunately disk space is cheap <G>....


On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Sorry, my bad... I replied to a current mailing list message only changing
> the subject... Didn't know about this " Hijacking" problem. Will not happen
> again.
>
> Just for close this issue, if I understand correctly, for an index of 40G,
> I will need, for running an optimize:
> - 40G if all activity on index is stopped
> - 80G if index is being searched...)
> - 120G if index is being searched and if a commit is performed.
>
> Is this correct?
>
> Thanks.
> Frederico
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: terça-feira, 12 de Janeiro de 2010 19:18
> To: solr-user@lucene.apache.org
> Subject: Re: Problem comitting on 40GB index
>
> Huh?
>
> On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter
> <ho...@fucit.org>wrote:
>
> >
> > : Subject: Problem comitting on 40GB index
> > : In-Reply-To: <
> > 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98ac6c@mail.gmail.com>
> >
> > http://people.apache.org/~hossman/#threadhijack
> > Thread Hijacking on Mailing Lists
> >
> > When starting a new discussion on a mailing list, please do not reply to
> > an existing message, instead start a fresh email.  Even if you change the
> > subject line of your email, other mail headers still track which thread
> > you replied to and your question is "hidden" in that thread and gets less
> > attention.   It makes following discussions in the mailing list archives
> > particularly difficult.
> > See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
> >
> >
> >
> > -Hoss
> >
> >
>
----------------------------------------------------------
This transmission is strictly confidential, possibly legally privileged, and intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies.  If you 
are not the intended recipient then you must not disclose, copy or take any action in reliance of this 
transmission. If you have received this transmission in error, please notify the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.

Re: Problem comitting on 40GB index

Posted by Erick Erickson <er...@gmail.com>.
That's my understanding...... But fortunately disk space is cheap <G>....


On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Sorry, my bad... I replied to a current mailing list message only changing
> the subject... Didn't know about this " Hijacking" problem. Will not happen
> again.
>
> Just for close this issue, if I understand correctly, for an index of 40G,
> I will need, for running an optimize:
> - 40G if all activity on index is stopped
> - 80G if index is being searched...)
> - 120G if index is being searched and if a commit is performed.
>
> Is this correct?
>
> Thanks.
> Frederico
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: terça-feira, 12 de Janeiro de 2010 19:18
> To: solr-user@lucene.apache.org
> Subject: Re: Problem comitting on 40GB index
>
> Huh?
>
> On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter
> <ho...@fucit.org>wrote:
>
> >
> > : Subject: Problem comitting on 40GB index
> > : In-Reply-To: <
> > 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98ac6c@mail.gmail.com>
> >
> > http://people.apache.org/~hossman/#threadhijack
> > Thread Hijacking on Mailing Lists
> >
> > When starting a new discussion on a mailing list, please do not reply to
> > an existing message, instead start a fresh email.  Even if you change the
> > subject line of your email, other mail headers still track which thread
> > you replied to and your question is "hidden" in that thread and gets less
> > attention.   It makes following discussions in the mailing list archives
> > particularly difficult.
> > See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
> >
> >
> >
> > -Hoss
> >
> >
>

RE: Problem comitting on 40GB index

Posted by Frederico Azeiteiro <Fr...@cision.com>.
Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this " Hijacking" problem. Will not happen again.

Just for close this issue, if I understand correctly, for an index of 40G, I will need, for running an optimize:
- 40G if all activity on index is stopped
- 80G if index is being searched...)
- 120G if index is being searched and if a commit is performed.

Is this correct?

Thanks.
Frederico
-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: terça-feira, 12 de Janeiro de 2010 19:18
To: solr-user@lucene.apache.org
Subject: Re: Problem comitting on 40GB index

Huh?

On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : Subject: Problem comitting on 40GB index
> : In-Reply-To: <
> 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98ac6c@mail.gmail.com>
>
> http://people.apache.org/~hossman/#threadhijack
> Thread Hijacking on Mailing Lists
>
> When starting a new discussion on a mailing list, please do not reply to
> an existing message, instead start a fresh email.  Even if you change the
> subject line of your email, other mail headers still track which thread
> you replied to and your question is "hidden" in that thread and gets less
> attention.   It makes following discussions in the mailing list archives
> particularly difficult.
> See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
>
>
>
> -Hoss
>
>

Re: Problem comitting on 40GB index

Posted by Erick Erickson <er...@gmail.com>.
Huh?

On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : Subject: Problem comitting on 40GB index
> : In-Reply-To: <
> 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98ac6c@mail.gmail.com>
>
> http://people.apache.org/~hossman/#threadhijack
> Thread Hijacking on Mailing Lists
>
> When starting a new discussion on a mailing list, please do not reply to
> an existing message, instead start a fresh email.  Even if you change the
> subject line of your email, other mail headers still track which thread
> you replied to and your question is "hidden" in that thread and gets less
> attention.   It makes following discussions in the mailing list archives
> particularly difficult.
> See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
>
>
>
> -Hoss
>
>

Re: Problem comitting on 40GB index

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Problem comitting on 40GB index
: In-Reply-To: <7a...@mail.gmail.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss


Re: Problem comitting on 40GB index

Posted by Erick Erickson <er...@gmail.com>.
You'll be able to get some valuable info by monitoring your free space on
disk.

If this occurs again, it'd help if you posted your your SOLR
configuration and told us about any warmups you're doing...

Of course, there are always gremlins...

On Tue, Jan 12, 2010 at 12:36 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> I restarted the solr and stopped all searches. After that, the commit() was
> normal (2 secs) and it's been working for 3h without problems (indexing and
> a few searches too)... I haven't done any optimize yet, mainly because I had
> no deletes on the index and the performance is ok, so no need to optimize I
> think..
>
> I had tried this procedure a few times in the morning and the commit always
> hanged so.. I have no explanation for it start working suddenly..
> I'm making a commit every 2m (because I need the results updated on
> searches), so propably when I have more searches at the same time the commit
> will hang again right?
>
> Sorry for the newbie questions and thanks for your help and explanation
> Erik.
>
> BR,
> Frederico
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: terça-feira, 12 de Janeiro de 2010 15:15
> To: solr-user@lucene.apache.org
> Subject: Re: Problem comitting on 40GB index
>
> Rebooting the machine certainly closes the searchers, but
> depending upon how you shut it down there may be stale files....
> After reboot (but before you start SOLR), how much space
> is on your disk? If it's 40G, you have no stale files....
>
> Yes, IR is IndexReader, which is a searcher.
>
> I'll have to leave it to others if you don't have stale files
> hanging around, although if you're optimizing while
> searchers are running, you'll use up to 3X the index size...
>
> Otherwise I'll have to leave it to others for additional insights....
>
> Best
> Erick
>
> On Tue, Jan 12, 2010 at 9:22 AM, Frederico Azeiteiro <
> Frederico.Azeiteiro@cision.com> wrote:
>
> > Hi Erik,
> >
> > I'm a newbie to solr... By IR, you mean searcher? Is there a place where
> I
> > can check the open searchers? And rebooting the machine shouldn't closed
> > that searchers?
> >
> > Thanks,
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: terça-feira, 12 de Janeiro de 2010 13:54
> > To: solr-user@lucene.apache.org
> > Subject: Re: Problem comitting on 40GB index
> >
> > There are several possibilities:
> >
> > 1> you have some process holding open your indexes, probably
> >     other searchers. You *probably* are OK just committing
> >     new changes if there is exactly *one* searcher keeping
> >     your index open. If you have some process whereby
> >     you periodically open a new search but you fail to close
> >     the old one, then you'll use up an extra 40G for every
> >     version of your index held open by your processes. That's
> >    confusing... I'm saying that if you open any number of IRs,
> >    you'll have 40G consumed. Then if you add
> >    some more documents and open *another* IR,  you'll have
> >    another 40G consumed. They'll stay around until you close
> >    your readers.
> >
> > 2> If you optimize, there can be up to 3X the index size being
> >    consumed if you also have a previous reader opened.
> >
> > So I suspect that sometime recently you've opened another
> > IR.....
> >
> > HTH
> > Erick
> >
> >
> >
> > On Tue, Jan 12, 2010 at 8:03 AM, Frederico Azeiteiro <
> > Frederico.Azeiteiro@cision.com> wrote:
> >
> > > Hi all,
> > >
> > > I started working with solr about 1 month ago, and everything was
> > > running well both indexing as searching documents.
> > >
> > > I have a 40GB index with about 10 000 000 documents available. I index
> > > 3k docs for each 10m and commit after each insert.
> > >
> > > Since yesterday, I can't commit no articles to index. I manage to
> search
> > > ok, and index documents without commiting. But when I start the commit
> > > is takes a long time and eats all of the available disk space
> > > left(60GB). The commit eventually stops with full disk and I have to
> > > restart SOLR and get the 60GB returned to system.
> > >
> > > Before this, the commit was taking a few seconds to complete.
> > >
> > > Can someone help to debug the problem? Where should I start? Should I
> > > try to copy the index to other machine with more free space and try to
> > > commit? Should I try an optimize?
> > >
> > > Log for the last commit I tried:
> > >
> > > INFO: start
> > >
> commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=f
> > > alse)
> > > (Then, after a long time...)
> > > Exception in thread "Lucene Merge Thread #0"
> > > org.apache.lucene.index.MergePolicy$MergeException:
> java.io.IOException:
> > > No space left on device
> > >        at
> > >
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Co
> > > ncurrentMergeScheduler.java:351)
> > >        at
> > >
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurr
> > > entMergeScheduler.java:315)
> > > Caused by: java.io.IOException: No space left on device
> > >
> > > I'm using Ubuntu 9.04 and Solr 1.4.0.
> > >
> > > Thanks in advance,
> > >
> > > Frederico
> > >
> >
>

RE: Problem comitting on 40GB index

Posted by Frederico Azeiteiro <Fr...@cision.com>.
I restarted the solr and stopped all searches. After that, the commit() was normal (2 secs) and it's been working for 3h without problems (indexing and a few searches too)... I haven't done any optimize yet, mainly because I had no deletes on the index and the performance is ok, so no need to optimize I think..

I had tried this procedure a few times in the morning and the commit always hanged so.. I have no explanation for it start working suddenly.. 
I'm making a commit every 2m (because I need the results updated on searches), so propably when I have more searches at the same time the commit will hang again right?

Sorry for the newbie questions and thanks for your help and explanation Erik.

BR, 
Frederico

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: terça-feira, 12 de Janeiro de 2010 15:15
To: solr-user@lucene.apache.org
Subject: Re: Problem comitting on 40GB index

Rebooting the machine certainly closes the searchers, but
depending upon how you shut it down there may be stale files....
After reboot (but before you start SOLR), how much space
is on your disk? If it's 40G, you have no stale files....

Yes, IR is IndexReader, which is a searcher.

I'll have to leave it to others if you don't have stale files
hanging around, although if you're optimizing while
searchers are running, you'll use up to 3X the index size...

Otherwise I'll have to leave it to others for additional insights....

Best
Erick

On Tue, Jan 12, 2010 at 9:22 AM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Hi Erik,
>
> I'm a newbie to solr... By IR, you mean searcher? Is there a place where I
> can check the open searchers? And rebooting the machine shouldn't closed
> that searchers?
>
> Thanks,
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: terça-feira, 12 de Janeiro de 2010 13:54
> To: solr-user@lucene.apache.org
> Subject: Re: Problem comitting on 40GB index
>
> There are several possibilities:
>
> 1> you have some process holding open your indexes, probably
>     other searchers. You *probably* are OK just committing
>     new changes if there is exactly *one* searcher keeping
>     your index open. If you have some process whereby
>     you periodically open a new search but you fail to close
>     the old one, then you'll use up an extra 40G for every
>     version of your index held open by your processes. That's
>    confusing... I'm saying that if you open any number of IRs,
>    you'll have 40G consumed. Then if you add
>    some more documents and open *another* IR,  you'll have
>    another 40G consumed. They'll stay around until you close
>    your readers.
>
> 2> If you optimize, there can be up to 3X the index size being
>    consumed if you also have a previous reader opened.
>
> So I suspect that sometime recently you've opened another
> IR.....
>
> HTH
> Erick
>
>
>
> On Tue, Jan 12, 2010 at 8:03 AM, Frederico Azeiteiro <
> Frederico.Azeiteiro@cision.com> wrote:
>
> > Hi all,
> >
> > I started working with solr about 1 month ago, and everything was
> > running well both indexing as searching documents.
> >
> > I have a 40GB index with about 10 000 000 documents available. I index
> > 3k docs for each 10m and commit after each insert.
> >
> > Since yesterday, I can't commit no articles to index. I manage to search
> > ok, and index documents without commiting. But when I start the commit
> > is takes a long time and eats all of the available disk space
> > left(60GB). The commit eventually stops with full disk and I have to
> > restart SOLR and get the 60GB returned to system.
> >
> > Before this, the commit was taking a few seconds to complete.
> >
> > Can someone help to debug the problem? Where should I start? Should I
> > try to copy the index to other machine with more free space and try to
> > commit? Should I try an optimize?
> >
> > Log for the last commit I tried:
> >
> > INFO: start
> > commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=f
> > alse)
> > (Then, after a long time...)
> > Exception in thread "Lucene Merge Thread #0"
> > org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
> > No space left on device
> >        at
> > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Co
> > ncurrentMergeScheduler.java:351)
> >        at
> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurr
> > entMergeScheduler.java:315)
> > Caused by: java.io.IOException: No space left on device
> >
> > I'm using Ubuntu 9.04 and Solr 1.4.0.
> >
> > Thanks in advance,
> >
> > Frederico
> >
>

Re: Problem comitting on 40GB index

Posted by Erick Erickson <er...@gmail.com>.
Rebooting the machine certainly closes the searchers, but
depending upon how you shut it down there may be stale files....
After reboot (but before you start SOLR), how much space
is on your disk? If it's 40G, you have no stale files....

Yes, IR is IndexReader, which is a searcher.

I'll have to leave it to others if you don't have stale files
hanging around, although if you're optimizing while
searchers are running, you'll use up to 3X the index size...

Otherwise I'll have to leave it to others for additional insights....

Best
Erick

On Tue, Jan 12, 2010 at 9:22 AM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Hi Erik,
>
> I'm a newbie to solr... By IR, you mean searcher? Is there a place where I
> can check the open searchers? And rebooting the machine shouldn't closed
> that searchers?
>
> Thanks,
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: terça-feira, 12 de Janeiro de 2010 13:54
> To: solr-user@lucene.apache.org
> Subject: Re: Problem comitting on 40GB index
>
> There are several possibilities:
>
> 1> you have some process holding open your indexes, probably
>     other searchers. You *probably* are OK just committing
>     new changes if there is exactly *one* searcher keeping
>     your index open. If you have some process whereby
>     you periodically open a new search but you fail to close
>     the old one, then you'll use up an extra 40G for every
>     version of your index held open by your processes. That's
>    confusing... I'm saying that if you open any number of IRs,
>    you'll have 40G consumed. Then if you add
>    some more documents and open *another* IR,  you'll have
>    another 40G consumed. They'll stay around until you close
>    your readers.
>
> 2> If you optimize, there can be up to 3X the index size being
>    consumed if you also have a previous reader opened.
>
> So I suspect that sometime recently you've opened another
> IR.....
>
> HTH
> Erick
>
>
>
> On Tue, Jan 12, 2010 at 8:03 AM, Frederico Azeiteiro <
> Frederico.Azeiteiro@cision.com> wrote:
>
> > Hi all,
> >
> > I started working with solr about 1 month ago, and everything was
> > running well both indexing as searching documents.
> >
> > I have a 40GB index with about 10 000 000 documents available. I index
> > 3k docs for each 10m and commit after each insert.
> >
> > Since yesterday, I can't commit no articles to index. I manage to search
> > ok, and index documents without commiting. But when I start the commit
> > is takes a long time and eats all of the available disk space
> > left(60GB). The commit eventually stops with full disk and I have to
> > restart SOLR and get the 60GB returned to system.
> >
> > Before this, the commit was taking a few seconds to complete.
> >
> > Can someone help to debug the problem? Where should I start? Should I
> > try to copy the index to other machine with more free space and try to
> > commit? Should I try an optimize?
> >
> > Log for the last commit I tried:
> >
> > INFO: start
> > commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=f
> > alse)
> > (Then, after a long time...)
> > Exception in thread "Lucene Merge Thread #0"
> > org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
> > No space left on device
> >        at
> > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Co
> > ncurrentMergeScheduler.java:351)
> >        at
> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurr
> > entMergeScheduler.java:315)
> > Caused by: java.io.IOException: No space left on device
> >
> > I'm using Ubuntu 9.04 and Solr 1.4.0.
> >
> > Thanks in advance,
> >
> > Frederico
> >
>

RE: Problem comitting on 40GB index

Posted by Frederico Azeiteiro <Fr...@cision.com>.
Hi Erik,

I'm a newbie to solr... By IR, you mean searcher? Is there a place where I can check the open searchers? And rebooting the machine shouldn't closed that searchers?

Thanks,

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: terça-feira, 12 de Janeiro de 2010 13:54
To: solr-user@lucene.apache.org
Subject: Re: Problem comitting on 40GB index

There are several possibilities:

1> you have some process holding open your indexes, probably
     other searchers. You *probably* are OK just committing
     new changes if there is exactly *one* searcher keeping
     your index open. If you have some process whereby
     you periodically open a new search but you fail to close
     the old one, then you'll use up an extra 40G for every
     version of your index held open by your processes. That's
    confusing... I'm saying that if you open any number of IRs,
    you'll have 40G consumed. Then if you add
    some more documents and open *another* IR,  you'll have
    another 40G consumed. They'll stay around until you close
    your readers.

2> If you optimize, there can be up to 3X the index size being
    consumed if you also have a previous reader opened.

So I suspect that sometime recently you've opened another
IR.....

HTH
Erick



On Tue, Jan 12, 2010 at 8:03 AM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Hi all,
>
> I started working with solr about 1 month ago, and everything was
> running well both indexing as searching documents.
>
> I have a 40GB index with about 10 000 000 documents available. I index
> 3k docs for each 10m and commit after each insert.
>
> Since yesterday, I can't commit no articles to index. I manage to search
> ok, and index documents without commiting. But when I start the commit
> is takes a long time and eats all of the available disk space
> left(60GB). The commit eventually stops with full disk and I have to
> restart SOLR and get the 60GB returned to system.
>
> Before this, the commit was taking a few seconds to complete.
>
> Can someone help to debug the problem? Where should I start? Should I
> try to copy the index to other machine with more free space and try to
> commit? Should I try an optimize?
>
> Log for the last commit I tried:
>
> INFO: start
> commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=f
> alse)
> (Then, after a long time...)
> Exception in thread "Lucene Merge Thread #0"
> org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
> No space left on device
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Co
> ncurrentMergeScheduler.java:351)
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurr
> entMergeScheduler.java:315)
> Caused by: java.io.IOException: No space left on device
>
> I'm using Ubuntu 9.04 and Solr 1.4.0.
>
> Thanks in advance,
>
> Frederico
>

Re: Problem comitting on 40GB index

Posted by Erick Erickson <er...@gmail.com>.
There are several possibilities:

1> you have some process holding open your indexes, probably
     other searchers. You *probably* are OK just committing
     new changes if there is exactly *one* searcher keeping
     your index open. If you have some process whereby
     you periodically open a new search but you fail to close
     the old one, then you'll use up an extra 40G for every
     version of your index held open by your processes. That's
    confusing... I'm saying that if you open any number of IRs,
    you'll have 40G consumed. Then if you add
    some more documents and open *another* IR,  you'll have
    another 40G consumed. They'll stay around until you close
    your readers.

2> If you optimize, there can be up to 3X the index size being
    consumed if you also have a previous reader opened.

So I suspect that sometime recently you've opened another
IR.....

HTH
Erick



On Tue, Jan 12, 2010 at 8:03 AM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Hi all,
>
> I started working with solr about 1 month ago, and everything was
> running well both indexing as searching documents.
>
> I have a 40GB index with about 10 000 000 documents available. I index
> 3k docs for each 10m and commit after each insert.
>
> Since yesterday, I can't commit no articles to index. I manage to search
> ok, and index documents without commiting. But when I start the commit
> is takes a long time and eats all of the available disk space
> left(60GB). The commit eventually stops with full disk and I have to
> restart SOLR and get the 60GB returned to system.
>
> Before this, the commit was taking a few seconds to complete.
>
> Can someone help to debug the problem? Where should I start? Should I
> try to copy the index to other machine with more free space and try to
> commit? Should I try an optimize?
>
> Log for the last commit I tried:
>
> INFO: start
> commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=f
> alse)
> (Then, after a long time...)
> Exception in thread "Lucene Merge Thread #0"
> org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
> No space left on device
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Co
> ncurrentMergeScheduler.java:351)
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurr
> entMergeScheduler.java:315)
> Caused by: java.io.IOException: No space left on device
>
> I'm using Ubuntu 9.04 and Solr 1.4.0.
>
> Thanks in advance,
>
> Frederico
>