You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by stockii <st...@shopgate.com> on 2010/11/29 12:07:20 UTC

Large Hdd-Space using during commit/optimize

Hello.

i have ~37 Million Docs that i want to index. 

when i starte a full-import i importing only every 2 Million Docs, because
of better controll over solr and space/heap ....

so when i import 2 million docs and solr start the commit and the optimize
my used disc-space jumps into the sky. reacten: solr restart and space the
used space goes down.

why is using solr so many space ?  

can i optimize that  ? 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1985807.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Large Hdd-Space using during commit/optimize

Posted by Upayavira <uv...@odoko.co.uk>.

On Mon, 29 Nov 2010 03:07 -0800, "stockii" <st...@shopgate.com> wrote:
> 
> Hello.
> 
> i have ~37 Million Docs that i want to index. 
> 
> when i starte a full-import i importing only every 2 Million Docs,
> because
> of better controll over solr and space/heap ....
> 
> so when i import 2 million docs and solr start the commit and the
> optimize
> my used disc-space jumps into the sky. reacten: solr restart and space
> the
> used space goes down.
> 
> why is using solr so many space ?  
> 
> can i optimize that  ? 

What do you mean "into the sky"? What percentage increase are you
seeing?

I'd expect it to double at least. I've heard it suggested that you
should have three times the usual space available for an optimise.

Remember, when your index is optimising, you'll want to keep the
original index online and available for searches, so you'll have at
least two copies of your index on disk during an optimise.

Also, it is my understanding that if you commit infrequently, you won't
need to optimise immediately. There's nothing to stop you importing your
entire corpus, then doing a single commit. That will leave you with only
one segment (or at most two - one that existed before and was empty, and
one containing all of your documents). The net result being you don't
need to optimise at that point.

Note - I'm no solr guru, so I could be wrong with some of the above -
I'm happy to be corrected.

Upayavira

Re: Large Hdd-Space using during commit/optimize

Posted by stockii <st...@shopgate.com>.
okay. 

the query kills the database, because no index of modified is set ... 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1993750.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Large Hdd-Space using during commit/optimize

Posted by Erick Erickson <er...@gmail.com>.
Solr doesn't lock anything as far as I know, it just executes the
query you specify. The query you specify may well do bad things
to your database, but that's not Solr's fault. What happens if you
simply try executing the query outside Solr? Do you see the
same "locking" behavior?

You might want to consider using SolrJ along with the jdbc driver
of your choice rather than DIH if DIH is causing you grief.

Best
Erick

On Tue, Nov 30, 2010 at 3:36 AM, stockii <st...@shopgate.com> wrote:

>
> aha aha :D
>
> hm i dont know. we import in 2MillionSteps because we think that solr locks
> our database and we want a better controll of the import ...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1991392.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Large Hdd-Space using during commit/optimize

Posted by Upayavira <uv...@odoko.co.uk>.
I don't know who you are replying to here, but...

There's nothing to stop you doing:

 * import 2m docs
 * sleep 2 days
 * import 2m docs
 * sleep 2 days
 * repeat above until done
 * commit

There's no reason why you should commit regularly. If you need to slow
down for your DB, do, but that doesn't mean you need to increase the
frequency of your commits.

Upayavira



On Tue, 30 Nov 2010 00:36 -0800, "stockii" <st...@shopgate.com> wrote:
> 
> aha aha :D
> 
> hm i dont know. we import in 2MillionSteps because we think that solr
> locks
> our database and we want a better controll of the import ... 
> -- 
> View this message in context:
> http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1991392.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

Re: Large Hdd-Space using during commit/optimize

Posted by stockii <st...@shopgate.com>.
aha aha :D

hm i dont know. we import in 2MillionSteps because we think that solr locks
our database and we want a better controll of the import ... 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1991392.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Large Hdd-Space using during commit/optimize

Posted by Upayavira <uv...@odoko.co.uk>.
On Mon, 29 Nov 2010 08:43 -0800, "stockii" <st...@shopgate.com> wrote:
> 
> aha okay. thx
> 
> i dont know that solr copys the complete index for optimize. can i solr
> say,
> that he start an optimize, but wihtout copy ? 

No.

The copy is to keep an index available for searches while the optimise
is happening.

Also, to allow for rollback should something go wrong with the optimise.

The simplest thing is to keep your commits low (I suspect you could
ingest 35m documents with just one commit at the end).

In that case, optimisation is not required (optimisation is to reduce
the number of segments in your index, and segments are created by
commits. If you don't do many commits, you won't need to optimise - at
least you won't at the point of initial ingestion.

Upayavira

Re: Large Hdd-Space using during commit/optimize

Posted by stockii <st...@shopgate.com>.
aha okay. thx

i dont know that solr copys the complete index for optimize. can i solr say,
that he start an optimize, but wihtout copy ? 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1987477.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Large Hdd-Space using during commit/optimize

Posted by Erick Erickson <er...@gmail.com>.
First, don't optimize after every chunk, it's just making extra work for
your system.
If you're using a 3.x or trunk build, optimizing doesn't do much for you
anyway, but
if you must, just optimize after your entire import is done.

Optimizing will pretty much copy the old index into a new set of files, so
you can expect your disk space to at least double because Solr/Lucene
doesn't
delete anything until it's sure that the optimize finished successfully.
Imagine
the consequence of deleting files as they were copied to save disk space.
Now
hit a program error, power glitch or ctrl-c. Your indexes would be
corrupted.

Best
Erick

On Mon, Nov 29, 2010 at 6:07 AM, stockii <st...@shopgate.com> wrote:

>
> Hello.
>
> i have ~37 Million Docs that i want to index.
>
> when i starte a full-import i importing only every 2 Million Docs, because
> of better controll over solr and space/heap ....
>
> so when i import 2 million docs and solr start the commit and the optimize
> my used disc-space jumps into the sky. reacten: solr restart and space the
> used space goes down.
>
> why is using solr so many space ?
>
> can i optimize that  ?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1985807.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>