You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bill Au <bi...@gmail.com> on 2009/08/25 23:29:44 UTC

frequency of commit when building index from scratch

Just curious, how often do folks commit when building their Solr/Lucene
index from scratch for index with millions of documents?  Should I just wait
and do a single commit at the end after adding all the documents to the
index?

Bill

RE: frequency of commit when building index from scratch

Posted by Fuad Efendi <fu...@efendi.ca>.
But again, why someone has OOM??? I never had...

What I discovered is: committing millions docs (in SOLR-1.4) may take
several days (although adding docs takes a day) if you have somehow
_many_segments_ and bad I/O with <= 2 CPUs; I am using heavy ramBufferSizeMB
instead of heavy mergeFactor, and quad cores...


Yes, I am using SolrJ with binary format. 20 minutes to commit millions of
docs (including overwrites of existing ones with same uniqueId); I usually
have 2 segments (>10 Gb each)
-Fuad
http://www.casaGURU.com
=========


If you're using SolrJ, it's due to improvements there too:
1) binary format by default - no XML parsing
2) not used by default, but try using StreamingUpdateSolrServer

-Yonik
http://www.lucidimagination.com


> Bill in most cases you probably cannot do one large commit as you will 
> hit OOM. How many documents can be uncommitted is based on the size of 
> the documents. Committing every document is slow. I have done a commit 
> every 10,000 mostly. Results may vary. Someone might have a better 
> answer then me.




Re: frequency of commit when building index from scratch

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Aug 25, 2009 at 8:37 PM, Lance Norskog<go...@gmail.com> wrote:
> The latest Solr 1.4 can index 200k records in several minutes, then commit
> in a few seconds. I don't know but I'm guessing it is due to Lucene
> improvements. It does not use much memory doing this.

If you're using SolrJ, it's due to improvements there too:
1) binary format by default - no XML parsing
2) not used by default, but try using StreamingUpdateSolrServer

-Yonik
http://www.lucidimagination.com

Re: frequency of commit when building index from scratch

Posted by Lance Norskog <go...@gmail.com>.
The latest Solr 1.4 can index 200k records in several minutes, then commit
in a few seconds. I don't know but I'm guessing it is due to Lucene
improvements. It does not use much memory doing this.

Lance

On Tue, Aug 25, 2009 at 2:43 PM, Fuad Efendi <fu...@efendi.ca> wrote:

> I do commit once a day, millions of small docs... it takes 20 minutes in
> average... why OOM? I see only reduced I/O...
>
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: August-25-09 5:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: frequency of commit when building index from scratch
>
> On Tue, Aug 25, 2009 at 5:29 PM, Bill Au<bi...@gmail.com> wrote:
> > Just curious, how often do folks commit when building their Solr/Lucene
> > index from scratch for index with millions of documents?  Should I just
> wait
> > and do a single commit at the end after adding all the documents to the
> > index?
> >
> > Bill
> >
>
> Bill in most cases you probably cannot do one large commit as you will
> hit OOM. How many documents can be uncommitted is based on the size of
> the documents. Committing every document is slow. I have done a commit
> every 10,000 mostly. Results may vary. Someone might have a better
> answer then me.
>
>
>


-- 
Lance Norskog
goksron@gmail.com

RE: frequency of commit when building index from scratch

Posted by Fuad Efendi <fu...@efendi.ca>.
I do commit once a day, millions of small docs... it takes 20 minutes in
average... why OOM? I see only reduced I/O...


-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com] 
Sent: August-25-09 5:35 PM
To: solr-user@lucene.apache.org
Subject: Re: frequency of commit when building index from scratch

On Tue, Aug 25, 2009 at 5:29 PM, Bill Au<bi...@gmail.com> wrote:
> Just curious, how often do folks commit when building their Solr/Lucene
> index from scratch for index with millions of documents?  Should I just
wait
> and do a single commit at the end after adding all the documents to the
> index?
>
> Bill
>

Bill in most cases you probably cannot do one large commit as you will
hit OOM. How many documents can be uncommitted is based on the size of
the documents. Committing every document is slow. I have done a commit
every 10,000 mostly. Results may vary. Someone might have a better
answer then me.



Re: frequency of commit when building index from scratch

Posted by Bill Au <bi...@gmail.com>.
That's my gut feeling (start big and go lower if OOM occurs) too.

Bill

On Tue, Aug 25, 2009 at 5:34 PM, Edward Capriolo <ed...@gmail.com>wrote:

> On Tue, Aug 25, 2009 at 5:29 PM, Bill Au<bi...@gmail.com> wrote:
> > Just curious, how often do folks commit when building their Solr/Lucene
> > index from scratch for index with millions of documents?  Should I just
> wait
> > and do a single commit at the end after adding all the documents to the
> > index?
> >
> > Bill
> >
>
> Bill in most cases you probably cannot do one large commit as you will
> hit OOM. How many documents can be uncommitted is based on the size of
> the documents. Committing every document is slow. I have done a commit
> every 10,000 mostly. Results may vary. Someone might have a better
> answer then me.
>

Re: frequency of commit when building index from scratch

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Aug 25, 2009 at 5:29 PM, Bill Au<bi...@gmail.com> wrote:
> Just curious, how often do folks commit when building their Solr/Lucene
> index from scratch for index with millions of documents?  Should I just wait
> and do a single commit at the end after adding all the documents to the
> index?
>
> Bill
>

Bill in most cases you probably cannot do one large commit as you will
hit OOM. How many documents can be uncommitted is based on the size of
the documents. Committing every document is slow. I have done a commit
every 10,000 mostly. Results may vary. Someone might have a better
answer then me.