You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Savvas-Andreas Moysidis <sa...@googlemail.com> on 2010/10/28 15:56:25 UTC

Commit/Optimise question

Hello,

We currently index our data through a SQL-DIH setup but due to our model
(and therefore sql query) becoming complex we need to index our data
programmatically. As we didn't have to deal with commit/optimise before, we
are now wondering whether there is an optimal approach to that. Is there a
batch size after which we should fire a commit or should we execute a commit
after indexing all of our data? What about optimise?

Our document corpus is > 4m documents and through DIH the resulting index is
around 1.5G

We have searched previous posts but couldn't find a definite answer. Any
input much appreciated!

Regards,
-- Savvas

Re: Commit/Optimise question

Posted by Savvas-Andreas Moysidis <sa...@googlemail.com>.

Thanks Eric. For the record, we are using 1.4.1 and SolrJ.

On 31 October 2010 01:54, Erick Erickson <er...@gmail.com> wrote:

> What version of Solr are you using?
>
> About committing. I'd just let the solr defaults handle that. You configure
> this in the autocommit section of solrconfig.xml. I'm pretty sure this
>  gets
> triggered even if you're using SolrJ.
>
> That said, it's probably wise to issue a commit after all your data is
> indexed
> too, just to flush any remaining documents since the last autocommit.
>
> Optimize should not be issued until you're all done, if at all. If
> you're not deleting (or updating) documents, don't bother to optimize
> unless the number of files in your index directory gets really large.
> Recent Solr code almost removes the need to optimize unless you
> delete documents, but I confess I don't know the revision number
> "recent" refers to, perhaps only trunk...
>
> HTH
> Erick
>
> On Thu, Oct 28, 2010 at 9:56 AM, Savvas-Andreas Moysidis <
> savvas.andreas.moysidis@googlemail.com> wrote:
>
> > Hello,
> >
> > We currently index our data through a SQL-DIH setup but due to our model
> > (and therefore sql query) becoming complex we need to index our data
> > programmatically. As we didn't have to deal with commit/optimise before,
> we
> > are now wondering whether there is an optimal approach to that. Is there
> a
> > batch size after which we should fire a commit or should we execute a
> > commit
> > after indexing all of our data? What about optimise?
> >
> > Our document corpus is > 4m documents and through DIH the resulting index
> > is
> > around 1.5G
> >
> > We have searched previous posts but couldn't find a definite answer. Any
> > input much appreciated!
> >
> > Regards,
> > -- Savvas
> >
>

Re: Commit/Optimise question

Posted by Erick Erickson <er...@gmail.com>.

What version of Solr are you using?

About committing. I'd just let the solr defaults handle that. You configure
this in the autocommit section of solrconfig.xml. I'm pretty sure this  gets
triggered even if you're using SolrJ.

That said, it's probably wise to issue a commit after all your data is
indexed
too, just to flush any remaining documents since the last autocommit.

Optimize should not be issued until you're all done, if at all. If
you're not deleting (or updating) documents, don't bother to optimize
unless the number of files in your index directory gets really large.
Recent Solr code almost removes the need to optimize unless you
delete documents, but I confess I don't know the revision number
"recent" refers to, perhaps only trunk...

HTH
Erick

On Thu, Oct 28, 2010 at 9:56 AM, Savvas-Andreas Moysidis <
savvas.andreas.moysidis@googlemail.com> wrote:

> Hello,
>
> We currently index our data through a SQL-DIH setup but due to our model
> (and therefore sql query) becoming complex we need to index our data
> programmatically. As we didn't have to deal with commit/optimise before, we
> are now wondering whether there is an optimal approach to that. Is there a
> batch size after which we should fire a commit or should we execute a
> commit
> after indexing all of our data? What about optimise?
>
> Our document corpus is > 4m documents and through DIH the resulting index
> is
> around 1.5G
>
> We have searched previous posts but couldn't find a definite answer. Any
> input much appreciated!
>
> Regards,
> -- Savvas
>