You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Daniel Persson <ma...@gmail.com> on 2012/03/30 19:00:28 UTC

Solr adding and commit

Hi Solr users.

I've been using solr for a while now and got really good search potential.

The last assignment was to take a really large dataset in files and load it
to solr for searches.

My solution was to build a tool that loads information with about 20
threads reading data and submitting and because the data needs some
preprocessing and the parsing isn't simple either.
At the moment each thread creates up to 5000 documents with a size of about
1kb adds them to the server and commits the data. So small documents but
about 30 million of them :)

So my questions to you is the best practice for loading solr with small
documents.
Should I send larger chucks, commit less seldomly or smaller chucks and
commit more often.
The load is done to an separate core that isn't in use so there is no issue
of waiting for commit.

Eager to hear your suggestions.

best regards

Daniel

Re: Solr adding and commit

Posted by Erick Erickson <er...@gmail.com>.

Commit as rarely as possible. But let's be clear what "commit" means,
I'm talking about an actual call to _server.commit() as opposed
to _server.add(List). I don't issue an explicit commit until all the documents
are indexed, just rely on commitWithin to keep things flowing.

I'm guessing you're talking SolrJ here BTW.

By and large I prefer sending fewer packets with lots of docs
in them, I think 5-10K is fine. You probably want to set the
commitWithin variable to limit how long it takes to some rather long
period (i often use 10 minutes or even longer).

But really the packet size is irrelevant if your parsing/etc process isn't
keeping Solr busy. Throw a perf monitor on it (or just use Top or similar)
and see if you're pegging the CPU before worrying about tweaking your
packet size IMO...

Best
Erick

On Fri, Mar 30, 2012 at 1:00 PM, Daniel Persson <ma...@gmail.com> wrote:
> Hi Solr users.
>
> I've been using solr for a while now and got really good search potential.
>
> The last assignment was to take a really large dataset in files and load it
> to solr for searches.
>
> My solution was to build a tool that loads information with about 20
> threads reading data and submitting and because the data needs some
> preprocessing and the parsing isn't simple either.
> At the moment each thread creates up to 5000 documents with a size of about
> 1kb adds them to the server and commits the data. So small documents but
> about 30 million of them :)
>
> So my questions to you is the best practice for loading solr with small
> documents.
> Should I send larger chucks, commit less seldomly or smaller chucks and
> commit more often.
> The load is done to an separate core that isn't in use so there is no issue
> of waiting for commit.
>
> Eager to hear your suggestions.
>
> best regards
>
> Daniel