You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by erolagnab <tr...@gmail.com> on 2007/09/17 05:01:32 UTC

Indexing Speed

Hi,

Just a FYI.

I've seen some posts mentioned that Solr can index 100-150 docs/s and the
comparison between embedded solr and HTTP. I've tried to do the indexing
with 1.7+ million docs, each doc has 30 fields among which 10 fields are
indexed/stored and the rest are only stored. The result was pretty
impressive, it took approx 1.4 hour to finish. Noted that, the docs were
sent synchronously, one after the other. The solr server and client were
both running on Pentium Dual Core 3.2, 2G Ram, Ubuntu Feisty.

The only issue I noticed is that, Solr does occupy some amount of memory. In
the first run, after indexing around 500 thousands docs, it threw
OutOfMemory exception. In the second trial, I setup -Xms and -Xmx for the
JVM to run on 1G memory, Solr performed till the finish. 

Some questions
1) Is it a good practice to allow Solr indexing docs in real time (millions
docs per day)? What I'm worry is that, Solr may eat up the memory as it
goes.
2) If docs are sent asynchronously, how well could Solr can index?

Any comments are highly appriciated

Trung
-- 
View this message in context: http://www.nabble.com/Indexing-Speed-tf4464036.html#a12728679
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing Speed

Posted by Mike Klaas <mi...@gmail.com>.
On 16-Sep-07, at 8:01 PM, erolagnab wrote:

>
> Hi,
>
> Just a FYI.
>
> I've seen some posts mentioned that Solr can index 100-150 docs/s  
> and the
> comparison between embedded solr and HTTP. I've tried to do the  
> indexing
> with 1.7+ million docs, each doc has 30 fields among which 10  
> fields are
> indexed/stored and the rest are only stored. The result was pretty
> impressive, it took approx 1.4 hour to finish. Noted that, the docs  
> were
> sent synchronously, one after the other. The solr server and client  
> were
> both running on Pentium Dual Core 3.2, 2G Ram, Ubuntu Feisty.
>
> The only issue I noticed is that, Solr does occupy some amount of  
> memory. In
> the first run, after indexing around 500 thousands docs, it threw
> OutOfMemory exception. In the second trial, I setup -Xms and -Xmx  
> for the
> JVM to run on 1G memory, Solr performed till the finish.

You can tune memory usage by setting maxBufferedDocs to a lower  
value.  Also, watch out for large individual docs.

> Some questions
> 1) Is it a good practice to allow Solr indexing docs in real time  
> (millions
> docs per day)? What I'm worry is that, Solr may eat up the memory  
> as it
> goes.

You can tune max memory usage (see above).

> 2) If docs are sent asynchronously, how well could Solr can index?

As long as you don't send 1.7million docs at once, you should see a  
performance improvement.

-Mike