You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2012/03/04 21:24:51 UTC

[SoldCloud] Slow indexing

 Hi,

 With auto-committing disabled we can now index many millions of 
 documents in our test environment on a 5-node cluster with 5 shards and 
 a replication factor of 2. The documents are uploaded from map/reduce. 
 No significant changes were made to solrconfig and there are no update 
 processors enabled. We are using a trunk revision from this weekend.

 The indexing speed is well below what we are used to see, we can easily 
 index 5 millions documents on a non-cloud enabled Solr 3.x instance 
 within an hour. What could be going on? There aren't many open TCP 
 connections and the number of file descriptors is stable and I/O is low 
 but CPU-time is high! Each node has two Solr cores both writing to their 
 dedicated disk.

 The indexing speed is stable, it was slow at start and still is. It's 
 now running for well over 6 hours and only 3.5 millions documents are 
 indexed. Another strange detail is that the node receiving all incoming 
 documents (we're not yet using a client side Solr server pool) has a 
 much larger disk usage than all other nodes. This is peculiar as we 
 expected all replica's to be a about the same size.

 The receiving node has slightly higher CPU than the other nodes but the 
 thread dump shows a very large amount of threads of type 
 cmdDistribExecutor-8-thread-292260 (295090) with 0-100ms CPU-time. At 
 the top of the list these threads all have < 20ms time but near the 
 bottom it rises to just over 100ms. All nodes have a couple of 
 http-80-30 (121994) threads with very high CPU-time each.

 Is this a known issue? Did i miss something? Any ideas?

 Thanks

Re: [SoldCloud] Slow indexing

Posted by eks dev <ek...@googlemail.com>.

hmm, loks like you are facing exactly the phenomena I asked about.
See my question here:
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/61326

On Sun, Mar 4, 2012 at 9:24 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> Hi,
>
> With auto-committing disabled we can now index many millions of documents in
> our test environment on a 5-node cluster with 5 shards and a replication
> factor of 2. The documents are uploaded from map/reduce. No significant
> changes were made to solrconfig and there are no update processors enabled.
> We are using a trunk revision from this weekend.
>
> The indexing speed is well below what we are used to see, we can easily
> index 5 millions documents on a non-cloud enabled Solr 3.x instance within
> an hour. What could be going on? There aren't many open TCP connections and
> the number of file descriptors is stable and I/O is low but CPU-time is
> high! Each node has two Solr cores both writing to their dedicated disk.
>
> The indexing speed is stable, it was slow at start and still is. It's now
> running for well over 6 hours and only 3.5 millions documents are indexed.
> Another strange detail is that the node receiving all incoming documents
> (we're not yet using a client side Solr server pool) has a much larger disk
> usage than all other nodes. This is peculiar as we expected all replica's to
> be a about the same size.
>
> The receiving node has slightly higher CPU than the other nodes but the
> thread dump shows a very large amount of threads of type
> cmdDistribExecutor-8-thread-292260 (295090) with 0-100ms CPU-time. At the
> top of the list these threads all have < 20ms time but near the bottom it
> rises to just over 100ms. All nodes have a couple of http-80-30 (121994)
> threads with very high CPU-time each.
>
> Is this a known issue? Did i miss something? Any ideas?
>
> Thanks