You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2012/03/04 21:24:51 UTC
[SoldCloud] Slow indexing
Hi,
With auto-committing disabled we can now index many millions of
documents in our test environment on a 5-node cluster with 5 shards and
a replication factor of 2. The documents are uploaded from map/reduce.
No significant changes were made to solrconfig and there are no update
processors enabled. We are using a trunk revision from this weekend.
The indexing speed is well below what we are used to see, we can easily
index 5 millions documents on a non-cloud enabled Solr 3.x instance
within an hour. What could be going on? There aren't many open TCP
connections and the number of file descriptors is stable and I/O is low
but CPU-time is high! Each node has two Solr cores both writing to their
dedicated disk.
The indexing speed is stable, it was slow at start and still is. It's
now running for well over 6 hours and only 3.5 millions documents are
indexed. Another strange detail is that the node receiving all incoming
documents (we're not yet using a client side Solr server pool) has a
much larger disk usage than all other nodes. This is peculiar as we
expected all replica's to be a about the same size.
The receiving node has slightly higher CPU than the other nodes but the
thread dump shows a very large amount of threads of type
cmdDistribExecutor-8-thread-292260 (295090) with 0-100ms CPU-time. At
the top of the list these threads all have < 20ms time but near the
bottom it rises to just over 100ms. All nodes have a couple of
http-80-30 (121994) threads with very high CPU-time each.
Is this a known issue? Did i miss something? Any ideas?
Thanks
Re: [SoldCloud] Slow indexing
Posted by eks dev <ek...@googlemail.com>.
hmm, loks like you are facing exactly the phenomena I asked about.
See my question here:
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/61326
On Sun, Mar 4, 2012 at 9:24 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> Hi,
>
> With auto-committing disabled we can now index many millions of documents in
> our test environment on a 5-node cluster with 5 shards and a replication
> factor of 2. The documents are uploaded from map/reduce. No significant
> changes were made to solrconfig and there are no update processors enabled.
> We are using a trunk revision from this weekend.
>
> The indexing speed is well below what we are used to see, we can easily
> index 5 millions documents on a non-cloud enabled Solr 3.x instance within
> an hour. What could be going on? There aren't many open TCP connections and
> the number of file descriptors is stable and I/O is low but CPU-time is
> high! Each node has two Solr cores both writing to their dedicated disk.
>
> The indexing speed is stable, it was slow at start and still is. It's now
> running for well over 6 hours and only 3.5 millions documents are indexed.
> Another strange detail is that the node receiving all incoming documents
> (we're not yet using a client side Solr server pool) has a much larger disk
> usage than all other nodes. This is peculiar as we expected all replica's to
> be a about the same size.
>
> The receiving node has slightly higher CPU than the other nodes but the
> thread dump shows a very large amount of threads of type
> cmdDistribExecutor-8-thread-292260 (295090) with 0-100ms CPU-time. At the
> top of the list these threads all have < 20ms time but near the bottom it
> rises to just over 100ms. All nodes have a couple of http-80-30 (121994)
> threads with very high CPU-time each.
>
> Is this a known issue? Did i miss something? Any ideas?
>
> Thanks