You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "kowish.adamosh" <ko...@gmail.com> on 2013/09/18 11:40:33 UTC

Solr Cloud dataimport freezes

Hi guys,

I have a problem with data import (based on database sql) in Solr Cloud. I'm
trying to import ~500 000 000 of documents and I've created 30 logical
shards on 2 physical machines. Documents are distributed by composite id.
After some time (5-10 minutes; about 400 000 documents) Solr Cloud stops
indexing documents. This is because indexing thread parks and waits on
semaphore:
org.apache.solr.update.SolrCmdDistributor#semaphore.acquire() in method
submit.

While indexing I see jdbc calls in stack trace but after it parks on
semaphore I don't see any jdbc calls (I see only Solr and JDK method calls).

Version of Solr: 4.4
Version of Lucene: 4.4

*With one shard and one physical machines everything is OK*
*With one shard and two physical machines (one leader, one replica)
everything is OK*

This is really big problem for us because of large number of documents we
have to shard index. We have unique queries with sorting so it leads to 1
minute long response times without sharding.

Best,
Kowish



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-dataimport-freezes-tp4090812.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud dataimport freezes

Posted by "kowish.adamosh" <ko...@gmail.com>.
Update:
- it works for 8 shards. 
I'm going to test it on 16 shards.

Any ideas what is going on? :-)



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-dataimport-freezes-tp4090812p4090832.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud dataimport freezes

Posted by Shawn Heisey <so...@elyograg.org>.
On 9/18/2013 3:40 AM, kowish.adamosh wrote:
> I have a problem with data import (based on database sql) in Solr Cloud. I'm
> trying to import ~500 000 000 of documents and I've created 30 logical
> shards on 2 physical machines. Documents are distributed by composite id.
> After some time (5-10 minutes; about 400 000 documents) Solr Cloud stops
> indexing documents. This is because indexing thread parks and waits on
> semaphore:
> org.apache.solr.update.SolrCmdDistributor#semaphore.acquire() in method
> submit.

There are some SolrCloud bugs that we expect will be fixed in version
4.5.  Basically what happens is that when a large number of updates are
being distributed from whichever core receives them to the appropriate
shard replicas, managing all those requests results in a deadlock.  If
everything goes well with the release, 4.5 will be out sometime within
the next two weeks.

You can always download and build the "branches/lucene_solr_4_5" code
branch from SVN if you want to try out what will become Solr 4.5:

http://wiki.apache.org/solr/HowToContribute#Getting_the_source_code

SOLR-4816 is semi-related, because it helps avoid the problem in the
first place when using CloudSolrServer in a java program.  I'm having a
hard time finding the jira issue number(s) for the underlying
problem(s), but I know some changes were committed recently specifically
for this problem.

Thanks,
Shawn