You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by roySolr <ro...@gmail.com> on 2013/03/14 14:22:38 UTC

Advice: solrCloud + DIH

Hello,

I need some advice with my solrcloud cluster and the DIH. I have a cluster
with 3 cloud servers. Every server has an solr instance and a zookeeper
instance. I start it with the -Dzkhost parameter. It works great, i send
updates by an curl(xml) like this:

curl http:/ip:SOLRport/solr/update -H "Content-Type: text/xml" --data-binary
'<add><doc><field name="id">223232</field><field
name="content">test</field></doc></add>'

Solr has 2 million docs in the index. Now i want a extra field: content2. I
add this in my schema and upload this again to the cluster with
-Dbootstrap_confdir and -Dcollection.configName. It's replicated to the
whole cluster.

Now i need a re-index to add the field to every doc. I have a database with
all the data and want to use the full-import of DIH(this was the way i did
this in previous solr versions). When i run this it goes with 3 doc/s(Really
slow). When i run solr alone(not solrcloud) it goes 600 docs/sec. 

What's the best way to do a full re-index with solrcloud? Does solrcloud
support DIH?

Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/Advice-solrCloud-DIH-tp4047339.html
Sent from the Solr - User mailing list archive at Nabble.com.

答复: Advice: solrCloud + DIH

Posted by "Rollin.R.Ma (lab.sh04.Newegg) 41099" <Ro...@newegg.com>.
2000docs/s is my result. Near to embededsolr. Can be tuned .


Yes u can know that, u must understand shard partition.



--
View this message in context: http://lucene.472066.n3.nabble.com/Advice-solrCloud-DIH-tp4047339p4047673.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Advice: solrCloud + DIH

Posted by rulinma <ru...@gmail.com>.
Yes u can know that, u must understand shard partition.



--
View this message in context: http://lucene.472066.n3.nabble.com/Advice-solrCloud-DIH-tp4047339p4047673.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Advice: solrCloud + DIH

Posted by roySolr <ro...@gmail.com>.
Thans for the support so far,

I was running the dataimport on a replica! Now i start it on the leader and
it goes with 590 doc/s. I think all docs were going to another node and then
came back. 

Is there a way to get the leader? If there is, i can detect the leader with
a script and start the DIH every night on the right server. 

Roy





--
View this message in context: http://lucene.472066.n3.nabble.com/Advice-solrCloud-DIH-tp4047339p4047627.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Advice: solrCloud + DIH

Posted by Mark Miller <ma...@gmail.com>.
On Mar 14, 2013, at 9:22 AM, roySolr <ro...@gmail.com> wrote:

> Hello,
> 
>  When i run this it goes with 3 doc/s(Really
> slow). When i run solr alone(not solrcloud) it goes 600 docs/sec. 
> 
> What's the best way to do a full re-index with solrcloud? Does solrcloud
> support DIH?
> 
> Thanks
> 

SolrCloud supports DIH, but not fully and happily. It's setup to work pretty nicely with non SolrCloud - it will load pretty quick - with SolrCloud a few things can happen - one is that you might be running DIH on a replica rather than a leader - and that can change without your consent - in this case all docs will go to another node and then come back. SolrCloud also works best with multiple threads really - DIH will only use one to my knowledge.

Still, at 3 docs/s, something sounds wrong. That's too slow.

- Mark


Re: Advice: solrCloud + DIH

Posted by rulinma <ru...@gmail.com>.
3docs/s is lower, I test with 4 node is more 1000docs/s and 4k/doc with
solrcloud. Every leader has a replica.

I am tuning to improve to 3000docs/s. 3docs/s is too slow.

3x!



--
View this message in context: http://lucene.472066.n3.nabble.com/Advice-solrCloud-DIH-tp4047339p4047559.html
Sent from the Solr - User mailing list archive at Nabble.com.