You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jimtronic <ji...@gmail.com> on 2013/03/13 16:39:55 UTC

Scaling SolrCloud and DIH

I'm curious how people are using DIH with SolrCloud.

I have cron jobs set up to trigger the dataimports which come from both xml
files and a sql database. Some are frequent small delta imports while others
are larger daily xml imports.

Here's what I've tried:

1. Set up a micro box that sends the dataimport requests to a load balancer
using cron. This didn't work because frequent requests would get spread
around and at one point all my nodes were doing the dataimport requests at
the same time.

2. Designate one box as the indexer and call dataimport via localhost. The
problem here is that I now have a single point of failure for indexing -- I
always have to have that box running. I love that SolrCloud is distributed
so I can have 3 boxes in my cluster and I don't care which one goes down.

I don't really know what the solution is, but I guess it would be nice if
the dataimport was cloud aware. Meaning that the cluster knows an update is
happening on one of the boxes and won't let another one start. That way I
could just send the dataimport request up through the load balancer and
forget about it.

Anyway, I thought I would see how others are handling this issue.

Cheers, Jim



--
View this message in context: http://lucene.472066.n3.nabble.com/Scaling-SolrCloud-and-DIH-tp4047049.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scaling SolrCloud and DIH

Posted by Mark Miller <ma...@gmail.com>.
There is still some work to be done to make DIH play nicely with SolrCloud in terms of failover.

https://issues.apache.org/jira/browse/SOLR-4058 is one of the issues that should be addressed.

I think I made another issue or two, but I don't remember them offhand.

- Mark

On Mar 13, 2013, at 11:39 AM, jimtronic <ji...@gmail.com> wrote:

> I'm curious how people are using DIH with SolrCloud.
> 
> I have cron jobs set up to trigger the dataimports which come from both xml
> files and a sql database. Some are frequent small delta imports while others
> are larger daily xml imports.
> 
> Here's what I've tried:
> 
> 1. Set up a micro box that sends the dataimport requests to a load balancer
> using cron. This didn't work because frequent requests would get spread
> around and at one point all my nodes were doing the dataimport requests at
> the same time.
> 
> 2. Designate one box as the indexer and call dataimport via localhost. The
> problem here is that I now have a single point of failure for indexing -- I
> always have to have that box running. I love that SolrCloud is distributed
> so I can have 3 boxes in my cluster and I don't care which one goes down.
> 
> I don't really know what the solution is, but I guess it would be nice if
> the dataimport was cloud aware. Meaning that the cluster knows an update is
> happening on one of the boxes and won't let another one start. That way I
> could just send the dataimport request up through the load balancer and
> forget about it.
> 
> Anyway, I thought I would see how others are handling this issue.
> 
> Cheers, Jim
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Scaling-SolrCloud-and-DIH-tp4047049.html
> Sent from the Solr - User mailing list archive at Nabble.com.