You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Timothy Potter <th...@gmail.com> on 2013/04/21 17:57:44 UTC

Solr cloud and batched updates

There's no problem here, but I'm curious about how batches of updates
are handled on the Solr server side in Solr cloud?

Going over the code for DistributedUpdateProcessor and
SolrCmdDistributor, it appears that the batch is broken down and docs
are processed one-by-one. By processed, I mean that each doc in the
batch from the client is sent to replicas individually.

This makes sense but I wonder if the forwarding on to replicas could
be done in sub-batches? For instance, if the client sends a batch of
100 documents to a cluster with 4 shards, I wonder if it would be more
efficient to calculate the shard assignments to create 4 sub-batches
and then forward those 4 sub-batches on to their respective leaders?
Maybe I'm overthinking it too ;-)

Cheers,
Tim

Re: Solr cloud and batched updates

Posted by Erick Erickson <er...@gmail.com>.
Thanks Yonik! You see how behind the times I get....

On Sun, Apr 21, 2013 at 5:07 PM, Timothy Potter <th...@gmail.com> wrote:
> That's awesome! Thanks Yonik.
>
> Tim
>
> On Sun, Apr 21, 2013 at 1:30 PM, Yonik Seeley <yo...@lucidworks.com> wrote:
>> On Sun, Apr 21, 2013 at 11:57 AM, Timothy Potter <th...@gmail.com> wrote:
>>> There's no problem here, but I'm curious about how batches of updates
>>> are handled on the Solr server side in Solr cloud?
>>>
>>> Going over the code for DistributedUpdateProcessor and
>>> SolrCmdDistributor, it appears that the batch is broken down and docs
>>> are processed one-by-one. By processed, I mean that each doc in the
>>> batch from the client is sent to replicas individually.
>>>
>>> This makes sense but I wonder if the forwarding on to replicas could
>>> be done in sub-batches?
>>
>> Good news... they already are sent in batches!  The docs are processed
>> one-by-one, but then buffered (into batches) for forwarding to
>> replicas.
>>
>> -Yonik
>> http://lucidworks.com

Re: Solr cloud and batched updates

Posted by Timothy Potter <th...@gmail.com>.
That's awesome! Thanks Yonik.

Tim

On Sun, Apr 21, 2013 at 1:30 PM, Yonik Seeley <yo...@lucidworks.com> wrote:
> On Sun, Apr 21, 2013 at 11:57 AM, Timothy Potter <th...@gmail.com> wrote:
>> There's no problem here, but I'm curious about how batches of updates
>> are handled on the Solr server side in Solr cloud?
>>
>> Going over the code for DistributedUpdateProcessor and
>> SolrCmdDistributor, it appears that the batch is broken down and docs
>> are processed one-by-one. By processed, I mean that each doc in the
>> batch from the client is sent to replicas individually.
>>
>> This makes sense but I wonder if the forwarding on to replicas could
>> be done in sub-batches?
>
> Good news... they already are sent in batches!  The docs are processed
> one-by-one, but then buffered (into batches) for forwarding to
> replicas.
>
> -Yonik
> http://lucidworks.com

Re: Solr cloud and batched updates

Posted by Yonik Seeley <yo...@lucidworks.com>.
On Sun, Apr 21, 2013 at 11:57 AM, Timothy Potter <th...@gmail.com> wrote:
> There's no problem here, but I'm curious about how batches of updates
> are handled on the Solr server side in Solr cloud?
>
> Going over the code for DistributedUpdateProcessor and
> SolrCmdDistributor, it appears that the batch is broken down and docs
> are processed one-by-one. By processed, I mean that each doc in the
> batch from the client is sent to replicas individually.
>
> This makes sense but I wonder if the forwarding on to replicas could
> be done in sub-batches?

Good news... they already are sent in batches!  The docs are processed
one-by-one, but then buffered (into batches) for forwarding to
replicas.

-Yonik
http://lucidworks.com

Re: Solr cloud and batched updates

Posted by Erick Erickson <er...@gmail.com>.
I'm pretty sure there's a JIRA to do just that, it just hasn't been
implemented yet.

I guess it's one of those things that would undoubtedly be more efficient, but
whether it would really be noticeable or not is an open question. At any rate,
there are more important fish to fry but if you'd like to submit a patch.....

Best
Erick

On Sun, Apr 21, 2013 at 11:57 AM, Timothy Potter <th...@gmail.com> wrote:
> There's no problem here, but I'm curious about how batches of updates
> are handled on the Solr server side in Solr cloud?
>
> Going over the code for DistributedUpdateProcessor and
> SolrCmdDistributor, it appears that the batch is broken down and docs
> are processed one-by-one. By processed, I mean that each doc in the
> batch from the client is sent to replicas individually.
>
> This makes sense but I wonder if the forwarding on to replicas could
> be done in sub-batches? For instance, if the client sends a batch of
> 100 documents to a cluster with 4 shards, I wonder if it would be more
> efficient to calculate the shard assignments to create 4 sub-batches
> and then forward those 4 sub-batches on to their respective leaders?
> Maybe I'm overthinking it too ;-)
>
> Cheers,
> Tim