You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/07/25 20:08:46 UTC

Does SELECT … IN () use parallel dispatch?

Say I have about 50 primary keys I need to fetch.

I'd like to use parallel dispatch.  So that if I have 50 hosts, and each
has record, I can read from all 50 at once.

I assume cassandra does the right thing here ?  I believe it does… at least
from reading the docs but it's still a bit unclear.

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: Does SELECT … IN () use parallel dispatch?

Posted by Kevin Burton <bu...@spinn3r.com>.
On Fri, Jul 25, 2014 at 11:14 AM, DuyHai Doan <do...@gmail.com> wrote:

> Nope. Select ... IN() sends one request to a coordinator. This coordinator
> dispatch the request to 50 nodes as in your example and waits for 50
> responses before sending back the final result. As you can guess this
> approach is not optimal since the global request latency is bound to the
> slowest latency among 50 nodes.
>
>
Maybe it's the wording but it sounds like the coordinator is doing parallel
dispatch?

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: Does SELECT … IN () use parallel dispatch?

Posted by "Laing, Michael" <mi...@nytimes.com>.
Except then you have to merge results if you want them ordered.


On Fri, Jul 25, 2014 at 2:15 PM, Kevin Burton <bu...@spinn3r.com> wrote:

> Ah.. ok. Nice.  That should work.  Parallel dispatch on the client would
> work too.. using async.
>
>
> On Fri, Jul 25, 2014 at 1:37 PM, Laing, Michael <michael.laing@nytimes.com
> > wrote:
>
>> We use IN (keeping the number down). The coordinator does parallel
>> dispatch AND applies ORDERED BY to the aggregate results, which we would
>> otherwise have to do ourselves. Anyway, worth it for us.
>>
>> ml
>>
>>
>> On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton <bu...@spinn3r.com> wrote:
>>
>>> Perhaps the best strategy is to have the datastax java-driver do this
>>> and I just wait or each result individually.  This will give me parallel
>>> dispatch.
>>>
>>>
>>> On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson <gr...@vast.com>
>>> wrote:
>>>
>>>> Of course the driver in question is allowed to be smarter and can do so
>>>> if use use a ? parameter for a list or even individual elements
>>>>
>>>> I'm not sure which if any drivers currently do this but we plan to
>>>> combine this with token aware routing in our scala driver in the future
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Jul 25, 2014, at 1:14 PM, DuyHai Doan <do...@gmail.com> wrote:
>>>>
>>>> Nope. Select ... IN() sends one request to a coordinator. This
>>>> coordinator dispatch the request to 50 nodes as in your example and waits
>>>> for 50 responses before sending back the final result. As you can guess
>>>> this approach is not optimal since the global request latency is bound to
>>>> the slowest latency among 50 nodes.
>>>>
>>>>  On the other hand if you use async feature from the native protocol,
>>>> you client will issue 50 requests in parallel and the answers arrive as
>>>> soon as they are fetched from different nodes.
>>>>
>>>>  Clearly the only advantage of using IN() clause is ease of query. I
>>>> would advise to use IN() only when you have a "few" values, not 50.
>>>>
>>>>
>>>> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton <bu...@spinn3r.com>
>>>> wrote:
>>>>
>>>>> Say I have about 50 primary keys I need to fetch.
>>>>>
>>>>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and
>>>>> each has record, I can read from all 50 at once.
>>>>>
>>>>> I assume cassandra does the right thing here ?  I believe it does… at
>>>>> least from reading the docs but it's still a bit unclear.
>>>>>
>>>>> Kevin
>>>>>
>>>>> --
>>>>>
>>>>> Founder/CEO Spinn3r.com
>>>>> Location: *San Francisco, CA*
>>>>> blog: http://burtonator.wordpress.com
>>>>>  … or check out my Google+ profile
>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>> <http://spinn3r.com>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com>
>>>
>>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>

Re: Does SELECT … IN () use parallel dispatch?

Posted by Kevin Burton <bu...@spinn3r.com>.
Ah.. ok. Nice.  That should work.  Parallel dispatch on the client would
work too.. using async.


On Fri, Jul 25, 2014 at 1:37 PM, Laing, Michael <mi...@nytimes.com>
wrote:

> We use IN (keeping the number down). The coordinator does parallel
> dispatch AND applies ORDERED BY to the aggregate results, which we would
> otherwise have to do ourselves. Anyway, worth it for us.
>
> ml
>
>
> On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton <bu...@spinn3r.com> wrote:
>
>> Perhaps the best strategy is to have the datastax java-driver do this and
>> I just wait or each result individually.  This will give me parallel
>> dispatch.
>>
>>
>> On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson <gr...@vast.com>
>> wrote:
>>
>>> Of course the driver in question is allowed to be smarter and can do so
>>> if use use a ? parameter for a list or even individual elements
>>>
>>> I'm not sure which if any drivers currently do this but we plan to
>>> combine this with token aware routing in our scala driver in the future
>>>
>>> Sent from my iPhone
>>>
>>> On Jul 25, 2014, at 1:14 PM, DuyHai Doan <do...@gmail.com> wrote:
>>>
>>> Nope. Select ... IN() sends one request to a coordinator. This
>>> coordinator dispatch the request to 50 nodes as in your example and waits
>>> for 50 responses before sending back the final result. As you can guess
>>> this approach is not optimal since the global request latency is bound to
>>> the slowest latency among 50 nodes.
>>>
>>>  On the other hand if you use async feature from the native protocol,
>>> you client will issue 50 requests in parallel and the answers arrive as
>>> soon as they are fetched from different nodes.
>>>
>>>  Clearly the only advantage of using IN() clause is ease of query. I
>>> would advise to use IN() only when you have a "few" values, not 50.
>>>
>>>
>>> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton <bu...@spinn3r.com>
>>> wrote:
>>>
>>>> Say I have about 50 primary keys I need to fetch.
>>>>
>>>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and
>>>> each has record, I can read from all 50 at once.
>>>>
>>>> I assume cassandra does the right thing here ?  I believe it does… at
>>>> least from reading the docs but it's still a bit unclear.
>>>>
>>>> Kevin
>>>>
>>>> --
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> blog: http://burtonator.wordpress.com
>>>>  … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>> <http://spinn3r.com>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: Does SELECT … IN () use parallel dispatch?

Posted by "Laing, Michael" <mi...@nytimes.com>.
We use IN (keeping the number down). The coordinator does parallel dispatch
AND applies ORDERED BY to the aggregate results, which we would otherwise
have to do ourselves. Anyway, worth it for us.

ml


On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton <bu...@spinn3r.com> wrote:

> Perhaps the best strategy is to have the datastax java-driver do this and
> I just wait or each result individually.  This will give me parallel
> dispatch.
>
>
> On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson <gr...@vast.com>
> wrote:
>
>> Of course the driver in question is allowed to be smarter and can do so
>> if use use a ? parameter for a list or even individual elements
>>
>> I'm not sure which if any drivers currently do this but we plan to
>> combine this with token aware routing in our scala driver in the future
>>
>> Sent from my iPhone
>>
>> On Jul 25, 2014, at 1:14 PM, DuyHai Doan <do...@gmail.com> wrote:
>>
>> Nope. Select ... IN() sends one request to a coordinator. This
>> coordinator dispatch the request to 50 nodes as in your example and waits
>> for 50 responses before sending back the final result. As you can guess
>> this approach is not optimal since the global request latency is bound to
>> the slowest latency among 50 nodes.
>>
>>  On the other hand if you use async feature from the native protocol, you
>> client will issue 50 requests in parallel and the answers arrive as soon as
>> they are fetched from different nodes.
>>
>>  Clearly the only advantage of using IN() clause is ease of query. I
>> would advise to use IN() only when you have a "few" values, not 50.
>>
>>
>> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton <bu...@spinn3r.com> wrote:
>>
>>> Say I have about 50 primary keys I need to fetch.
>>>
>>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and each
>>> has record, I can read from all 50 at once.
>>>
>>> I assume cassandra does the right thing here ?  I believe it does… at
>>> least from reading the docs but it's still a bit unclear.
>>>
>>> Kevin
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>>  … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com>
>>>
>>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>

Re: Does SELECT … IN () use parallel dispatch?

Posted by Kevin Burton <bu...@spinn3r.com>.
Perhaps the best strategy is to have the datastax java-driver do this and I
just wait or each result individually.  This will give me parallel dispatch.


On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson <gr...@vast.com> wrote:

> Of course the driver in question is allowed to be smarter and can do so if
> use use a ? parameter for a list or even individual elements
>
> I'm not sure which if any drivers currently do this but we plan to combine
> this with token aware routing in our scala driver in the future
>
> Sent from my iPhone
>
> On Jul 25, 2014, at 1:14 PM, DuyHai Doan <do...@gmail.com> wrote:
>
> Nope. Select ... IN() sends one request to a coordinator. This coordinator
> dispatch the request to 50 nodes as in your example and waits for 50
> responses before sending back the final result. As you can guess this
> approach is not optimal since the global request latency is bound to the
> slowest latency among 50 nodes.
>
>  On the other hand if you use async feature from the native protocol, you
> client will issue 50 requests in parallel and the answers arrive as soon as
> they are fetched from different nodes.
>
>  Clearly the only advantage of using IN() clause is ease of query. I would
> advise to use IN() only when you have a "few" values, not 50.
>
>
> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton <bu...@spinn3r.com> wrote:
>
>> Say I have about 50 primary keys I need to fetch.
>>
>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and each
>> has record, I can read from all 50 at once.
>>
>> I assume cassandra does the right thing here ?  I believe it does… at
>> least from reading the docs but it's still a bit unclear.
>>
>> Kevin
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>>  … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: Does SELECT … IN () use parallel dispatch?

Posted by Graham Sanderson <gr...@vast.com>.
Of course the driver in question is allowed to be smarter and can do so if use use a ? parameter for a list or even individual elements

I'm not sure which if any drivers currently do this but we plan to combine this with token aware routing in our scala driver in the future 

Sent from my iPhone

> On Jul 25, 2014, at 1:14 PM, DuyHai Doan <do...@gmail.com> wrote:
> 
> Nope. Select ... IN() sends one request to a coordinator. This coordinator dispatch the request to 50 nodes as in your example and waits for 50 responses before sending back the final result. As you can guess this approach is not optimal since the global request latency is bound to the slowest latency among 50 nodes.
> 
>  On the other hand if you use async feature from the native protocol, you client will issue 50 requests in parallel and the answers arrive as soon as they are fetched from different nodes.
> 
>  Clearly the only advantage of using IN() clause is ease of query. I would advise to use IN() only when you have a "few" values, not 50.
> 
> 
>> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton <bu...@spinn3r.com> wrote:
>> Say I have about 50 primary keys I need to fetch.
>> 
>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and each has record, I can read from all 50 at once.
>> 
>> I assume cassandra does the right thing here ?  I believe it does… at least from reading the docs but it's still a bit unclear.
>> 
>> Kevin
>> 
>> -- 
>> Founder/CEO Spinn3r.com
>> Location: San Francisco, CA
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
> 

Re: Does SELECT … IN () use parallel dispatch?

Posted by DuyHai Doan <do...@gmail.com>.
Nope. Select ... IN() sends one request to a coordinator. This coordinator
dispatch the request to 50 nodes as in your example and waits for 50
responses before sending back the final result. As you can guess this
approach is not optimal since the global request latency is bound to the
slowest latency among 50 nodes.

 On the other hand if you use async feature from the native protocol, you
client will issue 50 requests in parallel and the answers arrive as soon as
they are fetched from different nodes.

 Clearly the only advantage of using IN() clause is ease of query. I would
advise to use IN() only when you have a "few" values, not 50.


On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton <bu...@spinn3r.com> wrote:

> Say I have about 50 primary keys I need to fetch.
>
> I'd like to use parallel dispatch.  So that if I have 50 hosts, and each
> has record, I can read from all 50 at once.
>
> I assume cassandra does the right thing here ?  I believe it does… at
> least from reading the docs but it's still a bit unclear.
>
> Kevin
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
>  … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>