You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Amit Adhau <am...@globant.com> on 2016/09/01 14:11:52 UTC

Re: No order by in kudu java api

Thanks Todd, we will be trying the same, hope that this should not affect
the performance.

We are using hash partition for our table. Can you please suggest, if there
would be any other config flags that we should look into to improve the
scan performance. In the past we had used some of the flags that you had
suggested in your kudu insert performance blog and that helped us in kudu
writes.

Thanks,
Amit

On Aug 31, 2016 10:36 PM, "Todd Lipcon" <to...@cloudera.com> wrote:

> Hi Amit,
>
> That's correct, there is no "order by" support in the Java API, because
> this is an arbitrarily complex operation. Imagine a table with a trillion
> rows, and asking for "order by" from a Java client. It would have to either
> download and sort the entire table on your client node (which is
> infeasible) or would have to somehow ask the servers to perform a huge
> shuffle and sort, which isn't something Kudu's designed to do.
>
> The recommendation is:
> - if you're just needing to sort small sets of rows, then grab the whole
> result set and use a normal Java-based sort (Collections.sort)
> - if you're needing to sort a large number of rows, use something like
> Impala or Spark SQL to perform the sort.
>
> -Todd
>
> On Wed, Aug 31, 2016 at 8:06 AM, Amit Adhau <am...@globant.com>
> wrote:
>
>> Hi Kudu Team,
>>
>> Using Java Kudu API, we want to sort the data on kudu table based on
>> table column, but we have not found any option in API for the same.
>> Can you please help us on the same.
>>
>> --
>> Thanks & Regards,
>>
>> *Amit Adhau* | Data Architect
>>
>> *GLOBANT* | IND:+91 9821518132
>>
>> [image: Facebook] <https://www.facebook.com/Globant>
>>
>> [image: Twitter] <http://www.twitter.com/globant>
>>
>> [image: Youtube] <http://www.youtube.com/Globant>
>>
>> [image: Linkedin] <http://www.linkedin.com/company/globant>
>>
>> [image: Pinterest] <http://pinterest.com/globant/>
>>
>> [image: Globant] <http://www.globant.com/>
>>
>> The information contained in this e-mail may be confidential. It has been
>> sent for the sole use of the intended recipient(s). If the reader of this
>> message is not an intended recipient, you are hereby notified that any
>> unauthorized review, use, disclosure, dissemination, distribution or
>> copying of this communication, or any of its contents,
>> is strictly prohibited. If you have received it by mistake please let us
>> know by e-mail immediately and delete it from your system. Many thanks.
>>
>>
>>
>> La información contenida en este mensaje puede ser confidencial. Ha sido
>> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
>> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
>> notificado que cualquier lectura, uso, publicación, diseminación,
>> distribución o copiado de esta comunicación o su contenido está
>> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
>> por error le agradeceremos notificarnos por e-mail inmediatamente y
>> eliminarlo de su sistema. Muchas gracias.
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

-- 


The information contained in this e-mail may be confidential. It has been 
sent for the sole use of the intended recipient(s). If the reader of this 
message is not an intended recipient, you are hereby notified that any 
unauthorized review, use, disclosure, dissemination, distribution or 
copying of this communication, or any of its contents, 
is strictly prohibited. If you have received it by mistake please let us 
know by e-mail immediately and delete it from your system. Many thanks.

 

La información contenida en este mensaje puede ser confidencial. Ha sido 
enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de 
este mensaje no fuera el destinatario previsto, por el presente queda Ud. 
notificado que cualquier lectura, uso, publicación, diseminación, 
distribución o copiado de esta comunicación o su contenido está 
estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje 
por error le agradeceremos notificarnos por e-mail inmediatamente y 
eliminarlo de su sistema. Muchas gracias.


Re: No order by in kudu java api

Posted by Todd Lipcon <to...@cloudera.com>.
On Thu, Sep 1, 2016 at 7:11 AM, Amit Adhau <am...@globant.com> wrote:

> Thanks Todd, we will be trying the same, hope that this should not affect
> the performance.
>
> We are using hash partition for our table. Can you please suggest, if
> there would be any other config flags that we should look into to improve
> the scan performance. In the past we had used some of the flags that you
> had suggested in your kudu insert performance blog and that helped us in
> kudu writes.
>

Are you using a single Java client to read large amounts of data? If so,
note that you're getting a single-threaded read, so you are most likely not
limited by the server side. What you could consider is using the ScanToken
API to retrieve a bunch of scan tokens for your query, and then feed them
into a thread pool, starting a new scanner for each token. That should give
you parallelism on the client side.

-Todd

> Thanks,
> Amit
>
> On Aug 31, 2016 10:36 PM, "Todd Lipcon" <to...@cloudera.com> wrote:
>
>> Hi Amit,
>>
>> That's correct, there is no "order by" support in the Java API, because
>> this is an arbitrarily complex operation. Imagine a table with a trillion
>> rows, and asking for "order by" from a Java client. It would have to either
>> download and sort the entire table on your client node (which is
>> infeasible) or would have to somehow ask the servers to perform a huge
>> shuffle and sort, which isn't something Kudu's designed to do.
>>
>> The recommendation is:
>> - if you're just needing to sort small sets of rows, then grab the whole
>> result set and use a normal Java-based sort (Collections.sort)
>> - if you're needing to sort a large number of rows, use something like
>> Impala or Spark SQL to perform the sort.
>>
>> -Todd
>>
>> On Wed, Aug 31, 2016 at 8:06 AM, Amit Adhau <am...@globant.com>
>> wrote:
>>
>>> Hi Kudu Team,
>>>
>>> Using Java Kudu API, we want to sort the data on kudu table based on
>>> table column, but we have not found any option in API for the same.
>>> Can you please help us on the same.
>>>
>>> --
>>> Thanks & Regards,
>>>
>>> *Amit Adhau* | Data Architect
>>>
>>> *GLOBANT* | IND:+91 9821518132
>>>
>>> [image: Facebook] <https://www.facebook.com/Globant>
>>>
>>> [image: Twitter] <http://www.twitter.com/globant>
>>>
>>> [image: Youtube] <http://www.youtube.com/Globant>
>>>
>>> [image: Linkedin] <http://www.linkedin.com/company/globant>
>>>
>>> [image: Pinterest] <http://pinterest.com/globant/>
>>>
>>> [image: Globant] <http://www.globant.com/>
>>>
>>> The information contained in this e-mail may be confidential. It has
>>> been sent for the sole use of the intended recipient(s). If the reader of
>>> this message is not an intended recipient, you are hereby notified that any
>>> unauthorized review, use, disclosure, dissemination, distribution or
>>> copying of this communication, or any of its contents,
>>> is strictly prohibited. If you have received it by mistake please let
>>> us know by e-mail immediately and delete it from your system. Many
>>> thanks.
>>>
>>>
>>>
>>> La información contenida en este mensaje puede ser confidencial. Ha sido
>>> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
>>> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
>>> notificado que cualquier lectura, uso, publicación, diseminación,
>>> distribución o copiado de esta comunicación o su contenido está
>>> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
>>> por error le agradeceremos notificarnos por e-mail inmediatamente y
>>> eliminarlo de su sistema. Muchas gracias.
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
> The information contained in this e-mail may be confidential. It has been
> sent for the sole use of the intended recipient(s). If the reader of this
> message is not an intended recipient, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution or
> copying of this communication, or any of its contents,
> is strictly prohibited. If you have received it by mistake please let us
> know by e-mail immediately and delete it from your system. Many thanks.
>
>
>
> La información contenida en este mensaje puede ser confidencial. Ha sido
> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
> notificado que cualquier lectura, uso, publicación, diseminación,
> distribución o copiado de esta comunicación o su contenido está
> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
> por error le agradeceremos notificarnos por e-mail inmediatamente y
> eliminarlo de su sistema. Muchas gracias.
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera