You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Jean-Daniel Cryans <jd...@gmail.com> on 2017/07/10 14:40:05 UTC

Re: Load Data Question

(sending to user@ and putting dev@ in bcc)

Hi,

Kudu by itself doesn't really have file loading capabilities, you'd have to
write your own code that reads a file and then uses either the Java or C++
API to insert the data.

Hope this helps,

J-D

On Mon, Jul 10, 2017 at 1:55 AM, sky <x_...@163.com> wrote:

> Hi all,
>     Kudu how to load data from a file?  I know that kudu can insert data
> from impala , but is there any other way? Not through impala, executed by
> kudu alone.
>     Thanks.

Re: Load Data Question

Posted by Brock Noland <br...@phdata.io>.
FWIW - StreamSets and Kafka Connect both have the ability to write data to
Kudu without right code.

Disclaimer - I work at a StreamSets and Confluent partner.

On Mon, Jul 10, 2017 at 9:40 AM, Jean-Daniel Cryans <jd...@gmail.com>
wrote:

> (sending to user@ and putting dev@ in bcc)
>
> Hi,
>
> Kudu by itself doesn't really have file loading capabilities, you'd have to
> write your own code that reads a file and then uses either the Java or C++
> API to insert the data.
>
> Hope this helps,
>
> J-D
>
> On Mon, Jul 10, 2017 at 1:55 AM, sky <x_...@163.com> wrote:
>
> > Hi all,
> >     Kudu how to load data from a file?  I know that kudu can insert data
> > from impala , but is there any other way? Not through impala, executed by
> > kudu alone.
> >     Thanks.
>

Re: Load Data Question

Posted by Alexey Serbin <as...@cloudera.com>.
If you use Kudu API and set flush mode for a session to anything but 
AUTO_FLUSH_SYNC, those inserts will be accumulated into batches at the 
client side and sent to the corresponding tablet servers in chunks.  
Consider using the AUTO_FLUSH_BACKGROUND mode while working with 
KuduSession API (using MANUAL_FLUSH would require you to flush those 
batches manually before the size of the accumulated data reaches the max 
allowed size, which is configurable).

Also, if the lines in your file(s) contain data for independent rows 
(i.e. you are not expecting to perform upserts for some lines), you 
could split those lines into ranges (e.g., 0 -- 999999, 100000 -- 
199999, etc.) and run multiple Kudu sessions (one per line range in the 
file) in parallel.

Hope this helps.


Best regards,

Alexey



On 7/10/17 7:54 PM, sky wrote:
> Hi,
>      If load  data from a csv file, I can only traverse the file, one by one insert through the API ?
>
>   
>
>
>
>
>
>
> At 2017-07-10 22:40:05, "Jean-Daniel Cryans" <jd...@gmail.com> wrote:
>> (sending to user@ and putting dev@ in bcc)
>>
>> Hi,
>>
>> Kudu by itself doesn't really have file loading capabilities, you'd have to
>> write your own code that reads a file and then uses either the Java or C++
>> API to insert the data.
>>
>> Hope this helps,
>>
>> J-D
>>
>> On Mon, Jul 10, 2017 at 1:55 AM, sky <x_...@163.com> wrote:
>>
>>> Hi all,
>>>      Kudu how to load data from a file?  I know that kudu can insert data
>>> from impala , but is there any other way? Not through impala, executed by
>>> kudu alone.
>>>      Thanks.


Re:Re: Load Data Question

Posted by sky <x_...@163.com>.
Hi,
    If load  data from a csv file, I can only traverse the file, one by one insert through the API ?

 






At 2017-07-10 22:40:05, "Jean-Daniel Cryans" <jd...@gmail.com> wrote:
>(sending to user@ and putting dev@ in bcc)
>
>Hi,
>
>Kudu by itself doesn't really have file loading capabilities, you'd have to
>write your own code that reads a file and then uses either the Java or C++
>API to insert the data.
>
>Hope this helps,
>
>J-D
>
>On Mon, Jul 10, 2017 at 1:55 AM, sky <x_...@163.com> wrote:
>
>> Hi all,
>>     Kudu how to load data from a file?  I know that kudu can insert data
>> from impala , but is there any other way? Not through impala, executed by
>> kudu alone.
>>     Thanks.