You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Jean-Daniel Cryans <jd...@gmail.com> on 2017/07/10 14:40:05 UTC
Re: Load Data Question
(sending to user@ and putting dev@ in bcc)
Hi,
Kudu by itself doesn't really have file loading capabilities, you'd have to
write your own code that reads a file and then uses either the Java or C++
API to insert the data.
Hope this helps,
J-D
On Mon, Jul 10, 2017 at 1:55 AM, sky <x_...@163.com> wrote:
> Hi all,
> Kudu how to load data from a file? I know that kudu can insert data
> from impala , but is there any other way? Not through impala, executed by
> kudu alone.
> Thanks.
Re: Load Data Question
Posted by Brock Noland <br...@phdata.io>.
FWIW - StreamSets and Kafka Connect both have the ability to write data to
Kudu without right code.
Disclaimer - I work at a StreamSets and Confluent partner.
On Mon, Jul 10, 2017 at 9:40 AM, Jean-Daniel Cryans <jd...@gmail.com>
wrote:
> (sending to user@ and putting dev@ in bcc)
>
> Hi,
>
> Kudu by itself doesn't really have file loading capabilities, you'd have to
> write your own code that reads a file and then uses either the Java or C++
> API to insert the data.
>
> Hope this helps,
>
> J-D
>
> On Mon, Jul 10, 2017 at 1:55 AM, sky <x_...@163.com> wrote:
>
> > Hi all,
> > Kudu how to load data from a file? I know that kudu can insert data
> > from impala , but is there any other way? Not through impala, executed by
> > kudu alone.
> > Thanks.
>
Re: Load Data Question
Posted by Alexey Serbin <as...@cloudera.com>.
If you use Kudu API and set flush mode for a session to anything but
AUTO_FLUSH_SYNC, those inserts will be accumulated into batches at the
client side and sent to the corresponding tablet servers in chunks.
Consider using the AUTO_FLUSH_BACKGROUND mode while working with
KuduSession API (using MANUAL_FLUSH would require you to flush those
batches manually before the size of the accumulated data reaches the max
allowed size, which is configurable).
Also, if the lines in your file(s) contain data for independent rows
(i.e. you are not expecting to perform upserts for some lines), you
could split those lines into ranges (e.g., 0 -- 999999, 100000 --
199999, etc.) and run multiple Kudu sessions (one per line range in the
file) in parallel.
Hope this helps.
Best regards,
Alexey
On 7/10/17 7:54 PM, sky wrote:
> Hi,
> If load data from a csv file, I can only traverse the file, one by one insert through the API ?
>
>
>
>
>
>
>
>
> At 2017-07-10 22:40:05, "Jean-Daniel Cryans" <jd...@gmail.com> wrote:
>> (sending to user@ and putting dev@ in bcc)
>>
>> Hi,
>>
>> Kudu by itself doesn't really have file loading capabilities, you'd have to
>> write your own code that reads a file and then uses either the Java or C++
>> API to insert the data.
>>
>> Hope this helps,
>>
>> J-D
>>
>> On Mon, Jul 10, 2017 at 1:55 AM, sky <x_...@163.com> wrote:
>>
>>> Hi all,
>>> Kudu how to load data from a file? I know that kudu can insert data
>>> from impala , but is there any other way? Not through impala, executed by
>>> kudu alone.
>>> Thanks.
Re:Re: Load Data Question
Posted by sky <x_...@163.com>.
Hi,
If load data from a csv file, I can only traverse the file, one by one insert through the API ?
At 2017-07-10 22:40:05, "Jean-Daniel Cryans" <jd...@gmail.com> wrote:
>(sending to user@ and putting dev@ in bcc)
>
>Hi,
>
>Kudu by itself doesn't really have file loading capabilities, you'd have to
>write your own code that reads a file and then uses either the Java or C++
>API to insert the data.
>
>Hope this helps,
>
>J-D
>
>On Mon, Jul 10, 2017 at 1:55 AM, sky <x_...@163.com> wrote:
>
>> Hi all,
>> Kudu how to load data from a file? I know that kudu can insert data
>> from impala , but is there any other way? Not through impala, executed by
>> kudu alone.
>> Thanks.