You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn> on 2020/03/20 07:30:07 UTC

Can hive bear high throughput streaming data ingest?

https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2

I want to stream my app log to Hive using flume on the edge app server.
Since HDFS is not friendly to frequently write, I am afraid this way can not bear  high throuthput.

Any suggesions on this?

Thanks,
Lei



wanglei2@geekplus.com.cn 


Re: Can hive bear high throughput streaming data ingest?

Posted by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>.
Hi Prasanth,

 I tried to run your test example but me errors and submt a issue: 
  https://github.com/prasanthj/culvert/issues/1
I am using Hive3.1.1 

Thanks,
Lei




wanglei2@geekplus.com.cn
 
发件人: Prasanth Jayachandran
发送时间: 2020-03-20 15:41
收件人: user@hive.apache.org
主题: Re: Can hive bear high throughput streaming data ingest?
Use higher transaction batch size? Begin transaction opens a file, commit transaction writes intermediate footer but the file is kept open until the entire batch completes. So bigger batch size with less frequent commits can avoid creating too many small files in hdfs. Here is a test application for hive streaming v2 https://github.com/prasanthj/culvert/blob/v2/README.md that injected ~1.5 million rows/sec with 64 threads and 100K row commit interval in hdfs. https://github.com/prasanthj/culvert/blob/v2/report.txt

Thanks
Prasanth


From: wanglei2@geekplus.com.cn <wa...@geekplus.com.cn>
Sent: Friday, March 20, 2020 12:30:07 AM
To: user <us...@hive.apache.org>
Subject: Can hive bear high throughput streaming data ingest? 
 
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2

I want to stream my app log to Hive using flume on the edge app server.
Since HDFS is not friendly to frequently write, I am afraid this way can not bear  high throuthput.

Any suggesions on this?

Thanks,
Lei



wanglei2@geekplus.com.cn 


Re: Can hive bear high throughput streaming data ingest?

Posted by Prasanth Jayachandran <pj...@cloudera.com>.
Use higher transaction batch size? Begin transaction opens a file, commit transaction writes intermediate footer but the file is kept open until the entire batch completes. So bigger batch size with less frequent commits can avoid creating too many small files in hdfs. Here is a test application for hive streaming v2 https://github.com/prasanthj/culvert/blob/v2/README.md that injected ~1.5 million rows/sec with 64 threads and 100K row commit interval in hdfs. https://github.com/prasanthj/culvert/blob/v2/report.txt

Thanks
Prasanth
________________________________
From: wanglei2@geekplus.com.cn <wa...@geekplus.com.cn>
Sent: Friday, March 20, 2020 12:30:07 AM
To: user <us...@hive.apache.org>
Subject: Can hive bear high throughput streaming data ingest?

https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2

I want to stream my app log to Hive using flume on the edge app server.
Since HDFS is not friendly to frequently write, I am afraid this way can not bear  high throuthput.

Any suggesions on this?

Thanks,
Lei

________________________________
wanglei2@geekplus.com.cn<ma...@geekplus.com.cn>


Re: Can hive bear high throughput streaming data ingest?

Posted by Jörn Franke <jo...@gmail.com>.
Why don’t you write them directly on local storage and then write them all to HDFS?

Then you can create an external table in Hive on them and do analyses

> Am 20.03.2020 um 08:30 schrieb "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>:
> 
> 
> https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2
> 
> I want to stream my app log to Hive using flume on the edge app server.
> Since HDFS is not friendly to frequently write, I am afraid this way can not bear  high throuthput.
> 
> Any suggesions on this?
> 
> Thanks,
> Lei
> 
> wanglei2@geekplus.com.cn 
>