You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn> on 2020/03/20 03:55:19 UTC

Streaming kafka data sink to hive

We have many app logs on our app server  and want to parse the logs to structed table format and then sink to hive.
Seems it is good to use batch mode. The app log is hourly compressed and it is convenience to do partitioning.

We want to use streaming mode. Tail the app logs to Kafka,  then use flink to read kafka topic  and then sink to Hive.
I have several questions.

1  Is there any flink-hive-connector that i can use to write to hive streamingly？
2  Since HDFS is not friendly to frequently append and hive's data is stored to hdfs,  is it  OK if the throughput is high? 

Thanks,
Lei



wanglei2@geekplus.com.cn

Re: Streaming kafka data sink to hive

Posted by Jingsong Li <ji...@gmail.com>.

Hi wanglei,

> 1  Is there any flink-hive-connector that i can use to write to hive
streamingly？

"Streaming kafka data sink to hive" is under discussion.[1]
And POC work is ongoing.[2] We want to support it in release-1.11.

> 2  Since HDFS is not friendly to frequently append and hive's data is
stored to hdfs,  is it  OK if the throughput is high?

We should concern small files, It's better to have 128MB for each file.
If the throughput is high, I think you can try to write files in 5 minutes
or 10 minutes.
You can learn more in [3].

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-115-Filesystem-connector-in-Table-td38870.html
[2]https://github.com/apache/flink/pull/11457
[3]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/streamfile_sink.html

Best,
Jingsong Lee

On Fri, Mar 20, 2020 at 11:55 AM wanglei2@geekplus.com.cn <
wanglei2@geekplus.com.cn> wrote:

>
> We have many app logs on our app server  and want to parse the logs to
> structed table format and then sink to hive.
> Seems it is good to use batch mode. The app log is hourly compressed and
> it is convenience to do partitioning.
>
> We want to use streaming mode. Tail the app logs to Kafka,  then use flink
> to read kafka topic  and then sink to Hive.
> I have several questions.
>
> 1  Is there any flink-hive-connector that i can use to write to hive
> streamingly？
> 2  Since HDFS is not friendly to frequently append and hive's data is
> stored to hdfs,  is it  OK if the throughput is high?
>
> Thanks,
> Lei
>
> ------------------------------
> wanglei2@geekplus.com.cn
>


-- 
Best, Jingsong Lee