You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Chen Qin <qi...@gmail.com> on 2020/11/19 06:30:39 UTC

Hive Streaming write compaction

Hi there,

We are testing out writing Kafka to hive table as parquet format.
Currently, we have seen user has to choose to create lots of small files in
min level folder to gain latency benefits. I recall FF2020 Global folks
mentioned implement compaction logic during the checkpointing time. Wonder
how that goes? Love collaborate on this topic.

Chen
Pinterest

Re: Hive Streaming write compaction

Posted by Kurt Young <yk...@gmail.com>.
We just added this feature to 1.12 [1][2], it would be great that you can
download the 1.12 RC to test
it out, and give us some feedback.

In case you will wonder why I linked 2 jiras, it's because both FileSystem
& Hive connector share
the same option options and also the implementations.

[1] https://issues.apache.org/jira/browse/FLINK-19875
[2] https://issues.apache.org/jira/browse/FLINK-19886

Best,
Kurt


On Thu, Nov 19, 2020 at 2:31 PM Chen Qin <qi...@gmail.com> wrote:

> Hi there,
>
> We are testing out writing Kafka to hive table as parquet format.
> Currently, we have seen user has to choose to create lots of small files in
> min level folder to gain latency benefits. I recall FF2020 Global folks
> mentioned implement compaction logic during the checkpointing time. Wonder
> how that goes? Love collaborate on this topic.
>
> Chen
> Pinterest
>

Re: Hive Streaming write compaction

Posted by Jingsong Li <ji...@gmail.com>.
Hi Chen,

Table Filesystem/Hive sink file compaction has been merged into master,
detail in [1]. It is included in Flink 1.12.

Hope you can have a try and test.

[1]https://issues.apache.org/jira/browse/FLINK-19345

Best,
Jingsong

On Thu, Nov 19, 2020 at 2:31 PM Chen Qin <qi...@gmail.com> wrote:

> Hi there,
>
> We are testing out writing Kafka to hive table as parquet format.
> Currently, we have seen user has to choose to create lots of small files in
> min level folder to gain latency benefits. I recall FF2020 Global folks
> mentioned implement compaction logic during the checkpointing time. Wonder
> how that goes? Love collaborate on this topic.
>
> Chen
> Pinterest
>


-- 
Best, Jingsong Lee