You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Qi Luo <lu...@gmail.com> on 2019/09/04 09:23:06 UTC

Streaming write to Hive

Hi guys,

In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only
supports batch mode. StreamingFileSink can write to HDFS in streaming mode,
but it has no Hive related functionality (e.g. adding Hive partition).

Is there any easy way we can streaming write to Hive (with exactly-once
guarantee)?

Thanks,
Qi

Re: Streaming write to Hive

Posted by Qi Luo <lu...@gmail.com>.
Hi JingsongLee,

Fantastic! We'll look into it.

Thanks,
Qi

On Fri, Sep 6, 2019 at 10:52 AM JingsongLee <lz...@aliyun.com> wrote:

> Hi luoqi:
>
> With partition support[1], I want to introduce a FileFormatSink to
> cover streaming exactly-once and partition-related logic for flink
> file connectors and hive connector. You can take a look.
>
> [1]
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>
> Best,
> Jingsong Lee
>
> ------------------------------------------------------------------
> From:Bowen Li <bo...@gmail.com>
> Send Time:2019年9月6日(星期五) 05:21
> To:Qi Luo <lu...@gmail.com>
> Cc:user <us...@flink.apache.org>; snake.fly318 <sn...@gmail.com>;
> lichang.bd <li...@gmail.com>
> Subject:Re: Streaming write to Hive
>
> Hi,
>
> I'm not sure if there's one yet. Feel free to create one if not.
>
> On Wed, Sep 4, 2019 at 11:28 PM Qi Luo <lu...@gmail.com> wrote:
> Hi Bowen,
>
> Thank you for the information! Streaming write to Hive is a very common
> use case for our users. Is there any open issue for this to which we can
> try contributing?
>
> +Yufei and Chang who are also interested in this.
>
> Thanks,
> Qi
>
> On Thu, Sep 5, 2019 at 12:16 PM Bowen Li <bo...@gmail.com> wrote:
> Hi Qi,
>
> With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink
> implements AppendStreamTableSink (an empty interface for now) so it can be
> picked up in streaming job. Also, streaming requires checkpointing, and
> Hive sink doesn't do that yet. There might be other tweaks you need to make.
>
> It's on our list for 1.10, not high priority though.
>
> Bowen
>
> On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <lu...@gmail.com> wrote:
> Hi guys,
>
> In Flink 1.9 HiveTableSink is added to support writing to Hive, but it
> only supports batch mode. StreamingFileSink can write to HDFS in streaming
> mode, but it has no Hive related functionality (e.g. adding Hive partition).
>
> Is there any easy way we can streaming write to Hive (with exactly-once
> guarantee)?
>
> Thanks,
> Qi
>
>
>

Re: Streaming write to Hive

Posted by JingsongLee <lz...@aliyun.com>.
Hi luoqi:

With partition support[1], I want to introduce a FileFormatSink to
 cover streaming exactly-once and partition-related logic for flink
 file connectors and hive connector. You can take a look.

[1] https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing

Best,
Jingsong Lee


------------------------------------------------------------------
From:Bowen Li <bo...@gmail.com>
Send Time:2019年9月6日(星期五) 05:21
To:Qi Luo <lu...@gmail.com>
Cc:user <us...@flink.apache.org>; snake.fly318 <sn...@gmail.com>; lichang.bd <li...@gmail.com>
Subject:Re: Streaming write to Hive

Hi, 

I'm not sure if there's one yet. Feel free to create one if not.
On Wed, Sep 4, 2019 at 11:28 PM Qi Luo <lu...@gmail.com> wrote:

Hi Bowen,

Thank you for the information! Streaming write to Hive is a very common use case for our users. Is there any open issue for this to which we can try contributing?

+Yufei and Chang who are also interested in this.

Thanks,
Qi
On Thu, Sep 5, 2019 at 12:16 PM Bowen Li <bo...@gmail.com> wrote:
Hi Qi,

With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink implements AppendStreamTableSink (an empty interface for now) so it can be picked up in streaming job. Also, streaming requires checkpointing, and Hive sink doesn't do that yet. There might be other tweaks you need to make.

It's on our list for 1.10, not high priority though.

Bowen
On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <lu...@gmail.com> wrote:
Hi guys,

In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only supports batch mode. StreamingFileSink can write to HDFS in streaming mode, but it has no Hive related functionality (e.g. adding Hive partition).

Is there any easy way we can streaming write to Hive (with exactly-once guarantee)?

Thanks,
Qi


Re: Streaming write to Hive

Posted by Bowen Li <bo...@gmail.com>.
Hi,

I'm not sure if there's one yet. Feel free to create one if not.

On Wed, Sep 4, 2019 at 11:28 PM Qi Luo <lu...@gmail.com> wrote:

> Hi Bowen,
>
> Thank you for the information! Streaming write to Hive is a very common
> use case for our users. Is there any open issue for this to which we can
> try contributing?
>
> +Yufei and Chang who are also interested in this.
>
> Thanks,
> Qi
>
> On Thu, Sep 5, 2019 at 12:16 PM Bowen Li <bo...@gmail.com> wrote:
>
>> Hi Qi,
>>
>> With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink
>> implements AppendStreamTableSink (an empty interface for now) so it can be
>> picked up in streaming job. Also, streaming requires checkpointing, and
>> Hive sink doesn't do that yet. There might be other tweaks you need to make.
>>
>> It's on our list for 1.10, not high priority though.
>>
>> Bowen
>>
>> On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <lu...@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> In Flink 1.9 HiveTableSink is added to support writing to Hive, but it
>>> only supports batch mode. StreamingFileSink can write to HDFS in streaming
>>> mode, but it has no Hive related functionality (e.g. adding Hive partition).
>>>
>>> Is there any easy way we can streaming write to Hive (with exactly-once
>>> guarantee)?
>>>
>>> Thanks,
>>> Qi
>>>
>>

Re: Streaming write to Hive

Posted by Qi Luo <lu...@gmail.com>.
Hi Bowen,

Thank you for the information! Streaming write to Hive is a very common use
case for our users. Is there any open issue for this to which we can try
contributing?

+Yufei and Chang who are also interested in this.

Thanks,
Qi

On Thu, Sep 5, 2019 at 12:16 PM Bowen Li <bo...@gmail.com> wrote:

> Hi Qi,
>
> With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink
> implements AppendStreamTableSink (an empty interface for now) so it can be
> picked up in streaming job. Also, streaming requires checkpointing, and
> Hive sink doesn't do that yet. There might be other tweaks you need to make.
>
> It's on our list for 1.10, not high priority though.
>
> Bowen
>
> On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <lu...@gmail.com> wrote:
>
>> Hi guys,
>>
>> In Flink 1.9 HiveTableSink is added to support writing to Hive, but it
>> only supports batch mode. StreamingFileSink can write to HDFS in streaming
>> mode, but it has no Hive related functionality (e.g. adding Hive partition).
>>
>> Is there any easy way we can streaming write to Hive (with exactly-once
>> guarantee)?
>>
>> Thanks,
>> Qi
>>
>

Re: Streaming write to Hive

Posted by Bowen Li <bo...@gmail.com>.
Hi Qi,

With 1.9 out of shelf, I'm afraid not. You can make HiveTableSink
implements AppendStreamTableSink (an empty interface for now) so it can be
picked up in streaming job. Also, streaming requires checkpointing, and
Hive sink doesn't do that yet. There might be other tweaks you need to make.

It's on our list for 1.10, not high priority though.

Bowen

On Wed, Sep 4, 2019 at 2:23 AM Qi Luo <lu...@gmail.com> wrote:

> Hi guys,
>
> In Flink 1.9 HiveTableSink is added to support writing to Hive, but it
> only supports batch mode. StreamingFileSink can write to HDFS in streaming
> mode, but it has no Hive related functionality (e.g. adding Hive partition).
>
> Is there any easy way we can streaming write to Hive (with exactly-once
> guarantee)?
>
> Thanks,
> Qi
>