You are viewing a plain text version of this content. The canonical link for it is here.
Posted to community@flink.apache.org by Robert Rapplean <rr...@altitudedigital.com> on 2017/07/16 19:46:51 UTC

Greetings and question

Hey, everyone.

I have a need for Flink to write to ORCFile tables in the near future.
Could someone educate me on the current challenges that might make that
hard to do? I've worked quite a bit with the HCat libraries, and may be
overconfident about how complicated this is. Is anyone currently working on
the issue?

I'd go ahead and submit a Jira ticket for this, but am deterred by the
thought that someone should have already created such a ticket, and
wondering why it isn't already there. It may be a priority thing, but this
is my personal priority at the moment.

Best,

Robert

Re: Greetings and question

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Robert,

I don't think anybody is working on a ORC file sink.
Are you interested in a sink for data streams or a batch sink?

Implementing a batch sink shouldn't be very hard.
You can either implement an OutputFormat the internally uses the ORC Java
API or you try to use Flink's HadoopOutputFormat which can wrap Hadoop
OutputFormats.

If you need a streaming ORC sink, things become a bit more challenging
because you would need to integrate the sink with Flink's checkpointing
mechanism.
I would recommend to have a look at the BucketingSink and it's JavaDocs.

Best,
Fabian

2017-07-17 6:55 GMT+02:00 Tzu-Li (Gordon) Tai <tz...@apache.org>:

> Hi Robert,
>
> Thanks for your interest in contributing that.
> AFAIK, I don’t think there’s any ongoing efforts yet in an ORC table sink.
> I’ll loop in Fabian (CC'ed) who might know more about this.
> The only complicated consideration in designing sinks is to consider the
> delivery guarantees it will provide and how to provide them using Flink’s
> checkpointing mechanism.
> I would suggest to open a JIRA (if there isn’t one already) and elaborate
> the details there to collect feedback before jumping right in.
>
> Cheers,
> Gordon
>
> On 17 July 2017 at 3:47:02 AM, Robert Rapplean (
> rrapplean@altitudedigital.com) wrote:
>
> Hey, everyone.
>
> I have a need for Flink to write to ORCFile tables in the near future.
> Could someone educate me on the current challenges that might make that
> hard to do? I've worked quite a bit with the HCat libraries, and may be
> overconfident about how complicated this is. Is anyone currently working
> on
> the issue?
>
> I'd go ahead and submit a Jira ticket for this, but am deterred by the
> thought that someone should have already created such a ticket, and
> wondering why it isn't already there. It may be a priority thing, but this
> is my personal priority at the moment.
>
> Best,
>
> Robert
>
>

Re: Greetings and question

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Robert,

I don't think anybody is working on a ORC file sink.
Are you interested in a sink for data streams or a batch sink?

Implementing a batch sink shouldn't be very hard.
You can either implement an OutputFormat the internally uses the ORC Java
API or you try to use Flink's HadoopOutputFormat which can wrap Hadoop
OutputFormats.

If you need a streaming ORC sink, things become a bit more challenging
because you would need to integrate the sink with Flink's checkpointing
mechanism.
I would recommend to have a look at the BucketingSink and it's JavaDocs.

Best,
Fabian

2017-07-17 6:55 GMT+02:00 Tzu-Li (Gordon) Tai <tz...@apache.org>:

> Hi Robert,
>
> Thanks for your interest in contributing that.
> AFAIK, I don’t think there’s any ongoing efforts yet in an ORC table sink.
> I’ll loop in Fabian (CC'ed) who might know more about this.
> The only complicated consideration in designing sinks is to consider the
> delivery guarantees it will provide and how to provide them using Flink’s
> checkpointing mechanism.
> I would suggest to open a JIRA (if there isn’t one already) and elaborate
> the details there to collect feedback before jumping right in.
>
> Cheers,
> Gordon
>
> On 17 July 2017 at 3:47:02 AM, Robert Rapplean (
> rrapplean@altitudedigital.com) wrote:
>
> Hey, everyone.
>
> I have a need for Flink to write to ORCFile tables in the near future.
> Could someone educate me on the current challenges that might make that
> hard to do? I've worked quite a bit with the HCat libraries, and may be
> overconfident about how complicated this is. Is anyone currently working
> on
> the issue?
>
> I'd go ahead and submit a Jira ticket for this, but am deterred by the
> thought that someone should have already created such a ticket, and
> wondering why it isn't already there. It may be a priority thing, but this
> is my personal priority at the moment.
>
> Best,
>
> Robert
>
>

Re: Greetings and question

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi Robert,

Thanks for your interest in contributing that.
AFAIK, I don’t think there’s any ongoing efforts yet in an ORC table sink. I’ll loop in Fabian (CC'ed) who might know more about this.
The only complicated consideration in designing sinks is to consider the delivery guarantees it will provide and how to provide them using Flink’s checkpointing mechanism.
I would suggest to open a JIRA (if there isn’t one already) and elaborate the details there to collect feedback before jumping right in.

Cheers,
Gordon

On 17 July 2017 at 3:47:02 AM, Robert Rapplean (rrapplean@altitudedigital.com) wrote:

Hey, everyone.  

I have a need for Flink to write to ORCFile tables in the near future.  
Could someone educate me on the current challenges that might make that  
hard to do? I've worked quite a bit with the HCat libraries, and may be  
overconfident about how complicated this is. Is anyone currently working on  
the issue?  

I'd go ahead and submit a Jira ticket for this, but am deterred by the  
thought that someone should have already created such a ticket, and  
wondering why it isn't already there. It may be a priority thing, but this  
is my personal priority at the moment.  

Best,  

Robert  

Re: Greetings and question

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi Robert,

Thanks for your interest in contributing that.
AFAIK, I don’t think there’s any ongoing efforts yet in an ORC table sink. I’ll loop in Fabian (CC'ed) who might know more about this.
The only complicated consideration in designing sinks is to consider the delivery guarantees it will provide and how to provide them using Flink’s checkpointing mechanism.
I would suggest to open a JIRA (if there isn’t one already) and elaborate the details there to collect feedback before jumping right in.

Cheers,
Gordon

On 17 July 2017 at 3:47:02 AM, Robert Rapplean (rrapplean@altitudedigital.com) wrote:

Hey, everyone.  

I have a need for Flink to write to ORCFile tables in the near future.  
Could someone educate me on the current challenges that might make that  
hard to do? I've worked quite a bit with the HCat libraries, and may be  
overconfident about how complicated this is. Is anyone currently working on  
the issue?  

I'd go ahead and submit a Jira ticket for this, but am deterred by the  
thought that someone should have already created such a ticket, and  
wondering why it isn't already there. It may be a priority thing, but this  
is my personal priority at the moment.  

Best,  

Robert