You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by lr...@lyft.com, lr...@lyft.com on 2018/03/27 23:59:48 UTC

Compressing files with the Bucketing Sink

I want to upload a compressed file (gzip preferrably) using the Bucketing Sink. What is the best way to do this? Would I have to implement my own Writer that does the compression? Has anyone done something similar?

Re: Compressing files with the Bucketing Sink

Posted by lr...@lyft.com, lr...@lyft.com.





Thanks a lot for the suggestion Till!

I ended up using your suggestion of extending StreamWriterBase and wrapping the FSDataOutputStream with GZIPOutputStream.


On 2018/03/28 09:44:26, Till Rohrmann <tr...@apache.org> wrote: 
> Hi,
> 
> the SequenceFileWriter and the AvroKeyValueSinkWriter both support
> compressed outputs. Apart from that, I'm not aware of any other Writers
> which support compression. Maybe you could use these two Writers as a
> guiding example. Alternatively, you could try to extend the
> StreamWriterBase and wrapping the outStream into a GZIPOutputStream.
> 
> Cheers,
> Till
> 
> On Wed, Mar 28, 2018 at 1:59 AM, lrao@lyft.com <lr...@lyft.com> wrote:
> 
> > I want to upload a compressed file (gzip preferrably) using the Bucketing
> > Sink. What is the best way to do this? Would I have to implement my own
> > Writer that does the compression? Has anyone done something similar?
> >
> 

Re: Compressing files with the Bucketing Sink

Posted by Till Rohrmann <tr...@apache.org>.
Hi,

the SequenceFileWriter and the AvroKeyValueSinkWriter both support
compressed outputs. Apart from that, I'm not aware of any other Writers
which support compression. Maybe you could use these two Writers as a
guiding example. Alternatively, you could try to extend the
StreamWriterBase and wrapping the outStream into a GZIPOutputStream.

Cheers,
Till

On Wed, Mar 28, 2018 at 1:59 AM, lrao@lyft.com <lr...@lyft.com> wrote:

> I want to upload a compressed file (gzip preferrably) using the Bucketing
> Sink. What is the best way to do this? Would I have to implement my own
> Writer that does the compression? Has anyone done something similar?
>