You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Frederick Kautz <fk...@pseudocode.cc> on 2016/05/21 18:21:33 UTC

BEAM-64

I impemented a potential solution to "[BEAM-64] General decompression
registry". It still needs a bit more attention with some of the finer
details, e.g. better error handling, better javadocs, adding unit tests.

However, before I spend more time on it, I would like a review of the
general design.

https://github.com/apache/incubator-beam/compare/master...fkautz:beam-64?expand=1

Design:

I attempted to implement an approach that would require no code changes to
the users. There is an SDK interface change, but it should be backwards
compatible with existing code.

TextIO.withCompression() is now capable of receiving a generic compressor
operator which includes all of the enums from before (AUTO, UNCOMPRESSED,
GZIP, BZIP2) but now can also receive a user or library implemented
compressor.

CompressionType also receives a new getRegistry() which allows the user to
customize the behavior of AUTO. It allows the user to add, replace or
remove registered compressors as necessary.

Here's a short list of changes:

* Create a new CompressorOperator, compatible with Java 8 lambda
* CompressionType enum now implements CompressorType
* withCompression now takes a CompressorOperator
* Compression wrappers implementations moved from in-line code to
CompressionType enum
* Compression registry created
* AUTO now supports compressors registered with the registry

Can someone review the design and give me feedback? If the design looks
good, I'll move forward on implementing tests, better exception error
messages, and improve the javadocs.

Thanks,
Frederick

Re: BEAM-64

Posted by Davor Bonaci <da...@google.com.INVALID>.
[ I'll reply a little bit and leave the details to Dan. ]

First, Frederick, welcome! We look forward to your contributions to Beam.

On a first glance, BEAM-64 was a little under-specified. Let me try to
clarify what was intended:
* Add a pipeline-level registry of compression formats with a corresponding
logic to compress/decompress. This is perhaps somewhat similar design to
CoderRegistry.
* Remove the current logic from CompressedSource, but keep the ability to
override the registry.
* Propagate the ability to override the registry to the users of
CompressedSource, one of which is TextIO.

From the user perspective, the experience would be as follows:
* Add custom compressed formats to the registry, just after creating the
pipeline.
* Use any (applicable) IO without any special considerations. Compression
is handled automatically by the filename extension.
* Alternatively, override the compression format at any source / sink.

Does this make sense?

On Sun, May 22, 2016 at 3:01 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Frederick,
>
> thanks for the update. We gonna take a look.
>
> Thanks !
> Regards
> JB
>
>
> On 05/21/2016 08:21 PM, Frederick Kautz wrote:
>
>> I impemented a potential solution to "[BEAM-64] General decompression
>> registry". It still needs a bit more attention with some of the finer
>> details, e.g. better error handling, better javadocs, adding unit tests.
>>
>> However, before I spend more time on it, I would like a review of the
>> general design.
>>
>>
>> https://github.com/apache/incubator-beam/compare/master...fkautz:beam-64?expand=1
>>
>> Design:
>>
>> I attempted to implement an approach that would require no code changes to
>> the users. There is an SDK interface change, but it should be backwards
>> compatible with existing code.
>>
>> TextIO.withCompression() is now capable of receiving a generic compressor
>> operator which includes all of the enums from before (AUTO, UNCOMPRESSED,
>> GZIP, BZIP2) but now can also receive a user or library implemented
>> compressor.
>>
>> CompressionType also receives a new getRegistry() which allows the user to
>> customize the behavior of AUTO. It allows the user to add, replace or
>> remove registered compressors as necessary.
>>
>> Here's a short list of changes:
>>
>> * Create a new CompressorOperator, compatible with Java 8 lambda
>> * CompressionType enum now implements CompressorType
>> * withCompression now takes a CompressorOperator
>> * Compression wrappers implementations moved from in-line code to
>> CompressionType enum
>> * Compression registry created
>> * AUTO now supports compressors registered with the registry
>>
>> Can someone review the design and give me feedback? If the design looks
>> good, I'll move forward on implementing tests, better exception error
>> messages, and improve the javadocs.
>>
>> Thanks,
>> Frederick
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: BEAM-64

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Frederick,

thanks for the update. We gonna take a look.

Thanks !
Regards
JB

On 05/21/2016 08:21 PM, Frederick Kautz wrote:
> I impemented a potential solution to "[BEAM-64] General decompression
> registry". It still needs a bit more attention with some of the finer
> details, e.g. better error handling, better javadocs, adding unit tests.
>
> However, before I spend more time on it, I would like a review of the
> general design.
>
> https://github.com/apache/incubator-beam/compare/master...fkautz:beam-64?expand=1
>
> Design:
>
> I attempted to implement an approach that would require no code changes to
> the users. There is an SDK interface change, but it should be backwards
> compatible with existing code.
>
> TextIO.withCompression() is now capable of receiving a generic compressor
> operator which includes all of the enums from before (AUTO, UNCOMPRESSED,
> GZIP, BZIP2) but now can also receive a user or library implemented
> compressor.
>
> CompressionType also receives a new getRegistry() which allows the user to
> customize the behavior of AUTO. It allows the user to add, replace or
> remove registered compressors as necessary.
>
> Here's a short list of changes:
>
> * Create a new CompressorOperator, compatible with Java 8 lambda
> * CompressionType enum now implements CompressorType
> * withCompression now takes a CompressorOperator
> * Compression wrappers implementations moved from in-line code to
> CompressionType enum
> * Compression registry created
> * AUTO now supports compressors registered with the registry
>
> Can someone review the design and give me feedback? If the design looks
> good, I'll move forward on implementing tests, better exception error
> messages, and improve the javadocs.
>
> Thanks,
> Frederick
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com