You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Gang Wu <ga...@apache.org> on 2019/02/11 22:57:56 UTC

Re: Add ZStandard compression to ORC -- ORC-306

Thanks David for providing your use cases.

Hi Owen, can we resume reviewing the aforementioned PR of ORC-363? Anyone
interested in reviewing this PR is welcome. Thanks!

Best,
Gang

On Fri, Feb 8, 2019 at 4:02 PM David Christle <dc...@linkedin.com>
wrote:

> Hi,
>
>
>
> I am interested in the status of pull request ORC-363 (
> https://github.com/apache/orc/pull/306), which adds the ZStandard
> compression codec to the Java reader/writer. I am very keen on
> experimenting with this codec for large scale data processing, and driving
> adoption of it to my colleagues, but I noticed that it seems to have
> stalled since the beginning of November waiting for review. As you know,
> ZStandard is a newer compression algorithm that offers essentially better
> compression than zlib at substantially faster speeds. It was recently
> enabled in the C++ writer/reader in ORC-395 (
> https://github.com/apache/orc/pull/301), but I don’t think this will work
> for using ZStandard within ORC in Apache Spark (my primary data processing
> framework).
>
>
>
> I do think this addition to ORC is a good one to shepherd through the
> review process, as I think it will be useful for anyone doing the kind of
> large scale data processing that ORC is designed to enable – Facebook has
> already implemented ZStandard in ORC, and recently reported double-digit
> improvements in both compression and speed (
> https://code.fb.com/core-data/zstandard/) in their data warehousing
> applications.
>
>
>
> Kind regards,
>
> David Christle
>
>
>