You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/09/27 09:23:00 UTC

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

    [ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609928#comment-17609928 ] 

ASF GitHub Bot commented on PARQUET-2196:
-----------------------------------------

wgtmac opened a new pull request, #1000:
URL: https://github.com/apache/parquet-mr/pull/1000

   This PR implements the LZ4_RAW codec which was introduced by parquet format v2.9.0. Since there are a lot of common logic between the LZ4_RAW and SNAPPY codecs, this patch moves them into NonBlockedCompressor and NonBlockedDecompressor and make the specific codec extend them.
   
   Added TestLz4RawCodec test to make sure the new codec itself is correct.




> Support LZ4_RAW codec
> ---------------------
>
>                 Key: PARQUET-2196
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2196
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Gang Wu
>            Priority: Major
>
> There is a long history about the LZ4 interoperability of parquet files between parquet-mr and parquet-cpp (which is now in the Apache Arrow). Attached links are the evidence. In short, a new LZ4_RAW codec type has been introduced since parquet format v2.9.0. However, only parquet-cpp supports LZ4_RAW. The parquet-mr library still uses the old Hadoop-provided LZ4 codec and cannot read parquet files with LZ4_RAW.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)