You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Wei-Chiu Chuang <we...@cloudera.com.INVALID> on 2020/06/26 20:39:42 UTC

Re: ZStandard compression crashes

A similar bug was reported: HADOOP-17096
<https://issues.apache.org/jira/browse/HADOOP-17096>

On Mon, May 11, 2020 at 3:48 PM Eric Yang <ey...@apache.org> wrote:

> If I recall this problem correctly, the root cause is the default zstd
> compression block size is 256kb, and Hadoop Zstd compression will attempt
> to use the OS platform default compression size, if it is available.  The
> recommended output size is slightly bigger than input size to account for
> header size in Zstd compression.
> http://software.icecube.wisc.edu/coverage/00_LATEST/icetray/private/zstd/lib/compress/zstd_compress.c.gcov.html#2982
>
> Where, Hadoop code
> https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c#L259 is
> setting output size to the same as input size, if input size is bigger than
> output size.  By manually setting buffer size to a small value, input size
> will be smaller than recommended output size to keep the system working.
> By returning ZTD_CStreamOutSize() in getSteramSize, it may enable the
> system to work without a predefined default.
>
> On Mon, May 11, 2020 at 2:29 PM Wei-Chiu Chuang
> <we...@cloudera.com.invalid> wrote:
>
>> Thanks for the pointer, it does look similar. However we are roughly on
>> the
>> latest of branch-3.1 and this fix is in our branch. I'm pretty sure we
>> have
>> all the zstd fixes.
>>
>> I believe the libzstd version used is 1.4.4 but need to confirm. I
>> suspected it's a library version issue because we've been using zstd
>> compression for over a year, and this bug (reproducible) happens
>> consistently just recently.
>>
>> On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ay...@gmail.com> wrote:
>>
>> > Hi Wei Chiu,
>> > What is the Hadoop version being used?
>> > Give a check if HADOOP-15822 is there, it had something similar error.
>> >
>> > -Ayush
>> >
>> > > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <we...@apache.org>
>> wrote:
>> > >
>> > > Hadoop devs,
>> > >
>> > > A colleague of mine recently hit a strange issue where zstd
>> compression
>> > > codec crashes.
>> > >
>> > > Caused by: java.lang.InternalError: Error (generic)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
>> > > Method)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
>> > > at
>> > >
>> >
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>> > > at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> > > at
>> > >
>> >
>> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
>> > > at
>> > >
>> >
>> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
>> > >
>> > > Anyone out there hitting the similar problem?
>> > >
>> > > A temporary workaround is to set buffer size "set
>> > > io.compression.codec.zstd.buffersize=8192;"
>> > >
>> > > We suspected it's a bug in zstd library, but couldn't verify. Just
>> want
>> > to
>> > > send this out and see if I can get some luck.
>> >
>>
>