You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Wei-Chiu Chuang <we...@apache.org> on 2020/05/11 16:41:20 UTC

ZStandard compression crashes

Hadoop devs,

A colleague of mine recently hit a strange issue where zstd compression
codec crashes.

Caused by: java.lang.InternalError: Error (generic)
at
org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
Method)
at
org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
at
org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
at
org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
at
org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)

Anyone out there hitting the similar problem?

A temporary workaround is to set buffer size "set
io.compression.codec.zstd.buffersize=8192;"

We suspected it's a bug in zstd library, but couldn't verify. Just want to
send this out and see if I can get some luck.

Re: ZStandard compression crashes

Posted by Wei-Chiu Chuang <we...@cloudera.com.INVALID>.
A similar bug was reported: HADOOP-17096
<https://issues.apache.org/jira/browse/HADOOP-17096>

On Mon, May 11, 2020 at 3:48 PM Eric Yang <ey...@apache.org> wrote:

> If I recall this problem correctly, the root cause is the default zstd
> compression block size is 256kb, and Hadoop Zstd compression will attempt
> to use the OS platform default compression size, if it is available.  The
> recommended output size is slightly bigger than input size to account for
> header size in Zstd compression.
> http://software.icecube.wisc.edu/coverage/00_LATEST/icetray/private/zstd/lib/compress/zstd_compress.c.gcov.html#2982
>
> Where, Hadoop code
> https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c#L259 is
> setting output size to the same as input size, if input size is bigger than
> output size.  By manually setting buffer size to a small value, input size
> will be smaller than recommended output size to keep the system working.
> By returning ZTD_CStreamOutSize() in getSteramSize, it may enable the
> system to work without a predefined default.
>
> On Mon, May 11, 2020 at 2:29 PM Wei-Chiu Chuang
> <we...@cloudera.com.invalid> wrote:
>
>> Thanks for the pointer, it does look similar. However we are roughly on
>> the
>> latest of branch-3.1 and this fix is in our branch. I'm pretty sure we
>> have
>> all the zstd fixes.
>>
>> I believe the libzstd version used is 1.4.4 but need to confirm. I
>> suspected it's a library version issue because we've been using zstd
>> compression for over a year, and this bug (reproducible) happens
>> consistently just recently.
>>
>> On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ay...@gmail.com> wrote:
>>
>> > Hi Wei Chiu,
>> > What is the Hadoop version being used?
>> > Give a check if HADOOP-15822 is there, it had something similar error.
>> >
>> > -Ayush
>> >
>> > > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <we...@apache.org>
>> wrote:
>> > >
>> > > Hadoop devs,
>> > >
>> > > A colleague of mine recently hit a strange issue where zstd
>> compression
>> > > codec crashes.
>> > >
>> > > Caused by: java.lang.InternalError: Error (generic)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
>> > > Method)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
>> > > at
>> > >
>> >
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>> > > at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> > > at
>> > >
>> >
>> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
>> > > at
>> > >
>> >
>> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
>> > >
>> > > Anyone out there hitting the similar problem?
>> > >
>> > > A temporary workaround is to set buffer size "set
>> > > io.compression.codec.zstd.buffersize=8192;"
>> > >
>> > > We suspected it's a bug in zstd library, but couldn't verify. Just
>> want
>> > to
>> > > send this out and see if I can get some luck.
>> >
>>
>

Re: ZStandard compression crashes

Posted by Wei-Chiu Chuang <we...@cloudera.com.INVALID>.
A similar bug was reported: HADOOP-17096
<https://issues.apache.org/jira/browse/HADOOP-17096>

On Mon, May 11, 2020 at 3:48 PM Eric Yang <ey...@apache.org> wrote:

> If I recall this problem correctly, the root cause is the default zstd
> compression block size is 256kb, and Hadoop Zstd compression will attempt
> to use the OS platform default compression size, if it is available.  The
> recommended output size is slightly bigger than input size to account for
> header size in Zstd compression.
> http://software.icecube.wisc.edu/coverage/00_LATEST/icetray/private/zstd/lib/compress/zstd_compress.c.gcov.html#2982
>
> Where, Hadoop code
> https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c#L259 is
> setting output size to the same as input size, if input size is bigger than
> output size.  By manually setting buffer size to a small value, input size
> will be smaller than recommended output size to keep the system working.
> By returning ZTD_CStreamOutSize() in getSteramSize, it may enable the
> system to work without a predefined default.
>
> On Mon, May 11, 2020 at 2:29 PM Wei-Chiu Chuang
> <we...@cloudera.com.invalid> wrote:
>
>> Thanks for the pointer, it does look similar. However we are roughly on
>> the
>> latest of branch-3.1 and this fix is in our branch. I'm pretty sure we
>> have
>> all the zstd fixes.
>>
>> I believe the libzstd version used is 1.4.4 but need to confirm. I
>> suspected it's a library version issue because we've been using zstd
>> compression for over a year, and this bug (reproducible) happens
>> consistently just recently.
>>
>> On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ay...@gmail.com> wrote:
>>
>> > Hi Wei Chiu,
>> > What is the Hadoop version being used?
>> > Give a check if HADOOP-15822 is there, it had something similar error.
>> >
>> > -Ayush
>> >
>> > > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <we...@apache.org>
>> wrote:
>> > >
>> > > Hadoop devs,
>> > >
>> > > A colleague of mine recently hit a strange issue where zstd
>> compression
>> > > codec crashes.
>> > >
>> > > Caused by: java.lang.InternalError: Error (generic)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
>> > > Method)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
>> > > at
>> > >
>> >
>> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
>> > > at
>> > >
>> >
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>> > > at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> > > at
>> > >
>> >
>> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
>> > > at
>> > >
>> >
>> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
>> > >
>> > > Anyone out there hitting the similar problem?
>> > >
>> > > A temporary workaround is to set buffer size "set
>> > > io.compression.codec.zstd.buffersize=8192;"
>> > >
>> > > We suspected it's a bug in zstd library, but couldn't verify. Just
>> want
>> > to
>> > > send this out and see if I can get some luck.
>> >
>>
>

Re: ZStandard compression crashes

Posted by Eric Yang <ey...@apache.org>.
If I recall this problem correctly, the root cause is the default zstd
compression block size is 256kb, and Hadoop Zstd compression will attempt
to use the OS platform default compression size, if it is available.  The
recommended output size is slightly bigger than input size to account for
header size in Zstd compression.
http://software.icecube.wisc.edu/coverage/00_LATEST/icetray/private/zstd/lib/compress/zstd_compress.c.gcov.html#2982

Where, Hadoop code
https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c#L259
is
setting output size to the same as input size, if input size is bigger than
output size.  By manually setting buffer size to a small value, input size
will be smaller than recommended output size to keep the system working.
By returning ZTD_CStreamOutSize() in getSteramSize, it may enable the
system to work without a predefined default.

On Mon, May 11, 2020 at 2:29 PM Wei-Chiu Chuang
<we...@cloudera.com.invalid> wrote:

> Thanks for the pointer, it does look similar. However we are roughly on the
> latest of branch-3.1 and this fix is in our branch. I'm pretty sure we have
> all the zstd fixes.
>
> I believe the libzstd version used is 1.4.4 but need to confirm. I
> suspected it's a library version issue because we've been using zstd
> compression for over a year, and this bug (reproducible) happens
> consistently just recently.
>
> On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Hi Wei Chiu,
> > What is the Hadoop version being used?
> > Give a check if HADOOP-15822 is there, it had something similar error.
> >
> > -Ayush
> >
> > > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <we...@apache.org>
> wrote:
> > >
> > > Hadoop devs,
> > >
> > > A colleague of mine recently hit a strange issue where zstd compression
> > > codec crashes.
> > >
> > > Caused by: java.lang.InternalError: Error (generic)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
> > > Method)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> > > at
> > >
> >
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> > > at java.io.DataOutputStream.write(DataOutputStream.java:107)
> > > at
> > >
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
> > > at
> > >
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
> > >
> > > Anyone out there hitting the similar problem?
> > >
> > > A temporary workaround is to set buffer size "set
> > > io.compression.codec.zstd.buffersize=8192;"
> > >
> > > We suspected it's a bug in zstd library, but couldn't verify. Just want
> > to
> > > send this out and see if I can get some luck.
> >
>

Re: ZStandard compression crashes

Posted by Eric Yang <ey...@apache.org>.
If I recall this problem correctly, the root cause is the default zstd
compression block size is 256kb, and Hadoop Zstd compression will attempt
to use the OS platform default compression size, if it is available.  The
recommended output size is slightly bigger than input size to account for
header size in Zstd compression.
http://software.icecube.wisc.edu/coverage/00_LATEST/icetray/private/zstd/lib/compress/zstd_compress.c.gcov.html#2982

Where, Hadoop code
https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c#L259
is
setting output size to the same as input size, if input size is bigger than
output size.  By manually setting buffer size to a small value, input size
will be smaller than recommended output size to keep the system working.
By returning ZTD_CStreamOutSize() in getSteramSize, it may enable the
system to work without a predefined default.

On Mon, May 11, 2020 at 2:29 PM Wei-Chiu Chuang
<we...@cloudera.com.invalid> wrote:

> Thanks for the pointer, it does look similar. However we are roughly on the
> latest of branch-3.1 and this fix is in our branch. I'm pretty sure we have
> all the zstd fixes.
>
> I believe the libzstd version used is 1.4.4 but need to confirm. I
> suspected it's a library version issue because we've been using zstd
> compression for over a year, and this bug (reproducible) happens
> consistently just recently.
>
> On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Hi Wei Chiu,
> > What is the Hadoop version being used?
> > Give a check if HADOOP-15822 is there, it had something similar error.
> >
> > -Ayush
> >
> > > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <we...@apache.org>
> wrote:
> > >
> > > Hadoop devs,
> > >
> > > A colleague of mine recently hit a strange issue where zstd compression
> > > codec crashes.
> > >
> > > Caused by: java.lang.InternalError: Error (generic)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
> > > Method)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> > > at
> > >
> >
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> > > at java.io.DataOutputStream.write(DataOutputStream.java:107)
> > > at
> > >
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
> > > at
> > >
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
> > >
> > > Anyone out there hitting the similar problem?
> > >
> > > A temporary workaround is to set buffer size "set
> > > io.compression.codec.zstd.buffersize=8192;"
> > >
> > > We suspected it's a bug in zstd library, but couldn't verify. Just want
> > to
> > > send this out and see if I can get some luck.
> >
>

Re: ZStandard compression crashes

Posted by Wei-Chiu Chuang <we...@cloudera.com.INVALID>.
Thanks for the pointer, it does look similar. However we are roughly on the
latest of branch-3.1 and this fix is in our branch. I'm pretty sure we have
all the zstd fixes.

I believe the libzstd version used is 1.4.4 but need to confirm. I
suspected it's a library version issue because we've been using zstd
compression for over a year, and this bug (reproducible) happens
consistently just recently.

On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ay...@gmail.com> wrote:

> Hi Wei Chiu,
> What is the Hadoop version being used?
> Give a check if HADOOP-15822 is there, it had something similar error.
>
> -Ayush
>
> > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <we...@apache.org> wrote:
> >
> > Hadoop devs,
> >
> > A colleague of mine recently hit a strange issue where zstd compression
> > codec crashes.
> >
> > Caused by: java.lang.InternalError: Error (generic)
> > at
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
> > Method)
> > at
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
> > at
> >
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> > at
> >
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> > at
> >
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> > at java.io.DataOutputStream.write(DataOutputStream.java:107)
> > at
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
> > at
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
> >
> > Anyone out there hitting the similar problem?
> >
> > A temporary workaround is to set buffer size "set
> > io.compression.codec.zstd.buffersize=8192;"
> >
> > We suspected it's a bug in zstd library, but couldn't verify. Just want
> to
> > send this out and see if I can get some luck.
>

Re: ZStandard compression crashes

Posted by Wei-Chiu Chuang <we...@cloudera.com.INVALID>.
Thanks for the pointer, it does look similar. However we are roughly on the
latest of branch-3.1 and this fix is in our branch. I'm pretty sure we have
all the zstd fixes.

I believe the libzstd version used is 1.4.4 but need to confirm. I
suspected it's a library version issue because we've been using zstd
compression for over a year, and this bug (reproducible) happens
consistently just recently.

On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ay...@gmail.com> wrote:

> Hi Wei Chiu,
> What is the Hadoop version being used?
> Give a check if HADOOP-15822 is there, it had something similar error.
>
> -Ayush
>
> > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <we...@apache.org> wrote:
> >
> > Hadoop devs,
> >
> > A colleague of mine recently hit a strange issue where zstd compression
> > codec crashes.
> >
> > Caused by: java.lang.InternalError: Error (generic)
> > at
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
> > Method)
> > at
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
> > at
> >
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> > at
> >
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> > at
> >
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> > at java.io.DataOutputStream.write(DataOutputStream.java:107)
> > at
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
> > at
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
> >
> > Anyone out there hitting the similar problem?
> >
> > A temporary workaround is to set buffer size "set
> > io.compression.codec.zstd.buffersize=8192;"
> >
> > We suspected it's a bug in zstd library, but couldn't verify. Just want
> to
> > send this out and see if I can get some luck.
>

Re: ZStandard compression crashes

Posted by Ayush Saxena <ay...@gmail.com>.
Hi Wei Chiu,
What is the Hadoop version being used?
Give a check if HADOOP-15822 is there, it had something similar error.

-Ayush

> On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <we...@apache.org> wrote:
> 
> Hadoop devs,
> 
> A colleague of mine recently hit a strange issue where zstd compression
> codec crashes.
> 
> Caused by: java.lang.InternalError: Error (generic)
> at
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
> Method)
> at
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
> at
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> at
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
> at
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
> 
> Anyone out there hitting the similar problem?
> 
> A temporary workaround is to set buffer size "set
> io.compression.codec.zstd.buffersize=8192;"
> 
> We suspected it's a bug in zstd library, but couldn't verify. Just want to
> send this out and see if I can get some luck.

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: ZStandard compression crashes

Posted by Ayush Saxena <ay...@gmail.com>.
Hi Wei Chiu,
What is the Hadoop version being used?
Give a check if HADOOP-15822 is there, it had something similar error.

-Ayush

> On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <we...@apache.org> wrote:
> 
> Hadoop devs,
> 
> A colleague of mine recently hit a strange issue where zstd compression
> codec crashes.
> 
> Caused by: java.lang.InternalError: Error (generic)
> at
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
> Method)
> at
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
> at
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> at
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
> at
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
> 
> Anyone out there hitting the similar problem?
> 
> A temporary workaround is to set buffer size "set
> io.compression.codec.zstd.buffersize=8192;"
> 
> We suspected it's a bug in zstd library, but couldn't verify. Just want to
> send this out and see if I can get some luck.

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org