You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2021/06/01 15:06:00 UTC
[jira] [Commented] (TEZ-4295) Could not decompress data. Buffer length is too small.

    [ https://issues.apache.org/jira/browse/TEZ-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355155#comment-17355155 ] 

László Bodor commented on TEZ-4295:
-----------------------------------

[~junnan.yang]: could you please help me understand how exactly  [^TEZ-4295.01.patch] solved your issue?
as far as I can understand, the original issue was introduced by TEZ-4135, and fixed by TEZ-4234 (which might not be complete)
TEZ-4234 tries to synchronize on codec instance while getting an InputStream from it, here you're trying to synchronize on the configuration object of the codec
this makes me think that the problem is:
1. we get a reference of the configuration of the codec and work on it (changing buffer size),
2. we get references from different threads, synchronization on the outer codec doesn't take effect when we e.g. create a new compressor from the codec ([snappy createCompressor|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/SnappyCodec.java#L112]) or reinit an existing one from the pool

compressor.reinit is not blocked by codec object's monitor, so this makes sense to me
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java#L155
{code}
  public static Compressor getCompressor(CompressionCodec codec, Configuration conf) {
    Compressor compressor = borrow(compressorPool, codec.getCompressorType());
    if (compressor == null) {
      compressor = codec.createCompressor();
      LOG.info("Got brand-new compressor ["+codec.getDefaultExtension()+"]");
    } else {
      compressor.reinit(conf);
      if(LOG.isDebugEnabled()) {
        LOG.debug("Got recycled compressor");
      }
    }
...
{code}
please let me know if I got your point right

> Could not decompress data. Buffer length is too small.
> ------------------------------------------------------
>
>                 Key: TEZ-4295
>                 URL: https://issues.apache.org/jira/browse/TEZ-4295
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>            Reporter: junnan.yang
>            Priority: Major
>         Attachments: TEZ-4295.01.patch
>
>
> tez 使用snappy压缩方式时，会报错缓冲区太小：
> java.io.IOException: java.lang.InternalError: Could not decompress data. Buffer length is too small.java.io.IOException: java.lang.InternalError: Could not decompress data. Buffer length is too small. at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:137) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyMapOutput(FetcherOrderedGrouped.java:550) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:283) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:182) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:194) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:57) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.InternalError: Could not decompress data. Buffer length is too small. at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:238) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:210) at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:833) at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:121) ... 12 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)