You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "wangchao (JIRA)" <ji...@apache.org> on 2015/12/07 16:19:11 UTC

[jira] [Commented] (HADOOP-12619) Native memory leaks in CompressorStream

    [ https://issues.apache.org/jira/browse/HADOOP-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045070#comment-15045070 ] 

wangchao commented on HADOOP-12619:
-----------------------------------

The code of hadoop 2.7.1 changes the implement of GzipCodec.createOutputStream as 

{code}
  @Override
  public CompressionOutputStream createOutputStream(OutputStream out) 
    throws IOException {
    if (!ZlibFactory.isNativeZlibLoaded(conf)) {
      return new GzipOutputStream(out);
    }
    return CompressionCodec.Util.
        createOutputStreamWithCodecPool(this, conf, out);
  }

  @Override
  public CompressionOutputStream createOutputStream(OutputStream out, 
                                                    Compressor compressor) 
  throws IOException {
    return (compressor != null) ?
               new CompressorStream(out, compressor,
                                    conf.getInt("io.file.buffer.size", 
                                                4*1024)) :
               createOutputStream(out);
  }

    static CompressionOutputStream createOutputStreamWithCodecPool(
        CompressionCodec codec, Configuration conf, OutputStream out)
        throws IOException {
      Compressor compressor = CodecPool.getCompressor(codec, conf);
      CompressionOutputStream stream = null;
      try {
        stream = codec.createOutputStream(out, compressor);
      } finally {
        if (stream == null) {
          CodecPool.returnCompressor(compressor);
        } else {
          stream.setTrackedCompressor(compressor);
        }
      }
      return stream;
    }
 
{code}

but CompressorStream override the close method and still not return the compressor to pool



> Native memory leaks in CompressorStream
> ---------------------------------------
>
>                 Key: HADOOP-12619
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12619
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: wangchao
>
> The constructor of org.apache.hadoop.io.compress.CompressorStream requires an org.apache.hadoop.io.compress.Compressor  object to compress bytes but it does not invoke the compressor's finish method when close method are called. This may causes the native memory leaks if the compressor is only used by this CompressorStream object.
> I found this when set up a flume agent with gzip compression, the native memory grows slowly and cannot fall back. 
> {code}
>   @Override
>   public CompressionOutputStream createOutputStream(OutputStream out) 
>     throws IOException {
>     return (ZlibFactory.isNativeZlibLoaded(conf)) ?
>                new CompressorStream(out, createCompressor(),
>                                     conf.getInt("io.file.buffer.size", 4*1024)) :
>                new GzipOutputStream(out);
>   }
>   @Override
>   public Compressor createCompressor() {
>     return (ZlibFactory.isNativeZlibLoaded(conf))
>       ? new GzipZlibCompressor(conf)
>       : null;
>   }
> {code}
> The method of CompressorStream is
> {code}
>   @Override
>   public void close() throws IOException {
>     if (!closed) {
>       finish();
>       out.close();
>       closed = true;
>     }
>   }
>   @Override
>   public void finish() throws IOException {
>     if (!compressor.finished()) {
>       compressor.finish();
>       while (!compressor.finished()) {
>         compress();
>       }
>     }
>   }
> {code}
> No one will end the compressor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)