You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2016/10/04 20:22:20 UTC

[jira] [Created] (HBASE-16766) Do not rely on InputStream.available()

Enis Soztutar created HBASE-16766:
-------------------------------------

             Summary: Do not rely on InputStream.available() 
                 Key: HBASE-16766
                 URL: https://issues.apache.org/jira/browse/HBASE-16766
             Project: HBase
          Issue Type: Bug
          Components: wal
            Reporter: Enis Soztutar
            Assignee: Enis Soztutar
             Fix For: 2.0.0, 1.4.0


ProtobufLogReader relies on InputStream.available() to figure out whether we have exhausted the file. However InputStream.available() javadoc states: 
{code}
     * <p> Note that while some implementations of {@code InputStream} will return
     * the total number of bytes in the stream, many will not.  It is
     * never correct to use the return value of this method to allocate
     * a buffer intended to hold all data in this stream.
{code}

HDFS and many other Hadoop FS's, and things like ByteBufferInputStream, etc all return remaining bytes, so the code works on top of HDFS. However, on other file systems, it may or may not be true that IS.available() returns the remaining bytes. In one specific case, the ADLS wrapper FS used implement {{available()}} call with the correct semantics, which ended up causing data loss in the WAL recovery. We have since fixed ADLS to implement the HDFS semantics, but we should fix HBase itself so that we do not rely on available() call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)