You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Brian Nixon (JIRA)" <ji...@apache.org> on 2012/08/16 20:19:38 UTC

[jira] [Created] (HBASE-6597) Block Encoding Size Estimation

Brian Nixon created HBASE-6597:
----------------------------------

             Summary: Block Encoding Size Estimation
                 Key: HBASE-6597
                 URL: https://issues.apache.org/jira/browse/HBASE-6597
             Project: HBase
          Issue Type: Improvement
          Components: io
    Affects Versions: 0.89-fb
            Reporter: Brian Nixon
            Priority: Minor


Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-6597:
--------------------------

    Attachment:     (was: 6597-trunk.txt)
    
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471799#comment-13471799 ] 

Phabricator commented on HBASE-6597:
------------------------------------

tedyu has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:447 Got you.

  Then this implementation is fine.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-6597:
-------------------------------

    Attachment: D5895.5.patch

mbautin updated the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".
Reviewers: Kannan, Karthik, Liyin, aaiyer, avf, JIRA

  Addressing Kannan's comments and rebasing.

REVISION DETAIL
  https://reviews.facebook.net/D5895

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/EncodingState.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CompressionState.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestIncrementalEncoding.java
  src/test/java/org/apache/hadoop/hbase/util/TestByteBufferUtils.java

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482543#comment-13482543 ] 

Phabricator commented on HBASE-6597:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

  Replying to comments inline.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:29 Updated the comment.
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:34 Done.
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:35 Done.
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java:132 Updated the comment. length < Bytes.SIZEOF_INT is correct (this is the length of the part of the array we are allowed to write into).
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:445 Done.
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:446 No. The two last parameters are for assertEquals.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481531#comment-13481531 ] 

Phabricator commented on HBASE-6597:
------------------------------------

Kannan has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

  Mikhail - looks great. Pending comments are very trivial.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470475#comment-13470475 ] 

Phabricator commented on HBASE-6597:
------------------------------------

tedyu has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:38 Length of prefix is returned.
  Name this method getCommonPrefixLength ?
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:93 Do we need to consider memstoreTS ?
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:340 Check / assert that skipLastBytes is not negative ?
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:422 prevKey stores the previous key, right ?
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:497 Name this variable negativeDiffTimestamp ?
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:411 This class can be private, right ?
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:420 This class can be private, right ?

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Priority: Minor
>         Attachments: D5895.1.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HBASE-6597) Block Encoding Size Estimation

Posted by "Mikhail Bautin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-6597 started by Mikhail Bautin.

> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485756#comment-13485756 ] 

Ted Yu commented on HBASE-6597:
-------------------------------

Update on my recent fidings.
I came up with patch for 0.94 branch.
Most data block encoding related tests pass.
TestHFileBlockCompatibility poses a little challenge. There is no embedded checksum feature in 0.89-fb branch. So this test is unique to 0.94 / trunk.
In the test, there is a copy of Writer class which I assume shouldn't be modified, at least not for a point release.
The test reuses some code from TestHFileBlock.java where there is some change related to usage of Writer:
{code}
- static int writeTestKeyValues(OutputStream dos, int seed, boolean includesMemstoreTS)
+ static void writeTestKeyValues(OutputStream dos, Writer hbw, int seed, boolean includesMemstoreTS)
{code}
This is the test failure I am observing now:
{code}
testDataBlockEncoding[0](org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility)  Time elapsed: 0.129 sec  <<< FAILURE!
org.junit.ComparisonFailure: Content mismath for compression NONE, encoding PREFIX, pread false, commonPrefix 2, expected length 1859, actual length 1859 expected:<\x00\x00\x0[B\xB8]*\x0A\x00\x00\x0A\x0...> but was:<\x00\x00\x0[0\x00]*\x0A\x00\x00\x0A\x0...>
  at org.junit.Assert.assertEquals(Assert.java:125)
  at org.apache.hadoop.hbase.io.hfile.TestHFileBlock.assertBuffersEqual(TestHFileBlock.java:463)
  at org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility.testDataBlockEncoding(TestHFileBlockCompatibility.java:337)
{code}
                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471784#comment-13471784 ] 

Phabricator commented on HBASE-6597:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:76 includesMemstoreTS means that we are memstore timestamp is part of both input and output. We don't change that aspect of the data format on data block encoding/decoding.
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:119 Added an assertion to BufferedEncodedWriter. The code below won't make currentState null if it is not null initially.
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:447 As you can see, the DataBlockEncoder class does not have a lot of state (unlike the EncodedWriter) so I don't know what else I could include here.
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:96 Oops. That was for debugging. Good catch!

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-6597:
-------------------------------

    Attachment: D5895.3.patch

mbautin updated the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".
Reviewers: Kannan, Karthik, Liyin, aaiyer, avf, JIRA

  Critical bugfixes in updating encoder state.

REVISION DETAIL
  https://reviews.facebook.net/D5895

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CompressionState.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestIncrementalEncoding.java
  src/test/java/org/apache/hadoop/hbase/util/TestByteBufferUtils.java

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-6597:
-------------------------------

    Attachment: D5895.2.patch

mbautin updated the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".
Reviewers: Kannan, Karthik, Liyin, aaiyer, avf, JIRA

  Removing code duplication on the encoding codepath. Addressing some of Ted's comments (I will double-check that I have not missed any of them later).

REVISION DETAIL
  https://reviews.facebook.net/D5895

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CompressionState.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestIncrementalEncoding.java
  src/test/java/org/apache/hadoop/hbase/util/TestByteBufferUtils.java

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471766#comment-13471766 ] 

Phabricator commented on HBASE-6597:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:38 Done.
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:93 Yes, that's a good catch.
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:340 Done.
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:420 Done (here and in other encoded writers).
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:422 Yes, the comment was incorrect. Moved this field to the encoder state.
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:497 Done.
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:411 Done.
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:173 Done.
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:186 Yes, this javadoc was incorrect. Moved this to encoder state.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471355#comment-13471355 ] 

Phabricator commented on HBASE-6597:
------------------------------------

tedyu has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

  My previous comments for PrefixKeyDeltaEncoder.java were for patch v1.
  Phabricator remembered my unfinished comments and sent them out.

  FYI

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-6597:
--------------------------

    Attachment: 6597-trunk.txt

Patch that compiles against trunk.

There're test failures.
                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471358#comment-13471358 ] 

Phabricator commented on HBASE-6597:
------------------------------------

tedyu has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

  I got some test failures in TestCacheOnWrite:

    testStoreFileCacheOnWrite[2](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:<{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT...> but was:<{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...>
    testStoreFileCacheOnWrite[5](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:<{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT...> but was:<{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...>
    testStoreFileCacheOnWrite[8](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:<{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT...> but was:<{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...>
    testStoreFileCacheOnWrite[11](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:<{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT...> but was:<{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...>
    testStoreFileCacheOnWrite[14](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:<{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT...> but was:<{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...>
    testStoreFileCacheOnWrite[17](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite): expected:<{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT...> but was:<{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...>

  Here is one of the above:

  testStoreFileCacheOnWrite[2](org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite)  Time elapsed: 0.295 sec  <<< FAILURE!
  org.junit.ComparisonFailure: expected:<{ENCODED_DATA=9[65, LEAF_INDEX=121], BLOOM_CHUNK=9, INT...> but was:<{ENCODED_DATA=9[91, LEAF_INDEX=124], BLOOM_CHUNK=9, INT...>
    at org.junit.Assert.assertEquals(Assert.java:123)
    at org.junit.Assert.assertEquals(Assert.java:145)
    at org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.readStoreFile(TestCacheOnWrite.java:259)
    at org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testStoreFileCacheOnWrite(TestCacheOnWrite.java:203)

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-6597:
--------------------------

    Attachment: 6597-trunk.txt

Patch for trunk.
There are some compilation errors in tests.
                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482818#comment-13482818 ] 

Phabricator commented on HBASE-6597:
------------------------------------

mbautin has abandoned the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

  Committed as rHBASEEIGHTNINEFBBRANCH1401499.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469933#comment-13469933 ] 

Phabricator commented on HBASE-6597:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

  The original review (Facebook-internal, just for the record): https://phabricator.fb.com/D554523

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Priority: Minor
>         Attachments: D5895.1.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484441#comment-13484441 ] 

Ted Yu commented on HBASE-6597:
-------------------------------

I am planning to continue the work.
                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484436#comment-13484436 ] 

Andrew Purtell commented on HBASE-6597:
---------------------------------------

[~ted_yu] Are you planning to fix the compilation errors?
                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482737#comment-13482737 ] 

Phabricator commented on HBASE-6597:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

  Verified that the effective in-cache block size for encoded blocks is close enough to the configured size during a load test. Committing.

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation

Posted by "Mikhail Bautin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Bautin updated HBASE-6597:
----------------------------------

    Assignee: Mikhail Bautin
    
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481530#comment-13481530 ] 

Phabricator commented on HBASE-6597:
------------------------------------

Kannan has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:29 Regarding the second bullet (in my comment), I suppose it is ok to leave this as is... as it does simplify the calling logic a little bit. We should just add here in comments that

  * this is used for non-encoded blocks.
  * and, keeps blocks in old format (without the DBE specific headers).
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:35 use integer compression for key, value and prefix (7-bit encoding)
  -->
  use integer compression for key, value and prefix lengths (7-bit encoding)

  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java:34 ditto (as other comment)

  s/prefix/prefix lengths

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-6597:
-------------------------------

    Attachment: D5895.1.patch

mbautin requested code review of "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".
Reviewers: Kannan, Karthik, Liyin, aaiyer, avf, JIRA

  Instead of accumulating a block of key-values of predetermined size and then delta-encoding it, we encode key-value pairs as we add them and start a new block when the **encoded** size exceeds the configured block size. This work was done by Brian Nixon. I did necessary testing and bug fixes.

TEST PLAN
  Run unit tests

REVISION DETAIL
  https://reviews.facebook.net/D5895

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestIncrementalEncoding.java
  src/test/java/org/apache/hadoop/hbase/util/TestByteBufferUtils.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/13989/

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Priority: Minor
>         Attachments: D5895.1.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480431#comment-13480431 ] 

Phabricator commented on HBASE-6597:
------------------------------------

Kannan has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

  comments thus far...

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:445 mismath -> mismatch
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:446 don't you need two more %s's.
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java:132 shouldn't this be:

  if (length - offset < Bytes.SIZEOF_INT)

  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java:29 I think this comment is no longer valid.

  * It gets used for ENCODING => 'NONE' case now correct?
  * Wondering now, if that was a correct choice... because we seem to be having to jump through some hoops to handle this encoder as a separate case (such as to not write the headers, etc.).


REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471330#comment-13471330 ] 

Phabricator commented on HBASE-6597:
------------------------------------

tedyu has commented on the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".

  Code is cleaner in patch v2.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:173 This class can be private.
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java:186 Javadoc and code mismatch.
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:76 Would memstoreTS always be part of in ?
  If so, do you need to advance the position inside in when includesMemstoreTS is false ?
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:119 Consider adding an assertion.
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java:447 Do you want to include more information in String representation ?
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:96 Why skipping the case of includesMemstoreTS being true ?

REVISION DETAIL
  https://reviews.facebook.net/D5895

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6597) Block Encoding Size Estimation

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-6597:
-------------------------------

    Attachment: D5895.4.patch

mbautin updated the revision "[jira] [HBASE-6597] [89-fb] Incremental data block encoding".
Reviewers: Kannan, Karthik, Liyin, aaiyer, avf, JIRA

  Addressing Ted's comments.

REVISION DETAIL
  https://reviews.facebook.net/D5895

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/EncodingState.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CompressionState.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestIncrementalEncoding.java
  src/test/java/org/apache/hadoop/hbase/util/TestByteBufferUtils.java

To: Kannan, Karthik, Liyin, aaiyer, avf, JIRA, mbautin
Cc: tedyu

                
> Block Encoding Size Estimation
> ------------------------------
>
>                 Key: HBASE-6597
>                 URL: https://issues.apache.org/jira/browse/HBASE-6597
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.89-fb
>            Reporter: Brian Nixon
>            Assignee: Mikhail Bautin
>            Priority: Minor
>         Attachments: D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch
>
>
> Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira