You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Li Pi (Created) (JIRA)" <ji...@apache.org> on 2011/10/18 00:51:11 UTC

[jira] [Created] (HBASE-4608) HLog Compression

HLog Compression
----------------

                 Key: HBASE-4608
                 URL: https://issues.apache.org/jira/browse/HBASE-4608
             Project: HBase
          Issue Type: New Feature
            Reporter: Li Pi
            Assignee: Li Pi


The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228165#comment-13228165 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

bq. Stop making this more complicated than it need be Ted.
It is rare that I saw review comments in such tone: condescending.

And the same comment was posted twice.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229773#comment-13229773 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5972
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
<https://reviews.apache.org/r/4328/#comment12946>

    IllegalArgumentException is not needed here.
    I removed it, compiled and ran TestCompressor - it passed.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/4328/#comment12947>

    A closing ) should be placed either on this line or on line 109.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/4328/#comment12948>

    Should read 'byte of index to the ...'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/4328/#comment12949>

    Should read 'an array of bytes'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/4328/#comment12950>

    Please add javadoc for offset and length.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
<https://reviews.apache.org/r/4328/#comment12958>

    Should we label this class @InterfaceAudience.Private ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/4328/#comment12951>

    I don't quite get what the second sentence is supposed to convey ?
    It seems to be same as first sentence.
    
    This version is the minimum version that supports compression.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/4328/#comment12952>

    A (slightly) long line.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
<https://reviews.apache.org/r/4328/#comment12954>

    Long line.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
<https://reviews.apache.org/r/4328/#comment12955>

    Can we remove 'silly' here ?
    Some user may actually reach this size.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/4328/#comment12956>

    'initiate' is used to start an action or message.
    'initialize' should be used here.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/4328/#comment12957>

    Setting reader to null would be desirable after the close() call.


- Ted


On 2012-03-14 22:26:34, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 22:26:34)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220362#comment-13220362 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12516721/4608v15.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1076//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1076//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1076//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178084#comment-13178084 ] 

Li Pi commented on HBASE-4608:
------------------------------

max size = 64k * around 100-200 bytes. Really not that big. Less than 100 megabytes.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183023#comment-13183023 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 35
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line35>
bq.  >
bq.  >     Add javadoc please.

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 84
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line84>
bq.  >
bq.  >     Please give this config parameter better name.
bq.  >     How about 'hbase.regionserver.wal.compressed' ?

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 91
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line91>
bq.  >
bq.  >     Would this be able to hold large number of HLog.Entry's in memory ?

An HLog is at most 400mb, should be okay?


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 229
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line229>
bq.  >
bq.  >     Since short is signed, how do I know that the return value would be positive ?
bq.  >     e.g. (short)0xFE00 == -512

if the hi bit is negative, (we read that), then we do something else, because its not part of the dictionary. added an assert anyways.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 166
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67063#file67063line166>
bq.  >
bq.  >     I suggest naming this class Node.

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 74
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67067#file67067line74>
bq.  >
bq.  >     Is compressed a better name ?

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 127
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67067#file67067line127>
bq.  >
bq.  >     White space makes indentation look weird.

fixed.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java, line 1
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67068#file67068line1>
bq.  >
bq.  >     Please add Apache license.

fixed.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java, line 12
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67068#file67068line12>
bq.  >
bq.  >     Add short javadoc and test category, please.

test category - small?


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 2
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67070#file67070line2>
bq.  >
bq.  >     Please remove year.

Done.


bq.  On 2012-01-07 05:18:45, Ted Yu wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 29
bq.  > <https://reviews.apache.org/r/2740/diff/8/?file=67070#file67070line29>
bq.  >
bq.  >     You will add the real test, right ?
bq.  >     
bq.  >     Also, missing test category.

This is actually a really good test. If testWALReplay works after compression is enabled, then the compression/decompression is working. This is the real test.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4232
-----------------------------------------------------------


On 2012-01-10 02:34:06, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-10 02:34:06)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517160/4608v17.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1103//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1103//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1103//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229826#comment-13229826 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518388/4608v29.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 11 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Li Pi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Li Pi updated HBASE-4608:
-------------------------

    Attachment: 4608v5.txt
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Attachment: 4608v16.txt

Patch v16 decrements HLogKey.VERSION
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Hadoop Flags: Reviewed
          Status: Patch Available  (was: Open)

Trying hadoopqa on v28.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227929#comment-13227929 ] 

stack commented on HBASE-4608:
------------------------------

Its a regular pattern only.  Perhaps this does some decent testing?  TestWALReplayCompressed?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147225#comment-13147225 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > Cool stuff.
bq.  > 
bq.  > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again.
bq.  > 
bq.  > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index.
bq.  > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right?
bq.  > 
bq.  > (As I said, I am probably missing something).
bq.  > 
bq.  > See minor comments inline.
bq.  
bq.  Li Pi wrote:
bq.      You aren't missing anything! Thats exactly how it works.
bq.      
bq.      Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read.
bq.  
bq.  Lars Hofhansl wrote:
bq.      Ok... What I cannot find then, is the code that builds the dictionary during read :)
bq.      
bq.      Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today.
bq.
bq.  
bq.  Li Pi wrote:
bq.      Oops, somehow I deleted that line. There are comments for it. Added it back in.
bq.      
bq.      //if this isn't in the dictionary, we need to add to the dictionary.
bq.      
bq.      As for the more general concern: HBase won't return a write to the client until the WALEdit write is completely done. So aborting midway won't be an issue - and even if we abort midway, we can recover everything thats been written so far.
bq.      
bq.      For the beginning of the file getting garbled? - True but we'd lose some information with or without compression. With compression we lose more information, but that's the nature of compression. Recovering a partially garbled WAL fully is impossible no matter what approach we use. Either way, its not a contingency the WAL is built to handle - a partial recovery after all WAL replica's have been corrupted.
bq.  
bq.  Todd Lipcon wrote:
bq.      well, in the non-compressed WAL case, we can re-sync to a SequenceFile "SYNC" marker and continue reading from there in the face of arbitrary corruption.
bq.      
bq.      Perhaps the compression mechanism should have some kind of "maximum lookback" - ie when a dictionary is being built, keep the file offset where each dictionary word was used. Then, when deciding to use a dict reference vs a literal, if the curOffset - lastUsedOffset > MAX_LOOKBACK_THRESHOLD, we re-write the entry. This would bound the size of unrecoverable WAL portions while still providing good compression (similar to what we have today)

That makes sense. Maybe file a separate jira and use this one to get the compression in?


- Lars


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3093
-----------------------------------------------------------


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228185#comment-13228185 ] 

stack commented on HBASE-4608:
------------------------------

bq. It is rare that I saw review comments in such tone: condescending.

Don't be silly.  Frustrated, yes.  Condescending no.

bq. And the same comment was posted twice.

Sorry about that.  Made a mistake.

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229878#comment-13229878 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-03-14 23:54:37, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java, line 28
bq.  > <https://reviews.apache.org/r/4328/diff/3/?file=92429#file92429line28>
bq.  >
bq.  >     Should we label this class @InterfaceAudience.Private ?

Unless a class is public, it doesn't need an interface audience annotation


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5972
-----------------------------------------------------------


On 2012-03-14 22:26:34, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 22:26:34)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210655#comment-13210655 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-14 02:29:24, Liyin Tang wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 42
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70705#file70705line42>
bq.  >
bq.  >     Look like there are side effect to call findEntry() since you will put the data into the dictionary.
bq.  >

This is intentional. When we look for an entry, that means we intend to compress with it. If we don't find it, then its inserted into the dictionary.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5068
-----------------------------------------------------------


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175310#comment-13175310 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4100
-----------------------------------------------------------


Maybe just do this for WALEdits/KeyValues for now and tackle HLogKey later.
Looks like hash collisions in SimpleDictionary could be nasty.

Other than that mostly whitespace.

Cool stuff.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment9223>

    Should remove the year line.
    Also some extra whitespace in this file.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/2740/#comment9237>

    Bunch of whitespace in here.
    As said above, maybe do HLogKey in a separate jira.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/2740/#comment9236>

    bunch of whitespace in here.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/2740/#comment9234>

    whitespace



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/2740/#comment9235>

    I know this is not done, yet... But needs to be a fully qualified config name.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/2740/#comment9233>

    LOG.debug?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
<https://reviews.apache.org/r/2740/#comment9232>

    Hardcoding SimpleDictionary here?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9230>

    year...



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9229>

    What if you have a hash collision?
    You now overwrite the old value that just happens to have the same hash code. Is that OK?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9231>

    Here too; what happens for hash collisions?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
<https://reviews.apache.org/r/2740/#comment9228>

    Year... And trailing whitespace in here.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
<https://reviews.apache.org/r/2740/#comment9225>

    bunch of extra leading whitespace in this file



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
<https://reviews.apache.org/r/2740/#comment9226>

    Would sure be nice if we had a KeyValue interface and the implementations would just do the right thing.



src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
<https://reviews.apache.org/r/2740/#comment9227>

    I assume you'll tests with/without compression.


- Lars


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192619#comment-13192619 ] 

Li Pi commented on HBASE-4608:
------------------------------

I need to run a test against LZO or GZ. I wouldn't be surprised if 4608 is more efficient on some inputs - it's very well tailored for certain kinds of data.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192025#comment-13192025 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-24 09:00:37.768707)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  CHANGES.txt 1d7238e 
  bin/hbase 350abef 
  bin/hbase-daemon.sh 5c42ac1 
  dev-support/findHangingTest.sh PRE-CREATION 
  pom.xml 6566a1c 
  src/docbkx/book.xml c67ca06 
  src/docbkx/configuration.xml 7fd90e7 
  src/docbkx/ops_mgt.xml f93c9f2 
  src/docbkx/performance.xml e61248f 
  src/docbkx/preface.xml 10fa755 
  src/docbkx/troubleshooting.xml 0b7c93a 
  src/docbkx/upgrading.xml c0642f5 
  src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon 24caabd 
  src/main/java/org/apache/hadoop/hbase/HBaseConfiguration.java 0477be8 
  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 8ec5042 
  src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 6cdeec1 
  src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/Delete.java 51bbc63 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8cd9bd0 
  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 0e78d96 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 852a810 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 839d79b 
  src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 0bc9577 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 4135e55 
  src/main/java/org/apache/hadoop/hbase/client/RowMutation.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 9b568e3 
  src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java 0d4a9e4 
  src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java ba3414d 
  src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java f25ba11 
  src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java b47423c 
  src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 9002a0f 
  src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java 3ad6cd5 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 07ddbca 
  src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4327a44 
  src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java 39c73f5 
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java bd574b2 
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormat.java 3dcbf74 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java e6f8a6e 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 
  src/main/java/org/apache/hadoop/hbase/master/LoadBalancerFactory.java 89685bb 
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 3938fa7 
  src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 9de1784 
  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 667a8b1 
  src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 2dfc3e7 
  src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 2dd497b 
  src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java 493dcdb 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java fb4ec05 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 3917d40 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionThriftServer.java 18b6c13 
  src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java c840e7c 
  src/main/java/org/apache/hadoop/hbase/regionserver/OperationStatus.java b6f7456 
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 7cee17c 
  src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff 
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java b928731 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java bd6f70d 
  src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseMetaHandler.java e8e95ed 
  src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java a25ca32 
  src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRootHandler.java fa38ad6 
  src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java 490694c 
  src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java 97dd8e6 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 43bfba0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java 7fe0ae5 
  src/main/java/org/apache/hadoop/hbase/rest/MultiRowResource.java 2ba6a0d 
  src/main/java/org/apache/hadoop/hbase/rest/RowResource.java dade6a8 
  src/main/java/org/apache/hadoop/hbase/rest/TableResource.java cc719bc 
  src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java f93c81d 
  src/main/java/org/apache/hadoop/hbase/rest/transform/Base64.java f991121 
  src/main/java/org/apache/hadoop/hbase/rest/transform/NullTransform.java 8492cc6 
  src/main/java/org/apache/hadoop/hbase/rest/transform/Transform.java 9f33bab 
  src/main/java/org/apache/hadoop/hbase/thrift/HThreadedSelectorServerArgs.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java 690a57f 
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 3fa5d41 
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/AlreadyExists.java 0479e31 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/BatchMutation.java be902c9 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/ColumnDescriptor.java 04b42fe 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java 9e31c61 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/IOError.java 778e869 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/IllegalArgument.java 9ae5340 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/Mutation.java 7aa9bcd 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/TCell.java ed420d4 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/TRegionInfo.java 161dedc 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/TRowResult.java 0f31e5e 
  src/main/java/org/apache/hadoop/hbase/thrift/generated/TScan.java 3b894db 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumn.java 3e116e7 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnIncrement.java 8390015 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnValue.java 424a87b 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDelete.java 68b4f8e 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDeleteType.java 2abdee0 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TGet.java b1a1a12 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java 272a4a5 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIOError.java 283d430 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIllegalArgument.java 254fbe5 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIncrement.java 3cc82e9 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TPut.java 97ab5dc 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TResult.java 73c8340 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TScan.java d76c355 
  src/main/java/org/apache/hadoop/hbase/thrift2/generated/TTimeRange.java ad9fdc7 
  src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java 11dfbef 
  src/main/java/org/apache/hadoop/hbase/util/Threads.java 6f81b62 
  src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java 9b83840 
  src/main/resources/hbase-webapps/static/favicon.ico PRE-CREATION 
  src/main/resources/hbase-webapps/static/hbase_logo.png 03fa793 
  src/site/resources/images/favicon.ico 161bcf7 
  src/site/resources/images/hbase_logo.png 03fa793 
  src/site/resources/images/hbase_logo.svg PRE-CREATION 
  src/site/resources/images/hbase_logo_med.gif 36d3e3c 
  src/site/resources/images/hbase_small.gif 3275765 
  src/site/xdoc/index.xml 9157d6a 
  src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java dada051 
  src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 80d69b4 
  src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java c1a077f 
  src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java bb077d0 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java ab80020 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 0d38ac9 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 5b64895 
  src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java 5e3e994 
  src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan.java 46e1bee 
  src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java cc0f30f 
  src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java c359f4b 
  src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a348f0c 
  src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 32ad7e8 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 42db18b 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java 0a34371 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestRSStatusServlet.java 64e61bb 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScanner.java 2d87567 
  src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java 853a35f 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 6e89cc4 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/rest/TestTransform.java 2e2ba4c 
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java 12247d0 
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java 477141f 
  src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java 0b45ac1 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228223#comment-13228223 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

@Stack: I brought HFile into this discussion, sorry about that. :)
@Ted: The version you cite is for the HFile version not for the compression version, correct?
@Li Pi: You make a good point. WAL_VERSION could imply the compression type. Could call it WAL_TYPE, that way we still have the flexibility to alter compression. We do not regularly change the HLog format, so that is reasonable.

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228991#comment-13228991 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

bq. Out of scope for this issue.
This reminds me of HBASE-4218: from Aug 17th 2011 to Feb 17th 2012, the development took 6 months.

This JIRA doesn't have as many algorithms as those in HBASE-4218. But we should follow similar goal:
>From Jacek @ 17/Aug/11 21:47:
bq. Once we have common interface you would be able to reuse some of my tests and benchmarks.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517177/4608v18.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1108//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1108//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1108//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Attachment: hbase-4608-v28.txt

Reuploading Todds v28 so can run hadoopqa on it (needs to be most recent file posted)
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228948#comment-13228948 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

The sentence involving COMPRESSION_VERSION was in past tense but I don't see it in patch v23.

Let me elaborate more on my comment @ 14/Mar/12 00:26
As you described, we would use a new constant (COMPRESSION_VERSION) to represent the minimum version that supports dictionary compression.
In my opinion, this version corresponds to the major version in my comment @ 13/Mar/12 01:37

Say we later introduce prefix compression, we would introduce another constant representing the minimum version supporting prefix compression.

I agree that both version and compression type should be checked. However, the order should be checking compression type followed by checking version.

Regards
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229263#comment-13229263 ] 

Li Pi commented on HBASE-4608:
------------------------------

+1 from here.

Agree w/ Stack. Compression can be generalized later. We can just bump up the version in that case.

Right now, this works, passes tests, and provides a very substantial improvement in certain cases. (See Stack's workload).
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228097#comment-13228097 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

HLog version decision aside, my feeling about the current implementation is -0.5

First, compression ratio is not good - at least for the data written by PE. 

Second, HLogKey persistence becomes dependent on the compression implementation. This would make plugging other compression techniques hard. 
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230456#comment-13230456 ] 

stack commented on HBASE-4608:
------------------------------

Now this is in, does that mean we can cut a 0.94RC0?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175317#comment-13175317 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > Maybe just do this for WALEdits/KeyValues for now and tackle HLogKey later.
bq.  > Looks like hash collisions in SimpleDictionary could be nasty.
bq.  > 
bq.  > Other than that mostly whitespace.
bq.  > 
bq.  > Cool stuff.

Just did another test, looks like SequenceFile doesn't actually do it out of order, theres another bug making HLogKey break.


I'll figure it out later. probably after christmas.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4100
-----------------------------------------------------------


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Li Pi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Li Pi updated HBASE-4608:
-------------------------

    Attachment: 4608v13.txt
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197521#comment-13197521 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4732
-----------------------------------------------------------


Only got about halfway through. Will continue to look soon. Overall looking pretty good!


src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment10459>

    I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" -- since this is just utility functions, not actually an instance of a compressed keyvalue.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment10460>

    rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
<https://reviews.apache.org/r/2740/#comment10461>

    Since this is so simple, I'd move it to be a static inner class of KVCompression above



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10462>

    I think we can merge this with the other class that just has static methods as well.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10463>

    this function requires that the whole log data fit in RAM - not a great assumption



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10464>

    why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10465>

    switch order of "in" and "offset" here.
    
    Perhaps clearer to name this as "uncompressIntoArray"?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10467>

    worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10466>

    *un*compressed value, right?


- Todd


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213248#comment-13213248 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-21 23:30:35, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 61
bq.  > <https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line61>
bq.  >
bq.  >     This comment should also be placed at the beginning of compressFile().

removed the comment, not necessary anymore.


bq.  On 2012-02-21 23:30:35, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 88
bq.  > <https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line88>
bq.  >
bq.  >     Typo: should be output.getFileSystem(outconf)

fixed.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5254
-----------------------------------------------------------


On 2012-02-22 03:46:12, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-22 03:46:12)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213316#comment-13213316 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5265
-----------------------------------------------------------


This looks great.  Some small comments below.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11488>

    Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works?  If not here, where else will doc. on how the Compressor works go?
    
    Maybe you should purge all mention of WAL from this class -- e.g. WALDictionary -- because it seems like it could be easily generalized (I suppose we can do that later).



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11489>

    The way the usage is written, -u and -c are optional.  You should fix that.  Looks like they are required going by fact that args.length needs to be 3.  Also, it looks like you take --help, the long form, or -u/-c the short forms.  Either take all short forms or take both long and short form to be consistent.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11490>

    Why is the tool called WALCompressor in the usage but the class I invoke is Compressor?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11491>

    This does not need to be an HBaseConfiguration?  There are no configs in hbase-site.xml that might effect whats going on here?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11492>

    Doc the '@return'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11493>

    Doc the return



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/2740/#comment11494>

    White space



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/2740/#comment11495>

    When is this called?  Post construction?  Should it be part of constructor?  What happens if its called part way through the writing of a WAL?  Will we start compressing a WAL in the middle?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/2740/#comment11496>

    I don't follow whats going on here.  What happens when len >= 0?  Why is it < 0?  Whats that mean?  Whats v2 of hlogkey?  What if keyContext is not null?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
<https://reviews.apache.org/r/2740/#comment11497>

    Class comment on what this is about?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
<https://reviews.apache.org/r/2740/#comment11498>

    Why do I clear this?  Why not just throw it away?  Does clearing make it so I can recycle this instance?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
<https://reviews.apache.org/r/2740/#comment11499>

    Why would I ever let go of terms in the dictionary?  Should you explain why in class comment?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
<https://reviews.apache.org/r/2740/#comment11501>

    Should this be static?  Does it need reference to outer class?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
<https://reviews.apache.org/r/2740/#comment11502>

    Class comment?  Should this be static?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/2740/#comment11503>

    Why am I reading whether compression is on or off by looking at config?  Why am I not looking into head of the WAL file and figure its compressed and then decompressing?  Otherwise, if config is disabled but I'm fed a compressed file, do I just burp?  See the white space added here.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
<https://reviews.apache.org/r/2740/#comment11504>

    Should be just called Dictionary. Its in the wal package.  No need of the redundant prefix?



src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
<https://reviews.apache.org/r/2740/#comment11505>

    This will run all the tests in TestWALReplay?  Nice.


- Michael


On 2012-02-22 03:46:12, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-22 03:46:12)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224808#comment-13224808 ] 

Todd Lipcon commented on HBASE-4608:
------------------------------------

Good test case - maybe it can be turned into a functional test in the code?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228842#comment-13228842 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

In isWALCompressionEnabled():
{code}
+    if (txt == null || Integer.parseInt(txt.toString()) < VERSION) return false;
{code}
What would happen when we have a newer version for WAL_VERSION_KEY ?

Looks like the following check should suffice for isWALCompressionEnabled():
{code}
+    txt = metadata.get(WAL_COMPRESSION_TYPE_KEY);
+    return txt != null && txt.equals(DICTIONARY_COMPRESSION_TYPE);
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180967#comment-13180967 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-06 00:01:44.856233)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

Added a LRU dictionary. Should be more efficient than a 1-way associative cache.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12516888/4608v16.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1082//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1082//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1082//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181757#comment-13181757 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1655
bq.  > <https://reviews.apache.org/r/2740/diff/5/?file=66009#file66009line1655>
bq.  >
bq.  >     Should there be disableCompression ?

Compression is always enabled if config. Otherwise decompressor won't know whether to try to decompress the log or not.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 2
bq.  > <https://reviews.apache.org/r/2740/diff/5/?file=66013#file66013line2>
bq.  >
bq.  >     Remove this year line, please.

Done.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 137
bq.  > <https://reviews.apache.org/r/2740/diff/5/?file=66008#file66008line137>
bq.  >
bq.  >     Add javadoc for the parameters, please.

Added.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 1
bq.  > <https://reviews.apache.org/r/2740/diff/5/?file=66008#file66008line1>
bq.  >
bq.  >     License, please.

Done


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/HConstants.java, line 579
bq.  > <https://reviews.apache.org/r/2740/diff/5/?file=66005#file66005line579>
bq.  >
bq.  >     This name may refer to the compression algorithm.
bq.  >     I think the word 'enable' should be part of the name.

fixed.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 2
bq.  > <https://reviews.apache.org/r/2740/diff/5/?file=66006#file66006line2>
bq.  >
bq.  >     No year needed.

fixed.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 51
bq.  > <https://reviews.apache.org/r/2740/diff/5/?file=66006#file66006line51>
bq.  >
bq.  >     This javadoc should be combined with above block.

fixed.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 87
bq.  > <https://reviews.apache.org/r/2740/diff/5/?file=66006#file66006line87>
bq.  >
bq.  >     Should read 'Compresses and ...'

fixed.


bq.  On 2012-01-02 23:39:38, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 1
bq.  > <https://reviews.apache.org/r/2740/diff/5/?file=66007#file66007line1>
bq.  >
bq.  >     Add license, please.

fixed.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4172
-----------------------------------------------------------


On 2012-01-07 01:25:20, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-07 01:25:20)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178085#comment-13178085 ] 

Li Pi commented on HBASE-4608:
------------------------------

I was thinking of replacing the 1-way associative with a a 127 sized LRU dictionary. should allow us to save a few bytes, and also be far more efficient with our eviction strategy.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192878#comment-13192878 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4585
-----------------------------------------------------------


Nice work.
Will try out the Compressor tool.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment10215>

    Should we verify that length is larger than pos ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10216>

    I would expect different implementations to be instantiated based on the prefix of path.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10217>

    Why do we instantiate Configuration again (there is already one @ line 113) ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10218>

    Typo, should read 'to start reading from'.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10219>

    NOT_IN_DICTIONARY should be used here.


- Ted


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228976#comment-13228976 ] 

stack commented on HBASE-4608:
------------------------------

bq. But we don't know if the current dictionary compression API is general enough to cover the new compression type.

Agree that we don't know what the future will bring.  Not going to try.

bq. But the last paragraph above hinges on the scenario of keeping the same WAL version when new compression type is added.

Yes, thats one possible scenario.  There are others where we need to change the version.  Can deal when we get there.

bq. Suppose we find a way to improve dictionary compression after the integration of this JIRA. Would WAL version increase or stay at 1 ?

If API doesn't change, no need to up the global file version.  Could add new improved dictionary compression type.

If we need to change the api, then we'll need to change the global version.  At the same time we might add some other facility that has nought to do w/ compression -- say, we might decide to intersperse markers for when we flush or compact.  We'd likely bump the version one point only though.  This new version would say indicate wal was now able to do extended compression api AND includes flush and compaction markers.  We could bump the version once per feature added but that buys us nothing; its the version we ship that counts, the accumulation of features since last time we shipped.

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226764#comment-13226764 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I cannot go to bed if the answer is still No :-)
With patch v22, I was able to perform decompression/compression round-trip.
See the timestamp of the files below:
{code}
-rwxrwxrwx   1 zhihyu  110088321   99406052 Mar  9 21:38 sea-lab-3.comp
-rwxrwxrwx   1 zhihyu  110088321  100664533 Mar  9 21:36 sea-lab-3.decomp
-rw-r--r--   1 zhihyu  110088321   99406052 Mar  9 21:18 sea-lab-3%2C60020%2C1331337114819.1331337244655
{code}
The fix is the second line below:
{code}
      while ((e = in.next()) != null) {
        if (compress) e.enableCompression(null);
{code}
This is because Entry e would be carrying non-null context after the in.next() call if the input was compressed HLog.
This context needs to be stripped before we pass the Entry to writer.

Patch v22 should be close to the state of checkin.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Attachment: 4608v24.txt

Address Ted's comments.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228957#comment-13228957 ] 

stack commented on HBASE-4608:
------------------------------

bq. I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is different from that of sv4r25s8%3A60020.1331661889339

Because it has metadata the original doesn't have.  When I compress it, it compresses down to same size.  Notice that the decompressed and decompressed.again are same size because they both have the new meata data.

bq. The sentence involving COMPRESSION_VERSION was in past tense but I don't see it in patch v23.

Pardon me.  Should have uploaded v24.

Testing has turned up a minor issue... will upload v25 soon.

bq. In my opinion, this version corresponds to the major version in my comment @ 13/Mar/12 01:37

Nope.  This is the global version that introduces compression.  No need of major/minor granularity, and in particular major/minor on the compression feature itself.  Its overkill.

bq. I agree that both version and compression type should be checked. However, the order should be checking compression type followed by checking version.

Nope.  First figure if we have a file that does compression.  Then figure what type of compression the file does.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226647#comment-13226647 ] 

stack commented on HBASE-4608:
------------------------------

Does v21 fix the bad decompress that you found above testing with PE?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192431#comment-13192431 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

On recovery we'd always to have scan the entire log from the beginning. Maybe that's not a big deal, because log size in limited?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189530#comment-13189530 ] 

Li Pi commented on HBASE-4608:
------------------------------

Are the shutdown hooks slower than TestWALReplay without compression?

On Thu, Jan 19, 2012 at 4:12 PM, Zhihong Yu (Commented) (JIRA)

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Li Pi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Li Pi updated HBASE-4608:
-------------------------

    Attachment: 4608v1.txt

Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228158#comment-13228158 ] 

stack commented on HBASE-4608:
------------------------------

Stop making this more complicated than it need be Ted.

WAL_VERSION is global version on WAL log.

Adding a type metadata field for compression makes sense.  If none, presume uncompressed.

You don't need a compression type version.  If we change the format, we can do PREFIX_COMPRESSION_V2.

HLogKeys are serialized independent of their container.  Don't conflate their versioning w/ the suggested WAL log versioning.

Regards PE data, its data is not amenable to compression. Its keys are very basic.   Its likely not a good test evaluating the viability of this feature.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Francke (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231009#comment-13231009 ] 

Lars Francke commented on HBASE-4608:
-------------------------------------

This seems to be missing documentation, no?

Shouldn't the hbase.regionserver.wal.enablecompression key at least be in hbase-default.xml?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230493#comment-13230493 ] 

Hudson commented on HBASE-4608:
-------------------------------

Integrated in HBase-0.94 #32 (See [https://builds.apache.org/job/HBase-0.94/32/])
    HBASE-4608 HLog Compression (Revision 1301167)

     Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131888#comment-13131888 ] 

stack commented on HBASE-4608:
------------------------------

Just to be clear, when we talk of compression, we are not talking about gzip or the like?  Such compressors compress in chunks -- e.g. 32k -- with dictionary as preface. If a machine crashes before it flushes the current chunk, you may lose up to the last 32k of edits.  This is not the type of compression that is being worked on here?  Thanks.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183495#comment-13183495 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I got the following on my MacBook for 4608v9.txt:
{code}
testReplayEditsWrittenViaHRegion(org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed)  Time elapsed: 2.009 sec  <<< FAILURE!
java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertTrue(Assert.java:54)
  at org.apache.hadoop.hbase.regionserver.wal.TestWALReplay.testReplayEditsWrittenViaHRegion(TestWALReplay.java:289)
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228947#comment-13228947 ] 

stack commented on HBASE-4608:
------------------------------

Here's my compressing, decompressing, compressing again, decompressing again, then recompressing a random log file from our front end:

{code}
-rw-r--r--    1 stack  staff   64928728 Mar 13 20:43 sv4r25s8%3A60020.1331661889339
-rwxrwxrwx    1 stack  staff   28540761 Mar 13 20:48 sv4r25s8%3A60020.1331661889339.compressed
-rwxrwxrwx    1 stack  staff   28540761 Mar 13 20:58 sv4r25s8%3A60020.1331661889339.compressed.again
-rwxrwxrwx    1 stack  staff   28540761 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.compressed.again.again
-rwxrwxrwx    1 stack  staff   64945799 Mar 13 20:57 sv4r25s8%3A60020.1331661889339.decompressed
-rwxrwxrwx    1 stack  staff   64945799 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.decompressed.again
{code}

Its 44% of original size.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226558#comment-13226558 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

Thanks for the reminder w.r.t. Metadata.

In SequenceFileLogWriter.init(), we can pass Metadata, indicating whether WAL compression is enabled, to SequenceFile.Writer which then gets persisted.
SequenceFile.Reader.getMetadata() would return WAL compression status.

The we don't need the following in SequenceFileLogReader.init():
{code}
    compression = conf.getBoolean(HConstants.ENABLE_WAL_COMPRESSION, false);
{code}

I think the above is important part of the review comments.

Will address adding unit test too, maybe in later iteration.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192622#comment-13192622 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-24 22:26:21.830142)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

last diff was against the wrong (non-trunk) branch.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java c92cc02 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177845#comment-13177845 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2011-12-31 00:20:40.770066)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

WritableContext makes things cleaner. Some space optimizations to make compression even more efficient.


Summary
-------

Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131933#comment-13131933 ] 

Li Pi commented on HBASE-4608:
------------------------------

A form of custom compression. The ability to recover the uncompressed
HLog no matter when the machine crashes is a requirement.

On Thu, Oct 20, 2011 at 11:50 AM, Jonathan Gray (Commented) (JIRA)

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201997#comment-13201997 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4852
-----------------------------------------------------------


I tried to use the command line tool to compress an HLog written by 0.92 and got the follwoing:

Exception in thread "main" java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.close(SequenceFileLogReader.java:192)
        at org.apache.hadoop.hbase.regionserver.wal.Compressor.readFile(Compressor.java:104)
        at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:64)

Also, if you use the command line tool with no arguments, it should print its help (right now it prints an IndexOutOfBOundsException).

I'll try again with an hlog written by trunk - I'm guessing the hlog serialization version might have changed or something.

- Todd


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224786#comment-13224786 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I saw this in region server log:
{code}
2012-03-07 13:01:12,408 INFO  wal.SequenceFileLogWriter (SequenceFileLogWriter.java:init(91)) <<regionserver60020.logRoller>> - WAL compression enabled for hdfs://sea-lab-0:54310/hbase/.logs/sea-lab-5,60020,1331150872956/sea-lab-5%2C60020%2C1331150872956.1331154072399
{code}

After copying the HLog to local, I issued:
{code}
bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -u sea-lab-5%2C60020%2C1331150872956.1331154072399 sea-lab-5.decomp
{code}
I got:
{code}
-rwxr-xr-x 1 hduser hduser 119487372 2012-03-07 14:12 sea-lab-5.decomp
-rw-r--r-- 1 hduser hduser 120660017 2012-03-07 14:11 sea-lab-5%2C60020%2C1331150872956.1331154072399
{code}
When I issued compression command, I saw:
{code}
$ bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -c sea-lab-5.decomp sea-lab-5.comp
12/03/07 14:14:17 INFO wal.SequenceFileLogReader: Input stream class: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker, not adjusting length
12/03/07 14:14:17 INFO wal.SequenceFileLogWriter: WAL compression enabled for sea-lab-5.comp
12/03/07 14:14:17 DEBUG wal.SequenceFileLogWriter: new createWriter -- HADOOP-6840 -- not available
12/03/07 14:14:17 WARN util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
12/03/07 14:14:17 WARN util.NativeCodeLoader: java.library.path=/apache/hbase/bin/../lib/native/Linux-amd64-64
12/03/07 14:14:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/03/07 14:14:17 INFO compress.CodecPool: Got brand-new compressor [.deflate]
12/03/07 14:14:17 DEBUG wal.SequenceFileLogWriter: Path=sea-lab-5.comp, syncFs=true, hflush=true
Exception in thread "main" java.io.IOException: sea-lab-5.decomp, entryStart=124, pos=1406386, end=119487372, edit=0
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:275)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:231)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:200)
	at org.apache.hadoop.hbase.regionserver.wal.Compressor.transformFile(Compressor.java:93)
	at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:59)
Caused by: java.io.IOException: //0 read 36 bytes, should read 22
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2118)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2155)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:229)
	... 3 more
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229294#comment-13229294 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

bq. we don't compress values yet.
Looks like we have something to do in V2 :-)
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Kannan Muthukkaruppan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148679#comment-13148679 ] 

Kannan Muthukkaruppan commented on HBASE-4608:
----------------------------------------------

Li wrote: <<< The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. >>> 

Compression potentially adds some time, but then, yes, you save somewhere else in amount of stuff DFS has to do. I am curious what kind of improvement are you seeing with your changes. Without "sync" (deferred log flushing) the win might be even more. Perhaps, could you share some numbers with and without "sync".


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197591#comment-13197591 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-01 02:50:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100>
bq.  >
bq.  >     If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29, we should be able to tell that the queue is full.
bq.  >     This implies that readFile() would be called multiple times for a single file.

That's beside the point. Using a queue here is just silly. reading a file should probably be a different interface altogether rather than writing to a queue -- ie it should be a pull interface, not a push.

I also mentioned to Li offline that it would make sense to add a metadata header to the HLog sequencefiles which indicates that they're compressed. In that case, this code could just use the existing log reader code and log writer code, but vary the output between compressed/uncompressed using the configuration flag.


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4736
-----------------------------------------------------------


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509633/4608v7.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -149 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/677//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/677//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/677//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229663#comment-13229663 ] 

stack commented on HBASE-4608:
------------------------------

Li asked me lzma some of my logs from the wild.  I did.  W/ lzma --best, it compresses down to 12% of size.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228042#comment-13228042 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

Since WAL compression may be off for the new HLog file version, we would always consult compression type metadata when reading HLog file.
WAL_VERSION is written but is not needed at time of reading HLog.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Attachment: 4608v17.txt

Patch v17 from https://reviews.apache.org/r/4185/
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208236#comment-13208236 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-15 05:23:04, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112>
bq.  >
bq.  >     FileSystem has the following methods:
bq.  >     
bq.  >       /** Returns the configured filesystem implementation.*/
bq.  >       public static FileSystem get(Configuration conf) throws IOException {
bq.  >     
bq.  >       public static FileSystem get(URI uri, Configuration conf) throws IOException {
bq.  >     
bq.  >     I think the second get() should allow you to read HLog on hdfs

see my earlier comment on this review: path.getFilesystem(conf) is what you want to use


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5113
-----------------------------------------------------------


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220394#comment-13220394 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

The reason we need to decrement HLogKey.VERSION is that HBASE-2195 (which introduced HLogKey.VERSION starting at -1) went into 0.92
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131955#comment-13131955 ] 

Todd Lipcon commented on HBASE-4608:
------------------------------------

oops, on the write side you'd also add it to the dict after writing the literal.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228971#comment-13228971 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

In HLogKey.java:
{code}
+   * Enables compression.
+   * 
+   * @param tableDict
+   *          dictionary used for compressing table
+   * @param regionDict
+   *          dictionary used for compressing region
+   */
+  public void setCompressionContext(CompressionContext compressionContext) {
{code}
Please adjust the javadoc above.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227969#comment-13227969 ] 

stack commented on HBASE-4608:
------------------------------

bq. I think WAL_VERSION metadata is orthogonal to compression type metadata and I would expect both to be present in new HLog files written with this feature.

How does it get in if you don't add it?

If you don't want to add it, just don't.  I'm not going to +1 this patch though if it adds metadata about a new compression feature w/o introducing a general versioning on the WAL.

bq. Should the Compression class in wal package ...

The compression class in wal is Compressor.java.

I have trouble following your responses to my comments because they come in w/o context and are also they are done piecemeal which means I have to spend way more time than I should have to reviewing your stuff.  I'd suggest you save up your comments and submit them in a lump rather than hit submit per comment; you'll use up less internet.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Attachment: 4608v29.txt

Patch that addresses Ted and Lars' last set of comments (diff between v28 and v29 is just extra comments and javadoc)
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190189#comment-13190189 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4508
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment10102>

    '/less' should be removed.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment10103>

    javadoc needs update.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment10104>

    Either remove the word 'a' or change it into 'an'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment10105>

    Please change ourKV to keyval or something similar.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment10106>

    Update javadoc to match the context parameter.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment10107>

    I think adding 'the effect of compression would be good' at the end would make the sentence more easily understandable.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/2740/#comment10112>

    Remove whitespace.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/2740/#comment10113>

    This javadoc is more suitable for the init() method.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/2740/#comment10114>

    Please include e in new IOE.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
<https://reviews.apache.org/r/2740/#comment10111>

    Please include e in the new IOE.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
<https://reviews.apache.org/r/2740/#comment10108>

    Please remove year.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
<https://reviews.apache.org/r/2740/#comment10109>

    Please put this line at the end of line 34.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
<https://reviews.apache.org/r/2740/#comment10110>

    'ad' should be 'add'


- Ted


On 2012-01-13 01:37:35, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 01:37:35)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175314#comment-13175314 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 73
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line73>
bq.  >
bq.  >     What if you have a hash collision?
bq.  >     You now overwrite the old value that just happens to have the same hash code. Is that OK?

I overwrite the old value. As long as we do it for both reads and writes, thats okay! (The state of the dictionary must be consistent).


bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 82
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line82>
bq.  >
bq.  >     Here too; what happens for hash collisions?

The old value would have been evicted by the latest value.


bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java, line 84
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65775#file65775line84>
bq.  >
bq.  >     I assume you'll tests with/without compression.

I'm gonna write better tests, this is just sort of a hackwish way to make it work.


bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 130
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65774#file65774line130>
bq.  >
bq.  >     Would sure be nice if we had a KeyValue interface and the implementations would just do the right thing.

Didn't want to create a new KeyValue, or modify it, rather - thus the CompressedKeyValue thing.

I can refactor this.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4100
-----------------------------------------------------------


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222763#comment-13222763 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I got permission from Pi to complete this feature since he is busy with course work.

I created new review request:
https://reviews.apache.org/r/4185/
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208763#comment-13208763 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

@Li:
Do you have time to address Todd and Liying's comments ?

Thanks
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192604#comment-13192604 ] 

Nicolas Spiegelberg commented on HBASE-4608:
--------------------------------------------

I think, if we want to avoid scanning the entire log and seek as an optimization, we should put more effort into rolling logs at a lower size threshold and having log GC be size-based and get rid of (or greatly raise) the file-count-based pressure.

In production, the major bottleneck for us in log replay (after distributed log splitting) has been IO dominated.  We normally don't max out CPU.  Anything we can do to minimize IO size at the expense of CPU would be beneficial to reduction.

As an aside, do we currently compress the output of our log split?  Having the output of the resulting per-region logs be in LZO or GZ format will decrease our reply time, perhaps more than this optimization will.  That said, this feature is very useful, just want to make sure that we're not missing less cool but potentially more beneficial optimizations.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145924#comment-13145924 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2011-11-07 23:12:37.111204)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
-------

Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197529#comment-13197529 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4736
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10469>

    If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29, we should be able to tell that the queue is full.
    This implies that readFile() would be called multiple times for a single file.


- Ted


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229277#comment-13229277 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

bq. HLogKey does not need to know about 'type' of compression.
I agree. But see this code:
{code}
+      Compressor.writeCompressed(this.encodedRegionName, 0,
+          this.encodedRegionName.length, out,
+          compressionContext.regionDict);
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178087#comment-13178087 ] 

Li Pi commented on HBASE-4608:
------------------------------

Guava's mapmaker doesn't guarantee consistent eviction. You'd want to either use 2 LinkedHashMap's or your own LRU style system.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230914#comment-13230914 ] 

Hudson commented on HBASE-4608:
-------------------------------

Integrated in HBase-TRUNK-security #139 (See [https://builds.apache.org/job/HBase-TRUNK-security/139/])
    HBASE-4608 HLog Compression (Revision 1301165)

     Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226532#comment-13226532 ] 

stack commented on HBASE-4608:
------------------------------

bq. The above method allows to start computation at specified offset while existing hashCode() doesn't have this parameter.

Should have at least the same name as the other two methods that do same (pity WritableComparator.hashBytes w/ start offset doesn't exist).

bq. Looking at SequenceFile.Sorter.cloneFileAttributes(), I don't see a convenient way for doing above.

When you create a write on a sequencefile, you can pass metadata: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Metadata.html

bq. For HLogKey, can we designate version of -2 for representing compressed HLogKey ? If HLogKey isn't compressed, we write -1.

I don't know what this is in response to.

What about my other items?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229032#comment-13229032 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

@Stack: fair enough. Let's get this one done. +1 on generalization only when needed and in another jira.

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228157#comment-13228157 ] 

stack commented on HBASE-4608:
------------------------------

Stop making this more complicated than it need be Ted.

WAL_VERSION is global version on WAL log.

Adding a type metadata field for compression makes sense.  If none, presume uncompressed.

You don't need a compression type version.  If we change the format, we can do PREFIX_COMPRESSION_V2.

HLogKeys are serialized independent of their container.  Don't conflate their versioning w/ the suggested WAL log versioning.

Regards PE data, its data is not amenable to compression. Its keys are very basic.   Its likely not a good test evaluating the viability of this feature.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228196#comment-13228196 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

>From HFileBlock:
{code}
  int getMinorVersion() {
    return this.minorVersion;
  }
{code}
>From HFileReaderV2.java:
{code}
  private void validateMinorVersion(Path path, int minorVersion) {
    if (minorVersion < MIN_MINOR_VERSION ||
        minorVersion > MAX_MINOR_VERSION) {
{code}
I think compression type versioning would allow us to perform migration with ease in the future.

PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of compression type + compression version.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Attachment: 4608-v20.txt

Uploaded patch v20 onto review board.

keyContext is used by HLogKey to compress region name and table name of HLogKey
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145950#comment-13145950 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3093
-----------------------------------------------------------


Cool stuff.

I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again.

Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index.
On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right?

(As I said, I am probably missing something).

See minor comments inline.


src/main/java/org/apache/hadoop/hbase/KeyValue.java
<https://reviews.apache.org/r/2740/#comment6899>

    This is functionally the same as before, but less readable. I don't think this leads to much performance improvement.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment6900>

    I think we leave out the line with the year now.
    Lot's of leading whitespace and weird indentation in this file.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment6901>

    passing 0 here? I might be missing something, but looking down at readCompressed that looks wrong.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
<https://reviews.apache.org/r/2740/#comment6902>

    Could we have a no-op compressor instead?


- Lars


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229054#comment-13229054 ] 

stack commented on HBASE-4608:
------------------------------

I like your changes Todd.  Nice fixup.  Lars, let me post v28 for you up on rb.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229292#comment-13229292 ] 

Li Pi commented on HBASE-4608:
------------------------------

Not that we'd compress a random value well at all anyways.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192613#comment-13192613 ] 

Todd Lipcon commented on HBASE-4608:
------------------------------------

Nope, we don't currently compress the log-split output. Good idea, Nicolas. We can use both compression mechanisms there - LZO/Snappy on top of the dictionary compression should be very good. The dictionary compression alone will be a big improvement there, though, since we'll save len(region key) bytes per edit guaranteed.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Attachment: 4608-v22.txt
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu reassigned HBASE-4608:
---------------------------------

    Assignee: Zhihong Yu  (was: Li Pi)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Zhihong Yu
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190573#comment-13190573 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

It occurred to me yesterday that we should clear the dictionaries after each successful memstore flush...?
Otherwise we might have to go further back in the log than necessary in order to replay.

I realize memstore flushes a pre region, whereas the WAL is per region server, still it seems prudent to reset the dictionary after each flush. Thoughts?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185377#comment-13185377 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-13 01:34:31.569679)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

fixed bug in dictionary causing another test to fail. passes small tests now. running med tests.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145984#comment-13145984 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > Cool stuff.
bq.  > 
bq.  > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again.
bq.  > 
bq.  > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index.
bq.  > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right?
bq.  > 
bq.  > (As I said, I am probably missing something).
bq.  > 
bq.  > See minor comments inline.
bq.  
bq.  Li Pi wrote:
bq.      You aren't missing anything! Thats exactly how it works.
bq.      
bq.      Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read.

Ok... What I cannot find then, is the code that builds the dictionary during read :)

Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today.


bq.  On 2011-11-07 23:39:59, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 157
bq.  > <https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line157>
bq.  >
bq.  >     Could we have a no-op compressor instead?
bq.  
bq.  Li Pi wrote:
bq.      no-op compressor? as in one that does nothing?

Yep... So compression will never be null, and we can safe if-statements (and make the code more readable) :)


- Lars


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3093
-----------------------------------------------------------


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229877#comment-13229877 ] 

stack commented on HBASE-4608:
------------------------------

@Ted Would suggest that in future you not piecemeal in your reviews.  Bulk them up.  When review comes in in dribs and drabs, the whole process takes way longer.

@Lars "What portion of the WAL storage do the current WALs represent?"

Do you mean, how much of our footprint is comprised of WAL logs?  Not sure.  I thought intent of this issue was to speed syncs because there'd be less bytes to shuttle across the datanode replicas pipeline.

I'm not wondering if this patch is worth adding?  If compressible stuff is only shrinking by half, is that big enough win?  What do you lot thing?  LZMA is not viable because it takes for ever compressing though its turning SU WALs into 11-14% original size.

Let me try adding lzo numbers but we wouldn't want to use lzo anyways because we could lose a bunch of edits off the end if the compression block was not closed off (Thats my understanding.  I could be wrong).

Li, what happens if we cut the end off a dictionary-compressed file.  Will we be able to read up to the last byte or word or so?

Good stuff.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208358#comment-13208358 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514600/4608v13.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/966//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178078#comment-13178078 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

@Li:
Hadoop QA is taking a vacation :-) See https://builds.apache.org/job/PreCommit-HBASE-Build/

I ran patch v5 on Linux and didn't observe notable issue. But then you have a new patch.
Please try to run your latest patch through test suite.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229453#comment-13229453 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32>
bq.  >
bq.  >     I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case?
bq.  >     I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all.
bq.  
bq.  Li Pi wrote:
bq.      65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes.
bq.      
bq.      If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes.

Actually halve those amounts, 2^15, not 2^16.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5951
-----------------------------------------------------------


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226784#comment-13226784 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517839/4608-v22.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -121 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1154//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1154//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1154//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131940#comment-13131940 ] 

Todd Lipcon commented on HBASE-4608:
------------------------------------

One quick sketch of how this might work:

{code}
interface CompressionDictionary {
  public byte[] getEntry(int idx);
  public int findEntry(byte[] data);
  public int addEntry(byte[] data);
}
{code}

while writing:
start each HLog with an empty CompressionDictionary:

{code}
void writeString(byte[] data) {
  int dictIdx = dict.findEntry(data);
  if (dictIdx == -1) {
    // not in dict
    writeByte(0x00);
    WritableUtils.writeString(data); // current implementation
  } else {
    writeInt((1 << 31) | dictIdx);
  }
}
{code}

while reading:
{code}
byte[] readString(in) {
  in.mark();
  byte firstbyte = in.read();
  if (firstbyte & (1 << 31)) {
    in.reset();
    int dictidx = in.readInt() & ~(1 << 31);
    return dict.getEntry(dictidx);
  } else {
    assert firstbyte == 0;
    byte[] ret = WritableUtils.readString();
    dict.addEntry(ret);
  }
}
{code}

then the dictionary could be implemented as a fixed size associative hash... maybe a cuckoo hash or something exotic (they're on my mind since reading the SILT paper last week)
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213110#comment-13213110 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5254
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11473>

    This comment should also be placed at the beginning of compressFile().



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11472>

    Typo: should be output.getFileSystem(outconf)


- Ted


On 2012-02-21 19:29:20, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-21 19:29:20)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228216#comment-13228216 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

Since Li Pi has done 90% of coding, I think this JIRA should bear his name at the time of integration.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: No :-()
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Attachment: 4608v23.txt

Renamed method enableCompression in all places to be setCompressionContext

Made all instances of compression contexts have same name rather than a new name every time used.

Cleaned up unused 'compression' data member flag or moved them local from being data members when only used by a single method.

Removed define of TRUE and repeat of ENABLE_WAL_COMPRESSION key from
SequenceFileLogReader.  No longer needed.

Rather than have the sequencefile metadata code making sprinkled over the reader and writer, instead do all in writer and have reader use write methods.

Added a global WAL type as metadata.

Added a compression type to metadata.

Renamed method WALCompressionEnabled as isWALCompressionEnabled.

Added some small tests to TestLRUDictionary and a new TestCompressor that taught me how this stuff works.  Added documentation to methods where I was surprised; e.g. addEntry will happily add new entry even though already has dictionary entry.

Miscellaneous cleanup.

I ran this compression on one of our production logs and it halved its size.  See below.  I then decompressed and then recompressed and I got the same size back.

{code}
-rwxrwxrwx   1 stack  staff  28540761 Mar 13 16:47 sv4r25s8%3A60020.1331661889339.out.out.out
-rwxrwxrwx   1 stack  staff  64945799 Mar 13 16:45 sv4r25s8%3A60020.1331661889339.out.out
-rwxrwxrwx   1 stack  staff  28540761 Mar 13 16:44 sv4r25s8%3A60020.1331661889339.out
-rw-r--r--   1 stack  staff  64928728 Mar 13 16:25 sv4r25s8%3A60020.1331661889339
{code}

Will run more of our production logs through the compressor this evening to see if I can turn up bugs.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229890#comment-13229890 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518416/4608v30.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 11 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestMetaScanner

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181790#comment-13181790 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-07 03:13:33.237858)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu reassigned HBASE-4608:
---------------------------------

    Assignee: Li Pi  (was: Zhihong Yu)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229893#comment-13229893 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

bq. I'm not wondering if this patch is worth adding? If compressible stuff is only shrinking by half, is that big enough win? What do you lot thing? LZMA is not viable because it takes for ever compressing though its turning SU WALs into 11-14% original size.

You mean you are *now* wondering? :) IMHO: The WAL is probably the greatest source of synchronous IO that we generate, cutting this in half seems quite valuable (maybe this will be less valuable in the future if/when HDFS can do parallel replication instead of chaining - but it is now).
I agree that none of the block based compression schemes would be good options... Was merely curious about HLog archiving, which is quite unrelated to this issue.

+1, let's commit this.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Li Pi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Li Pi updated HBASE-4608:
-------------------------

    Attachment: 4608v7.txt
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192024#comment-13192024 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 34
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line34>
bq.  >
bq.  >     '/less' should be removed.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 42
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line42>
bq.  >
bq.  >     javadoc needs update.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 43
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line43>
bq.  >
bq.  >     Either remove the word 'a' or change it into 'an'

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 78
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line78>
bq.  >
bq.  >     Please change ourKV to keyval or something similar.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 82
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line82>
bq.  >
bq.  >     Update javadoc to match the context parameter.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 94
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line94>
bq.  >
bq.  >     I think adding 'the effect of compression would be good' at the end would make the sentence more easily understandable.

fixed


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 60
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68710#file68710line60>
bq.  >
bq.  >     Remove whitespace.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 154
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68710#file68710line154>
bq.  >
bq.  >     This javadoc is more suitable for the init() method.

fixed.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 186
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68710#file68710line186>
bq.  >
bq.  >     Please include e in new IOE.

fixed. I assume you mean store it as the cause.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 93
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68711#file68711line93>
bq.  >
bq.  >     Please include e in the new IOE.

fixed above.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 2
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68712#file68712line2>
bq.  >
bq.  >     Please remove year.

fixed above.


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 35
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68712#file68712line35>
bq.  >
bq.  >     Please put this line at the end of line 34.

fixed


bq.  On 2012-01-20 22:56:07, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 53
bq.  > <https://reviews.apache.org/r/2740/diff/12/?file=68712#file68712line53>
bq.  >
bq.  >     'ad' should be 'add'

fixed.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4508
-----------------------------------------------------------


On 2012-01-13 01:37:35, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 01:37:35)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175315#comment-13175315 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 73
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line73>
bq.  >
bq.  >     What if you have a hash collision?
bq.  >     You now overwrite the old value that just happens to have the same hash code. Is that OK?
bq.  
bq.  Li Pi wrote:
bq.      I overwrite the old value. As long as we do it for both reads and writes, thats okay! (The state of the dictionary must be consistent).

I see, because read and write would do that in the same order.


bq.  On 2011-12-23 06:34:53, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 130
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65774#file65774line130>
bq.  >
bq.  >     Would sure be nice if we had a KeyValue interface and the implementations would just do the right thing.
bq.  
bq.  Li Pi wrote:
bq.      Didn't want to create a new KeyValue, or modify it, rather - thus the CompressedKeyValue thing.
bq.      
bq.      I can refactor this.

That was just a general comment. I've thinking quite often how our life would be nice if KeyValue was just an interface rather than a concrete class. Fixing that would be a huge PITA... Different jira :)


- Lars


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4100
-----------------------------------------------------------


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Li Pi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Li Pi updated HBASE-4608:
-------------------------

    Attachment: 4608v13.txt
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208234#comment-13208234 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5113
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11203>

    FileSystem has the following methods:
    
      /** Returns the configured filesystem implementation.*/
      public static FileSystem get(Configuration conf) throws IOException {
    
      public static FileSystem get(URI uri, Configuration conf) throws IOException {
    
    I think the second get() should allow you to read HLog on hdfs


- Ted


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169114#comment-13169114 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

Are you still working on this Li?
I think this is an important featured to get into HBase, especially if we want to do log archival for backups and PIT restores.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212832#comment-13212832 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96>
bq.  >
bq.  >     rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here.
bq.  
bq.  Li Pi wrote:
bq.      This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary.

fixed.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4732
-----------------------------------------------------------


On 2012-02-21 19:29:20, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-21 19:29:20)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229697#comment-13229697 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 107
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line107>
bq.  >
bq.  >     Nit: Comment here that the status byte is the higher order byte of the dict entry.

done in next version


bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line108>
bq.  >
bq.  >     I assume we're entirely sure that a dictionary will never have > 2^15 entries.
bq.  
bq.  Li Pi wrote:
bq.      It'll start evicting once it hits its max size, which is currently 2 ^ 15.

Added comment to LRUDictionary on what happens when it hits limit as well as a comment on max expected size of dictionary for any one WAL.


bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 128
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line128>
bq.  >
bq.  >     Nit: The naming convention is a bit strange.
bq.  >     This one is called uncompress... whereas the method returning a new byte[] is called readCompressed

Its not the worst.  Its descriptive I think.


bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1678
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92104#file92104line1678>
bq.  >
bq.  >     Have a constructor that takes a compression context too?
bq.  >     It seems like once anything has been written to the HLog this should be immutable.

That won't work for writing case since WAL compression is internal to wal package and the HLog.Entry used writing is made outside of the HLog... which means, for writing case we need above method.  Might work for read side though here we allow 'reuse' of the shell HLog.Entry so would need the above method read side too.... 


bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53>
bq.  >
bq.  >     COMPRESSED is a bit of a strange name.
bq.  >     I happens to be a version of the WAL that supports compression, but it is not necessarily compressed.

Added comment that these enum means 'The WAL version that first had compression'


bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 303
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line303>
bq.  >
bq.  >     ugly whitespace :)

Fixed in next version.


bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32>
bq.  >
bq.  >     I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case?
bq.  >     I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all.
bq.  
bq.  Li Pi wrote:
bq.      65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes.
bq.      
bq.      If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes.
bq.  
bq.  Li Pi wrote:
bq.      Actually halve those amounts, 2^15, not 2^16.

Added above as class comment on class.


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5951
-----------------------------------------------------------


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228193#comment-13228193 ] 

Li Pi commented on HBASE-4608:
------------------------------

Yo, sorry I can't quite work on this. Finals are finished this week, and once that happens, I'll be able to scram.

There doesn't seem to that much left - though I said that about 3 months ago. My bad! Feel free to do as you please, theres not much left on this, and I'm happy that work is getting done. I won't be offended at all if somebody else wants to take their hand at finishing this.

My thoughts on it were this. WAL_VERSION is used to indicate compression type. This is pretty good, because enabling compression would immediately tell older versions that the version was wrong, while newer versions with compression disabled could function alongside older versions without support for compression. 

Also, I had my old benchmarks, and I was getting anywhere from a 20% increase to 40% increase on YCSB loads, depending on the testcase. This seemed pretty impressive to me. Not sure if a bug was introduced. I'll run a few more benchmarks later.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230360#comment-13230360 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

w.r.t. Lars' comment: https://issues.apache.org/jira/browse/HBASE-4608?focusedCommentId=13229010&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13229010

I think it makes sense.
How about introducing an enum CompressionType with values of NONE and DICTIONARY ?
HConstants.ENABLE_WAL_COMPRESSION would be replaced by another String: "hbase.regionserver.wal.compressiontype"
If "hbase.regionserver.wal.compressiontype" doesn't appear in conf, CompressionType.NONE is assumed.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228056#comment-13228056 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I think we may enhance WAL compression using dictionary in the future.
So for DICTIONARY compression type, it is desirable to introduce versioning as well.

I don't have strong opinion about WAL_VERSION actually.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227892#comment-13227892 ] 

Ted Yu commented on HBASE-4608:
-------------------------------

Introducing WAL_VERSION would imply that we may change HLog aspect other than compression in the future.
Is there plan for the above ?
Having another compression type is nice but requires making HLogKey persistence pluggable.

I think it would be better to introduce one meta entry instead of two.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228970#comment-13228970 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

bq. and if all else is equal – same API, etc. – then we don't need to up the global version.
True. But we don't know if the current dictionary compression API is general enough to cover the new compression type.

bq. wal version and compression type. They are not the same thing.
Agreed. But the last paragraph above hinges on the scenario of keeping the same WAL version when new compression type is added.

Suppose we find a way to improve dictionary compression after the integration of this JIRA. Would WAL version increase or stay at 1 ?

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189520#comment-13189520 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I tried to run TestWALReplayCompressed:
{code}
Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.008 sec

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase ---
[INFO] Tests are skipped.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:50.838s
{code}
Looks like the ShutdownHooks took a long time to finish:
{code}
"main" prio=5 tid=104000800 nid=0x100601000 in Object.wait() [100600000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1210)
	- locked <78e887470> (a org.apache.hadoop.fs.FileSystem$ClientFinalizer)
	at java.lang.Thread.join(Thread.java:1263)
	at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79)
	at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24)
	at java.lang.Shutdown.runHooks(Shutdown.java:79)
	at java.lang.Shutdown.sequence(Shutdown.java:123)
	at java.lang.Shutdown.exit(Shutdown.java:168)
	- locked <7faf9d288> (a java.lang.Class for java.lang.Shutdown)
	at java.lang.Runtime.exit(Runtime.java:90)
	at java.lang.System.exit(System.java:921)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:73)
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229051#comment-13229051 ] 

Todd Lipcon commented on HBASE-4608:
------------------------------------

btw, +1 on this new patch after you've double-checked with your logs and run it through the full suite. Lars, did you want to take a look tomorrow before it's committed?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229064#comment-13229064 ] 

stack commented on HBASE-4608:
------------------------------

I reran compress, decompress, compress cycle over my 40 odd random WALs from prod and seems fine w/ v28.  Sizes look right.  No errors.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212813#comment-13212813 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think.
bq.  > 
bq.  > I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed.

checked it out. looks like in YCSB workloads the 0x00 bytes are actually indexes pointing to the 0th entry of the dictionary.


bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 52
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line52>
bq.  >
bq.  >     invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments

fixed.


bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 86-88
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line86>
bq.  >
bq.  >     this code doesn't work properly. Here's what you want to do:
bq.  >     
bq.  >           Configuration conf = new Configuration();
bq.  >           FileSystem fs = path.getFileSystem(conf);
bq.  >

fixed.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4853
-----------------------------------------------------------


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228181#comment-13228181 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

Just my $0.02 here... I think having a compression type + compression version will be hard to grok for newcomers unfamiliar with this area, whereas having a single compression type fields is clear. A new version of a compression algorithm is a new type (IMHO). We do not have compression versions for the HFiles, just compression types.

I think with WAL_VERSION and compression type we have enough flexibility (HLogKey version is really unrelated as it is for other serialization as well).

What do you think Ted?

I'll do some testing as to what the compression ratio is for a few of our scenarios tomorrow.

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229776#comment-13229776 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

I'm still +1 :)

The lzma number are interesting. Maybe a nice (future) solution would be to dictionary compress the HLog while writing, and then when the log is rolled compress it with lzma (since we know the file won't change any more we can compress it wholesale).
This begs the next question: What portion of the WAL storage do the current WALs represent?

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178580#comment-13178580 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4172
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/HConstants.java
<https://reviews.apache.org/r/2740/#comment9402>

    This name may refer to the compression algorithm.
    I think the word 'enable' should be part of the name.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment9403>

    No year needed.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment9404>

    This javadoc should be combined with above block.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment9405>

    Should read 'Compresses and ...'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
<https://reviews.apache.org/r/2740/#comment9407>

    Add license, please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
<https://reviews.apache.org/r/2740/#comment9406>

    Add jaavdoc for this class, please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9408>

    License, please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9409>

    Add javadoc for the parameters, please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/2740/#comment9410>

    Should there be disableCompression ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9411>

    Remove this year line, please.


- Ted


On 2011-12-31 20:19:11, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-31 20:19:11)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Attachment: 4608v30.txt

Accomodate about 50% of Ted's last review (ignoring the trivial).
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219950#comment-13219950 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-03-01 09:58:44.801420)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

Updated as per stack's review.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 17cb0e3 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java bd31ead 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226614#comment-13226614 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I repeated manual decompression based on patch v20.
Still got:
{code}
12/03/09 15:58:30 DEBUG wal.SequenceFileLogWriter: Path=sea-lab-3.comp, syncFs=true, hflush=true
Exception in thread "main" java.io.IOException: sea-lab-3.decomp, entryStart=124, pos=1406386, end=98439940, edit=0
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:276)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:232)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:201)
	at org.apache.hadoop.hbase.regionserver.wal.Compressor.transformFile(Compressor.java:91)
	at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:58)
Caused by: java.io.IOException: //0 read 36 bytes, should read 22
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2118)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2155)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:230)
	... 3 more
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Attachment: 4608v25.txt

This includes Ted reviews (including suggestion that I shorten a line in HConstants).  Also fixed an issue where an NPE was hiding real issue when bad paths passed Compressor tool.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517839/4608-v22.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -121 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1154//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1154//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1154//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208228#comment-13208228 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-02-15 04:57:45.411924)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

fixed as per ted yu's review


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227934#comment-13227934 ] 

stack commented on HBASE-4608:
------------------------------

The tests do not have variety.  I think we should add it here rather than wait for the variety to hit out in the field.

bq. If only compression would evolve, I think checking against compression type metadata would be adequate.

The above begins with a conditional, "If...". 

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213247#comment-13213247 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-02-22 03:46:12.923539)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

fixed typos


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227869#comment-13227869 ] 

stack commented on HBASE-4608:
------------------------------

Is HLog versioned? If not, perhaps instead of a HConstants.WAL_COMPRESSION_VER, add a WAL_VERSION metadata field.  Then have another for compression type (NONE or this)?

bq. For TestLRUDictionary, please outline the combinations that should be added.

Does it not look bare to you?   I'd think that we'd try a paragraph of text going in and out... perhaps test multiple dictionaries in the one file?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227954#comment-13227954 ] 

Ted Yu commented on HBASE-4608:
-------------------------------

bq. Should the Compression class in wal package ...
I only see KeyValueCompression.java under wal package. Please elaborate which class should carry more comments.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222830#comment-13222830 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517160/4608v17.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1103//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1103//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1103//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169117#comment-13169117 ] 

Li Pi commented on HBASE-4608:
------------------------------

Yup. Just finished finals. So I have time again.
On Dec 13, 2011 10:40 PM, "Lars Hofhansl (Commented) (JIRA)" <


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229022#comment-13229022 ] 

stack commented on HBASE-4608:
------------------------------

@Lars Generalizing the compression done here is out of scope for this issue.  The patch was not written that way from the get go.  The reviews done up to like v22odd made no mention of supporting other compression types.  I'd suggest we do it in another issue if and when its wanted.

Let me put v27 up on rb.

bq. I forget, do we also SNAPPY/LZO/GZ compress the HLogs?

We don't do this because these compression algorithms work in blocks of 32k or so.  If not tied off probably on the end we could lose up to 32k of edits.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212829#comment-13212829 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-01-25 06:20:23, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 226
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line226>
bq.  >
bq.  >     NOT_IN_DICTIONARY should be used here.

fixed.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4585
-----------------------------------------------------------


On 2012-02-21 19:29:20, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-21 19:29:20)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181831#comment-13181831 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509763/4608v8fixed.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -149 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 82 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.regionserver.wal.TestHLog
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/693//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/693//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/693//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145322#comment-13145322 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

Review request for Eli Collins and Todd Lipcon.


Summary
-------

Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-4608:
---------------------------------

    Fix Version/s: 0.94.0

Marking for 0.94
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228173#comment-13228173 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

>From what can one conclude who owns the issue ? Assignee ?

I do have an opinion on compression type versioning. I would wait for a concrete design to form.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226660#comment-13226660 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

No :-(
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207444#comment-13207444 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5066
-----------------------------------------------------------


Nice patch and good job !  I have two questions inline and maybe I just misunderstood the code.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11122>

    WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment11120>

    Should the data be added back to the dict in this case?
    dict.addEntry(data) ?


- Liyin


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227984#comment-13227984 ] 

Ted Yu commented on HBASE-4608:
-------------------------------

For code specific review, please use https://reviews.apache.org/r/4185/ where there would be context.

I can add WAL_VERSION as v2 in the metadata.
My question is: would HLog v2 be allowed not to compress Log entries ?

If desirable, we can discuss in more detail, face to face, on the 27th.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226870#comment-13226870 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

For TestLRUDictionary, please outline the combinations that should be added.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147245#comment-13147245 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3136
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
<https://reviews.apache.org/r/2740/#comment6949>

    should compression be added to the HLogKey as well to compress regionName & table?  It seems like the biggest wins will come from table + region + family, which all have user-bounded values.  It might even make sense to have these values in a different dictionary from row & column qualifier, which can be unbounded and might accidentally dominate the dictionary


- Nicolas


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181756#comment-13181756 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-07 01:25:20.762498)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

Addressed ted yu's changes. Also switched SimpleDictionary to LRUDictionary. Much smarter eviction algorithm.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227898#comment-13227898 ] 

stack commented on HBASE-4608:
------------------------------

In TestLRUDictionary, we test a single entry in essence.  We should try it w/ all kinds of rubbish... really long entries, empty entries, null entries.... similar entries... a dictionary for 32k worth of stuff..as we'll do in the wild.  So I'd think?

A test for the new class KeyValueCompression would be good to have too.


enableCompression is an odd name for this method.  Should it be setCompressionContext since that is what it does (you pass null if no compression)... seems odd passing null to 'enableCompression'

Should the Compression class in wal package have more javadoc comments explaining the kinda of compression it does?  Otherwise, it looks like a generic compressor class when in facts its a one-trick pony?

Should this method, WALCompressionEnabled, be isWALCompressionEnabled?

I like your idea of versioning the WAL

Patch is coming along nicely.  Almost there.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176259#comment-13176259 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4125
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9323>

    public class SequenceFileLogWriter implements HLog.Writer {
    And
      public interface Writer {
    
    There is no Closeable mentioned above although Writer has close() method.


- Ted


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175299#comment-13175299 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4098
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9217>

    Apache headers go here.


- Li


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228927#comment-13228927 ] 

stack commented on HBASE-4608:
------------------------------

bq. What would happen when we have a newer version for WAL_VERSION_KEY ?

You mean VERSION?  You mean when new feature added?  This code will change.  We'll likely have a constant for the version that introduces compression, i.e.  version 1, and we'll use it here in this expression instead (I went ahead and added a COMPRESSION_VERSION, the version that introduced compression and will use this new define instead).


bq. Looks like the following check should suffice for isWALCompressionEnabled()...

Nah.  Verify we have sufficient global version first, then check for the type.

Will fix other issues in next version of patch.

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229914#comment-13229914 ] 

Li Pi commented on HBASE-4608:
------------------------------

If a dictionary file gets cut up, you'll be able to read all the way to the end.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177873#comment-13177873 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2011-12-31 02:06:00.510532)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

fixed a failing test.


Summary
-------

Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229913#comment-13229913 ] 

Li Pi commented on HBASE-4608:
------------------------------

If a dictionary file gets cut up, you'll be able to read all the way to the end.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: I repeated manual decompression based on patch v20.
Still got:
{code}
12/03/09 15:58:30 DEBUG wal.SequenceFileLogWriter: Path=sea-lab-3.comp, syncFs=true, hflush=true
Exception in thread "main" java.io.IOException: sea-lab-3.decomp, entryStart=124, pos=1406386, end=98439940, edit=0
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:276)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:232)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:201)
	at org.apache.hadoop.hbase.regionserver.wal.Compressor.transformFile(Compressor.java:91)
	at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:58)
Caused by: java.io.IOException: //0 read 36 bytes, should read 22
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2118)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2155)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:230)
	... 3 more
{code})
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229448#comment-13229448 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32>
bq.  >
bq.  >     I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case?
bq.  >     I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all.

65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes.

If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes. 


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5951
-----------------------------------------------------------


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228950#comment-13228950 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is different from that of sv4r25s8%3A60020.1331661889339
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515472/4608v14.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -134 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 162 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/999//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/999//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/999//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212815#comment-13212815 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 74
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line74>
bq.  >
bq.  >     I think the better way of expressing this usage would be:
bq.  >     
bq.  >     WALCompressor [-u | -c] <input> <output>
bq.  >     
bq.  >       -u - uncompresses the input log
bq.  >       -c - compresses the output log
bq.  >     
bq.  >     Exactly one of -u or -c must be specified
bq.  >     
bq.  >

fixed


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4853
-----------------------------------------------------------


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220347#comment-13220347 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5525
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/2740/#comment11943>

    HLogKey.VERSION should be decremented to -2.
    
    The if statement should be changed to:
    if (version == -1 || keyContext == null)



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/2740/#comment11944>

    The if statement should be changed to:
    if (version == -1 || keyContext == null)


- Ted


On 2012-03-01 09:58:44, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-01 09:58:44)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 17cb0e3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java bd31ead 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178086#comment-13178086 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

How about using Guava's MapMaker ?
>From SingleSizeCache.java:
{code}
    backingMap = new MapMaker().maximumSize(numBlocks - 1)
        .evictionListener(listener).makeMap();
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212830#comment-13212830 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100>
bq.  >
bq.  >     this function requires that the whole log data fit in RAM - not a great assumption
bq.  
bq.  Li Pi wrote:
bq.      old one. will do eventually...

fixed.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4732
-----------------------------------------------------------


On 2012-02-21 19:29:20, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-21 19:29:20)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229291#comment-13229291 ] 

Li Pi commented on HBASE-4608:
------------------------------

Also, figured out why Ted's benchmarks differed from the rest of ours.

PE tool tests with random writes to million rows, each row has a single column whose value is 1000 randomly-generated byte.

This is pretty difficult to compress. The number of rows means that rownames won't fit in the dictionary, and we don't compress values yet.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Status: Open  (was: Patch Available)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227919#comment-13227919 ] 

Ted Yu commented on HBASE-4608:
-------------------------------

bq. Its the test of a single entry only
Please take a look at the following in test:
{code}
    for(int i = 1; i < Short.MAX_VALUE; i++){
      assertTrue(testee.findEntry(BigInteger.valueOf(i).toByteArray(), 0,
          BigInteger.valueOf(i).toByteArray().length) == -1);
    }
{code}
32766 entries of the dictionary are tested.

If only compression would evolve, I think checking against compression type metadata would be adequate.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147363#comment-13147363 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2011-11-09 20:00:52, Nicolas Spiegelberg wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, lines 122-124
bq.  > <https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line122>
bq.  >
bq.  >     should compression be added to the HLogKey as well to compress regionName & table?  It seems like the biggest wins will come from table + region + family, which all have user-bounded values.  It might even make sense to have these values in a different dictionary from row & column qualifier, which can be unbounded and might accidentally dominate the dictionary

Yes. I'm adding compression for regionname, table, and family as well. For this kind of simple 1 way associative dictionary, its likely that those two factors will end up dominating, but other more complex dictionaries can be used, perhaps with more interesting eviction strategies.

I do agree using multiple dictionaries is a simple strategy.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review3136
-----------------------------------------------------------


On 2011-11-07 23:12:37, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-07 23:12:37)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514600/4608v13.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/966//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202032#comment-13202032 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4853
-----------------------------------------------------------


I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think.

I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10650>

    invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10651>

    I think the better way of expressing this usage would be:
    
    WALCompressor [-u | -c] <input> <output>
    
      -u - uncompresses the input log
      -c - compresses the output log
    
    Exactly one of -u or -c must be specified
    
    



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment10649>

    this code doesn't work properly. Here's what you want to do:
    
          Configuration conf = new Configuration();
          FileSystem fs = path.getFileSystem(conf);
    


- Todd


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212850#comment-13212850 ] 

Li Pi commented on HBASE-4608:
------------------------------

@Kannan - heres the quick overview on 4608:

When writing the HLog, it checks a set of dictionaries for the key, cf, qualifier, tablename, and regionname. If these items happen to be in the dictionary, it writes the index, instead of the item. If the item is not in the dictionary, it is added to the dictionary.

When reading from the HLog, it works in the opposite manner. When it encounters an uncompressed item, it adds it to the dictionary. If it encounters an index, it just fetches what it needs from the dictionary. 

The dictionary itself is a simple LRU dictionary, that by default, uses 2 bytes per index. (shorts). There is a seperate dictionary for every different field (e.g. one for tablenames, one for regionnames...). 

The dictionary merely must be consistent, if given a bunch of things in a certain order, it should always assign them the same indices, and always evict in the exact same fashion.


This seems to work fairly well - and noticeably cuts down our write sizes on the vast majority of workloads.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227961#comment-13227961 ] 

Ted Yu commented on HBASE-4608:
-------------------------------

Uploaded v23 onto review board.
After WAL version metadata design is finalized, will add that.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176983#comment-13176983 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 161
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line161>
bq.  >
bq.  >     use constant

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 48
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line48>
bq.  >
bq.  >     LOG.isDebugEnabled -- or maybe this should even be TRACE level

removed this completely, not needed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 34
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line34>
bq.  >
bq.  >     private final

removed completely.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 32
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line32>
bq.  >
bq.  >     this should be all caps -- but also probably something from the configuration

changed


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 23
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65773#file65773line23>
bq.  >
bq.  >     does it have to be public?

now default.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 57
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line57>
bq.  >
bq.  >     hashCode() on a byte[] is identity-based - you should use Bytes.hashCode()

yup. i just figured this out. cost me a ton of pain. was wondering why things weren't compressing the way they should.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, lines 82-85
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65769#file65769line82>
bq.  >
bq.  >     indentation

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, lines 144-150
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65770#file65770line144>
bq.  >
bq.  >     again the Context object here would make things a little cleaner to integrate:
bq.  >     - you can drop "compression" boolean and just check "if (compressionContext != null)"
bq.  >     - you only add one integration point to the existing code instead of lots of new member vars

will do in a refactoring pass.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 90
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line90>
bq.  >
bq.  >     I'd call this clear()

done.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 64
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line64>
bq.  >
bq.  >     equals is identity based here... should use Bytes.equals()
bq.  >     
bq.  >     Also Bytes.equals I believe handles nulls, so you can collapse two of these three clauses together

also just figured this out.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1655
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65768#file65768line1655>
bq.  >
bq.  >     since we have several methods that take all these parameters, and we might want to change the compression scheme in the future, I think it makes sense to introduce a class WALCompressionContext with getters for each of the dictionaries

Will make a compression context during refactoring.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, lines 57-58
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65766#file65766line57>
bq.  >
bq.  >     we should probably use vints here - most keys and many values are <100bytes long, so we could store the lengths in 1 byte instead of the 4 used here

Will do. I didn't bother compression the size values in KeyValue. Should do that as well - squeeze out extra space.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 70
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line70>
bq.  >
bq.  >     should have a finally { in.close(); } probably

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 28
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line28>
bq.  >
bq.  >     extra word "designed"?

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 33
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line33>
bq.  >
bq.  >     example should use arguments like "-u compressed-hlog uncompressed-hlog" rather than "filename" twice

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 37
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line37>
bq.  >
bq.  >     check args.length first and print help if it's not got 3 args

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 43-45
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line43>
bq.  >
bq.  >     should be an 'else if' -- and have a final 'else' clause that gives usage

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 60
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line60>
bq.  >
bq.  >     TODO: need to change this config key to match our others

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 66-69
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line66>
bq.  >
bq.  >     this assumes the whole log's content fits in memory, which shouldn't be necessary... why not loop reading one record from reader and writing one to writer?

will do in optimization pass.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 90
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line90>
bq.  >
bq.  >     should go in finally clause. Also use IOUtils.closeStream as long as "out" implements Closeable (I think it does?)

done.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 114-116
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line114>
bq.  >
bq.  >     why not combine this with the if/else above?

because we need to write our size.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 133
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line133>
bq.  >
bq.  >     most of this byte is wasted - we're only using 2 of the 6 bits... and I think we could actually get rid of EMPTY as well.
bq.  >     
bq.  >     If we limit the dictionaries to 32k entries, then we could use the following:
bq.  >     
bq.  >     If bit 0 == 0: dictionary reference
bq.  >       bits 1 through 15: the dictionary index
bq.  >     if bit 0 == 1: new value
bq.  >       start a varint encoding in this byte
bq.  >     
bq.  >     but let's leave this as is for now just to get the rest of the code-level issues cleaned up

will do optimisation pass next.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 153-159
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line153>
bq.  >
bq.  >     rather than this, why not use varints here so you don't have to specify up front what the size is?

This is how KeyValue stores the length of its stuff. Didn't want to change that. will do during optimisation pass.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4121
-----------------------------------------------------------


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Todd Lipcon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-4608:
-------------------------------

    Attachment: hbase-4608-v28.txt
                hbase-4608-v28-delta.txt

I reviewed the latest patch and made some improvements:
- added a new true unit test for KeyValueCompression
- addressed one of my pieces of review feedback from earlier about the API for uncompressIntoArray
- renamed some methods for clarity
- removed some spurious whitespace changes
- added an enum for HLogKey version so that the comparisons are clearer
- renamed MAXSIZE to MAX_SIZE
- redid the linked list inside BidirectionalLRUMap, since the nomenclature was previously backwards and I found the code hard to read. ("next" is supposed to point towards the tail, not towards the head)
- changed getEntry() to throw an error if you pass an index larger than the current size

I ran the related unit tests and they passed, but did not try the full suite.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Andrew Purtell (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130099#comment-13130099 ] 

Andrew Purtell commented on HBASE-4608:
---------------------------------------

bq. The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes.

A different and complimentary avenue of attack for this issue is HDFS-1783.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221347#comment-13221347 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5597
-----------------------------------------------------------


It may be better if 4608v16.txt is uploaded here.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment12170>

    Can we toggle this config param after in.init() ?
    This way we only create one Configuration



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment12171>

    Should read 'uncompressed array'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment12172>

    This assignment is not necessary.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment12173>

    Should read '... start writing to'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment12174>

    Should read 'the length of entry'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment12175>

    Should we add a check for other sizeBytes values ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment12176>

    wrap long line, please.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment12177>

    Remove extra empty line.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
<https://reviews.apache.org/r/2740/#comment12178>

    This sentence is in parentheses.
    People would think it applies to dictionary indexes.
    Strictly speaking, -1 is not an index.
    
    Better rephrase this sentence.


- Ted


On 2012-03-01 09:58:44, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-01 09:58:44)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 17cb0e3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java bd31ead 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178131#comment-13178131 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508998/4608v6.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -149 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 78 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/645//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/645//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/645//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178033#comment-13178033 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

@Li:
Please use '--no-prefix' to generate diff.
Otherwise Hadoop QA won't be able to apply your patch.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229454#comment-13229454 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-03-14 17:42:21, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line108>
bq.  >
bq.  >     I assume we're entirely sure that a dictionary will never have > 2^15 entries.

It'll start evicting once it hits its max size, which is currently 2 ^ 15.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5951
-----------------------------------------------------------


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229248#comment-13229248 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53>
bq.  >
bq.  >     Introducing enum is a good idea.
bq.  >     I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar.

HLogKey does not need to know about 'type' of compression.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 306
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line306>
bq.  >
bq.  >     How about passing compressionContext and type of field we're reading to Compressor.readCompressed() ?

Generalization is out of scope.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 189
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189>
bq.  >
bq.  >     Hiding LRUDictionary.class is desirable.
bq.  >     Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ?

Out of scope.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 110
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line110>
bq.  >
bq.  >     We introduced compression type in Metadata, how about allowing user to specify compression type using conf ?
bq.  >     Default is dictionary compression.

Customization is out of scope.  "How about..." should have attendant justification.  You can justify generalization of this compression in a new jira.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 139
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line139>
bq.  >
bq.  >     Hiding LRUDictionary.class is desirable.
bq.  >     How about passing conf to CompressionContext ctor ?

The generalization that would require hiding the type of compression being done is out of scope.


This is not a software project that fellas are working on for casual amusement.  New facility should be justified by real-world needs.  This feature is experimental.  It could help w/ our WAL writes.  It may not.  We need to get a basic facility into a release so we can try it.  If it proves its worth, we can spend more time down this avenue.


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5929
-----------------------------------------------------------


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Attachment: 4608v27.txt

Address failed testlrudictionary test and include fix for javadoc Ted suggests.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229010#comment-13229010 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

@Ted: The HLog compression we're doing here is less complicated and has (far) fewer implications on other modules compared to HBASE-4218. I don't think that is a good comparison.

bq. LRUDictionary.class is passed to the context

You do have a point.
Maybe instead of saying
{code}
boolean compression = reader.isWALCompressionEnabled();
if (compression) {
...
{code}

it could be something like
{code}
HLogCompressionType type = reader.getCompressionType();
if (type == ...) {
...
{code}
(just made that up, but you get the idea, and should be an easy change)

@Stack: Is v27 up on RB? I looked at the earlier versions but haven't kept track recently. I'll promise I'll do a review tomorrow. I find it a bit big to just look at the diff.
44% space saving is pretty awesome. I forget, do we also SNAPPY/LZO/GZ compress the HLogs?

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12516721/4608v15.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1076//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1076//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1076//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509763/4608v8fixed.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -149 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 82 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.regionserver.wal.TestHLog
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/693//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/693//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/693//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Attachment: 4608-v19.txt

Patch v19 from review board.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209015#comment-13209015 ] 

Li Pi commented on HBASE-4608:
------------------------------

Doing so right now. Will be done before weekend.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228164#comment-13228164 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

Having PREFIX_COMPRESSION_V2 in the future is equivalent to having compression type version.
It may make compression checking verbose: I think checking against one compression type is better than comparing with every PREFIX_COMPRESSION_Vx.

I agree with the observation about PE data.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228996#comment-13228996 ] 

stack commented on HBASE-4608:
------------------------------

Can I get a +1 from someone else.  Its not a big patch.  Should be a quick review.  Thanks.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178075#comment-13178075 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2011-12-31 20:19:11.951711)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary (updated)
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229884#comment-13229884 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

w.r.t. adding javadoc for offset and length of writeCompressed(), I searched our code base for '@param offset ' and found 48 occurrences.

I like this snippet from HFileReaderV2.java:
{code}
     * @param key key byte array
     * @param offset key offset in the key byte array
     * @param length key length
{code}
Even an empty javadoc is better than missing parameter:
{code}
   * @param offset
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229420#comment-13229420 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5951
-----------------------------------------------------------

Ship it!


Some comments and nits inside. Some extraneous whitespace (can be fixed at commit).


src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/4328/#comment12915>

    Nit: Comment here that the status byte is the higher order byte of the dict entry.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/4328/#comment12916>

    I assume we're entirely sure that a dictionary will never have > 2^15 entries.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/4328/#comment12914>

    Nit: The naming convention is a bit strange.
    This one is called uncompress... whereas the method returning a new byte[] is called readCompressed



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/4328/#comment12917>

    Have a constructor that takes a compression context too?
    It seems like once anything has been written to the HLog this should be immutable.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/4328/#comment12919>

    COMPRESSED is a bit of a strange name.
    I happens to be a version of the WAL that supports compression, but it is not necessarily compressed.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/4328/#comment12920>

    ugly whitespace :)



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
<https://reviews.apache.org/r/4328/#comment12921>

    I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case?
    I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
<https://reviews.apache.org/r/4328/#comment12922>

    I'll trust you folks that a PriorityQueue would not work here.


- Lars


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230449#comment-13230449 ] 

stack commented on HBASE-4608:
------------------------------

I'll commit v30 then.

Thanks all for reviews, etc.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228986#comment-13228986 ] 

stack commented on HBASE-4608:
------------------------------

bq. This would make developing a new compression scheme hard.

Out of scope for this issue.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229024#comment-13229024 ] 

stack commented on HBASE-4608:
------------------------------

Lars, I can't upload a patch to someone else's issue.  Made a new rb at https://reviews.apache.org/r/4328/
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192494#comment-13192494 ] 

Todd Lipcon commented on HBASE-4608:
------------------------------------

Don't we already have to scan the entire log from the beginning on recovery? Log splitting splits entire segments, afaik. Am I forgetting about some index structure or something?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192580#comment-13192580 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

You know more about that than I do :)
I'm saying that we do not need to scan the entire log, especially if we add some custom log replaying tools (for example replaying for region).
If we're not careful now we shut ourselves out from future optimizations.
Might not be a big deal as the logs are rolled anyway and that naturally limits the amount of WALEdit we have to scan go back to find a dictionary.

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131898#comment-13131898 ] 

Jonathan Gray commented on HBASE-4608:
--------------------------------------

I think the idea is a custom compression where we can do stuff like start the HLog with a dictionary of some known repetitive stuff.  It's very similar to the delta encoding work.



                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202158#comment-13202158 ] 

Li Pi commented on HBASE-4608:
------------------------------

The compression uses 2 byte dictionary indices, so the first 255 entries should start off with 0x00. This might be causing it.

@Karthik, I'll try to get documentation out when I'm less busy. This quarter is pretty painful so far.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229921#comment-13229921 ] 

Li Pi commented on HBASE-4608:
------------------------------

On other compression things. I looked into those.

plugging into LZMA was the first thing I thought about doing - performance stops this one though.

There are other optimization we can make, such as modifying the dictionary to take into account frequency, and assigning the highest probability entries to the lowest numbers, then using vints rather than 2 bytes for everything. Note that we shouldn't be able to beat LZMA, because we neither compress values, nor do we compress the SequenceFile overhead. On some workloads, those overheads might be substantial - although I haven't checked.

This is actually pretty close to the challenge displayed by caching, in that we want to keep the most likely to be repeated entries in our dictionary, and evict the rest. I used LRU because LRU was simple, and like caching, pretty much anything results in a substantial performance increase over nothing. 

I'm pretty happy with cutting the WAL size in half on optimal workloads, though as always, it's nice to work towards future performance goals. I have other ideas, but they involve changing the HLog substantially in order to be more compact. In that case, we might end up abandoning the Hadoop Sequencefile format altogether, and this thing becomes a bit more complex.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192625#comment-13192625 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-24 22:27:32.723446)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
  src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java c92cc02 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229137#comment-13229137 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5929
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/4328/#comment12894>

    Introducing enum is a good idea.
    I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/4328/#comment12895>

    How about passing compressionContext and type of field we're reading to Compressor.readCompressed() ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/4328/#comment12891>

    Hiding LRUDictionary.class is desirable.
    Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
<https://reviews.apache.org/r/4328/#comment12892>

    We introduced compression type in Metadata, how about allowing user to specify compression type using conf ?
    Default is dictionary compression.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
<https://reviews.apache.org/r/4328/#comment12893>

    Hiding LRUDictionary.class is desirable.
    How about passing conf to CompressionContext ctor ?


- Ted


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228983#comment-13228983 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I feel the dictionary compression implementation is pervasive throughout the patch.
e.g.:
{code}
+    boolean compression = reader.isWALCompressionEnabled();
+    if (compression) {
+      try {
+        if (compressionContext == null) {
+          compressionContext = new CompressionContext(LRUDictionary.class);
{code}
while isWALCompressionEnabled() sounds general, LRUDictionary.class is passed to the context.
This would make developing a new compression scheme hard.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180999#comment-13180999 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509633/4608v7.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -149 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/677//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/677//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/677//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229432#comment-13229432 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I just thought we should encapsulate LRUDictionary in CompressionContext:
{code}
+    boolean compression = reader.isWALCompressionEnabled();
+    if (compression) {
+      try {
+        if (compressionContext == null) {
+          compressionContext = new CompressionContext(LRUDictionary.class);
{code}
In my opinion CompressionContext shouldn't just be a holder of multiple dictionaries.

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228108#comment-13228108 ] 

Todd Lipcon commented on HBASE-4608:
------------------------------------

bq. First, compression ratio is not good - at least for the data written by PE.
I saw ~40% compression on a YCSB load. So some workloads may have good results whereas others didn't. Did you also re-run the test after fixing the bug? Maybe that skewed the results?

bq. Second, HLogKey persistence becomes dependent on the compression implementation. This would make plugging other compression techniques hard.
I agree we should use a metadata field in the log to describe which compression mechanism is being used.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176984#comment-13176984 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2011-12-29 04:38:25.385999)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

added tests. fixed code issues as mentioned by todd.


Summary
-------

Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229069#comment-13229069 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518303/hbase-4608-v28.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 11 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.coprocessor.TestClassLoading
                  org.apache.hadoop.hbase.client.TestAdmin
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517805/4608-v20.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -121 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1151//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1151//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1151//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222940#comment-13222940 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517177/4608v18.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1108//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1108//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1108//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219947#comment-13219947 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 37
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line37>
bq.  >
bq.  >     Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works?  If not here, where else will doc. on how the Compressor works go?
bq.  >     
bq.  >     Maybe you should purge all mention of WAL from this class -- e.g. WALDictionary -- because it seems like it could be easily generalized (I suppose we can do that later).

Included!


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 47
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line47>
bq.  >
bq.  >     The way the usage is written, -u and -c are optional.  You should fix that.  Looks like they are required going by fact that args.length needs to be 3.  Also, it looks like you take --help, the long form, or -u/-c the short forms.  Either take all short forms or take both long and short form to be consistent.

System.out.println("Exactly one of -u or -c must be specified"); should take care of the required thing.

Help now takes both short and long forms. Everything else just takes short forms.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 66
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line66>
bq.  >
bq.  >     Why is the tool called WALCompressor in the usage but the class I invoke is Compressor?

Probably should be called compressor.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 79
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line79>
bq.  >
bq.  >     This does not need to be an HBaseConfiguration?  There are no configs in hbase-site.xml that might effect whats going on here?

Not really. All that matters is whether compression is on or off.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line108>
bq.  >
bq.  >     Doc the '@return'

fixed.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 141
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line141>
bq.  >
bq.  >     Doc the return

fixed.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1671
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1671>
bq.  >
bq.  >     White space

fixed.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1675
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1675>
bq.  >
bq.  >     When is this called?  Post construction?  Should it be part of constructor?  What happens if its called part way through the writing of a WAL?  Will we start compressing a WAL in the middle?

Its called when an logwriter is created. We will start compression a log in the middle if we happen to call it at that time. But that shouldn't happen.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 270
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78624#file78624line270>
bq.  >
bq.  >     I don't follow whats going on here.  What happens when len >= 0?  Why is it < 0?  Whats that mean?  Whats v2 of hlogkey?  What if keyContext is not null?

HLogKey has two different formats. If len < 0, that means we're reading the old version of the HLog.

Keycontext is the compression context that holds the dictionaries used in compression. If it isn't null, that means compression is enabled.

If len > 0, we're on version 1. We can't compress version 1, but the code for reading version 1 is still in there, for transitioning from earlier HLogs. Compression should never be enabled if we're reading in version 1 Hlogs, because there shouldn't be any version 1 hlogs.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 119
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line119>
bq.  >
bq.  >     Class comment on what this is about?

Just a tuple class for holding the various dictionaries used in compression.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 141
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line141>
bq.  >
bq.  >     Why do I clear this?  Why not just throw it away?  Does clearing make it so I can recycle this instance?

Correct. We clear it so we can recycle this instance instead of having to create a new dictionary. Not sure if this makes a huge difference in terms of performance.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 29
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line29>
bq.  >
bq.  >     Why would I ever let go of terms in the dictionary?  Should you explain why in class comment?

We let go of terms in the dictionary since we have only an finite amount of space, and ability to reference terms of the dictionary.

If we're using a 2 byte key, that limits our reference space to 65536. We could end up using vints for entries into the dictionary, but this could end up with it growing pretty huge.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 64
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line64>
bq.  >
bq.  >     Should this be static?  Does it need reference to outer class?

It doesn't need to reference the outer class. Made static.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 168
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line168>
bq.  >
bq.  >     Class comment?  Should this be static?

made static.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 176
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78627#file78627line176>
bq.  >
bq.  >     Why am I reading whether compression is on or off by looking at config?  Why am I not looking into head of the WAL file and figure its compressed and then decompressing?  Otherwise, if config is disabled but I'm fed a compressed file, do I just burp?  See the white space added here.

We just burp if compression is on and we get fed an uncompressed file. This should be easy to change though - on the read side.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 28
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78629#file78629line28>
bq.  >
bq.  >     Should be just called Dictionary. Its in the wal package.  No need of the redundant prefix?

Sure. But we have WALActionsListener and a bunch of other things starting with WAL. I figured we can just have that as well.

Renamed to dictionary.


bq.  On 2012-02-22 05:11:37, Michael Stack wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 38
bq.  > <https://reviews.apache.org/r/2740/diff/19/?file=78634#file78634line38>
bq.  >
bq.  >     This will run all the tests in TestWALReplay?  Nice.

Yup. thats exactly what it does.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5265
-----------------------------------------------------------


On 2012-02-22 03:46:12, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-22 03:46:12)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185378#comment-13185378 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-13 01:37:35.790343)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

removed debug printf.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228972#comment-13228972 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518291/4608v25.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223890#comment-13223890 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517341/4608-v19.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1124//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1124//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1124//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228965#comment-13228965 ] 

stack commented on HBASE-4608:
------------------------------

bq. It implies that WAL_VERSION is the same as COMPRESSION_VERSION.

Yes.  Thats right.  The current global version is the version that introduces WAL compression.

bq. As I explained earlier, we would likely have another compression scheme for WAL in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION

You are conflating wal version and compression type.  They are not the same thing.

If we introduce a new compression type only, and if all else is equal -- same API, etc. -- then we don't need to up the global version.  We are just adding a new compression type.  Either we support it or we don't.  If we don't we'll throw unsupported compression type (the dictionary compression type is currently called DICTIONARY_COMPRESSION_TYPE).
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177852#comment-13177852 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

@Li:
Do you want submit latest patch to Hadoop QA ?

Thanks
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145566#comment-13145566 ] 

Todd Lipcon commented on HBASE-4608:
------------------------------------

I haven't looked at the patch yet, but it would be great if you could build a tool to go along with this for testing that compresses/decompresses logs. EG:

bin/hbase org.apache.hadoop.hbase.....HLogTool -compress /path/to/hlog /path/to/hlog.compressed
bin/hbase org.apache.hadoop.hbase.....HLogTool -uncompress /path/to/hlog.compressed /path/to/hlog
.. or something like that.

Then real users could see what kind of compression ratio they could expect (and it serves as a decent test that compress/uncompress yields the original file)
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229080#comment-13229080 ] 

Li Pi commented on HBASE-4608:
------------------------------

@Stack nvm, just read upwards. That's inline with the other results by Todd and I.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183024#comment-13183024 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-10 02:34:06.162265)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226519#comment-13226519 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

{code}
+  public static int hashBytes(byte[] bytes, int offset, int length) {
{code}
The above method allows to start computation at specified offset while existing hashCode() doesn't have this parameter.

The remark of putting compression flag as sequence file attribute is really good.
Looking at SequenceFile.Sorter.cloneFileAttributes(), I don't see a convenient way for doing above.

For HLogKey, can we designate version of -2 for representing compressed HLogKey ? If HLogKey isn't compressed, we write -1.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228122#comment-13228122 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

We're looking at several metadata fields for version:
1. WAL_VERSION for HLog file
2. compression type for HLog file
3. compression major (minor) version
4. HLogKey version (covered in latest patch)

It would create some confusion w.r.t. the different combinations of the above 4
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228202#comment-13228202 ] 

stack commented on HBASE-4608:
------------------------------

bq. PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of compression type + compression version.

Ted, you misunderstood.  The above was suggested name for a new compression type, a version two of prefix compression.

Your bringing hfile compression versioning in here is an unnecessary complication, IMO.  Compression will not have the variety here it does over in hfile (IMO).

bq. I think compression type versioning would allow us to perform migration with ease in the future.

Not needed.  We will have compression types and WAL file global versioning.  That should be sufficient describing future evolutions, IMO.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517341/4608-v19.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1124//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1124//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1124//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192033#comment-13192033 ] 

Li Pi commented on HBASE-4608:
------------------------------

@Lars

Unless we know when exactly the dictionary is flushed, we can't rebuild the original HLog, can't we?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212245#comment-13212245 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-14 01:33:09, Liyin Tang wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 230
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line230>
bq.  >
bq.  >     Should the data be added back to the dict in this case?
bq.  >     dict.addEntry(data) ?

This is taken care of during findentry.


bq.  On 2012-02-14 01:33:09, Liyin Tang wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 192
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line192>
bq.  >
bq.  >     WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function.

This is part of the way HBase stores data uncompressed. It doesn't use an vInt.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review5066
-----------------------------------------------------------


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212879#comment-13212879 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515472/4608v14.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -134 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 162 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/999//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/999//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/999//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181846#comment-13181846 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

>From https://builds.apache.org/job/PreCommit-HBASE-Build/693//testReport/org.apache.hadoop.hbase.regionserver.wal/TestHLog/testAppendClose/:
{code}
java.net.BindException: Problem binding to localhost/127.0.0.1:50150 : Address already in use
	at org.apache.hadoop.ipc.Server.bind(Server.java:227)
{code}
Strange, was the above caused by parallel test case execution ?
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192627#comment-13192627 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-24 22:29:18.791094)


Review request for hbase, Eli Collins and Todd Lipcon.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227946#comment-13227946 ] 

Ted Yu commented on HBASE-4608:
-------------------------------

I think WAL_VERSION metadata is orthogonal to compression type metadata and I would expect both to be present in new HLog files written with this feature.
Say we define WAL_VERSION as v2 which has WAL compression capability. We still need to check compression type metadata before applying dictionary compression.
In this regard adding WAL_VERSION seems to be redundant.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226615#comment-13226615 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

I repeated manual decompression based on patch v20.
Still got:
{code}
12/03/09 15:58:30 DEBUG wal.SequenceFileLogWriter: Path=sea-lab-3.comp, syncFs=true, hflush=true
Exception in thread "main" java.io.IOException: sea-lab-3.decomp, entryStart=124, pos=1406386, end=98439940, edit=0
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:276)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:232)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:201)
	at org.apache.hadoop.hbase.regionserver.wal.Compressor.transformFile(Compressor.java:91)
	at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:58)
Caused by: java.io.IOException: //0 read 36 bytes, should read 22
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2118)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2155)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:230)
	... 3 more
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226642#comment-13226642 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

Simplified Compressor tool.
We read compression status from input HLog.
There is no need to pass -u or -c now.

I tested new build on the HLog used @ 10/Mar/12 00:00 with the new syntax.

I uploaded patch v21 onto review board.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185361#comment-13185361 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/
-----------------------------------------------------------

(Updated 2012-01-13 00:58:40.183584)


Review request for hbase, Eli Collins and Todd Lipcon.


Changes
-------

fixed failing test. added a few new ones to detect LRU dictionary failure.


Summary
-------

HLog compression. Has unit tests and a command line tool for compressing/decompressing.


This addresses bug HBase-4608.
    https://issues.apache.org/jira/browse/HBase-4608


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2740/diff


Testing
-------


Thanks,

Li


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Li Pi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Li Pi updated HBASE-4608:
-------------------------

    Attachment: 4608v6.txt

no prefix patch
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210601#comment-13210601 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 37
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line37>
bq.  >
bq.  >     I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" -- since this is just utility functions, not actually an instance of a compressed keyvalue.

fixed. legacy name. <3 eclipse.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 207
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line207>
bq.  >
bq.  >     *un*compressed value, right?

fixed.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 28
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70701#file70701line28>
bq.  >
bq.  >     Since this is so simple, I'd move it to be a static inner class of KVCompression above

fixed.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 152
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line152>
bq.  >
bq.  >     why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses

fixed.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 174
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line174>
bq.  >
bq.  >     switch order of "in" and "offset" here.
bq.  >     
bq.  >     Perhaps clearer to name this as "uncompressIntoArray"?

fixed.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 44
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line44>
bq.  >
bq.  >     I think we can merge this with the other class that just has static methods as well.

Compressor contains static methods for general purpose compression. KeyValueCompression.java contains static methods for compressing the KeyValue type. Should I merge them?


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 185
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line185>
bq.  >
bq.  >     worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary

done


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96>
bq.  >
bq.  >     rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here.

This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary.


bq.  On 2012-02-01 02:29:54, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100>
bq.  >
bq.  >     this function requires that the whole log data fit in RAM - not a great assumption

old one. will do eventually...


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4732
-----------------------------------------------------------


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229771#comment-13229771 ] 

stack commented on HBASE-4608:
------------------------------

Here's some WALs to compared compressed w/ patch v29 vs lzma and then the dictionary compressed file itself lzma'd (Todd request).  LZMA'ing the dictionary compressed file makes it smaller than the lzma'd original.  lzma'ing the compressed file makes it 1/4 size of dictionary compressed file (roughly).  I didn't get a chance to lzo it....

{code}
....
-rw-r--r--   1 stack  staff  64589199 Mar 13 20:24 sv4r21s12%3A60020.1331685637452
-rwxrwxrwx   1 stack  staff  28906432 Mar 14 15:34 sv4r21s12%3A60020.1331685637452.compressed
-rw-r--r--   1 stack  staff   7417213 Mar 14 16:25 sv4r21s12%3A60020.1331685637452.compressed.lzma
-rw-r--r--   1 stack  staff   8511618 Mar 14 16:24 sv4r21s12%3A60020.1331685637452.lzma
-rw-r--r--   1 stack  staff  63755620 Mar 13 20:24 sv4r21s12%3A60020.1331687005652
-rwxrwxrwx   1 stack  staff  28804928 Mar 14 15:34 sv4r21s12%3A60020.1331687005652.compressed
-rw-r--r--   1 stack  staff   6866107 Mar 14 16:28 sv4r21s12%3A60020.1331687005652.compressed.lzma
-rw-r--r--   1 stack  staff   8328771 Mar 14 16:27 sv4r21s12%3A60020.1331687005652.lzma
-rw-r--r--   1 stack  staff  63755688 Mar 13 20:24 sv4r21s12%3A60020.1331688224458
-rwxrwxrwx   1 stack  staff  27701052 Mar 14 15:34 sv4r21s12%3A60020.1331688224458.compressed
-rw-r--r--   1 stack  staff   6614637 Mar 14 16:31 sv4r21s12%3A60020.1331688224458.compressed.lzma
-rw-r--r--   1 stack  staff   8462991 Mar 14 16:31 sv4r21s12%3A60020.1331688224458.lzma
-rw-r--r--   1 stack  staff  64024836 Mar 13 20:24 sv4r21s12%3A60020.1331689518188
-rwxrwxrwx   1 stack  staff  28851435 Mar 14 15:34 sv4r21s12%3A60020.1331689518188.compressed
-rw-r--r--   1 stack  staff   6677112 Mar 14 16:35 sv4r21s12%3A60020.1331689518188.compressed.lzma
-rw-r--r--   1 stack  staff   8158847 Mar 14 16:34 sv4r21s12%3A60020.1331689518188.lzma
-rw-r--r--   1 stack  staff  63757131 Mar 13 20:24 sv4r21s12%3A60020.1331690608900
-rwxrwxrwx   1 stack  staff  28201506 Mar 14 15:34 sv4r21s12%3A60020.1331690608900.compressed
-rw-r--r--   1 stack  staff   6941982 Mar 14 16:38 sv4r21s12%3A60020.1331690608900.compressed.lzma
-rw-r--r--   1 stack  staff   8513895 Mar 14 16:37 sv4r21s12%3A60020.1331690608900.lzma
-rw-r--r--   1 stack  staff  63754114 Mar 13 20:24 sv4r21s12%3A60020.1331691711502
-rwxrwxrwx   1 stack  staff  28318314 Mar 14 15:34 sv4r21s12%3A60020.1331691711502.compressed
-rw-r--r--   1 stack  staff   7392701 Mar 14 16:42 sv4r21s12%3A60020.1331691711502.compressed.lzma
-rw-r--r--   1 stack  staff   9136798 Mar 14 16:41 sv4r21s12%3A60020.1331691711502.lzma
-rw-r--r--   1 stack  staff  63756667 Mar 13 20:24 sv4r21s12%3A60020.1331692886725
-rwxrwxrwx   1 stack  staff  28309792 Mar 14 15:34 sv4r21s12%3A60020.1331692886725.compressed
-rw-r--r--   1 stack  staff   7139965 Mar 14 16:44 sv4r21s12%3A60020.1331692886725.compressed.lzma
-rw-r--r--   1 stack  staff   8968155 Mar 14 16:43 sv4r21s12%3A60020.1331692886725.lzma
-rw-r--r--   1 stack  staff  63755003 Mar 13 20:24 sv4r21s12%3A60020.1331694049033
-rwxrwxrwx   1 stack  staff  28127053 Mar 14 15:35 sv4r21s12%3A60020.1331694049033.compressed
-rw-r--r--   1 stack  staff   6498486 Mar 14 16:45 sv4r21s12%3A60020.1331694049033.compressed.lzma
-rw-r--r--   1 stack  staff   8175618 Mar 14 16:45 sv4r21s12%3A60020.1331694049033.lzma
-rw-r--r--   1 stack  staff  23441144 Mar 13 20:24 sv4r21s12%3A60020.1331695045194
-rwxrwxrwx   1 stack  staff  10561645 Mar 14 15:35 sv4r21s12%3A60020.1331695045194.compressed
-rw-r--r--   1 stack  staff   2922204 Mar 14 16:46 sv4r21s12%3A60020.1331695045194.compressed.lzma
-rw-r--r--   1 stack  staff   3228837 Mar 14 16:46 sv4r21s12%3A60020.1331695045194.lzma
{code}
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228194#comment-13228194 ] 

Li Pi commented on HBASE-4608:
------------------------------

Yo, sorry I can't quite work on this. Finals are finished this week, and once that happens, I'll be able to scram.

There doesn't seem to that much left - though I said that about 3 months ago. My bad! Feel free to do as you please, theres not much left on this, and I'm happy that work is getting done. I won't be offended at all if somebody else wants to take their hand at finishing this.

My thoughts on it were this. WAL_VERSION is used to indicate compression type. This is pretty good, because enabling compression would immediately tell older versions that the version was wrong, while newer versions with compression disabled could function alongside older versions without support for compression. 

Also, I had my old benchmarks, and I was getting anywhere from a 20% increase to 40% increase on YCSB loads, depending on the testcase. This seemed pretty impressive to me. Not sure if a bug was introduced. I'll run a few more benchmarks later.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Li Pi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Li Pi updated HBASE-4608:
-------------------------

    Attachment: 4608v8fixed.txt
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228981#comment-13228981 ] 

stack commented on HBASE-4608:
------------------------------

We need facility in wal like we have in hfile for printing statistics on load carried. Our frontend is loads of counters.  I've not verified.  Should be random enough in table naming and region though so should be doing a bit of exercise of the compression code.

I'm game for committing this as a first cut if I can get a +1.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230490#comment-13230490 ] 

Lars Hofhansl commented on HBASE-4608:
--------------------------------------

Yeah! And, yes, time for an RC.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208226#comment-13208226 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-01-25 06:20:23, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112>
bq.  >
bq.  >     I would expect different implementations to be instantiated based on the prefix of path.

I figured people would only use this on their local machine. I guess the path can actually point to HDFS. Got any examples of how to do this easily?


bq.  On 2012-01-25 06:20:23, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 116
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line116>
bq.  >
bq.  >     Why do we instantiate Configuration again (there is already one @ line 113) ?

Hmm. Good point. Waste of heap, but I wasn't really optimizing the command line tool. Fixed!


bq.  On 2012-01-25 06:20:23, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 71
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line71>
bq.  >
bq.  >     Should we verify that length is larger than pos ?

I don't think it makes a difference. 


bq.  On 2012-01-25 06:20:23, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 169
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line169>
bq.  >
bq.  >     Typo, should read 'to start reading from'.

fixed.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4585
-----------------------------------------------------------


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227901#comment-13227901 ] 

Ted Yu commented on HBASE-4608:
-------------------------------

bq. try a paragraph of text going in and out
LRUDictionary deals with byte array:
{code}
  public short findEntry(byte[] data, int offset, int length) {
{code}
In this regard, piping text into the dictionary is functionally same as piping byte[] form of integer.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-4608:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508998/4608v6.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -149 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 78 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/645//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/645//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/645//console

This message is automatically generated.)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229678#comment-13229678 ] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53>
bq.  >
bq.  >     Introducing enum is a good idea.
bq.  >     I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar.
bq.  
bq.  Michael Stack wrote:
bq.      HLogKey does not need to know about 'type' of compression.

Adding comments around the versions to give some context on why enums are named so.


bq.  On 2012-03-14 11:46:10, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 189
bq.  > <https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189>
bq.  >
bq.  >     Hiding LRUDictionary.class is desirable.
bq.  >     Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ?
bq.  
bq.  Michael Stack wrote:
bq.      Out of scope.

Yeah, adding a factory to choose between different compression context types when we have only one compression type available is out of scope for this issue.


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4328/#review5929
-----------------------------------------------------------


On 2012-03-14 07:34:58, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4328/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-14 07:34:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  See issue
bq.  
bq.  
bq.  This addresses bug hbase-4608.
bq.      https://issues.apache.org/jira/browse/hbase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c 
bq.    src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4328/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4608) HLog Compression

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4608:
-------------------------

    Status: Open  (was: Patch Available)
    
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221319#comment-13221319 ] 

Hadoop QA commented on HBASE-4608:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12516888/4608v16.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -127 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1082//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1082//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1082//console

This message is automatically generated.
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228954#comment-13228954 ] 

Zhihong Yu commented on HBASE-4608:
-----------------------------------

Please wrap long line:
{code}
+  public static final String ENABLE_WAL_COMPRESSION = "hbase.regionserver.wal.enablecompression";
{code}
w.r.t. the following code:
{code}
+  static final int VERSION = COMPRESSION_VERSION;
+  static final Text WAL_VERSION = new Text("" + VERSION);
{code}
It implies that WAL_VERSION is the same as COMPRESSION_VERSION.
As I explained earlier, we would likely have another compression scheme for WAL in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION e.g.
Then we face a choice: what value would WAL_VERSION carry ?

I propose naming COMPRESSION_VERSION above DICTIONARY_COMPRESSION_VERSION and decouple it from WAL_VERSION.
In the future, WAL_VERSION of 2 can carry either dictionary or prefix compression.

                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: stack
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4608) HLog Compression

Posted by "Li Pi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230500#comment-13230500 ] 

Li Pi commented on HBASE-4608:
------------------------------

Woohoo! It's in!
                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>             Fix For: 0.94.0
>
>         Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira