You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michele Catasta (JIRA)" <ji...@apache.org> on 2010/06/02 15:47:39 UTC

[jira] Created: (HBASE-2655) 2-pass compression support

2-pass compression support
--------------------------

                 Key: HBASE-2655
                 URL: https://issues.apache.org/jira/browse/HBASE-2655
             Project: HBase
          Issue Type: New Feature
          Components: io
            Reporter: Michele Catasta
            Priority: Minor
             Fix For: 0.21.0


Quoting from BigTable paper: "Many clients use a two-pass custom compression scheme. The first pass uses Bentley and McIlroy's scheme, which compresses long common strings across a large window. The second pass uses a fast compression algorithm that looks for repetitions in a small 16 KB window of the data. Both compression passes are very fast—they encode at 100-200 MB/s, and decode at 400-1000 MB/s on modern machines."

The goal of this patch is to integrate a similar compression scheme in HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2655) 2-pass compression support

Posted by "Michele Catasta (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michele Catasta updated HBASE-2655:
-----------------------------------

    Attachment: HBASE-2655.diff

> 2-pass compression support
> --------------------------
>
>                 Key: HBASE-2655
>                 URL: https://issues.apache.org/jira/browse/HBASE-2655
>             Project: HBase
>          Issue Type: New Feature
>          Components: io
>            Reporter: Michele Catasta
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2655.diff
>
>
> Quoting from BigTable paper: "Many clients use a two-pass custom compression scheme. The first pass uses Bentley and McIlroy's scheme, which compresses long common strings across a large window. The second pass uses a fast compression algorithm that looks for repetitions in a small 16 KB window of the data. Both compression passes are very fast—they encode at 100-200 MB/s, and decode at 400-1000 MB/s on modern machines."
> The goal of this patch is to integrate a similar compression scheme in HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2655) 2-pass compression support

Posted by "Michele Catasta (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michele Catasta updated HBASE-2655:
-----------------------------------

    Attachment: HBASE-2655.patch

@stack: addressed the user experience problem you were talking about in HBASE-2681. I updated the patch to let it depend on that code change (and the JIRA issue as well)

> 2-pass compression support
> --------------------------
>
>                 Key: HBASE-2655
>                 URL: https://issues.apache.org/jira/browse/HBASE-2655
>             Project: HBase
>          Issue Type: New Feature
>          Components: io
>            Reporter: Michele Catasta
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2655.patch
>
>
> Quoting from BigTable paper: "Many clients use a two-pass custom compression scheme. The first pass uses Bentley and McIlroy's scheme, which compresses long common strings across a large window. The second pass uses a fast compression algorithm that looks for repetitions in a small 16 KB window of the data. Both compression passes are very fast—they encode at 100-200 MB/s, and decode at 400-1000 MB/s on modern machines."
> The goal of this patch is to integrate a similar compression scheme in HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2655) 2-pass compression support

Posted by "Michele (@pirroh) Catasta (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874616#action_12874616 ] 

Michele (@pirroh) Catasta commented on HBASE-2655:
--------------------------------------------------

Right, {NAME=>'cfamily', COMPRESSION=>'BMZ'} will do the job.

w.r.t. runtime exception: at the moment, it's happening the same for LZO. I just reproduced the behavior I found in that class *smile*
Default fallback to NONE might be an option as well, but it would let you create the table anyway - so people that are using hbase shell SCRIPT to create tables might experience some regressions. Matter of tastes I'd say!
Anyway, if you agree on that I'll create another jira to deal with Compression.java and update this patch as well.


> 2-pass compression support
> --------------------------
>
>                 Key: HBASE-2655
>                 URL: https://issues.apache.org/jira/browse/HBASE-2655
>             Project: HBase
>          Issue Type: New Feature
>          Components: io
>            Reporter: Michele (@pirroh) Catasta
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2655.diff
>
>
> Quoting from BigTable paper: "Many clients use a two-pass custom compression scheme. The first pass uses Bentley and McIlroy's scheme, which compresses long common strings across a large window. The second pass uses a fast compression algorithm that looks for repetitions in a small 16 KB window of the data. Both compression passes are very fast—they encode at 100-200 MB/s, and decode at 400-1000 MB/s on modern machines."
> The goal of this patch is to integrate a similar compression scheme in HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2655) 2-pass compression support

Posted by "Michele Catasta (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michele Catasta updated HBASE-2655:
-----------------------------------

    Attachment:     (was: HBASE-2655.diff)

> 2-pass compression support
> --------------------------
>
>                 Key: HBASE-2655
>                 URL: https://issues.apache.org/jira/browse/HBASE-2655
>             Project: HBase
>          Issue Type: New Feature
>          Components: io
>            Reporter: Michele Catasta
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2655.patch
>
>
> Quoting from BigTable paper: "Many clients use a two-pass custom compression scheme. The first pass uses Bentley and McIlroy's scheme, which compresses long common strings across a large window. The second pass uses a fast compression algorithm that looks for repetitions in a small 16 KB window of the data. Both compression passes are very fast—they encode at 100-200 MB/s, and decode at 400-1000 MB/s on modern machines."
> The goal of this patch is to integrate a similar compression scheme in HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2655) 2-pass compression support

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874603#action_12874603 ] 

stack commented on HBASE-2655:
------------------------------

@Pirroh Ignore my questions in previous issue.  I see 'BMZ' here.

So, if bmz is not available, we'll just keep throwing runtime exceptions?

I wonder if instead we should fall back to default no-compression w/ a warn that bmz is missing.

Good stuff.

> 2-pass compression support
> --------------------------
>
>                 Key: HBASE-2655
>                 URL: https://issues.apache.org/jira/browse/HBASE-2655
>             Project: HBase
>          Issue Type: New Feature
>          Components: io
>            Reporter: Michele Catasta
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2655.diff
>
>
> Quoting from BigTable paper: "Many clients use a two-pass custom compression scheme. The first pass uses Bentley and McIlroy's scheme, which compresses long common strings across a large window. The second pass uses a fast compression algorithm that looks for repetitions in a small 16 KB window of the data. Both compression passes are very fast—they encode at 100-200 MB/s, and decode at 400-1000 MB/s on modern machines."
> The goal of this patch is to integrate a similar compression scheme in HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2655) 2-pass compression support

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2655:
-------------------------

    Fix Version/s:     (was: 0.90.0)

Moving out of 0.90.

> 2-pass compression support
> --------------------------
>
>                 Key: HBASE-2655
>                 URL: https://issues.apache.org/jira/browse/HBASE-2655
>             Project: HBase
>          Issue Type: New Feature
>          Components: io
>            Reporter: Michele Catasta
>            Priority: Minor
>         Attachments: HBASE-2655.patch
>
>
> Quoting from BigTable paper: "Many clients use a two-pass custom compression scheme. The first pass uses Bentley and McIlroy's scheme, which compresses long common strings across a large window. The second pass uses a fast compression algorithm that looks for repetitions in a small 16 KB window of the data. Both compression passes are very fast—they encode at 100-200 MB/s, and decode at 400-1000 MB/s on modern machines."
> The goal of this patch is to integrate a similar compression scheme in HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2655) 2-pass compression support

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874622#action_12874622 ] 

stack commented on HBASE-2655:
------------------------------

@Michele Understood.  Fellas have complained about the way broke lzo manifests itself.  HBase will actually take on writes.  Its only when it goes to flush that it drops the edits and in a way that is essentially hidden to the client -- exceptions are thrown in the regionserver log.  So, i'd say, make another issue if you don't mind but its not for you to fix, not unless you are inclined.  It'd be about better user experience around choosing a compression that is not supported or not properly installed. 

You know of this page in wiki? http://wiki.apache.org/hadoop/UsingLzoCompression? You might want to add a note on end pointing at your new fancy stuff.  You might even change the pointer over in the wiki home page to include BMdiff.

Good stuff.

> 2-pass compression support
> --------------------------
>
>                 Key: HBASE-2655
>                 URL: https://issues.apache.org/jira/browse/HBASE-2655
>             Project: HBase
>          Issue Type: New Feature
>          Components: io
>            Reporter: Michele (@pirroh) Catasta
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2655.diff
>
>
> Quoting from BigTable paper: "Many clients use a two-pass custom compression scheme. The first pass uses Bentley and McIlroy's scheme, which compresses long common strings across a large window. The second pass uses a fast compression algorithm that looks for repetitions in a small 16 KB window of the data. Both compression passes are very fast—they encode at 100-200 MB/s, and decode at 400-1000 MB/s on modern machines."
> The goal of this patch is to integrate a similar compression scheme in HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.