You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/03/28 21:32:05 UTC

[jira] [Created] (CASSANDRA-2398) Type specific compression

Type specific compression
-------------------------

                 Key: CASSANDRA-2398
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
    Affects Versions: 0.8
            Reporter: Stu Hood


Cassandra has a lot of locations that are ripe for type specific compression. A short list:

Indexes
 * Keys compressed as BytesType, which could default to LZO/LZMA
 * Offsets (delta and varint encoding)
 * Column names added by 2319
Data
 * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc

A basic interface for type specific compression could be as simple as:
{code:java}
public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
{code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2398:
--------------------------------------

    Fix Version/s:     (was: 1.2)
    
> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>              Labels: compression
>         Attachments: 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt, 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt, compress-lzf-0.7.0.jar
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment: 0002-CASSANDRA-2398-LZF-for-AbstractType-base.txt
                0001-CASSANDRA-2398-Add-basic-type-specific-compression-to-.txt

# Switched the version parameter to an SSTable version, since it doesn't seem likely that we'd apply type specific compression too many other places
# Added ning-compress LZF to AbstractType.(de)compress: its API isn't very ByteBuffer friendly, but it is conveniently small and sufficiently fast to be a good default
# Implemented AbstractType.skip

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>         Attachments: 0001-CASSANDRA-2398-Add-basic-type-specific-compression-to-.txt, 0002-CASSANDRA-2398-LZF-for-AbstractType-base.txt
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment:     (was: 0001-CASSANDRA-2398-Add-basic-type-specific-compression-to-.txt)

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8 beta 1
>            Reporter: Stu Hood
>              Labels: compression
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Description: 
Cassandra has a lot of locations that are ripe for type specific compression. A short list:

Indexes
 * Keys compressed as BytesType, which could default to LZO/LZMA
 * Offsets (delta and varint encoding)
 * Column names added by 2319

Data
 * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc

A basic interface for type specific compression could be as simple as:
{code:java}
public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
public void skip(int version, DataInput from) throws IOException
{code} 

  was:
Cassandra has a lot of locations that are ripe for type specific compression. A short list:

Indexes
 * Keys compressed as BytesType, which could default to LZO/LZMA
 * Offsets (delta and varint encoding)
 * Column names added by 2319

Data
 * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc

A basic interface for type specific compression could be as simple as:
{code:java}
public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
public void skip(int version, DataInput from) throws IOException
{code} 


> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment:     (was: 0001-Add-basic-type-specific-compression-to-AbstractType-ne.txt)

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment:     (was: 0002-CASSANDRA-2398-LZF-for-AbstractType-base.txt)

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8 beta 1
>            Reporter: Stu Hood
>              Labels: compression
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Comment: was deleted

(was: Actually, one restriction we should put on the output blob is that it should start with a length for skipability.

EDIT: rather than requiring a length at the front, could just add skip(version, DataInput) method which parallels decompress, since some implementations (fixed length values) will be able to skip efficiently without storing a length.)

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>         Attachments: example.diff
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012251#comment-13012251 ] 

Stu Hood commented on CASSANDRA-2398:
-------------------------------------

Actually, one restriction we should put on the output blob is that it should start with a length for skipability.

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>         Attachments: example.diff
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Description: 
Cassandra has a lot of locations that are ripe for type specific compression. A short list:

Indexes
 * Keys compressed as BytesType, which could default to LZO/LZMA
 * Offsets (delta and varint encoding)
 * Column names added by 2319

Data
 * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc

A basic interface for type specific compression could be as simple as:
{code:java}
public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
{code} 

  was:
Cassandra has a lot of locations that are ripe for type specific compression. A short list:

Indexes
 * Keys compressed as BytesType, which could default to LZO/LZMA
 * Offsets (delta and varint encoding)
 * Column names added by 2319
Data
 * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc

A basic interface for type specific compression could be as simple as:
{code:java}
public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
{code} 


> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027198#comment-13027198 ] 

Stu Hood commented on CASSANDRA-2398:
-------------------------------------

Planning to update the signatures to:
{code:java}
public ByteBuffer compress(int version, final List<ByteBuffer> from, ByteBuffer to)
public void decompress(int version, ByteBuffer from, List<ByteBuffer> to)
{code}
Which would make skip a native operation, and remove a lot of copies and object creation.

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8 beta 1
>            Reporter: Stu Hood
>              Labels: compression
>         Attachments: 0001-CASSANDRA-2398-Add-basic-type-specific-compression-to-.txt, 0002-CASSANDRA-2398-LZF-for-AbstractType-base.txt
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012251#comment-13012251 ] 

Stu Hood edited comment on CASSANDRA-2398 at 3/29/11 12:31 AM:
---------------------------------------------------------------

Actually, one restriction we should put on the output blob is that it should start with a length for skipability.

EDIT: rather than requiring a length at the front, could just add skip(version, DataInput) method which parallels decompress, since some implementations (fixed length values) will be able to skip efficiently without storing a length.

      was (Author: stuhood):
    Actually, one restriction we should put on the output blob is that it should start with a length for skipability.
  
> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>         Attachments: example.diff
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment:     (was: 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt)

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt, 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt, compress-lzf-0.7.0.jar
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Description: 
Cassandra has a lot of locations that are ripe for type specific compression. A short list:

Indexes
 * Keys compressed as BytesType, which could default to LZO/LZMA
 * Offsets (delta and varint encoding)
 * Column names added by 2319

Data
 * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc

A basic interface for type specific compression could be as simple as:
{code:java}
public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
public void skip(int version, DataInput from) throws IOException
{code} 

  was:
Cassandra has a lot of locations that are ripe for type specific compression. A short list:

Indexes
 * Keys compressed as BytesType, which could default to LZO/LZMA
 * Offsets (delta and varint encoding)
 * Column names added by 2319

Data
 * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc

A basic interface for type specific compression could be as simple as:
{code:java}
public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
{code} 


> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>         Attachments: example.diff
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment: 0001-Add-basic-type-specific-compression-to-AbstractType-ne.txt

Implemented delta and varint encoding for LongType, and used that primitive to compress the lengths for the AbstractType implementation. I'd love to find a pure java LZ* implementation that we're happy with and call this finished.

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>         Attachments: 0001-Add-basic-type-specific-compression-to-AbstractType-ne.txt
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment: compress-lzf-0.7.0.jar

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt, 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt, compress-lzf-0.7.0.jar
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037698#comment-13037698 ] 

Stu Hood edited comment on CASSANDRA-2398 at 5/23/11 2:08 AM:
--------------------------------------------------------------

Modified the compression interface to deal with ByteBuffers, and added support for compression of CounterColumnType. The compression ratios for examples in the unit tests are:
{quote}
\# with CounterColumnType specific compression
2.745098 for 4 values (inbytes: 140, outbytes: 51)
4.7719297 for 4 values (inbytes: 272, outbytes: 57)
5.9710145 for 8 values (inbytes: 412, outbytes: 69)
5.415465 for 10000 values (inbytes: 350034, outbytes: 64636)

\# with generic LZF compression
2.5 for 4 values (inbytes: 140, outbytes: 56)
4.1846156 for 4 values (inbytes: 272, outbytes: 65)
4.2916665 for 8 values (inbytes: 412, outbytes: 96)
2.3148732 for 10000 values (inbytes: 349944, outbytes: 151172)
{quote}

      was (Author: stuhood):
    Modified the compression interface to deal with ByteBuffers, and added support for compression of CounterColumnType. The compression ratios for examples in the unit tests are:
{quote}
# with CounterColumnType specific compression
2.745098 for 4 values (inbytes: 140, outbytes: 51)
4.7719297 for 4 values (inbytes: 272, outbytes: 57)
5.9710145 for 8 values (inbytes: 412, outbytes: 69)
5.415465 for 10000 values (inbytes: 350034, outbytes: 64636)

# with generic LZF compression
2.5 for 4 values (inbytes: 140, outbytes: 56)
4.1846156 for 4 values (inbytes: 272, outbytes: 65)
4.2916665 for 8 values (inbytes: 412, outbytes: 96)
2.3148732 for 10000 values (inbytes: 349944, outbytes: 151172)
{quote}
  
> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt, 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment: 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt
                0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt

Rebased for trunk.

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt, 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt, compress-lzf-0.7.0.jar
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment: 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt
                0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt

Modified the compression interface to deal with ByteBuffers, and added support for compression of CounterColumnType. The compression ratios for examples in the unit tests are:
{quote}
# with CounterColumnType specific compression
2.745098 for 4 values (inbytes: 140, outbytes: 51)
4.7719297 for 4 values (inbytes: 272, outbytes: 57)
5.9710145 for 8 values (inbytes: 412, outbytes: 69)
5.415465 for 10000 values (inbytes: 350034, outbytes: 64636)

# with generic LZF compression
2.5 for 4 values (inbytes: 140, outbytes: 56)
4.1846156 for 4 values (inbytes: 272, outbytes: 65)
4.2916665 for 8 values (inbytes: 412, outbytes: 96)
2.3148732 for 10000 values (inbytes: 349944, outbytes: 151172)
{quote}

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt, 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment: example.diff

Attaching an example of noop compression.

The general idea is that AbstractType would implement LZO/LZMA by default, and subclasses could override in the future to add actual type specificity. The versioning in this example is MessagingService.version_, although we should discuss whether that is the right one.

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>         Attachments: example.diff
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, Iterator<ByteBuffer> from, int count, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment:     (was: example.diff)

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Stu Hood
>              Labels: compression
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2398) Type specific compression

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-2398:
--------------------------------

    Attachment:     (was: 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt)

> Type specific compression
> -------------------------
>
>                 Key: CASSANDRA-2398
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt, 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt, compress-lzf-0.7.0.jar
>
>
> Cassandra has a lot of locations that are ripe for type specific compression. A short list:
> Indexes
>  * Keys compressed as BytesType, which could default to LZO/LZMA
>  * Offsets (delta and varint encoding)
>  * Column names added by 2319
> Data
>  * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc
> A basic interface for type specific compression could be as simple as:
> {code:java}
> public void compress(int version, final List<ByteBuffer> from, DataOutput to) throws IOException
> public void decompress(int version, DataInput from, List<ByteBuffer> to) throws IOException
> public void skip(int version, DataInput from) throws IOException
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira