You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joe Stein (Created) (JIRA)" <ji...@apache.org> on 2011/11/05 19:04:51 UTC

[jira] [Created] (KAFKA-187) Add Snappy Compression as a Codec

Add Snappy Compression as a Codec
---------------------------------

                 Key: KAFKA-187
                 URL: https://issues.apache.org/jira/browse/KAFKA-187
             Project: Kafka
          Issue Type: Improvement
            Reporter: Joe Stein


My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by Jun Rao <ju...@gmail.com>.
Jefferey,

There is already a wiki on Kafka compression. Feel free to extend it.

Thanks,

Jun

On Sun, Nov 13, 2011 at 10:58 AM, Jeffrey Damick <je...@gmail.com>wrote:

> Yes, I don't disagree with need or feasibility of gzip and snappy, as we're
> both agreeing a client spec is really what is lacking.  How can I help?  I
> would think even just documenting the protocol on the wiki be a good start
> (that would have helped me on the go client).
>
>
>
>
> On Sat, Nov 12, 2011 at 4:05 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Hi Jeffrey,
> >
> > What you are saying makes sense. I agree that we need to give a client
> spec
> > which is language agnostic. Currently I think we have reasonable support
> > for non-java producers--they are easy to write and work just as well as
> > java. We do not have good support for non-java consumers because the
> > co-ordination algorithm is done client side which makes the consumer
> > implementation complex. This is discussed a little here:
> > https://issues.apache.org/jira/browse/KAFKA-167
> >
> > I think with regard to compression we don't want to support gobs of
> > compression algorithms, but we do want to give a few basic options. We
> > discussed this a lot when we were originally designing Kafka, here was
> the
> > thinking. Compression can be done in a couple ways. It could be internal
> to
> > the message and purely a contract between the producer and consumer or it
> > could be something handled only on the broker with messages compressed by
> > the broker and decompressed when fetched. Here is what we came up with:
> >
> >   1. We want end-to-end compression. That is, the compression should be
> >   carried through the producer network hop, should be written compressed
> to
> >   disk, and should be fetched without needing decompression.
> >   2. We want compression to be explicitly supported in the message/log
> >   format to enable "block" compression that compresses batches of
> messages.
> >   The reason for this is that this is much more effective then
> > single-message
> >   compression, especially for a stream where all messages share common
> >   fields. This is very common for many use cases.
> >
> > This means that compression does need to be something the client is aware
> > of. For the codecs to support, we discussed this as well. We have only a
> > single byte for the compression codec, which means we can't support an
> > unbounded number of codecs and the support is in Kafka and is not meant
> to
> > be user-pluggable. The reason for this is that we didn't feel that
> plugging
> > in all possible algorithms really added any value. Instead we wanted to
> > support a couple of useful CPU vs size trade-offs:
> >
> >   1. No Compression: This requires the least CPU (maybe) and has the
> >   largest data size.
> >   2. GZIP: This has pretty good size but is very CPU intensive. This is
> >   appropriate for a lot of LinkedIn's uses where data is being
> transferred
> >   between datacenters and production comes from a very large number of
> >   producer processes and hence data size is much more important than CPU
> >   usage.
> >   3. LZO or Snappy are a nice intermediate between these extremes--good
> >   but not great compression with low CPU usage. We had thought of doing
> > LZO,
> >   but snappy seems to be slightly better.
> >
> > At this point I don't see much use in adding additional compression types
> > since there aren't many more useful spots on the CPU/size tradeoff curve.
> >
> > Because of the style of implementation each compression type does require
> > support from both the producer and the consumers in each language.
> However
> > lacking a compression type in one language is not a big impediment. If a
> > given language doesn't support it, users of that client can just not use
> > that compression type.
> >
> > My understanding is that snappy is available as fairly portable C so
> should
> > be reasonable to embed in most common languages.
> >
> > Does that sound reasonable?
> >
> > -Jay
> >
> > On Sat, Nov 12, 2011 at 11:24 AM, Jeffrey Damick <
> jeffreydamick@gmail.com
> > >wrote:
> >
> > > RIght, but on the other hand if every compression under the sun is
> > allowed,
> > > then you could end up with a very fractured client community of
> support.
> > >
> > > I guess I'd like to see a client RFC of sorts, but maybe I'm the only
> one
> > > that cares about alternative language support... :)
> > >
> > >
> > >
> > > On Fri, Nov 11, 2011 at 2:22 PM, Chris Burroughs
> > > <ch...@gmail.com>wrote:
> > >
> > > > On 11/11/2011 01:55 PM, Jeffrey Damick wrote:
> > > > > So with regard to the
> > > > > KAFKA-187<https://issues.apache.org/jira/browse/KAFKA-187> what
> > > > > is the stance going to be on supporting new compression methods?
>  Is
> > it
> > > > > expected that all clients 'must' & will support them?  If not, is
> > > there a
> > > > > set of 'required' compression codecs?  Jun mentioned not wanting
> > every
> > > > > language to re-implement a thick client, but where is the line
> > between
> > > > > thick and thin?  It seems like there needs be a clear set of
> > > expectations
> > > > > for what a client implements, regardless of language or platform,
> or
> > > > maybe
> > > > > I'm off in the weeds..
> > > > >
> > > >
> > > > I think realistically if we try to say that we can only include
> > > > compression codecs that every client language supports our only codec
> > > > will be gzip (or maybe bzip2, but that's ill suited for most uses
> > cases).
> > > >
> > >
> >
>

Re: [jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by Jeffrey Damick <je...@gmail.com>.
Yes, I don't disagree with need or feasibility of gzip and snappy, as we're
both agreeing a client spec is really what is lacking.  How can I help?  I
would think even just documenting the protocol on the wiki be a good start
(that would have helped me on the go client).




On Sat, Nov 12, 2011 at 4:05 PM, Jay Kreps <ja...@gmail.com> wrote:

> Hi Jeffrey,
>
> What you are saying makes sense. I agree that we need to give a client spec
> which is language agnostic. Currently I think we have reasonable support
> for non-java producers--they are easy to write and work just as well as
> java. We do not have good support for non-java consumers because the
> co-ordination algorithm is done client side which makes the consumer
> implementation complex. This is discussed a little here:
> https://issues.apache.org/jira/browse/KAFKA-167
>
> I think with regard to compression we don't want to support gobs of
> compression algorithms, but we do want to give a few basic options. We
> discussed this a lot when we were originally designing Kafka, here was the
> thinking. Compression can be done in a couple ways. It could be internal to
> the message and purely a contract between the producer and consumer or it
> could be something handled only on the broker with messages compressed by
> the broker and decompressed when fetched. Here is what we came up with:
>
>   1. We want end-to-end compression. That is, the compression should be
>   carried through the producer network hop, should be written compressed to
>   disk, and should be fetched without needing decompression.
>   2. We want compression to be explicitly supported in the message/log
>   format to enable "block" compression that compresses batches of messages.
>   The reason for this is that this is much more effective then
> single-message
>   compression, especially for a stream where all messages share common
>   fields. This is very common for many use cases.
>
> This means that compression does need to be something the client is aware
> of. For the codecs to support, we discussed this as well. We have only a
> single byte for the compression codec, which means we can't support an
> unbounded number of codecs and the support is in Kafka and is not meant to
> be user-pluggable. The reason for this is that we didn't feel that plugging
> in all possible algorithms really added any value. Instead we wanted to
> support a couple of useful CPU vs size trade-offs:
>
>   1. No Compression: This requires the least CPU (maybe) and has the
>   largest data size.
>   2. GZIP: This has pretty good size but is very CPU intensive. This is
>   appropriate for a lot of LinkedIn's uses where data is being transferred
>   between datacenters and production comes from a very large number of
>   producer processes and hence data size is much more important than CPU
>   usage.
>   3. LZO or Snappy are a nice intermediate between these extremes--good
>   but not great compression with low CPU usage. We had thought of doing
> LZO,
>   but snappy seems to be slightly better.
>
> At this point I don't see much use in adding additional compression types
> since there aren't many more useful spots on the CPU/size tradeoff curve.
>
> Because of the style of implementation each compression type does require
> support from both the producer and the consumers in each language. However
> lacking a compression type in one language is not a big impediment. If a
> given language doesn't support it, users of that client can just not use
> that compression type.
>
> My understanding is that snappy is available as fairly portable C so should
> be reasonable to embed in most common languages.
>
> Does that sound reasonable?
>
> -Jay
>
> On Sat, Nov 12, 2011 at 11:24 AM, Jeffrey Damick <jeffreydamick@gmail.com
> >wrote:
>
> > RIght, but on the other hand if every compression under the sun is
> allowed,
> > then you could end up with a very fractured client community of support.
> >
> > I guess I'd like to see a client RFC of sorts, but maybe I'm the only one
> > that cares about alternative language support... :)
> >
> >
> >
> > On Fri, Nov 11, 2011 at 2:22 PM, Chris Burroughs
> > <ch...@gmail.com>wrote:
> >
> > > On 11/11/2011 01:55 PM, Jeffrey Damick wrote:
> > > > So with regard to the
> > > > KAFKA-187<https://issues.apache.org/jira/browse/KAFKA-187> what
> > > > is the stance going to be on supporting new compression methods?  Is
> it
> > > > expected that all clients 'must' & will support them?  If not, is
> > there a
> > > > set of 'required' compression codecs?  Jun mentioned not wanting
> every
> > > > language to re-implement a thick client, but where is the line
> between
> > > > thick and thin?  It seems like there needs be a clear set of
> > expectations
> > > > for what a client implements, regardless of language or platform, or
> > > maybe
> > > > I'm off in the weeds..
> > > >
> > >
> > > I think realistically if we try to say that we can only include
> > > compression codecs that every client language supports our only codec
> > > will be gzip (or maybe bzip2, but that's ill suited for most uses
> cases).
> > >
> >
>

Re: [jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by Jay Kreps <ja...@gmail.com>.
Hi Jeffrey,

What you are saying makes sense. I agree that we need to give a client spec
which is language agnostic. Currently I think we have reasonable support
for non-java producers--they are easy to write and work just as well as
java. We do not have good support for non-java consumers because the
co-ordination algorithm is done client side which makes the consumer
implementation complex. This is discussed a little here:
https://issues.apache.org/jira/browse/KAFKA-167

I think with regard to compression we don't want to support gobs of
compression algorithms, but we do want to give a few basic options. We
discussed this a lot when we were originally designing Kafka, here was the
thinking. Compression can be done in a couple ways. It could be internal to
the message and purely a contract between the producer and consumer or it
could be something handled only on the broker with messages compressed by
the broker and decompressed when fetched. Here is what we came up with:

   1. We want end-to-end compression. That is, the compression should be
   carried through the producer network hop, should be written compressed to
   disk, and should be fetched without needing decompression.
   2. We want compression to be explicitly supported in the message/log
   format to enable "block" compression that compresses batches of messages.
   The reason for this is that this is much more effective then single-message
   compression, especially for a stream where all messages share common
   fields. This is very common for many use cases.

This means that compression does need to be something the client is aware
of. For the codecs to support, we discussed this as well. We have only a
single byte for the compression codec, which means we can't support an
unbounded number of codecs and the support is in Kafka and is not meant to
be user-pluggable. The reason for this is that we didn't feel that plugging
in all possible algorithms really added any value. Instead we wanted to
support a couple of useful CPU vs size trade-offs:

   1. No Compression: This requires the least CPU (maybe) and has the
   largest data size.
   2. GZIP: This has pretty good size but is very CPU intensive. This is
   appropriate for a lot of LinkedIn's uses where data is being transferred
   between datacenters and production comes from a very large number of
   producer processes and hence data size is much more important than CPU
   usage.
   3. LZO or Snappy are a nice intermediate between these extremes--good
   but not great compression with low CPU usage. We had thought of doing LZO,
   but snappy seems to be slightly better.

At this point I don't see much use in adding additional compression types
since there aren't many more useful spots on the CPU/size tradeoff curve.

Because of the style of implementation each compression type does require
support from both the producer and the consumers in each language. However
lacking a compression type in one language is not a big impediment. If a
given language doesn't support it, users of that client can just not use
that compression type.

My understanding is that snappy is available as fairly portable C so should
be reasonable to embed in most common languages.

Does that sound reasonable?

-Jay

On Sat, Nov 12, 2011 at 11:24 AM, Jeffrey Damick <je...@gmail.com>wrote:

> RIght, but on the other hand if every compression under the sun is allowed,
> then you could end up with a very fractured client community of support.
>
> I guess I'd like to see a client RFC of sorts, but maybe I'm the only one
> that cares about alternative language support... :)
>
>
>
> On Fri, Nov 11, 2011 at 2:22 PM, Chris Burroughs
> <ch...@gmail.com>wrote:
>
> > On 11/11/2011 01:55 PM, Jeffrey Damick wrote:
> > > So with regard to the
> > > KAFKA-187<https://issues.apache.org/jira/browse/KAFKA-187> what
> > > is the stance going to be on supporting new compression methods?  Is it
> > > expected that all clients 'must' & will support them?  If not, is
> there a
> > > set of 'required' compression codecs?  Jun mentioned not wanting every
> > > language to re-implement a thick client, but where is the line between
> > > thick and thin?  It seems like there needs be a clear set of
> expectations
> > > for what a client implements, regardless of language or platform, or
> > maybe
> > > I'm off in the weeds..
> > >
> >
> > I think realistically if we try to say that we can only include
> > compression codecs that every client language supports our only codec
> > will be gzip (or maybe bzip2, but that's ill suited for most uses cases).
> >
>

Re: [jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by Jeffrey Damick <je...@gmail.com>.
RIght, but on the other hand if every compression under the sun is allowed,
then you could end up with a very fractured client community of support.

I guess I'd like to see a client RFC of sorts, but maybe I'm the only one
that cares about alternative language support... :)



On Fri, Nov 11, 2011 at 2:22 PM, Chris Burroughs
<ch...@gmail.com>wrote:

> On 11/11/2011 01:55 PM, Jeffrey Damick wrote:
> > So with regard to the
> > KAFKA-187<https://issues.apache.org/jira/browse/KAFKA-187> what
> > is the stance going to be on supporting new compression methods?  Is it
> > expected that all clients 'must' & will support them?  If not, is there a
> > set of 'required' compression codecs?  Jun mentioned not wanting every
> > language to re-implement a thick client, but where is the line between
> > thick and thin?  It seems like there needs be a clear set of expectations
> > for what a client implements, regardless of language or platform, or
> maybe
> > I'm off in the weeds..
> >
>
> I think realistically if we try to say that we can only include
> compression codecs that every client language supports our only codec
> will be gzip (or maybe bzip2, but that's ill suited for most uses cases).
>

Re: [jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by Chris Burroughs <ch...@gmail.com>.
On 11/11/2011 01:55 PM, Jeffrey Damick wrote:
> So with regard to the
> KAFKA-187<https://issues.apache.org/jira/browse/KAFKA-187> what
> is the stance going to be on supporting new compression methods?  Is it
> expected that all clients 'must' & will support them?  If not, is there a
> set of 'required' compression codecs?  Jun mentioned not wanting every
> language to re-implement a thick client, but where is the line between
> thick and thin?  It seems like there needs be a clear set of expectations
> for what a client implements, regardless of language or platform, or maybe
> I'm off in the weeds..
> 

I think realistically if we try to say that we can only include
compression codecs that every client language supports our only codec
will be gzip (or maybe bzip2, but that's ill suited for most uses cases).

Re: [jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by Jeffrey Damick <je...@gmail.com>.
So with regard to the
KAFKA-187<https://issues.apache.org/jira/browse/KAFKA-187> what
is the stance going to be on supporting new compression methods?  Is it
expected that all clients 'must' & will support them?  If not, is there a
set of 'required' compression codecs?  Jun mentioned not wanting every
language to re-implement a thick client, but where is the line between
thick and thin?  It seems like there needs be a clear set of expectations
for what a client implements, regardless of language or platform, or maybe
I'm off in the weeds..

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Jun Rao (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145659#comment-13145659 ] 

Jun Rao commented on KAFKA-187:
-------------------------------

It would be good if the snappy jar is only a compile time dependency, but not a runtime dependency. This way, people not using snappy doesn't have to include the jar. Could we verify this?
                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Jun Rao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-187:
--------------------------

    Attachment: KAFKA-187_v3.patch

Patch v2 looks good to me. Attaching Patch v3 with minor changes in CompressionCodec to avoid duplicating constants. If nobody objects, I will commit this patch later today.
                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: KAFKA-187_v3.patch, kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Jeffrey Damick (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148278#comment-13148278 ] 

Jeffrey Damick commented on KAFKA-187:
--------------------------------------

Right I agree it overlaps that thread, but is compression support a thick or thin attribute? Maybe we should move this to the mailing list?
                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144946#comment-13144946 ] 

Neha Narkhede commented on KAFKA-187:
-------------------------------------

1. There is quite a lot of overlap in the code for the GZIP and Snappy codec in CompressionUtils. Wonder if you were up for refactoring it so that they use the same code path ?
2. One thing to think about is whether Snappy should be a compile and run time dependency in the core kafka project. Especially since, GZIP will be default and Snappy will only be used if it is explicitly configured. I wonder if there is any way of defining optional run time dependencies ?
3. I think we will have to wait for the unit test to get fixed before accepting this patch. Have you tried running the system and performance tests yet ? 
                
> Add Snappy Compression as a Codec
> ---------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec

Posted by "Joe Stein (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe Stein updated KAFKA-187:
----------------------------

    Attachment: kafka-187.patch
    
> Add Snappy Compression as a Codec
> ---------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Joe Stein (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145770#comment-13145770 ] 

Joe Stein edited comment on KAFKA-187 at 11/7/11 8:28 PM:
----------------------------------------------------------

Yes, this patch allows for that.  Here is how to verify.

apply the patch
./sbt update
./sbt package

then remove the jar

rm -f core/lib_managed/scala_2.8.0/compile/snappy-java-1.0.4.1.jar

then launch up

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-producer-shell.sh --props config/producer.properties --topic test
bin/kafka-consumer-shell.sh --topic test --props config/consumer.properties

send messages, things are good

shutdown down bin/kafka-producer-shell.sh --props config/producer.properties --topic test

then in config/producer.properties change the codec to 1

startup  bin/kafka-producer-shell.sh --props config/producer.properties --topic test

send messages, things are good to go

shutdown  bin/kafka-producer-shell.sh --props config/producer.properties --topic test

then  in config/producer.properties change the codec to 2

startup  bin/kafka-producer-shell.sh --props config/producer.properties --topic test

starts up fine, then try to send a message.....

Exception in thread "main" java.lang.NoClassDefFoundError: org/xerial/snappy/SnappyOutputStream
	at kafka.message.SnappyCompression.<init>(CompressionUtils.scala:61)
	at kafka.message.CompressionFactory$.apply(CompressionUtils.scala:82)
	at kafka.message.CompressionUtils$.compress(CompressionUtils.scala:111)
	at kafka.message.MessageSet$.createByteBuffer(MessageSet.scala:71)
	at kafka.message.ByteBufferMessageSet.<init>(ByteBufferMessageSet.scala:45)
	at kafka.producer.ProducerPool$$anonfun$send$1$$anonfun$3.apply(ProducerPool.scala:108)
	at kafka.producer.ProducerPool$$anonfun$send$1$$anonfun$3.apply(ProducerPool.scala:107)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
	at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:43)
	at kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:107)
	at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:102)
	at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:102)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
	at kafka.producer.ProducerPool.send(ProducerPool.scala:102)
	at kafka.producer.Producer.configSend(Producer.scala:167)
	at kafka.producer.Producer.send(Producer.scala:106)
	at kafka.tools.ProducerShell$.main(ProducerShell.scala:68)
	at kafka.tools.ProducerShell.main(ProducerShell.scala)
Caused by: java.lang.ClassNotFoundException: org.xerial.snappy.SnappyOutputStream
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
	... 23 more

                
      was (Author: joestein):
    Yes, this patch allows for that.  Here is how to verify.

apply the patch
./sbt package

then remove the jar

rm -f core/lib_managed/scala_2.8.0/compile/snappy-java-1.0.4.1.jar

then launch up

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-producer-shell.sh --props config/producer.properties --topic test
bin/kafka-consumer-shell.sh --topic test --props config/consumer.properties

send messages, things are good

shutdown down bin/kafka-producer-shell.sh --props config/producer.properties --topic test

then in config/producer.properties change the codec to 1

startup  bin/kafka-producer-shell.sh --props config/producer.properties --topic test

send messages, things are good to go

shutdown  bin/kafka-producer-shell.sh --props config/producer.properties --topic test

then  in config/producer.properties change the code to 2

startup  bin/kafka-producer-shell.sh --props config/producer.properties --topic test

starts up fine, then try to send a message.....

Exception in thread "main" java.lang.NoClassDefFoundError: org/xerial/snappy/SnappyOutputStream
	at kafka.message.SnappyCompression.<init>(CompressionUtils.scala:61)
	at kafka.message.CompressionFactory$.apply(CompressionUtils.scala:82)
	at kafka.message.CompressionUtils$.compress(CompressionUtils.scala:111)
	at kafka.message.MessageSet$.createByteBuffer(MessageSet.scala:71)
	at kafka.message.ByteBufferMessageSet.<init>(ByteBufferMessageSet.scala:45)
	at kafka.producer.ProducerPool$$anonfun$send$1$$anonfun$3.apply(ProducerPool.scala:108)
	at kafka.producer.ProducerPool$$anonfun$send$1$$anonfun$3.apply(ProducerPool.scala:107)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
	at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:43)
	at kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:107)
	at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:102)
	at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:102)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
	at kafka.producer.ProducerPool.send(ProducerPool.scala:102)
	at kafka.producer.Producer.configSend(Producer.scala:167)
	at kafka.producer.Producer.send(Producer.scala:106)
	at kafka.tools.ProducerShell$.main(ProducerShell.scala:68)
	at kafka.tools.ProducerShell.main(ProducerShell.scala)
Caused by: java.lang.ClassNotFoundException: org.xerial.snappy.SnappyOutputStream
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
	... 23 more

                  
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Joe Stein (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe Stein updated KAFKA-187:
----------------------------

    Description: 
My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.

refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

  was:My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library

        Summary: Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec  (was: Add Snappy Compression as a Codec)
    
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec

Posted by "Joe Stein (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144950#comment-13144950 ] 

Joe Stein commented on KAFKA-187:
---------------------------------

{quote}

1. There is quite a lot of overlap in the code for the GZIP and Snappy codec in CompressionUtils. Wonder if you were up for refactoring it so that they use the same code path ? 

{quote}

Agreed, I am.  Should I open a new ticket for refactoring CompressionUtils or just part of this ticket?

{quote}

2. One thing to think about is whether Snappy should be a compile and run time dependency in the core kafka project. Especially since, GZIP will be default and Snappy will only be used if it is explicitly configured. I wonder if there is any way of defining optional run time dependencies ? 

{quote}

Yeah, we could so something at startup that changes the default behavior to be a specific codec.  Incorporating this into refactoring that class should not be a big deal where the default case will check an object instead of implementing gzip (like it does now) and depending on that object call either the gzip or snappy function i create (which will also get used by the case match).  same JIRA as this?  a new JIRA for refactoring and put this into that?  a third JIRA?

{quote}

3. I think we will have to wait for the unit test to get fixed before accepting this patch. Have you tried running the system and performance tests yet ?

{quote}

Sounds good, I should be able to chip away at that tomorrow or early this week.  I did try running the performance tests but ran into some errors.  it is possible something I was doing wrong so I want to go through and set it up again before sending email about that.  same, will try that in the next few days too.

                
> Add Snappy Compression as a Codec
> ---------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Jeffrey Damick (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147729#comment-13147729 ] 

Jeffrey Damick commented on KAFKA-187:
--------------------------------------

So, this raises a bigger question, how do clients signal that they can't handle codec 'xyz' ?  Is it up to the consumer or could / should the broker re-encode it? 
Granted this probably isn't an issue for snappy since there are appear to be several implementations - but in general is there a need for an 'accept' style header?  

My thought is that if you leave it up to the client then you could run into an issue where you have client A and client B, and neither support the same compression codecs so you are stuck with uncompressed..  

                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145074#comment-13145074 ] 

Neha Narkhede commented on KAFKA-187:
-------------------------------------

1. I think we might as well do the refactoring it as part of this patch. Copy pasting code doesn't seem like a good idea. 
2. Yes. Same JIRA as this one. We might as well think through all the issues with supporting multiple codecs cleanly now, rather than later.
3. I've updated the perf patch. Do you want to submit a patch for the unit test ?
                
> Add Snappy Compression as a Codec
> ---------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Joe Stein (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145770#comment-13145770 ] 

Joe Stein commented on KAFKA-187:
---------------------------------

Yes, this patch allows for that.  Here is how to verify.

apply the patch
./sbt package

then remove the jar

rm -f core/lib_managed/scala_2.8.0/compile/snappy-java-1.0.4.1.jar

then launch up

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-producer-shell.sh --props config/producer.properties --topic test
bin/kafka-consumer-shell.sh --topic test --props config/consumer.properties

send messages, things are good

shutdown down bin/kafka-producer-shell.sh --props config/producer.properties --topic test

then in config/producer.properties change the codec to 1

startup  bin/kafka-producer-shell.sh --props config/producer.properties --topic test

send messages, things are good to go

shutdown  bin/kafka-producer-shell.sh --props config/producer.properties --topic test

then  in config/producer.properties change the code to 2

startup  bin/kafka-producer-shell.sh --props config/producer.properties --topic test

starts up fine, then try to send a message.....

Exception in thread "main" java.lang.NoClassDefFoundError: org/xerial/snappy/SnappyOutputStream
	at kafka.message.SnappyCompression.<init>(CompressionUtils.scala:61)
	at kafka.message.CompressionFactory$.apply(CompressionUtils.scala:82)
	at kafka.message.CompressionUtils$.compress(CompressionUtils.scala:111)
	at kafka.message.MessageSet$.createByteBuffer(MessageSet.scala:71)
	at kafka.message.ByteBufferMessageSet.<init>(ByteBufferMessageSet.scala:45)
	at kafka.producer.ProducerPool$$anonfun$send$1$$anonfun$3.apply(ProducerPool.scala:108)
	at kafka.producer.ProducerPool$$anonfun$send$1$$anonfun$3.apply(ProducerPool.scala:107)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
	at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:43)
	at kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:107)
	at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:102)
	at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:102)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
	at kafka.producer.ProducerPool.send(ProducerPool.scala:102)
	at kafka.producer.Producer.configSend(Producer.scala:167)
	at kafka.producer.Producer.send(Producer.scala:106)
	at kafka.tools.ProducerShell$.main(ProducerShell.scala:68)
	at kafka.tools.ProducerShell.main(ProducerShell.scala)
Caused by: java.lang.ClassNotFoundException: org.xerial.snappy.SnappyOutputStream
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
	... 23 more

                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Jun Rao (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145987#comment-13145987 ] 

Jun Rao commented on KAFKA-187:
-------------------------------

Thanks, Joe. That's what I was looking for.
                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145083#comment-13145083 ] 

Neha Narkhede commented on KAFKA-187:
-------------------------------------

>> provide a way on startup to select what the default codec is instead of the default always gziping
 
Today, we have a config named "compression.codec" that picks the compression codec. Today compression.codec is a numeric value. It is 0 for no compression and 1 for GZIP. Now we are supporting multiple codecs. It makes sense for this to be a string value, which can be one of "none, gzip, snappy".  Default is still "none".

As part of 2., I was raising a different question. In the general case, should snappy always be a core dependency of Kafka or not ? I don't know the right answer here. Maybe we need to think more.
                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144770#comment-13144770 ] 

Neha Narkhede commented on KAFKA-187:
-------------------------------------

Leaving a comment here, as Joe didn't appear to be in the JIRA list. Assigning this JIRA to Joe.
                
> Add Snappy Compression as a Codec
> ---------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Jun Rao (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao resolved KAFKA-187.
---------------------------

       Resolution: Fixed
    Fix Version/s: 0.8
         Assignee: Joe Stein

Thanks, Joe. Just committed this.
                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>            Assignee: Joe Stein
>             Fix For: 0.8
>
>         Attachments: KAFKA-187_v3.patch, kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Jun Rao (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147866#comment-13147866 ] 

Jun Rao commented on KAFKA-187:
-------------------------------

This is not a problem for java/scala clients. But it could be a problem for non-java clients. I think this is tied to some of the discussions that we had on non-java language support (see "different language binding support" thread in http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201109.mbox/thread). Ideally, we'd rather each language not re-implement a thick client.
                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150115#comment-13150115 ] 

Neha Narkhede commented on KAFKA-187:
-------------------------------------

+1 on the latest patch. I like the changes.
                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: KAFKA-187_v3.patch, kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149908#comment-13149908 ] 

Neha Narkhede commented on KAFKA-187:
-------------------------------------

What is the difference between Joe's v2 patch and the v3 patch ? Can you please describe the changes when uploading a new patch ?
                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: KAFKA-187_v3.patch, kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Jun Rao (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150105#comment-13150105 ] 

Jun Rao commented on KAFKA-187:
-------------------------------

The following are the changes that I made. 

Index: core/src/main/scala/kafka/message/CompressionCodec.scala
===================================================================
--- core/src/main/scala/kafka/message/CompressionCodec.scala	(revision 1200967)
+++ core/src/main/scala/kafka/message/CompressionCodec.scala	(working copy)
@@ -20,8 +20,9 @@
 object CompressionCodec {
   def getCompressionCodec(codec: Int): CompressionCodec = {
     codec match {
-      case 0 => NoCompressionCodec
-      case 1 => GZIPCompressionCodec
+      case NoCompressionCodec.codec => NoCompressionCodec
+      case GZIPCompressionCodec.codec => GZIPCompressionCodec
+      case SnappyCompressionCodec.codec => SnappyCompressionCodec
       case _ => throw new kafka.common.UnknownCodecException("%d is an unknown compression codec".format(codec))
     }
   }

                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: KAFKA-187_v3.patch, kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144943#comment-13144943 ] 

Neha Narkhede commented on KAFKA-187:
-------------------------------------

Thanks for the patch Joe! A couple of comments
                
> Add Snappy Compression as a Codec
> ---------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-187) Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec

Posted by "Joe Stein (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe Stein updated KAFKA-187:
----------------------------

    Attachment: kafka-187.refactored.patch

re-factored CompressionUtil, added snappy compression and test case for its use.  I have not run the perf test to compare gzip vs snappy yet.   

                
> Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch, kafka-187.refactored.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library.
> refactor CompressionUtil for better code reuse and provide a way on startup to select what the default codec is instead of the default always gziping

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-187) Add Snappy Compression as a Codec

Posted by "Joe Stein (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145077#comment-13145077 ] 

Joe Stein commented on KAFKA-187:
---------------------------------

1) sounds good to me, I opened another JIRA but will comment there that it is dead and will just do the work here, not a problem.
2) sounds good
3) done KAFKA-192

the changes were simple enough for me to just redo them after refactoring as such
                
> Add Snappy Compression as a Codec
> ---------------------------------
>
>                 Key: KAFKA-187
>                 URL: https://issues.apache.org/jira/browse/KAFKA-187
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joe Stein
>         Attachments: kafka-187.patch
>
>
> My thoughts are a new trait CompressionDependencies for KafkaProject.scala, adding snappy as the first library

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira