You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jörg Wagner <jo...@1und1.de> on 2015/09/01 09:33:58 UTC

Compression and MirrorMaker

Hello!

I was looking into compression for some WAN mirroring and did some 
tcpdumping.

Our producer is using snappy and I think I can see that in the network 
traffic. While the outer part of each message is readable, the content 
looks compressed to me with mostly nonprintable characters. However upon 
consuming by the MirrorMaker complete message contents are readable in 
the dump. To clarify:

producer --(snappy)--> broker --(uncompressed, WAN)--> MirrorMaker(consumer)

 From what I read about kafka (end-to-end) compression that shouldn't 
even be possible? As far as I can see compression can only be configured 
at the producer.
How can I be sure especially the communication via WAN is compressed?
Does anybody have experience mirroring via WAN and can give some hints 
on the MM configuration as well as OS Network tuning regarding kafka?

Also the option shallow.iterator.enable does not seem to be documented. 
Regarding the above, could that help (although I don't suppose so, 
because the messages already have to leave the broker compressed)?

Cheers
Jörg

Re: Compression and MirrorMaker

Posted by Lance Laursen <ll...@rubiconproject.com>.

Hi Jörg,

You are correct. The producer is wrapping up a batch of messages into one,
compressing that one message and slapping a magic byte flag and compression
type "snappy" on it, and then sending that single compressed message to
your brokers. They hang out on your brokers in compressed format.

On the other end, consumers (including mirrormaker) will automatically
decompress compressed messagesets once it receives them, and spit the
contents out as individual messages. RE: your diagram, the messages exist
compressed on the broker, leave the broker compressed, get to their
destination consumer process and only then are they decompressed.

In MirrorMaker's case, it will automatically decompress consumed messages
before (possibly re-compressing and) producing to the destination cluster.
This process is transparent, so that may be where your confusion lies.

Doing a "tcpdump -X src source.broker.i.p" at your mirrormaker destination
box should show some mangled message contents and SNAPPY in the ASCII dump
section of some of the output. Regarding tuning, general TCP tuning across
a WAN applies. If you're running anything more than a 1gbit wan with 30ms
or higher RTT, do a quick google on "bandwidth delay product" and follow
example tcp tuning guides. If you're not running any links which are out of
the ordinary, you likely won't have to do anything.

On Tue, Sep 1, 2015 at 12:33 AM, Jörg Wagner <jo...@1und1.de> wrote:

> Hello!
>
> I was looking into compression for some WAN mirroring and did some
> tcpdumping.
>
> Our producer is using snappy and I think I can see that in the network
> traffic. While the outer part of each message is readable, the content
> looks compressed to me with mostly nonprintable characters. However upon
> consuming by the MirrorMaker complete message contents are readable in the
> dump. To clarify:
>
> producer --(snappy)--> broker --(uncompressed, WAN)-->
> MirrorMaker(consumer)
>
> From what I read about kafka (end-to-end) compression that shouldn't even
> be possible? As far as I can see compression can only be configured at the
> producer.
> How can I be sure especially the communication via WAN is compressed?
> Does anybody have experience mirroring via WAN and can give some hints on
> the MM configuration as well as OS Network tuning regarding kafka?
>
> Also the option shallow.iterator.enable does not seem to be documented.
> Regarding the above, could that help (although I don't suppose so, because
> the messages already have to leave the broker compressed)?
>
> Cheers
> Jörg
>