You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Wesley Chow <we...@chartbeat.com> on 2017/09/20 21:58:29 UTC

is a topic compressed?

I have a producer configured to snappy compress data sent to a cluster. Is
there some way to verify that the data indeed is being compressed? If I
peek at the .log files on the broker, I can read some plain text amongst
binary. Similarly, tcpdump shows plain text readable data. I do not know if
this is evidence that compression is not working, but is there a better way
to verify that compression is turned on?

Wes

Re: is a topic compressed?

Posted by Wesley Chow <we...@chartbeat.com>.
Excellent, the DumpLogSegment tool did the trick!

Wes

On Thu, Sep 21, 2017 at 4:32 AM, Manikumar <ma...@gmail.com>
wrote:

> you can try DumpLogSegments tools to verify messages from log files.  This
> will give compression type for each message.
> https://cwiki.apache.org/confluence/display/KAFKA/
> System+Tools#SystemTools-
> DumpLogSegment
>
> On Thu, Sep 21, 2017 at 1:38 PM, Vincent Dautremont <
> vincent.dautremont@olamobile.com.invalid> wrote:
>
> > Hi,
> > Snappy keeps a lot of parts in plain text :
> > look that example where only "pedia" is encoded/tokenized in the
> sentence.
> > https://en.wikipedia.org/wiki/Snappy_(compression)
> >
> > > Wikipedia is a free, web-based, collaborative, multilingual
> encyclopedia
> > > project.
> >
> >
> > your data is then probably compressed with snappy.
> >
> > Another try would be to change compression to other values (or remove
> > compression) and compare the tcp dump with the one you already have.
> >
> >
> > Vincent.
> >
> > On Wed, Sep 20, 2017 at 11:58 PM, Wesley Chow <we...@chartbeat.com> wrote:
> >
> > > I have a producer configured to snappy compress data sent to a cluster.
> > Is
> > > there some way to verify that the data indeed is being compressed? If I
> > > peek at the .log files on the broker, I can read some plain text
> amongst
> > > binary. Similarly, tcpdump shows plain text readable data. I do not
> know
> > if
> > > this is evidence that compression is not working, but is there a better
> > way
> > > to verify that compression is turned on?
> > >
> > > Wes
> > >
> >
> > --
> > The information transmitted is intended only for the person or entity to
> > which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipient is prohibited. If you received
> > this in error, please contact the sender and delete the material from any
> > computer.
> >
>

Re: is a topic compressed?

Posted by Manikumar <ma...@gmail.com>.
you can try DumpLogSegments tools to verify messages from log files.  This
will give compression type for each message.
https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-
DumpLogSegment

On Thu, Sep 21, 2017 at 1:38 PM, Vincent Dautremont <
vincent.dautremont@olamobile.com.invalid> wrote:

> Hi,
> Snappy keeps a lot of parts in plain text :
> look that example where only "pedia" is encoded/tokenized in the sentence.
> https://en.wikipedia.org/wiki/Snappy_(compression)
>
> > Wikipedia is a free, web-based, collaborative, multilingual encyclopedia
> > project.
>
>
> your data is then probably compressed with snappy.
>
> Another try would be to change compression to other values (or remove
> compression) and compare the tcp dump with the one you already have.
>
>
> Vincent.
>
> On Wed, Sep 20, 2017 at 11:58 PM, Wesley Chow <we...@chartbeat.com> wrote:
>
> > I have a producer configured to snappy compress data sent to a cluster.
> Is
> > there some way to verify that the data indeed is being compressed? If I
> > peek at the .log files on the broker, I can read some plain text amongst
> > binary. Similarly, tcpdump shows plain text readable data. I do not know
> if
> > this is evidence that compression is not working, but is there a better
> way
> > to verify that compression is turned on?
> >
> > Wes
> >
>
> --
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>

Re: is a topic compressed?

Posted by Vincent Dautremont <vi...@olamobile.com.INVALID>.
Hi,
Snappy keeps a lot of parts in plain text :
look that example where only "pedia" is encoded/tokenized in the sentence.
https://en.wikipedia.org/wiki/Snappy_(compression)

> Wikipedia is a free, web-based, collaborative, multilingual encyclopedia
> project.


your data is then probably compressed with snappy.

Another try would be to change compression to other values (or remove
compression) and compare the tcp dump with the one you already have.


Vincent.

On Wed, Sep 20, 2017 at 11:58 PM, Wesley Chow <we...@chartbeat.com> wrote:

> I have a producer configured to snappy compress data sent to a cluster. Is
> there some way to verify that the data indeed is being compressed? If I
> peek at the .log files on the broker, I can read some plain text amongst
> binary. Similarly, tcpdump shows plain text readable data. I do not know if
> this is evidence that compression is not working, but is there a better way
> to verify that compression is turned on?
>
> Wes
>

-- 
The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this in error, please contact the sender and delete the material from any 
computer.