You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dathan Pattishall <da...@gmail.com> on 2010/09/08 21:08:14 UTC

Using this I think I found/verified an interesting problem with Cassandra

This link describe ganglia / cassandra graphing.

http://mysqldba.blogspot.com/2010/09/cassandra-and-ganglia.html

I ran into a problem illustrated here.

http://www.flickr.com/photos/dathan/4971255111/

This screen shot shows a huge spike of transport exceptions between the
hours of 12:15 - to 1:30. Why? Lets see.

http://www.flickr.com/photos/dathan/4971869002/

This link shows that the pending reads jump because the message
deserialization pool (mutex) blocks or maybe its viceversa. But Why? Lets
see.

This link shows that wait_io on the box sky rocketed.

http://www.flickr.com/photos/dathan/4971290101/

but why?

Could it be because

http://www.flickr.com/photos/dathan/4971869054/

This graph shows a massive amount of data growth for this server, then it
reduces but why? How can I tune it so that a growth of data doesn't explode
like this?



Some background information:

These servers are DELL 2950 dual quad core boxes with 48GB of Ram on a
RAID-10 EXT3 FS backed by 8 disks on a PERC-6 Controller with BBC. Each
server rougly recieves 300-400 requests per second fronted by a F5
Loadbalancer (soon to be HA-Proxy) on least connections, doing a client stat
check to verify the server is up from a client point of view.

There is only one simple key space. A Super Column is defined but not used
and uses a RandomPartitioner with NO RowCaching and mmap enabled.

Re: Using this I think I found/verified an interesting problem with Cassandra

Posted by Dathan Pattishall <da...@gmail.com>.

Ah thanks Jonathan, this is yet again a great explanation to get me started.
Will do some digging. Thanks allot!

On Wed, Sep 8, 2010 at 12:30 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> it looks to me like you are describing "compaction causes a lot of
> i/o."  see http://wiki.apache.org/cassandra/MemtableSSTable#Compaction
>
> things you can do about the extra i/o:
>  - increase memtable sizes (if you haven't done this yet you should,
> by 10x or so)
>  - reduce compaction priority (see
> http://www.riptano.com/blog/cassandra-annotated-changelog-063)
>  - enable the dynamic snitch (see
> http://www.riptano.com/blog/whats-new-cassandra-065) so other nodes
> can route around one that is slow during a compaction
>
> you can't make it stop using extra space during the compaction itself,
> that is part of the design (and part of the price you pay for not
> doing random i/o at insert time).
>
> On Wed, Sep 8, 2010 at 2:08 PM, Dathan Pattishall <da...@gmail.com>
> wrote:
> > This link describe ganglia / cassandra graphing.
> >
> > http://mysqldba.blogspot.com/2010/09/cassandra-and-ganglia.html
> >
> > I ran into a problem illustrated here.
> >
> > http://www.flickr.com/photos/dathan/4971255111/
> >
> > This screen shot shows a huge spike of transport exceptions between the
> > hours of 12:15 - to 1:30. Why? Lets see.
> >
> > http://www.flickr.com/photos/dathan/4971869002/
> >
> > This link shows that the pending reads jump because the message
> > deserialization pool (mutex) blocks or maybe its viceversa. But Why? Lets
> > see.
> >
> > This link shows that wait_io on the box sky rocketed.
> >
> > http://www.flickr.com/photos/dathan/4971290101/
> >
> > but why?
> >
> > Could it be because
> >
> > http://www.flickr.com/photos/dathan/4971869054/
> >
> > This graph shows a massive amount of data growth for this server, then it
> > reduces but why? How can I tune it so that a growth of data doesn't
> explode
> > like this?
> >
> >
> >
> > Some background information:
> >
> > These servers are DELL 2950 dual quad core boxes with 48GB of Ram on a
> > RAID-10 EXT3 FS backed by 8 disks on a PERC-6 Controller with BBC. Each
> > server rougly recieves 300-400 requests per second fronted by a F5
> > Loadbalancer (soon to be HA-Proxy) on least connections, doing a client
> stat
> > check to verify the server is up from a client point of view.
> >
> > There is only one simple key space. A Super Column is defined but not
> used
> > and uses a RandomPartitioner with NO RowCaching and mmap enabled.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Using this I think I found/verified an interesting problem with Cassandra

Posted by Jonathan Ellis <jb...@gmail.com>.

it looks to me like you are describing "compaction causes a lot of
i/o."  see http://wiki.apache.org/cassandra/MemtableSSTable#Compaction

things you can do about the extra i/o:
 - increase memtable sizes (if you haven't done this yet you should,
by 10x or so)
 - reduce compaction priority (see
http://www.riptano.com/blog/cassandra-annotated-changelog-063)
 - enable the dynamic snitch (see
http://www.riptano.com/blog/whats-new-cassandra-065) so other nodes
can route around one that is slow during a compaction

you can't make it stop using extra space during the compaction itself,
that is part of the design (and part of the price you pay for not
doing random i/o at insert time).

On Wed, Sep 8, 2010 at 2:08 PM, Dathan Pattishall <da...@gmail.com> wrote:
> This link describe ganglia / cassandra graphing.
>
> http://mysqldba.blogspot.com/2010/09/cassandra-and-ganglia.html
>
> I ran into a problem illustrated here.
>
> http://www.flickr.com/photos/dathan/4971255111/
>
> This screen shot shows a huge spike of transport exceptions between the
> hours of 12:15 - to 1:30. Why? Lets see.
>
> http://www.flickr.com/photos/dathan/4971869002/
>
> This link shows that the pending reads jump because the message
> deserialization pool (mutex) blocks or maybe its viceversa. But Why? Lets
> see.
>
> This link shows that wait_io on the box sky rocketed.
>
> http://www.flickr.com/photos/dathan/4971290101/
>
> but why?
>
> Could it be because
>
> http://www.flickr.com/photos/dathan/4971869054/
>
> This graph shows a massive amount of data growth for this server, then it
> reduces but why? How can I tune it so that a growth of data doesn't explode
> like this?
>
>
>
> Some background information:
>
> These servers are DELL 2950 dual quad core boxes with 48GB of Ram on a
> RAID-10 EXT3 FS backed by 8 disks on a PERC-6 Controller with BBC. Each
> server rougly recieves 300-400 requests per second fronted by a F5
> Loadbalancer (soon to be HA-Proxy) on least connections, doing a client stat
> check to verify the server is up from a client point of view.
>
> There is only one simple key space. A Super Column is defined but not used
> and uses a RandomPartitioner with NO RowCaching and mmap enabled.
>
>
>
>
>
>
>
>
>
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com