You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Or Yanay <or...@peer39.com> on 2011/03/31 12:20:03 UTC

Requests stuck on production cluster

Hi all,

My production cluster reads got stuck.
The ring gives:

Address         Status State   Load            Owns    Token
                                                       146231632500721020374621781629360107476
10.39.21.7      Up     Normal  118.86 GB       18.15%  6968792681466807915334918525105891681
10.39.21.2      Up     Normal  170.37 GB       33.20%  63458945745812644657648926377562798568
10.39.21.4      Up     Normal  129.49 GB       2.09%   67020233994527804731783987345291668992
10.39.21.3      Up     Normal  118.57 GB       31.26%  120208618942813734646032022699594259441
10.39.21.6      Up     Normal  171.03 GB       15.29%  146231632500721020374621781629360107476

The 2% bit struck me as odd, so I ran tpstats on 10.39.21.4 and got:
Pool Name

Active

Pending

Completed

ReadStage

0

0

143370

RequestResponseStage

8

1231283

414467

MutationStage

0

0

1772203

ReadRepair

0

0

7678

GossipStage

0

0

204797

AntiEntropyStage

0

0

0

MigrationStage

0

0

0

MemtablePostFlusher

0

0

48

StreamStage

0

0

0

FlushWriter

0

0

48

FILEUTILS-DELETE-POOL

0

0

46

MiscStage

0

0

0

FlushSorter

0

0

0

InternalResponseStage

0

0

0

HintedHandoff

0

0

6


So... something got terribly wrong.
Can anyone suggest what should do next to fix this?

Thanks.
-Orr

Re: Requests stuck on production cluster

Posted by Jonathan Ellis <jb...@gmail.com>.
What's going on in the logs?  CPU?  i/o?

On Thu, Mar 31, 2011 at 4:20 AM, Or Yanay <or...@peer39.com> wrote:

> Hi all,
>
>
>
> My production cluster reads got stuck.
>
> The ring gives:
>
>
>
> Address         Status State   Load            Owns
> Token
>
>                                                        146231632500721020374621781629360107476
>
>
> 10.39.21.7      Up     Normal  118.86 GB       18.15%
> 6968792681466807915334918525105891681
>
> 10.39.21.2      Up     Normal  170.37 GB       33.20%
> 63458945745812644657648926377562798568
>
> 10.39.21.4      Up     Normal  129.49 GB       2.09%
> 67020233994527804731783987345291668992
>
> 10.39.21.3      Up     Normal  118.57 GB       31.26%
> 120208618942813734646032022699594259441
>
> 10.39.21.6      Up     Normal  171.03 GB       15.29%
> 146231632500721020374621781629360107476
>
>
>
> The 2% bit struck me as odd, so I ran tpstats on 10.39.21.4 and got:
>
> Pool Name
>
> Active
>
> Pending
>
> Completed
>
> ReadStage
>
> 0
>
> 0
>
> 143370
>
> *RequestResponseStage*
>
> *8*
>
> *1231283*
>
> *414467*
>
> MutationStage
>
> 0
>
> 0
>
> 1772203
>
> ReadRepair
>
> 0
>
> 0
>
> 7678
>
> GossipStage
>
> 0
>
> 0
>
> 204797
>
> AntiEntropyStage
>
> 0
>
> 0
>
> 0
>
> MigrationStage
>
> 0
>
> 0
>
> 0
>
> MemtablePostFlusher
>
> 0
>
> 0
>
> 48
>
> StreamStage
>
> 0
>
> 0
>
> 0
>
> FlushWriter
>
> 0
>
> 0
>
> 48
>
> FILEUTILS-DELETE-POOL
>
> 0
>
> 0
>
> 46
>
> MiscStage
>
> 0
>
> 0
>
> 0
>
> FlushSorter
>
> 0
>
> 0
>
> 0
>
> InternalResponseStage
>
> 0
>
> 0
>
> 0
>
> HintedHandoff
>
> 0
>
> 0
>
> 6
>
>
>
> So… something got terribly wrong.
>
> Can anyone suggest what should do next to fix this?
>
>
>
> Thanks.
>
> -Orr
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

RE: Requests stuck on production cluster

Posted by Or Yanay <or...@peer39.com>.
I am using Cassandra 0.7.0 and Random Partitioner.

From: Or Yanay [mailto:or@peer39.com]
Sent: Thursday, March 31, 2011 12:20 PM
To: user@cassandra.apache.org
Subject: Requests stuck on production cluster

Hi all,

My production cluster reads got stuck.
The ring gives:

Address         Status State   Load            Owns    Token
                                                       146231632500721020374621781629360107476
10.39.21.7      Up     Normal  118.86 GB       18.15%  6968792681466807915334918525105891681
10.39.21.2      Up     Normal  170.37 GB       33.20%  63458945745812644657648926377562798568
10.39.21.4      Up     Normal  129.49 GB       2.09%   67020233994527804731783987345291668992
10.39.21.3      Up     Normal  118.57 GB       31.26%  120208618942813734646032022699594259441
10.39.21.6      Up     Normal  171.03 GB       15.29%  146231632500721020374621781629360107476

The 2% bit struck me as odd, so I ran tpstats on 10.39.21.4 and got:
Pool Name

Active

Pending

Completed

ReadStage

0

0

143370

RequestResponseStage

8

1231283

414467

MutationStage

0

0

1772203

ReadRepair

0

0

7678

GossipStage

0

0

204797

AntiEntropyStage

0

0

0

MigrationStage

0

0

0

MemtablePostFlusher

0

0

48

StreamStage

0

0

0

FlushWriter

0

0

48

FILEUTILS-DELETE-POOL

0

0

46

MiscStage

0

0

0

FlushSorter

0

0

0

InternalResponseStage

0

0

0

HintedHandoff

0

0

6


So... something got terribly wrong.
Can anyone suggest what should do next to fix this?

Thanks.
-Orr