You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Yan Chunlu <sp...@gmail.com> on 2011/07/21 04:00:37 UTC

with proof Re: cassandra goes infinite loop and data lost.....

this time it is another node, the node goes down during repair, and come
back but never up, I change log level to "DEBUG" and found out it print out
the following message infinitely

DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918




On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> That says "I'm collecting data to answer requests."
>
> I don't see anything here that indicates an infinite loop.
>
> I do see that it's saying "N of 2147483647" which looks like you're
> doing slices with a much larger limit than is advisable (good way to
> OOM the way you already did).
>
> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com> wrote:
> > I gave cassandra 8GB heap size and somehow it run out of memory and
> crashed.
> > after I start it, it just runs in to the following infinite loop, the
> last
> > line:
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> > goes for ever
> > I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
> and
> > can't get it back?
> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 1 of 2147483647: auje:false:13@1305641597957075
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
> >
> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
> wrote:
> >>
> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >
> >
> > --
> > 闫春路
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
闫春路

Re: with proof Re: cassandra goes infinite loop and data lost.....

Posted by Jonathan Ellis <jb...@gmail.com>.

You should be able to tell from earlier in the log if this is from a
request, from hinted handoff replay, or something else

On Wed, Jul 20, 2011 at 10:42 PM, Yan Chunlu <sp...@gmail.com> wrote:
> thans for the reply.
> now the problem is how can I get rid of the ""N of 2147483647 ", it seems
> never ends, and the node never goes UP....
> last time it happens I run "node cleanup", turns out some data loss(not sure
> if caused by cleanup).
>
> On Thu, Jul 21, 2011 at 11:37 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
>>
>> Personally I would do a repair first if you need to do one, just so you
>> are confident everything is where is should be.
>> Then do the move as described in the wiki.
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 21 Jul 2011, at 15:14, Yan Chunlu wrote:
>>
>> sorry for the misunderstanding.  I saw many N of 2147483647 which N=0 and
>> thought it was not doing anything.
>> my node was very unbalanced and I was intend to rebalance it by "nodetool
>> move" after a "node repair", does that cause the slices much large?
>> Address         Status State   Load            Owns    Token
>>
>>
>>  84944475733633104818662955375549269696
>> 10.28.53.2      Down   Normal  71.41 GB        81.09%
>>  52773518586096316348543097376923124102
>> 10.28.53.3     Up     Normal  14.72 GB        10.48%
>>  70597222385644499881390884416714081360
>> 10.28.53.4      Up     Normal  13.5 GB         8.43%
>> 84944475733633104818662955375549269696
>>
>> should I do "nodetool move" according to
>> http://wiki.apache.org/cassandra/Operations#Load_balancing  before doing
>> repair?
>> thank you for your help!
>>
>>
>> On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>>>
>>> This is not an infinite loop, you can see the column objects being
>>> iterated over are different.
>>>
>>> Like I said last time, "I do see that it's saying "N of 2147483647"
>>> which looks like you're
>>> doing slices with a much larger limit than is advisable."
>>>
>>> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com>
>>> wrote:
>>> > this time it is another node, the node goes down during repair, and
>>> > come
>>> > back but never up, I change log level to "DEBUG" and found out it print
>>> > out
>>> > the following message infinitely
>>> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
>>> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
>>> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
>>> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
>>> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
>>> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
>>> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
>>> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
>>> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
>>> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
>>> >
>>> >
>>> >
>>> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com>
>>> > wrote:
>>> >>
>>> >> That says "I'm collecting data to answer requests."
>>> >>
>>> >> I don't see anything here that indicates an infinite loop.
>>> >>
>>> >> I do see that it's saying "N of 2147483647" which looks like you're
>>> >> doing slices with a much larger limit than is advisable (good way to
>>> >> OOM the way you already did).
>>> >>
>>> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com>
>>> >> wrote:
>>> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
>>> >> > crashed.
>>> >> > after I start it, it just runs in to the following infinite loop,
>>> >> > the
>>> >> > last
>>> >> > line:
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>>> >> > goes for ever
>>> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am
>>> >> > screwed
>>> >> > and
>>> >> > can't get it back?
>>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
>>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
>>> >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>>> >> >
>>> >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> >> 123)
>>> >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>>> >> >
>>> >> >
>>> >> > --
>>> >> > 闫春路
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jonathan Ellis
>>> >> Project Chair, Apache Cassandra
>>> >> co-founder of DataStax, the source for professional Cassandra support
>>> >> http://www.datastax.com
>>> >
>>> >
>>> >
>>> > --
>>> > 闫春路
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>>
>>
>> --
>> 闫春路
>>
>
>
>
> --
> 闫春路
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: with proof Re: cassandra goes infinite loop and data lost.....

Posted by Yan Chunlu <sp...@gmail.com>.

thans for the reply.

now the problem is how can I get rid of the ""N of 2147483647 ", it seems
never ends, and the node never goes UP....
last time it happens I run "node cleanup", turns out some data loss(not sure
if caused by cleanup).

On Thu, Jul 21, 2011 at 11:37 AM, aaron morton <aa...@thelastpickle.com>wrote:

> Personally I would do a repair first if you need to do one, just so you are
> confident everything is where is should be.
>
> Then do the move as described in the wiki.
>
> Cheers
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21 Jul 2011, at 15:14, Yan Chunlu wrote:
>
> sorry for the misunderstanding.  I saw many N of 2147483647 which N=0 and
> thought it was not doing anything.
>
> my node was very unbalanced and I was intend to rebalance it by "nodetool
> move" after a "node repair", does that cause the slices much large?
>
> Address         Status State   Load            Owns    Token
>
>
>  84944475733633104818662955375549269696
> 10.28.53.2      Down   Normal  71.41 GB        81.09%
>  52773518586096316348543097376923124102
> 10.28.53.3     Up     Normal  14.72 GB        10.48%
>  70597222385644499881390884416714081360
> 10.28.53.4      Up     Normal  13.5 GB         8.43%
> 84944475733633104818662955375549269696
>
>
> should I do "nodetool move" according to
> http://wiki.apache.org/cassandra/Operations#Load_balancing  before doing
> repair?
>
> thank you for your help!
>
>
>
> On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis <jb...@gmail.com>wrote:
>
>> This is not an infinite loop, you can see the column objects being
>> iterated over are different.
>>
>> Like I said last time, "I do see that it's saying "N of 2147483647"
>> which looks like you're
>> doing slices with a much larger limit than is advisable."
>>
>> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com>
>> wrote:
>> > this time it is another node, the node goes down during repair, and come
>> > back but never up, I change log level to "DEBUG" and found out it print
>> out
>> > the following message infinitely
>> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
>> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
>> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
>> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
>> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
>> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
>> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
>> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
>> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
>> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
>> >
>> >
>> >
>> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>> >>
>> >> That says "I'm collecting data to answer requests."
>> >>
>> >> I don't see anything here that indicates an infinite loop.
>> >>
>> >> I do see that it's saying "N of 2147483647" which looks like you're
>> >> doing slices with a much larger limit than is advisable (good way to
>> >> OOM the way you already did).
>> >>
>> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com>
>> wrote:
>> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
>> >> > crashed.
>> >> > after I start it, it just runs in to the following infinite loop, the
>> >> > last
>> >> > line:
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> >> > goes for ever
>> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am
>> screwed
>> >> > and
>> >> > can't get it back?
>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
>> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
>> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
>> >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
>> >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>> >> >
>> >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>> 123)
>> >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> >> >
>> >> >
>> >> > --
>> >> > 闫春路
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of DataStax, the source for professional Cassandra support
>> >> http://www.datastax.com
>> >
>> >
>> >
>> > --
>> > 闫春路
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>
> --
> 闫春路
>
>
>


-- 
闫春路

Re: with proof Re: cassandra goes infinite loop and data lost.....

Posted by aaron morton <aa...@thelastpickle.com>.

Personally I would do a repair first if you need to do one, just so you are confident everything is where is should be. 

Then do the move as described in the wiki. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 15:14, Yan Chunlu wrote:

> sorry for the misunderstanding.  I saw many N of 2147483647 which N=0 and thought it was not doing anything.
> 
> my node was very unbalanced and I was intend to rebalance it by "nodetool move" after a "node repair", does that cause the slices much large?
> 
> Address         Status State   Load            Owns    Token                                       
>                                                        84944475733633104818662955375549269696      
> 10.28.53.2      Down   Normal  71.41 GB        81.09%  52773518586096316348543097376923124102      
> 10.28.53.3     Up     Normal  14.72 GB        10.48%  70597222385644499881390884416714081360      
> 10.28.53.4      Up     Normal  13.5 GB         8.43%   84944475733633104818662955375549269696  
> 
> 
> should I do "nodetool move" according to http://wiki.apache.org/cassandra/Operations#Load_balancing  before doing repair?
> 
> thank you for your help!
> 
> 
> 
> On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> This is not an infinite loop, you can see the column objects being
> iterated over are different.
> 
> Like I said last time, "I do see that it's saying "N of 2147483647"
> which looks like you're
> doing slices with a much larger limit than is advisable."
> 
> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com> wrote:
> > this time it is another node, the node goes down during repair, and come
> > back but never up, I change log level to "DEBUG" and found out it print out
> > the following message infinitely
> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
> >
> >
> >
> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> >>
> >> That says "I'm collecting data to answer requests."
> >>
> >> I don't see anything here that indicates an infinite loop.
> >>
> >> I do see that it's saying "N of 2147483647" which looks like you're
> >> doing slices with a much larger limit than is advisable (good way to
> >> OOM the way you already did).
> >>
> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com> wrote:
> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
> >> > crashed.
> >> > after I start it, it just runs in to the following infinite loop, the
> >> > last
> >> > line:
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> > goes for ever
> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
> >> > and
> >> > can't get it back?
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
> >> >
> >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
> >> > wrote:
> >> >>
> >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> >
> >> >
> >> > --
> >> > 闫春路
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
> >
> > --
> > 闫春路
> >
> 
> 
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 
> 
> 
> -- 
> 闫春路

Re: with proof Re: cassandra goes infinite loop and data lost.....

Posted by Yan Chunlu <sp...@gmail.com>.

sorry for the misunderstanding.  I saw many N of 2147483647 which N=0 and
thought it was not doing anything.

my node was very unbalanced and I was intend to rebalance it by "nodetool
move" after a "node repair", does that cause the slices much large?

Address         Status State   Load            Owns    Token


 84944475733633104818662955375549269696
10.28.53.2      Down   Normal  71.41 GB        81.09%
 52773518586096316348543097376923124102
10.28.53.3     Up     Normal  14.72 GB        10.48%
 70597222385644499881390884416714081360
10.28.53.4      Up     Normal  13.5 GB         8.43%
84944475733633104818662955375549269696


should I do "nodetool move" according to
http://wiki.apache.org/cassandra/Operations#Load_balancing  before doing
repair?

thank you for your help!



On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> This is not an infinite loop, you can see the column objects being
> iterated over are different.
>
> Like I said last time, "I do see that it's saying "N of 2147483647"
> which looks like you're
> doing slices with a much larger limit than is advisable."
>
> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com> wrote:
> > this time it is another node, the node goes down during repair, and come
> > back but never up, I change log level to "DEBUG" and found out it print
> out
> > the following message infinitely
> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
> >
> >
> >
> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> That says "I'm collecting data to answer requests."
> >>
> >> I don't see anything here that indicates an infinite loop.
> >>
> >> I do see that it's saying "N of 2147483647" which looks like you're
> >> doing slices with a much larger limit than is advisable (good way to
> >> OOM the way you already did).
> >>
> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com>
> wrote:
> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
> >> > crashed.
> >> > after I start it, it just runs in to the following infinite loop, the
> >> > last
> >> > line:
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> > goes for ever
> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am
> screwed
> >> > and
> >> > can't get it back?
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
> >> >
> >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
> >> > wrote:
> >> >>
> >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> >
> >> >
> >> > --
> >> > 闫春路
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
> >
> > --
> > 闫春路
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
闫春路

Re: with proof Re: cassandra goes infinite loop and data lost.....

Posted by Jonathan Ellis <jb...@gmail.com>.

This is not an infinite loop, you can see the column objects being
iterated over are different.

Like I said last time, "I do see that it's saying "N of 2147483647"
which looks like you're
doing slices with a much larger limit than is advisable."

On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com> wrote:
> this time it is another node, the node goes down during repair, and come
> back but never up, I change log level to "DEBUG" and found out it print out
> the following message infinitely
> DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
> DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
> DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
> DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
> DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
> DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
> DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
> DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
> DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
> DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
>
>
>
> On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> That says "I'm collecting data to answer requests."
>>
>> I don't see anything here that indicates an infinite loop.
>>
>> I do see that it's saying "N of 2147483647" which looks like you're
>> doing slices with a much larger limit than is advisable (good way to
>> OOM the way you already did).
>>
>> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com> wrote:
>> > I gave cassandra 8GB heap size and somehow it run out of memory and
>> > crashed.
>> > after I start it, it just runs in to the following infinite loop, the
>> > last
>> > line:
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> > goes for ever
>> > I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
>> > and
>> > can't get it back?
>> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
>> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
>> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 1 of 2147483647: auje:false:13@1305641597957075
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
>> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
>> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>> >
>> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
>> > wrote:
>> >>
>> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> >
>> >
>> > --
>> > 闫春路
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>
>
> --
> 闫春路
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com