You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Yan Chunlu <sp...@gmail.com> on 2011/07/21 04:00:37 UTC
with proof Re: cassandra goes infinite loop and data lost.....
this time it is another node, the node goes down during repair, and come
back but never up, I change log level to "DEBUG" and found out it print out
the following message infinitely
DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> That says "I'm collecting data to answer requests."
>
> I don't see anything here that indicates an infinite loop.
>
> I do see that it's saying "N of 2147483647" which looks like you're
> doing slices with a much larger limit than is advisable (good way to
> OOM the way you already did).
>
> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com> wrote:
> > I gave cassandra 8GB heap size and somehow it run out of memory and
> crashed.
> > after I start it, it just runs in to the following infinite loop, the
> last
> > line:
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> > goes for ever
> > I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
> and
> > can't get it back?
> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 1 of 2147483647: auje:false:13@1305641597957075
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
> >
> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
> wrote:
> >>
> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >
> >
> > --
> > 闫春路
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
--
闫春路
Re: with proof Re: cassandra goes infinite loop and data lost.....
Posted by Jonathan Ellis <jb...@gmail.com>.
You should be able to tell from earlier in the log if this is from a
request, from hinted handoff replay, or something else
On Wed, Jul 20, 2011 at 10:42 PM, Yan Chunlu <sp...@gmail.com> wrote:
> thans for the reply.
> now the problem is how can I get rid of the ""N of 2147483647 ", it seems
> never ends, and the node never goes UP....
> last time it happens I run "node cleanup", turns out some data loss(not sure
> if caused by cleanup).
>
> On Thu, Jul 21, 2011 at 11:37 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
>>
>> Personally I would do a repair first if you need to do one, just so you
>> are confident everything is where is should be.
>> Then do the move as described in the wiki.
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 21 Jul 2011, at 15:14, Yan Chunlu wrote:
>>
>> sorry for the misunderstanding. I saw many N of 2147483647 which N=0 and
>> thought it was not doing anything.
>> my node was very unbalanced and I was intend to rebalance it by "nodetool
>> move" after a "node repair", does that cause the slices much large?
>> Address Status State Load Owns Token
>>
>>
>> 84944475733633104818662955375549269696
>> 10.28.53.2 Down Normal 71.41 GB 81.09%
>> 52773518586096316348543097376923124102
>> 10.28.53.3 Up Normal 14.72 GB 10.48%
>> 70597222385644499881390884416714081360
>> 10.28.53.4 Up Normal 13.5 GB 8.43%
>> 84944475733633104818662955375549269696
>>
>> should I do "nodetool move" according to
>> http://wiki.apache.org/cassandra/Operations#Load_balancing before doing
>> repair?
>> thank you for your help!
>>
>>
>> On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>>>
>>> This is not an infinite loop, you can see the column objects being
>>> iterated over are different.
>>>
>>> Like I said last time, "I do see that it's saying "N of 2147483647"
>>> which looks like you're
>>> doing slices with a much larger limit than is advisable."
>>>
>>> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com>
>>> wrote:
>>> > this time it is another node, the node goes down during repair, and
>>> > come
>>> > back but never up, I change log level to "DEBUG" and found out it print
>>> > out
>>> > the following message infinitely
>>> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
>>> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
>>> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
>>> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
>>> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
>>> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
>>> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
>>> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
>>> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
>>> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
>>> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
>>> >
>>> >
>>> >
>>> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com>
>>> > wrote:
>>> >>
>>> >> That says "I'm collecting data to answer requests."
>>> >>
>>> >> I don't see anything here that indicates an infinite loop.
>>> >>
>>> >> I do see that it's saying "N of 2147483647" which looks like you're
>>> >> doing slices with a much larger limit than is advisable (good way to
>>> >> OOM the way you already did).
>>> >>
>>> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com>
>>> >> wrote:
>>> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
>>> >> > crashed.
>>> >> > after I start it, it just runs in to the following infinite loop,
>>> >> > the
>>> >> > last
>>> >> > line:
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>>> >> > goes for ever
>>> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am
>>> >> > screwed
>>> >> > and
>>> >> > can't get it back?
>>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
>>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
>>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
>>> >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line
>>> >> > 123)
>>> >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>>> >> >
>>> >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>>> >> >> 123)
>>> >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>>> >> >
>>> >> >
>>> >> > --
>>> >> > 闫春路
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jonathan Ellis
>>> >> Project Chair, Apache Cassandra
>>> >> co-founder of DataStax, the source for professional Cassandra support
>>> >> http://www.datastax.com
>>> >
>>> >
>>> >
>>> > --
>>> > 闫春路
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>>
>>
>> --
>> 闫春路
>>
>
>
>
> --
> 闫春路
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
Re: with proof Re: cassandra goes infinite loop and data lost.....
Posted by Yan Chunlu <sp...@gmail.com>.
thans for the reply.
now the problem is how can I get rid of the ""N of 2147483647 ", it seems
never ends, and the node never goes UP....
last time it happens I run "node cleanup", turns out some data loss(not sure
if caused by cleanup).
On Thu, Jul 21, 2011 at 11:37 AM, aaron morton <aa...@thelastpickle.com>wrote:
> Personally I would do a repair first if you need to do one, just so you are
> confident everything is where is should be.
>
> Then do the move as described in the wiki.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21 Jul 2011, at 15:14, Yan Chunlu wrote:
>
> sorry for the misunderstanding. I saw many N of 2147483647 which N=0 and
> thought it was not doing anything.
>
> my node was very unbalanced and I was intend to rebalance it by "nodetool
> move" after a "node repair", does that cause the slices much large?
>
> Address Status State Load Owns Token
>
>
> 84944475733633104818662955375549269696
> 10.28.53.2 Down Normal 71.41 GB 81.09%
> 52773518586096316348543097376923124102
> 10.28.53.3 Up Normal 14.72 GB 10.48%
> 70597222385644499881390884416714081360
> 10.28.53.4 Up Normal 13.5 GB 8.43%
> 84944475733633104818662955375549269696
>
>
> should I do "nodetool move" according to
> http://wiki.apache.org/cassandra/Operations#Load_balancing before doing
> repair?
>
> thank you for your help!
>
>
>
> On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis <jb...@gmail.com>wrote:
>
>> This is not an infinite loop, you can see the column objects being
>> iterated over are different.
>>
>> Like I said last time, "I do see that it's saying "N of 2147483647"
>> which looks like you're
>> doing slices with a much larger limit than is advisable."
>>
>> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com>
>> wrote:
>> > this time it is another node, the node goes down during repair, and come
>> > back but never up, I change log level to "DEBUG" and found out it print
>> out
>> > the following message infinitely
>> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
>> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
>> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
>> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
>> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
>> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
>> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
>> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
>> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
>> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
>> >
>> >
>> >
>> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>> >>
>> >> That says "I'm collecting data to answer requests."
>> >>
>> >> I don't see anything here that indicates an infinite loop.
>> >>
>> >> I do see that it's saying "N of 2147483647" which looks like you're
>> >> doing slices with a much larger limit than is advisable (good way to
>> >> OOM the way you already did).
>> >>
>> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com>
>> wrote:
>> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
>> >> > crashed.
>> >> > after I start it, it just runs in to the following infinite loop, the
>> >> > last
>> >> > line:
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> >> > goes for ever
>> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am
>> screwed
>> >> > and
>> >> > can't get it back?
>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
>> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
>> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
>> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
>> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
>> >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
>> >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>> >> >
>> >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line
>> 123)
>> >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> >> >
>> >> >
>> >> > --
>> >> > 闫春路
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of DataStax, the source for professional Cassandra support
>> >> http://www.datastax.com
>> >
>> >
>> >
>> > --
>> > 闫春路
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>
> --
> 闫春路
>
>
>
--
闫春路
Re: with proof Re: cassandra goes infinite loop and data lost.....
Posted by aaron morton <aa...@thelastpickle.com>.
Personally I would do a repair first if you need to do one, just so you are confident everything is where is should be.
Then do the move as described in the wiki.
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 21 Jul 2011, at 15:14, Yan Chunlu wrote:
> sorry for the misunderstanding. I saw many N of 2147483647 which N=0 and thought it was not doing anything.
>
> my node was very unbalanced and I was intend to rebalance it by "nodetool move" after a "node repair", does that cause the slices much large?
>
> Address Status State Load Owns Token
> 84944475733633104818662955375549269696
> 10.28.53.2 Down Normal 71.41 GB 81.09% 52773518586096316348543097376923124102
> 10.28.53.3 Up Normal 14.72 GB 10.48% 70597222385644499881390884416714081360
> 10.28.53.4 Up Normal 13.5 GB 8.43% 84944475733633104818662955375549269696
>
>
> should I do "nodetool move" according to http://wiki.apache.org/cassandra/Operations#Load_balancing before doing repair?
>
> thank you for your help!
>
>
>
> On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> This is not an infinite loop, you can see the column objects being
> iterated over are different.
>
> Like I said last time, "I do see that it's saying "N of 2147483647"
> which looks like you're
> doing slices with a much larger limit than is advisable."
>
> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com> wrote:
> > this time it is another node, the node goes down during repair, and come
> > back but never up, I change log level to "DEBUG" and found out it print out
> > the following message infinitely
> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
> >
> >
> >
> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> >>
> >> That says "I'm collecting data to answer requests."
> >>
> >> I don't see anything here that indicates an infinite loop.
> >>
> >> I do see that it's saying "N of 2147483647" which looks like you're
> >> doing slices with a much larger limit than is advisable (good way to
> >> OOM the way you already did).
> >>
> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com> wrote:
> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
> >> > crashed.
> >> > after I start it, it just runs in to the following infinite loop, the
> >> > last
> >> > line:
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> > goes for ever
> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
> >> > and
> >> > can't get it back?
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
> >> >
> >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
> >> > wrote:
> >> >>
> >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> >
> >> >
> >> > --
> >> > 闫春路
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
> >
> > --
> > 闫春路
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
>
>
> --
> 闫春路
Re: with proof Re: cassandra goes infinite loop and data lost.....
Posted by Yan Chunlu <sp...@gmail.com>.
sorry for the misunderstanding. I saw many N of 2147483647 which N=0 and
thought it was not doing anything.
my node was very unbalanced and I was intend to rebalance it by "nodetool
move" after a "node repair", does that cause the slices much large?
Address Status State Load Owns Token
84944475733633104818662955375549269696
10.28.53.2 Down Normal 71.41 GB 81.09%
52773518586096316348543097376923124102
10.28.53.3 Up Normal 14.72 GB 10.48%
70597222385644499881390884416714081360
10.28.53.4 Up Normal 13.5 GB 8.43%
84944475733633104818662955375549269696
should I do "nodetool move" according to
http://wiki.apache.org/cassandra/Operations#Load_balancing before doing
repair?
thank you for your help!
On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> This is not an infinite loop, you can see the column objects being
> iterated over are different.
>
> Like I said last time, "I do see that it's saying "N of 2147483647"
> which looks like you're
> doing slices with a much larger limit than is advisable."
>
> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com> wrote:
> > this time it is another node, the node goes down during repair, and come
> > back but never up, I change log level to "DEBUG" and found out it print
> out
> > the following message infinitely
> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
> >
> >
> >
> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> That says "I'm collecting data to answer requests."
> >>
> >> I don't see anything here that indicates an infinite loop.
> >>
> >> I do see that it's saying "N of 2147483647" which looks like you're
> >> doing slices with a much larger limit than is advisable (good way to
> >> OOM the way you already did).
> >>
> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com>
> wrote:
> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
> >> > crashed.
> >> > after I start it, it just runs in to the following infinite loop, the
> >> > last
> >> > line:
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> > goes for ever
> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am
> screwed
> >> > and
> >> > can't get it back?
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> >> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> >> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
> >> >
> >> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
> >> > wrote:
> >> >>
> >> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> >
> >> >
> >> > --
> >> > 闫春路
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
> >
> > --
> > 闫春路
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
--
闫春路
Re: with proof Re: cassandra goes infinite loop and data lost.....
Posted by Jonathan Ellis <jb...@gmail.com>.
This is not an infinite loop, you can see the column objects being
iterated over are different.
Like I said last time, "I do see that it's saying "N of 2147483647"
which looks like you're
doing slices with a much larger limit than is advisable."
On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu <sp...@gmail.com> wrote:
> this time it is another node, the node goes down during repair, and come
> back but never up, I change log level to "DEBUG" and found out it print out
> the following message infinitely
> DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
> DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
> DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
> DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
> DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
> DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
> DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
> DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
> DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
> DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
>
>
>
> On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> That says "I'm collecting data to answer requests."
>>
>> I don't see anything here that indicates an infinite loop.
>>
>> I do see that it's saying "N of 2147483647" which looks like you're
>> doing slices with a much larger limit than is advisable (good way to
>> OOM the way you already did).
>>
>> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu <sp...@gmail.com> wrote:
>> > I gave cassandra 8GB heap size and somehow it run out of memory and
>> > crashed.
>> > after I start it, it just runs in to the following infinite loop, the
>> > last
>> > line:
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> > goes for ever
>> > I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
>> > and
>> > can't get it back?
>> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
>> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
>> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 1 of 2147483647: auje:false:13@1305641597957075
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
>> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
>> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
>> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>> >
>> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu <sp...@gmail.com>
>> > wrote:
>> >>
>> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
>> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>> >
>> >
>> > --
>> > 闫春路
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>
>
> --
> 闫春路
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com