You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dan Di Spaltro <da...@gmail.com> on 2010/04/01 18:26:18 UTC

Stalled Bootstrapping Process

So we are adding another node to the cluster with the latest 0.6 branch
(RC1).  It seems to be hung in some limbo state.

Before bootstrapping our cluster had 50-60GB spread fairly evenly across 4
machines, with RF=3.   One machine had more load than the others, and sure
enough bootstrapping selected that node.   That is the red machine.  The
light blue machine is the new machine.

I have attached a graph to illustrate when the bootstrap process started.

In jconsole the streamingservice status was "performing anticompaction..."
for over 18-24 hrs.  It is currently in "nothing is happening".   It did
have 1 active STREAM-STAGE task, but the machine had to be rebooted for
something unrelated to cassandra. Now the light blue machine appears to be
getting data, but its growing at virtually the same rate as the other
machines which makes me think it is part of the cluster and not actually
streaming data from the machine its supposed to.

Any other ideas on how to debug?


-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Chris Goffinet <go...@digg.com>.
+1

On Fri, Apr 2, 2010 at 3:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Ah, right.  That's confusing for everyone.  I think the best solution
> there is to just get
> http://issues.apache.org/jira/browse/CASSANDRA-579 done so it can
> start streaming immediately.
>
> On Fri, Apr 2, 2010 at 5:45 PM, Dan Di Spaltro <da...@gmail.com>
> wrote:
> > It did once it was actually done anti-compacting.  The biggest
> > question-mark (for us) was, what was happening during the
> > anti-compaction phase.
> >
> > On Fri, Apr 2, 2010 at 3:39 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >> Great, glad it worked.
> >>
> >> Sounds like we do have a bug though if the destination node never
> >> showed anything in Streaming mbean. :(
> >>
> >> On Fri, Apr 2, 2010 at 5:11 PM, Dan Di Spaltro <da...@gmail.com>
> wrote:
> >>> To close the loop on this, the node finished bootstrapping.  The
> >>> source node rebooting definitely halted the process.
> >>>
> >>> Visibility-wise, watching the anti-compactions is the best way to tell
> >>> how much progress is being made on the bootstrapping process.  The
> >>> CompactionManager mbean gives you insight into the progress of each
> >>> anti-compaction as well.
> >>>
> >>> Thanks for the help,
> >>>
> >>> On Thu, Apr 1, 2010 at 4:23 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>>> I would turn debug logging on globally on the new node, that will
> >>>> answer more questions than just the streaming package.
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Dan Di Spaltro
> >>>
> >>
> >
> >
> >
> > --
> > Dan Di Spaltro
> >
>



-- 
Chris Goffinet

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
I agree.  That would have other good side-effects, like minimizing
shooting yourself in the foot, for new folks.

On Fri, Apr 2, 2010 at 3:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Ah, right.  That's confusing for everyone.  I think the best solution
> there is to just get
> http://issues.apache.org/jira/browse/CASSANDRA-579 done so it can
> start streaming immediately.
>
> On Fri, Apr 2, 2010 at 5:45 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>> It did once it was actually done anti-compacting.  The biggest
>> question-mark (for us) was, what was happening during the
>> anti-compaction phase.
>>
>> On Fri, Apr 2, 2010 at 3:39 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> Great, glad it worked.
>>>
>>> Sounds like we do have a bug though if the destination node never
>>> showed anything in Streaming mbean. :(
>>>
>>> On Fri, Apr 2, 2010 at 5:11 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>>>> To close the loop on this, the node finished bootstrapping.  The
>>>> source node rebooting definitely halted the process.
>>>>
>>>> Visibility-wise, watching the anti-compactions is the best way to tell
>>>> how much progress is being made on the bootstrapping process.  The
>>>> CompactionManager mbean gives you insight into the progress of each
>>>> anti-compaction as well.
>>>>
>>>> Thanks for the help,
>>>>
>>>> On Thu, Apr 1, 2010 at 4:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>> I would turn debug logging on globally on the new node, that will
>>>>> answer more questions than just the streaming package.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dan Di Spaltro
>>>>
>>>
>>
>>
>>
>> --
>> Dan Di Spaltro
>>
>



-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Jonathan Ellis <jb...@gmail.com>.
Ah, right.  That's confusing for everyone.  I think the best solution
there is to just get
http://issues.apache.org/jira/browse/CASSANDRA-579 done so it can
start streaming immediately.

On Fri, Apr 2, 2010 at 5:45 PM, Dan Di Spaltro <da...@gmail.com> wrote:
> It did once it was actually done anti-compacting.  The biggest
> question-mark (for us) was, what was happening during the
> anti-compaction phase.
>
> On Fri, Apr 2, 2010 at 3:39 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Great, glad it worked.
>>
>> Sounds like we do have a bug though if the destination node never
>> showed anything in Streaming mbean. :(
>>
>> On Fri, Apr 2, 2010 at 5:11 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>>> To close the loop on this, the node finished bootstrapping.  The
>>> source node rebooting definitely halted the process.
>>>
>>> Visibility-wise, watching the anti-compactions is the best way to tell
>>> how much progress is being made on the bootstrapping process.  The
>>> CompactionManager mbean gives you insight into the progress of each
>>> anti-compaction as well.
>>>
>>> Thanks for the help,
>>>
>>> On Thu, Apr 1, 2010 at 4:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> I would turn debug logging on globally on the new node, that will
>>>> answer more questions than just the streaming package.
>>>>
>>>
>>>
>>>
>>> --
>>> Dan Di Spaltro
>>>
>>
>
>
>
> --
> Dan Di Spaltro
>

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
It did once it was actually done anti-compacting.  The biggest
question-mark (for us) was, what was happening during the
anti-compaction phase.

On Fri, Apr 2, 2010 at 3:39 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Great, glad it worked.
>
> Sounds like we do have a bug though if the destination node never
> showed anything in Streaming mbean. :(
>
> On Fri, Apr 2, 2010 at 5:11 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>> To close the loop on this, the node finished bootstrapping.  The
>> source node rebooting definitely halted the process.
>>
>> Visibility-wise, watching the anti-compactions is the best way to tell
>> how much progress is being made on the bootstrapping process.  The
>> CompactionManager mbean gives you insight into the progress of each
>> anti-compaction as well.
>>
>> Thanks for the help,
>>
>> On Thu, Apr 1, 2010 at 4:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> I would turn debug logging on globally on the new node, that will
>>> answer more questions than just the streaming package.
>>>
>>
>>
>>
>> --
>> Dan Di Spaltro
>>
>



-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Jonathan Ellis <jb...@gmail.com>.
Great, glad it worked.

Sounds like we do have a bug though if the destination node never
showed anything in Streaming mbean. :(

On Fri, Apr 2, 2010 at 5:11 PM, Dan Di Spaltro <da...@gmail.com> wrote:
> To close the loop on this, the node finished bootstrapping.  The
> source node rebooting definitely halted the process.
>
> Visibility-wise, watching the anti-compactions is the best way to tell
> how much progress is being made on the bootstrapping process.  The
> CompactionManager mbean gives you insight into the progress of each
> anti-compaction as well.
>
> Thanks for the help,
>
> On Thu, Apr 1, 2010 at 4:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> I would turn debug logging on globally on the new node, that will
>> answer more questions than just the streaming package.
>>
>
>
>
> --
> Dan Di Spaltro
>

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
To close the loop on this, the node finished bootstrapping.  The
source node rebooting definitely halted the process.

Visibility-wise, watching the anti-compactions is the best way to tell
how much progress is being made on the bootstrapping process.  The
CompactionManager mbean gives you insight into the progress of each
anti-compaction as well.

Thanks for the help,

On Thu, Apr 1, 2010 at 4:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> I would turn debug logging on globally on the new node, that will
> answer more questions than just the streaming package.
>



-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Jonathan Ellis <jb...@gmail.com>.
I would turn debug logging on globally on the new node, that will
answer more questions than just the streaming package.

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
Seems to be doing more stuff now.

Ive attached an updated screenshot.

On Thu, Apr 1, 2010 at 1:16 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Right.
>
> On Thu, Apr 1, 2010 at 3:15 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>> So it looks like its still performing anti-compaction.  The
>> compactionmanager is the best way to track this?
>>
>> On Thu, Apr 1, 2010 at 12:31 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>>> Sorry I meant the red one restarted about a day ago.  The graph shows
>>> the dip in disk space.  But it no where near returned to the previous
>>> amount of disk usage.  I was referring to how the red one didn't
>>> reclaim all its space (I figure about 60gb actually belong on that
>>> machine) Is that normal (its currently taking up about 100gb)?
>>>
>>> 2 minutes ago, I restarted the blue one.
>>>
>>> Now the streamservice task is performing anti-compaction on the red one.
>>>
>>> On Thu, Apr 1, 2010 at 12:25 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>
>>>> On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>>>> > But I didn't restart the red one.
>>>>
>>>> >> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro
>>>> >> >> > <da...@gmail.com>
>>>> >> >> > wrote:
>>>> >> >> >>
>>>> >> >> >> Red one.
>>>> >> >> >>
>>>> >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com>
>>>> >> >> >> wrote:
>>>> >> >> >>>
>>>> >> >> >>> which node rebooted, the red one, or the blue one?
>>>>
>>>> I'm confused.
>>>
>>> --
>>> Dan Di Spaltro
>>>
>>
>>
>>
>> --
>> Dan Di Spaltro
>>
>



-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Jonathan Ellis <jb...@gmail.com>.
Right.

On Thu, Apr 1, 2010 at 3:15 PM, Dan Di Spaltro <da...@gmail.com> wrote:
> So it looks like its still performing anti-compaction.  The
> compactionmanager is the best way to track this?
>
> On Thu, Apr 1, 2010 at 12:31 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>> Sorry I meant the red one restarted about a day ago.  The graph shows
>> the dip in disk space.  But it no where near returned to the previous
>> amount of disk usage.  I was referring to how the red one didn't
>> reclaim all its space (I figure about 60gb actually belong on that
>> machine) Is that normal (its currently taking up about 100gb)?
>>
>> 2 minutes ago, I restarted the blue one.
>>
>> Now the streamservice task is performing anti-compaction on the red one.
>>
>> On Thu, Apr 1, 2010 at 12:25 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>
>>> On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>>> > But I didn't restart the red one.
>>>
>>> >> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro
>>> >> >> > <da...@gmail.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Red one.
>>> >> >> >>
>>> >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com>
>>> >> >> >> wrote:
>>> >> >> >>>
>>> >> >> >>> which node rebooted, the red one, or the blue one?
>>>
>>> I'm confused.
>>
>> --
>> Dan Di Spaltro
>>
>
>
>
> --
> Dan Di Spaltro
>

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
So it looks like its still performing anti-compaction.  The
compactionmanager is the best way to track this?

On Thu, Apr 1, 2010 at 12:31 PM, Dan Di Spaltro <da...@gmail.com> wrote:
> Sorry I meant the red one restarted about a day ago.  The graph shows
> the dip in disk space.  But it no where near returned to the previous
> amount of disk usage.  I was referring to how the red one didn't
> reclaim all its space (I figure about 60gb actually belong on that
> machine) Is that normal (its currently taking up about 100gb)?
>
> 2 minutes ago, I restarted the blue one.
>
> Now the streamservice task is performing anti-compaction on the red one.
>
> On Thu, Apr 1, 2010 at 12:25 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro <da...@gmail.com> wrote:
>> > But I didn't restart the red one.
>>
>> >> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro
>> >> >> > <da...@gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Red one.
>> >> >> >>
>> >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com>
>> >> >> >> wrote:
>> >> >> >>>
>> >> >> >>> which node rebooted, the red one, or the blue one?
>>
>> I'm confused.
>
> --
> Dan Di Spaltro
>



-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
Sorry I meant the red one restarted about a day ago.  The graph shows
the dip in disk space.  But it no where near returned to the previous
amount of disk usage.  I was referring to how the red one didn't
reclaim all its space (I figure about 60gb actually belong on that
machine) Is that normal (its currently taking up about 100gb)?

2 minutes ago, I restarted the blue one.

Now the streamservice task is performing anti-compaction on the red one.

On Thu, Apr 1, 2010 at 12:25 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
> On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro <da...@gmail.com> wrote:
> > But I didn't restart the red one.
>
> >> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro
> >> >> > <da...@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Red one.
> >> >> >>
> >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> which node rebooted, the red one, or the blue one?
>
> I'm confused.

--
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Jonathan Ellis <jb...@gmail.com>.
On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro <da...@gmail.com> wrote:
> But I didn't restart the red one.

>> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro
>> >> > <da...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Red one.
>> >> >>
>> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> which node rebooted, the red one, or the blue one?

I'm confused.

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
But I didn't restart the red one.

On Thu, Apr 1, 2010 at 12:18 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> There shouldn't be anything to clean up.  (The temporary streaming
> files it anticompacted are automatically removed on restart)
>
> On Thu, Apr 1, 2010 at 2:17 PM, Dan Di Spaltro <da...@gmail.com>
> wrote:
> > Okay, so should I run any more commands like cleanup before?
> >
> > On Thu, Apr 1, 2010 at 12:09 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> Bootstrap source restarting will always fail bootstrap.  You'll need
> >> to restart the blue one too now, I'm afraid.
> >>
> >> On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro <dan.dispaltro@gmail.com
> >
> >> wrote:
> >> > Before the Red one rebooted it had 1 active STREAM-STAGE.  Now it has
> 0
> >> > in
> >> > STREAM-STAGE.
> >> >
> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro
> >> > <da...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Red one.
> >> >> Gary - both say nothing is happening with no destinations or sources.
> >> >>
> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> which node rebooted, the red one, or the blue one?
> >> >>>
> >> >>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro
> >> >>> <da...@gmail.com>
> >> >>> wrote:
> >> >>> > So we are adding another node to the cluster with the latest 0.6
> >> >>> > branch
> >> >>> > (RC1).  It seems to be hung in some limbo state.
> >> >>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly
> >> >>> > across 4
> >> >>> > machines, with RF=3.   One machine had more load than the others,
> >> >>> > and
> >> >>> > sure
> >> >>> > enough bootstrapping selected that node.   That is the red
> machine.
> >> >>> >  The
> >> >>> > light blue machine is the new machine.
> >> >>> > I have attached a graph to illustrate when the bootstrap process
> >> >>> > started.
> >> >>> > In jconsole the streamingservice status was "performing
> >> >>> > anticompaction..."
> >> >>> > for over 18-24 hrs.  It is currently in "nothing is happening".
> It
> >> >>> > did
> >> >>> > have 1 active STREAM-STAGE task, but the machine had to be
> rebooted
> >> >>> > for
> >> >>> > something unrelated to cassandra. Now the light blue machine
> appears
> >> >>> > to
> >> >>> > be
> >> >>> > getting data, but its growing at virtually the same rate as the
> >> >>> > other
> >> >>> > machines which makes me think it is part of the cluster and not
> >> >>> > actually
> >> >>> > streaming data from the machine its supposed to.
> >> >>> > Any other ideas on how to debug?
> >> >>> >
> >> >>> > --
> >> >>> > Dan Di Spaltro
> >> >>> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Dan Di Spaltro
> >> >
> >> >
> >> >
> >> > --
> >> > Dan Di Spaltro
> >> >
> >
> >
> >
> > --
> > Dan Di Spaltro
> >
>



-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Jonathan Ellis <jb...@gmail.com>.
There shouldn't be anything to clean up.  (The temporary streaming
files it anticompacted are automatically removed on restart)

On Thu, Apr 1, 2010 at 2:17 PM, Dan Di Spaltro <da...@gmail.com> wrote:
> Okay, so should I run any more commands like cleanup before?
>
> On Thu, Apr 1, 2010 at 12:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> Bootstrap source restarting will always fail bootstrap.  You'll need
>> to restart the blue one too now, I'm afraid.
>>
>> On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro <da...@gmail.com>
>> wrote:
>> > Before the Red one rebooted it had 1 active STREAM-STAGE.  Now it has 0
>> > in
>> > STREAM-STAGE.
>> >
>> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro
>> > <da...@gmail.com>
>> > wrote:
>> >>
>> >> Red one.
>> >> Gary - both say nothing is happening with no destinations or sources.
>> >>
>> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com>
>> >> wrote:
>> >>>
>> >>> which node rebooted, the red one, or the blue one?
>> >>>
>> >>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro
>> >>> <da...@gmail.com>
>> >>> wrote:
>> >>> > So we are adding another node to the cluster with the latest 0.6
>> >>> > branch
>> >>> > (RC1).  It seems to be hung in some limbo state.
>> >>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly
>> >>> > across 4
>> >>> > machines, with RF=3.   One machine had more load than the others,
>> >>> > and
>> >>> > sure
>> >>> > enough bootstrapping selected that node.   That is the red machine.
>> >>> >  The
>> >>> > light blue machine is the new machine.
>> >>> > I have attached a graph to illustrate when the bootstrap process
>> >>> > started.
>> >>> > In jconsole the streamingservice status was "performing
>> >>> > anticompaction..."
>> >>> > for over 18-24 hrs.  It is currently in "nothing is happening".   It
>> >>> > did
>> >>> > have 1 active STREAM-STAGE task, but the machine had to be rebooted
>> >>> > for
>> >>> > something unrelated to cassandra. Now the light blue machine appears
>> >>> > to
>> >>> > be
>> >>> > getting data, but its growing at virtually the same rate as the
>> >>> > other
>> >>> > machines which makes me think it is part of the cluster and not
>> >>> > actually
>> >>> > streaming data from the machine its supposed to.
>> >>> > Any other ideas on how to debug?
>> >>> >
>> >>> > --
>> >>> > Dan Di Spaltro
>> >>> >
>> >>
>> >>
>> >>
>> >> --
>> >> Dan Di Spaltro
>> >
>> >
>> >
>> > --
>> > Dan Di Spaltro
>> >
>
>
>
> --
> Dan Di Spaltro
>

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
Okay, so should I run any more commands like cleanup before?

On Thu, Apr 1, 2010 at 12:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Bootstrap source restarting will always fail bootstrap.  You'll need
> to restart the blue one too now, I'm afraid.
>
> On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro <da...@gmail.com>
> wrote:
> > Before the Red one rebooted it had 1 active STREAM-STAGE.  Now it has 0
> in
> > STREAM-STAGE.
> >
> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro <dan.dispaltro@gmail.com
> >
> > wrote:
> >>
> >> Red one.
> >> Gary - both say nothing is happening with no destinations or sources.
> >>
> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>>
> >>> which node rebooted, the red one, or the blue one?
> >>>
> >>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro <
> dan.dispaltro@gmail.com>
> >>> wrote:
> >>> > So we are adding another node to the cluster with the latest 0.6
> branch
> >>> > (RC1).  It seems to be hung in some limbo state.
> >>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly
> >>> > across 4
> >>> > machines, with RF=3.   One machine had more load than the others, and
> >>> > sure
> >>> > enough bootstrapping selected that node.   That is the red machine.
> >>> >  The
> >>> > light blue machine is the new machine.
> >>> > I have attached a graph to illustrate when the bootstrap process
> >>> > started.
> >>> > In jconsole the streamingservice status was "performing
> >>> > anticompaction..."
> >>> > for over 18-24 hrs.  It is currently in "nothing is happening".   It
> >>> > did
> >>> > have 1 active STREAM-STAGE task, but the machine had to be rebooted
> for
> >>> > something unrelated to cassandra. Now the light blue machine appears
> to
> >>> > be
> >>> > getting data, but its growing at virtually the same rate as the other
> >>> > machines which makes me think it is part of the cluster and not
> >>> > actually
> >>> > streaming data from the machine its supposed to.
> >>> > Any other ideas on how to debug?
> >>> >
> >>> > --
> >>> > Dan Di Spaltro
> >>> >
> >>
> >>
> >>
> >> --
> >> Dan Di Spaltro
> >
> >
> >
> > --
> > Dan Di Spaltro
> >
>



-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Jonathan Ellis <jb...@gmail.com>.
Bootstrap source restarting will always fail bootstrap.  You'll need
to restart the blue one too now, I'm afraid.

On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro <da...@gmail.com> wrote:
> Before the Red one rebooted it had 1 active STREAM-STAGE.  Now it has 0 in
> STREAM-STAGE.
>
> On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro <da...@gmail.com>
> wrote:
>>
>> Red one.
>> Gary - both say nothing is happening with no destinations or sources.
>>
>> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>
>>> which node rebooted, the red one, or the blue one?
>>>
>>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro <da...@gmail.com>
>>> wrote:
>>> > So we are adding another node to the cluster with the latest 0.6 branch
>>> > (RC1).  It seems to be hung in some limbo state.
>>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly
>>> > across 4
>>> > machines, with RF=3.   One machine had more load than the others, and
>>> > sure
>>> > enough bootstrapping selected that node.   That is the red machine.
>>> >  The
>>> > light blue machine is the new machine.
>>> > I have attached a graph to illustrate when the bootstrap process
>>> > started.
>>> > In jconsole the streamingservice status was "performing
>>> > anticompaction..."
>>> > for over 18-24 hrs.  It is currently in "nothing is happening".   It
>>> > did
>>> > have 1 active STREAM-STAGE task, but the machine had to be rebooted for
>>> > something unrelated to cassandra. Now the light blue machine appears to
>>> > be
>>> > getting data, but its growing at virtually the same rate as the other
>>> > machines which makes me think it is part of the cluster and not
>>> > actually
>>> > streaming data from the machine its supposed to.
>>> > Any other ideas on how to debug?
>>> >
>>> > --
>>> > Dan Di Spaltro
>>> >
>>
>>
>>
>> --
>> Dan Di Spaltro
>
>
>
> --
> Dan Di Spaltro
>

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
Before the Red one rebooted it had 1 active STREAM-STAGE.  Now it has 0 in
STREAM-STAGE.

On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro <da...@gmail.com>wrote:

> Red one.
>
> Gary - both say nothing is happening with no destinations or sources.
>
>
> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> which node rebooted, the red one, or the blue one?
>>
>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro <da...@gmail.com>
>> wrote:
>> > So we are adding another node to the cluster with the latest 0.6 branch
>> > (RC1).  It seems to be hung in some limbo state.
>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly across
>> 4
>> > machines, with RF=3.   One machine had more load than the others, and
>> sure
>> > enough bootstrapping selected that node.   That is the red machine.  The
>> > light blue machine is the new machine.
>> > I have attached a graph to illustrate when the bootstrap process
>> started.
>> > In jconsole the streamingservice status was "performing
>> anticompaction..."
>> > for over 18-24 hrs.  It is currently in "nothing is happening".   It did
>> > have 1 active STREAM-STAGE task, but the machine had to be rebooted for
>> > something unrelated to cassandra. Now the light blue machine appears to
>> be
>> > getting data, but its growing at virtually the same rate as the other
>> > machines which makes me think it is part of the cluster and not actually
>> > streaming data from the machine its supposed to.
>> > Any other ideas on how to debug?
>> >
>> > --
>> > Dan Di Spaltro
>> >
>>
>
>
>
> --
> Dan Di Spaltro
>



-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
Red one.

Gary - both say nothing is happening with no destinations or sources.

On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> which node rebooted, the red one, or the blue one?
>
> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro <da...@gmail.com>
> wrote:
> > So we are adding another node to the cluster with the latest 0.6 branch
> > (RC1).  It seems to be hung in some limbo state.
> > Before bootstrapping our cluster had 50-60GB spread fairly evenly across
> 4
> > machines, with RF=3.   One machine had more load than the others, and
> sure
> > enough bootstrapping selected that node.   That is the red machine.  The
> > light blue machine is the new machine.
> > I have attached a graph to illustrate when the bootstrap process started.
> > In jconsole the streamingservice status was "performing
> anticompaction..."
> > for over 18-24 hrs.  It is currently in "nothing is happening".   It did
> > have 1 active STREAM-STAGE task, but the machine had to be rebooted for
> > something unrelated to cassandra. Now the light blue machine appears to
> be
> > getting data, but its growing at virtually the same rate as the other
> > machines which makes me think it is part of the cluster and not actually
> > streaming data from the machine its supposed to.
> > Any other ideas on how to debug?
> >
> > --
> > Dan Di Spaltro
> >
>



-- 
Dan Di Spaltro

Re: Stalled Bootstrapping Process

Posted by Jonathan Ellis <jb...@gmail.com>.
which node rebooted, the red one, or the blue one?

On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro <da...@gmail.com> wrote:
> So we are adding another node to the cluster with the latest 0.6 branch
> (RC1).  It seems to be hung in some limbo state.
> Before bootstrapping our cluster had 50-60GB spread fairly evenly across 4
> machines, with RF=3.   One machine had more load than the others, and sure
> enough bootstrapping selected that node.   That is the red machine.  The
> light blue machine is the new machine.
> I have attached a graph to illustrate when the bootstrap process started.
> In jconsole the streamingservice status was "performing anticompaction..."
> for over 18-24 hrs.  It is currently in "nothing is happening".   It did
> have 1 active STREAM-STAGE task, but the machine had to be rebooted for
> something unrelated to cassandra. Now the light blue machine appears to be
> getting data, but its growing at virtually the same rate as the other
> machines which makes me think it is part of the cluster and not actually
> streaming data from the machine its supposed to.
> Any other ideas on how to debug?
>
> --
> Dan Di Spaltro
>

Re: Stalled Bootstrapping Process

Posted by Gary Dusbabek <gd...@gmail.com>.
Does the JMX StreamingService list any incoming/outgoing files/hosts
on the sending/receiving nodes?

Gary.

On Thu, Apr 1, 2010 at 10:26, Dan Di Spaltro <da...@gmail.com> wrote:
> So we are adding another node to the cluster with the latest 0.6 branch
> (RC1).  It seems to be hung in some limbo state.
> Before bootstrapping our cluster had 50-60GB spread fairly evenly across 4
> machines, with RF=3.   One machine had more load than the others, and sure
> enough bootstrapping selected that node.   That is the red machine.  The
> light blue machine is the new machine.
> I have attached a graph to illustrate when the bootstrap process started.
> In jconsole the streamingservice status was "performing anticompaction..."
> for over 18-24 hrs.  It is currently in "nothing is happening".   It did
> have 1 active STREAM-STAGE task, but the machine had to be rebooted for
> something unrelated to cassandra. Now the light blue machine appears to be
> getting data, but its growing at virtually the same rate as the other
> machines which makes me think it is part of the cluster and not actually
> streaming data from the machine its supposed to.
> Any other ideas on how to debug?
>
> --
> Dan Di Spaltro
>

Re: Stalled Bootstrapping Process

Posted by Dan Di Spaltro <da...@gmail.com>.
The light-blue machine is in Operation Mode: Bootstrap

On Thu, Apr 1, 2010 at 9:26 AM, Dan Di Spaltro <da...@gmail.com>wrote:

> So we are adding another node to the cluster with the latest 0.6 branch
> (RC1).  It seems to be hung in some limbo state.
>
> Before bootstrapping our cluster had 50-60GB spread fairly evenly across 4
> machines, with RF=3.   One machine had more load than the others, and sure
> enough bootstrapping selected that node.   That is the red machine.  The
> light blue machine is the new machine.
>
> I have attached a graph to illustrate when the bootstrap process started.
>
> In jconsole the streamingservice status was "performing anticompaction..."
> for over 18-24 hrs.  It is currently in "nothing is happening".   It did
> have 1 active STREAM-STAGE task, but the machine had to be rebooted for
> something unrelated to cassandra. Now the light blue machine appears to be
> getting data, but its growing at virtually the same rate as the other
> machines which makes me think it is part of the cluster and not actually
> streaming data from the machine its supposed to.
>
> Any other ideas on how to debug?
>
>
> --
> Dan Di Spaltro
>



-- 
Dan Di Spaltro