You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ivan Tretyakov <it...@griddynamics.com> on 2013/03/25 15:43:35 UTC

DataNode heartbeat average time peaks

Hi!

We see DataNode heartbeat average time peaks in Ganglia up to 20-70 seconds
while SecondaryNameNode performs checkpointing.
See attached screenshots please.

I would like to clarify if it is Ok, or not. And what kind of consequences
and risks it could bring up.

-- 
Best Regards
Ivan Tretyakov

Re: DataNode heartbeat average time peaks

Posted by Harsh J <ha...@cloudera.com>.
Worst case if it puts too much strain: randomly failing clients and
missing DN heartbeats leading to unnecessary dead node appearances.
Worth using a release with HDFS-1457 in or patching your NN/SNN for
that - helps shape those graphs.

On Mon, Mar 25, 2013 at 9:59 PM, Ivan Tretyakov
<it...@griddynamics.com> wrote:
> Thanks Harsh!
>
> My image size is about 3.1 Gb.
> Yes, I think feature from HDFS-1457 is what I need, but unfortunately it is
> not available in version of hadoop we use.
>
> What kind of risks pose by these peaks.
>
>
> On Mon, Mar 25, 2013 at 7:31 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> What's your fsimage size? If its too high you would want to control
>> the checkpoint transfer bandwidth to not affect the load at the NN.
>> This is available via the JIRA HDFS-1457.
>>
>>
>> On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
>> <it...@griddynamics.com> wrote:
>> > Hi!
>> >
>> > We see DataNode heartbeat average time peaks in Ganglia up to 20-70
>> > seconds
>> > while SecondaryNameNode performs checkpointing.
>> > See attached screenshots please.
>> >
>> > I would like to clarify if it is Ok, or not. And what kind of
>> > consequences
>> > and risks it could bring up.
>> >
>> > --
>> > Best Regards
>> > Ivan Tretyakov
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>



-- 
Harsh J

Re: DataNode heartbeat average time peaks

Posted by Harsh J <ha...@cloudera.com>.
Worst case if it puts too much strain: randomly failing clients and
missing DN heartbeats leading to unnecessary dead node appearances.
Worth using a release with HDFS-1457 in or patching your NN/SNN for
that - helps shape those graphs.

On Mon, Mar 25, 2013 at 9:59 PM, Ivan Tretyakov
<it...@griddynamics.com> wrote:
> Thanks Harsh!
>
> My image size is about 3.1 Gb.
> Yes, I think feature from HDFS-1457 is what I need, but unfortunately it is
> not available in version of hadoop we use.
>
> What kind of risks pose by these peaks.
>
>
> On Mon, Mar 25, 2013 at 7:31 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> What's your fsimage size? If its too high you would want to control
>> the checkpoint transfer bandwidth to not affect the load at the NN.
>> This is available via the JIRA HDFS-1457.
>>
>>
>> On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
>> <it...@griddynamics.com> wrote:
>> > Hi!
>> >
>> > We see DataNode heartbeat average time peaks in Ganglia up to 20-70
>> > seconds
>> > while SecondaryNameNode performs checkpointing.
>> > See attached screenshots please.
>> >
>> > I would like to clarify if it is Ok, or not. And what kind of
>> > consequences
>> > and risks it could bring up.
>> >
>> > --
>> > Best Regards
>> > Ivan Tretyakov
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>



-- 
Harsh J

Re: DataNode heartbeat average time peaks

Posted by Harsh J <ha...@cloudera.com>.
Worst case if it puts too much strain: randomly failing clients and
missing DN heartbeats leading to unnecessary dead node appearances.
Worth using a release with HDFS-1457 in or patching your NN/SNN for
that - helps shape those graphs.

On Mon, Mar 25, 2013 at 9:59 PM, Ivan Tretyakov
<it...@griddynamics.com> wrote:
> Thanks Harsh!
>
> My image size is about 3.1 Gb.
> Yes, I think feature from HDFS-1457 is what I need, but unfortunately it is
> not available in version of hadoop we use.
>
> What kind of risks pose by these peaks.
>
>
> On Mon, Mar 25, 2013 at 7:31 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> What's your fsimage size? If its too high you would want to control
>> the checkpoint transfer bandwidth to not affect the load at the NN.
>> This is available via the JIRA HDFS-1457.
>>
>>
>> On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
>> <it...@griddynamics.com> wrote:
>> > Hi!
>> >
>> > We see DataNode heartbeat average time peaks in Ganglia up to 20-70
>> > seconds
>> > while SecondaryNameNode performs checkpointing.
>> > See attached screenshots please.
>> >
>> > I would like to clarify if it is Ok, or not. And what kind of
>> > consequences
>> > and risks it could bring up.
>> >
>> > --
>> > Best Regards
>> > Ivan Tretyakov
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>



-- 
Harsh J

Re: DataNode heartbeat average time peaks

Posted by Harsh J <ha...@cloudera.com>.
Worst case if it puts too much strain: randomly failing clients and
missing DN heartbeats leading to unnecessary dead node appearances.
Worth using a release with HDFS-1457 in or patching your NN/SNN for
that - helps shape those graphs.

On Mon, Mar 25, 2013 at 9:59 PM, Ivan Tretyakov
<it...@griddynamics.com> wrote:
> Thanks Harsh!
>
> My image size is about 3.1 Gb.
> Yes, I think feature from HDFS-1457 is what I need, but unfortunately it is
> not available in version of hadoop we use.
>
> What kind of risks pose by these peaks.
>
>
> On Mon, Mar 25, 2013 at 7:31 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> What's your fsimage size? If its too high you would want to control
>> the checkpoint transfer bandwidth to not affect the load at the NN.
>> This is available via the JIRA HDFS-1457.
>>
>>
>> On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
>> <it...@griddynamics.com> wrote:
>> > Hi!
>> >
>> > We see DataNode heartbeat average time peaks in Ganglia up to 20-70
>> > seconds
>> > while SecondaryNameNode performs checkpointing.
>> > See attached screenshots please.
>> >
>> > I would like to clarify if it is Ok, or not. And what kind of
>> > consequences
>> > and risks it could bring up.
>> >
>> > --
>> > Best Regards
>> > Ivan Tretyakov
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>



-- 
Harsh J

Re: DataNode heartbeat average time peaks

Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks Harsh!

My image size is about 3.1 Gb.
Yes, I think feature from HDFS-1457 is what I need, but unfortunately it is
not available in version of hadoop we use.

What kind of risks pose by these peaks.


On Mon, Mar 25, 2013 at 7:31 PM, Harsh J <ha...@cloudera.com> wrote:

> What's your fsimage size? If its too high you would want to control
> the checkpoint transfer bandwidth to not affect the load at the NN.
> This is available via the JIRA HDFS-1457.
>
>
> On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
> <it...@griddynamics.com> wrote:
> > Hi!
> >
> > We see DataNode heartbeat average time peaks in Ganglia up to 20-70
> seconds
> > while SecondaryNameNode performs checkpointing.
> > See attached screenshots please.
> >
> > I would like to clarify if it is Ok, or not. And what kind of
> consequences
> > and risks it could bring up.
> >
> > --
> > Best Regards
> > Ivan Tretyakov
>
>
>
> --
> Harsh J
>



-- 
Best Regards
Ivan Tretyakov

Re: DataNode heartbeat average time peaks

Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks Harsh!

My image size is about 3.1 Gb.
Yes, I think feature from HDFS-1457 is what I need, but unfortunately it is
not available in version of hadoop we use.

What kind of risks pose by these peaks.


On Mon, Mar 25, 2013 at 7:31 PM, Harsh J <ha...@cloudera.com> wrote:

> What's your fsimage size? If its too high you would want to control
> the checkpoint transfer bandwidth to not affect the load at the NN.
> This is available via the JIRA HDFS-1457.
>
>
> On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
> <it...@griddynamics.com> wrote:
> > Hi!
> >
> > We see DataNode heartbeat average time peaks in Ganglia up to 20-70
> seconds
> > while SecondaryNameNode performs checkpointing.
> > See attached screenshots please.
> >
> > I would like to clarify if it is Ok, or not. And what kind of
> consequences
> > and risks it could bring up.
> >
> > --
> > Best Regards
> > Ivan Tretyakov
>
>
>
> --
> Harsh J
>



-- 
Best Regards
Ivan Tretyakov

Re: DataNode heartbeat average time peaks

Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks Harsh!

My image size is about 3.1 Gb.
Yes, I think feature from HDFS-1457 is what I need, but unfortunately it is
not available in version of hadoop we use.

What kind of risks pose by these peaks.


On Mon, Mar 25, 2013 at 7:31 PM, Harsh J <ha...@cloudera.com> wrote:

> What's your fsimage size? If its too high you would want to control
> the checkpoint transfer bandwidth to not affect the load at the NN.
> This is available via the JIRA HDFS-1457.
>
>
> On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
> <it...@griddynamics.com> wrote:
> > Hi!
> >
> > We see DataNode heartbeat average time peaks in Ganglia up to 20-70
> seconds
> > while SecondaryNameNode performs checkpointing.
> > See attached screenshots please.
> >
> > I would like to clarify if it is Ok, or not. And what kind of
> consequences
> > and risks it could bring up.
> >
> > --
> > Best Regards
> > Ivan Tretyakov
>
>
>
> --
> Harsh J
>



-- 
Best Regards
Ivan Tretyakov

Re: DataNode heartbeat average time peaks

Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks Harsh!

My image size is about 3.1 Gb.
Yes, I think feature from HDFS-1457 is what I need, but unfortunately it is
not available in version of hadoop we use.

What kind of risks pose by these peaks.


On Mon, Mar 25, 2013 at 7:31 PM, Harsh J <ha...@cloudera.com> wrote:

> What's your fsimage size? If its too high you would want to control
> the checkpoint transfer bandwidth to not affect the load at the NN.
> This is available via the JIRA HDFS-1457.
>
>
> On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
> <it...@griddynamics.com> wrote:
> > Hi!
> >
> > We see DataNode heartbeat average time peaks in Ganglia up to 20-70
> seconds
> > while SecondaryNameNode performs checkpointing.
> > See attached screenshots please.
> >
> > I would like to clarify if it is Ok, or not. And what kind of
> consequences
> > and risks it could bring up.
> >
> > --
> > Best Regards
> > Ivan Tretyakov
>
>
>
> --
> Harsh J
>



-- 
Best Regards
Ivan Tretyakov

Re: DataNode heartbeat average time peaks

Posted by Harsh J <ha...@cloudera.com>.
What's your fsimage size? If its too high you would want to control
the checkpoint transfer bandwidth to not affect the load at the NN.
This is available via the JIRA HDFS-1457.


On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
<it...@griddynamics.com> wrote:
> Hi!
>
> We see DataNode heartbeat average time peaks in Ganglia up to 20-70 seconds
> while SecondaryNameNode performs checkpointing.
> See attached screenshots please.
>
> I would like to clarify if it is Ok, or not. And what kind of consequences
> and risks it could bring up.
>
> --
> Best Regards
> Ivan Tretyakov



--
Harsh J

Re: DataNode heartbeat average time peaks

Posted by Harsh J <ha...@cloudera.com>.
What's your fsimage size? If its too high you would want to control
the checkpoint transfer bandwidth to not affect the load at the NN.
This is available via the JIRA HDFS-1457.


On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
<it...@griddynamics.com> wrote:
> Hi!
>
> We see DataNode heartbeat average time peaks in Ganglia up to 20-70 seconds
> while SecondaryNameNode performs checkpointing.
> See attached screenshots please.
>
> I would like to clarify if it is Ok, or not. And what kind of consequences
> and risks it could bring up.
>
> --
> Best Regards
> Ivan Tretyakov



--
Harsh J

Re: DataNode heartbeat average time peaks

Posted by Harsh J <ha...@cloudera.com>.
What's your fsimage size? If its too high you would want to control
the checkpoint transfer bandwidth to not affect the load at the NN.
This is available via the JIRA HDFS-1457.


On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
<it...@griddynamics.com> wrote:
> Hi!
>
> We see DataNode heartbeat average time peaks in Ganglia up to 20-70 seconds
> while SecondaryNameNode performs checkpointing.
> See attached screenshots please.
>
> I would like to clarify if it is Ok, or not. And what kind of consequences
> and risks it could bring up.
>
> --
> Best Regards
> Ivan Tretyakov



--
Harsh J

Re: DataNode heartbeat average time peaks

Posted by Harsh J <ha...@cloudera.com>.
What's your fsimage size? If its too high you would want to control
the checkpoint transfer bandwidth to not affect the load at the NN.
This is available via the JIRA HDFS-1457.


On Mon, Mar 25, 2013 at 8:13 PM, Ivan Tretyakov
<it...@griddynamics.com> wrote:
> Hi!
>
> We see DataNode heartbeat average time peaks in Ganglia up to 20-70 seconds
> while SecondaryNameNode performs checkpointing.
> See attached screenshots please.
>
> I would like to clarify if it is Ok, or not. And what kind of consequences
> and risks it could bring up.
>
> --
> Best Regards
> Ivan Tretyakov



--
Harsh J