You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by sam liu <sa...@gmail.com> on 2013/06/21 11:29:33 UTC

Hang when add/remove a datanode into/from a 2 datanode cluster

Hi,

I encountered an issue which hangs the decommission operatoin. Its steps:
1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
hdfs-site.xml, set the 'dfs.replication' to 2
2. Add node dn3 into the cluster as a new datanode, and did not change the '
dfs.replication' value in hdfs-site.xml and keep it as 2
note: step 2 passed
3. Decommission dn3 from the cluster

Expected result: dn3 could be decommissioned successfully

Actual result: decommission progress hangs and the status always be
'Waiting DataNode status: Decommissioned'

However, if the initial cluster includes >= 3 datanodes, this issue won't
be encountered when add/remove another datanode.

Also, after step 2, I noticed that some block's expected replicas is 3, but
the 'dfs.replication' value in hdfs-site.xml is always 2!

Could anyone pls help provide some triages?

Thanks in advance!

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Yes, you are correct: using fsck tool I found some files in my cluster
expected more replications than the value defined in dfs.replication. If I
set the expected replication of this files to a proper number, the
decommissioning process will go smoothly and the datanode could be
decommissioned finally.

However, many users do not mention this and might confuse the situation
that cluster always stays on decommissioning phase, so I think we can make
some improvements to allow user more easily to do precheck before
decommisioning datanode here, and help them find out all filies which might
lack of replications after decommissioning a datanode. For example, we can
told user that the expected replication of file1 and file26 are 6, but
after decommissioning a datanode the datanodes of the cluster will be 5 and
won't satisify file1 and file26 any more. In this way, user can decide
whether to continue the decommissioning work or to reduce the expected
replications of those files. For the way to implementation, I think we
could add a decommission-precheck script or a parameter to fsck tool.

Any comments?


2013/8/1 Harsh J <ha...@cloudera.com>

> As I said before, it is a per-file property and the config can be
> bypassed by clients that do not read the configs, place a manual API
> override, etc..
>
> If you want to really define a hard maximum and catch such clients,
> try setting dfs.replication.max to 2 at your NameNode.
>
> On Thu, Aug 1, 2013 at 8:07 AM, sam liu <sa...@gmail.com> wrote:
> > But, please mention that the value of 'dfs.replication' of the cluster is
> > always 2, even when the datanode number is 3. And I am pretty sure I did
> not
> > manually create any files with rep=3. So, why were some files of hdfs
> > created with repl=3, but not repl=2?
> >
> >
> > 2013/8/1 Harsh J <ha...@cloudera.com>
> >>
> >> The step (a) points to your problem and solution both. You have files
> >> being created with repl=3 on a 2 DN cluster which will prevent
> >> decommission. This is not a bug.
> >>
> >> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com>
> wrote:
> >> > I opened a jira for tracking this issue:
> >> > https://issues.apache.org/jira/browse/HDFS-5046
> >> >
> >> >
> >> > 2013/7/2 sam liu <sa...@gmail.com>
> >> >>
> >> >> Yes, the default replication factor is 3. However, in my case, it's
> >> >> strange: during decommission hangs, I found some block's expected
> >> >> replicas
> >> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every
> cluster
> >> >> node
> >> >> is always 2 from the beginning of cluster setup. Below is my steps:
> >> >>
> >> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> And,
> >> >> in
> >> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> change
> >> >> the
> >> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> >> note: step 2 passed
> >> >> 3. Decommission dn3 from the cluster
> >> >> Expected result: dn3 could be decommissioned successfully
> >> >> Actual result:
> >> >> a). decommission progress hangs and the status always be 'Waiting
> >> >> DataNode
> >> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2
> /',
> >> >> the
> >> >> decommission continues and will be completed finally.
> >> >> b). However, if the initial cluster includes >= 3 datanodes, this
> issue
> >> >> won't be encountered when add/remove another datanode. For example,
> if
> >> >> I
> >> >> setup a cluster with 3 datanodes, and then I can successfully add the
> >> >> 4th
> >> >> datanode into it, and then also can successfully remove the 4th
> >> >> datanode
> >> >> from the cluster.
> >> >>
> >> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this.
> Any
> >> >> comments?
> >> >>
> >> >> Thanks!
> >> >>
> >> >>
> >> >> 2013/6/21 Harsh J <ha...@cloudera.com>
> >> >>>
> >> >>> The dfs.replication is a per-file parameter. If you have a client
> that
> >> >>> does not use the supplied configs, then its default replication is 3
> >> >>> and all files it will create (as part of the app or via a job
> config)
> >> >>> will be with replication factor 3.
> >> >>>
> >> >>> You can do an -lsr to find all files and filter which ones have been
> >> >>> created with a factor of 3 (versus expected config of 2).
> >> >>>
> >> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
> >> >>> wrote:
> >> >>> > Hi George,
> >> >>> >
> >> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
> >> >>> > But
> >> >>> > still
> >> >>> > encounter this issue.
> >> >>> >
> >> >>> > Thanks!
> >> >>> >
> >> >>> >
> >> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >> >>> >>
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> I think i have faced this before, the problem is that you have
> the
> >> >>> >> rep
> >> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve
> >> >>> >> the
> >> >>> >> factor
> >> >>> >> (replicas are not created on the same node). If you set the
> >> >>> >> replication
> >> >>> >> factor=2 i think you will not have this issue. So in general you
> >> >>> >> must
> >> >>> >> make
> >> >>> >> sure that the rep factor is <= to the available datanodes.
> >> >>> >>
> >> >>> >> BR,
> >> >>> >> George
> >> >>> >>
> >> >>> >>
> >> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> I encountered an issue which hangs the decommission operatoin.
> Its
> >> >>> >> steps:
> >> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> >> >>> >> And,
> >> >>> >> in
> >> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> >> >>> >> change
> >> >>> >> the
> >> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> >>> >> note: step 2 passed
> >> >>> >> 3. Decommission dn3 from the cluster
> >> >>> >>
> >> >>> >> Expected result: dn3 could be decommissioned successfully
> >> >>> >>
> >> >>> >> Actual result: decommission progress hangs and the status always
> be
> >> >>> >> 'Waiting DataNode status: Decommissioned'
> >> >>> >>
> >> >>> >> However, if the initial cluster includes >= 3 datanodes, this
> issue
> >> >>> >> won't
> >> >>> >> be encountered when add/remove another datanode.
> >> >>> >>
> >> >>> >> Also, after step 2, I noticed that some block's expected replicas
> >> >>> >> is
> >> >>> >> 3,
> >> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >> >>> >>
> >> >>> >> Could anyone pls help provide some triages?
> >> >>> >>
> >> >>> >> Thanks in advance!
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >> ---------------------------
> >> >>> >>
> >> >>> >> George Kousiouris, PhD
> >> >>> >> Electrical and Computer Engineer
> >> >>> >> Division of Communications,
> >> >>> >> Electronics and Information Engineering
> >> >>> >> School of Electrical and Computer Engineering
> >> >>> >> Tel: +30 210 772 2546
> >> >>> >> Mobile: +30 6939354121
> >> >>> >> Fax: +30 210 772 2569
> >> >>> >> Email: gkousiou@mail.ntua.gr
> >> >>> >> Site: http://users.ntua.gr/gkousiou/
> >> >>> >>
> >> >>> >> National Technical University of Athens
> >> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Harsh J
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Yes, you are correct: using fsck tool I found some files in my cluster
expected more replications than the value defined in dfs.replication. If I
set the expected replication of this files to a proper number, the
decommissioning process will go smoothly and the datanode could be
decommissioned finally.

However, many users do not mention this and might confuse the situation
that cluster always stays on decommissioning phase, so I think we can make
some improvements to allow user more easily to do precheck before
decommisioning datanode here, and help them find out all filies which might
lack of replications after decommissioning a datanode. For example, we can
told user that the expected replication of file1 and file26 are 6, but
after decommissioning a datanode the datanodes of the cluster will be 5 and
won't satisify file1 and file26 any more. In this way, user can decide
whether to continue the decommissioning work or to reduce the expected
replications of those files. For the way to implementation, I think we
could add a decommission-precheck script or a parameter to fsck tool.

Any comments?


2013/8/1 Harsh J <ha...@cloudera.com>

> As I said before, it is a per-file property and the config can be
> bypassed by clients that do not read the configs, place a manual API
> override, etc..
>
> If you want to really define a hard maximum and catch such clients,
> try setting dfs.replication.max to 2 at your NameNode.
>
> On Thu, Aug 1, 2013 at 8:07 AM, sam liu <sa...@gmail.com> wrote:
> > But, please mention that the value of 'dfs.replication' of the cluster is
> > always 2, even when the datanode number is 3. And I am pretty sure I did
> not
> > manually create any files with rep=3. So, why were some files of hdfs
> > created with repl=3, but not repl=2?
> >
> >
> > 2013/8/1 Harsh J <ha...@cloudera.com>
> >>
> >> The step (a) points to your problem and solution both. You have files
> >> being created with repl=3 on a 2 DN cluster which will prevent
> >> decommission. This is not a bug.
> >>
> >> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com>
> wrote:
> >> > I opened a jira for tracking this issue:
> >> > https://issues.apache.org/jira/browse/HDFS-5046
> >> >
> >> >
> >> > 2013/7/2 sam liu <sa...@gmail.com>
> >> >>
> >> >> Yes, the default replication factor is 3. However, in my case, it's
> >> >> strange: during decommission hangs, I found some block's expected
> >> >> replicas
> >> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every
> cluster
> >> >> node
> >> >> is always 2 from the beginning of cluster setup. Below is my steps:
> >> >>
> >> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> And,
> >> >> in
> >> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> change
> >> >> the
> >> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> >> note: step 2 passed
> >> >> 3. Decommission dn3 from the cluster
> >> >> Expected result: dn3 could be decommissioned successfully
> >> >> Actual result:
> >> >> a). decommission progress hangs and the status always be 'Waiting
> >> >> DataNode
> >> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2
> /',
> >> >> the
> >> >> decommission continues and will be completed finally.
> >> >> b). However, if the initial cluster includes >= 3 datanodes, this
> issue
> >> >> won't be encountered when add/remove another datanode. For example,
> if
> >> >> I
> >> >> setup a cluster with 3 datanodes, and then I can successfully add the
> >> >> 4th
> >> >> datanode into it, and then also can successfully remove the 4th
> >> >> datanode
> >> >> from the cluster.
> >> >>
> >> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this.
> Any
> >> >> comments?
> >> >>
> >> >> Thanks!
> >> >>
> >> >>
> >> >> 2013/6/21 Harsh J <ha...@cloudera.com>
> >> >>>
> >> >>> The dfs.replication is a per-file parameter. If you have a client
> that
> >> >>> does not use the supplied configs, then its default replication is 3
> >> >>> and all files it will create (as part of the app or via a job
> config)
> >> >>> will be with replication factor 3.
> >> >>>
> >> >>> You can do an -lsr to find all files and filter which ones have been
> >> >>> created with a factor of 3 (versus expected config of 2).
> >> >>>
> >> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
> >> >>> wrote:
> >> >>> > Hi George,
> >> >>> >
> >> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
> >> >>> > But
> >> >>> > still
> >> >>> > encounter this issue.
> >> >>> >
> >> >>> > Thanks!
> >> >>> >
> >> >>> >
> >> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >> >>> >>
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> I think i have faced this before, the problem is that you have
> the
> >> >>> >> rep
> >> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve
> >> >>> >> the
> >> >>> >> factor
> >> >>> >> (replicas are not created on the same node). If you set the
> >> >>> >> replication
> >> >>> >> factor=2 i think you will not have this issue. So in general you
> >> >>> >> must
> >> >>> >> make
> >> >>> >> sure that the rep factor is <= to the available datanodes.
> >> >>> >>
> >> >>> >> BR,
> >> >>> >> George
> >> >>> >>
> >> >>> >>
> >> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> I encountered an issue which hangs the decommission operatoin.
> Its
> >> >>> >> steps:
> >> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> >> >>> >> And,
> >> >>> >> in
> >> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> >> >>> >> change
> >> >>> >> the
> >> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> >>> >> note: step 2 passed
> >> >>> >> 3. Decommission dn3 from the cluster
> >> >>> >>
> >> >>> >> Expected result: dn3 could be decommissioned successfully
> >> >>> >>
> >> >>> >> Actual result: decommission progress hangs and the status always
> be
> >> >>> >> 'Waiting DataNode status: Decommissioned'
> >> >>> >>
> >> >>> >> However, if the initial cluster includes >= 3 datanodes, this
> issue
> >> >>> >> won't
> >> >>> >> be encountered when add/remove another datanode.
> >> >>> >>
> >> >>> >> Also, after step 2, I noticed that some block's expected replicas
> >> >>> >> is
> >> >>> >> 3,
> >> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >> >>> >>
> >> >>> >> Could anyone pls help provide some triages?
> >> >>> >>
> >> >>> >> Thanks in advance!
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >> ---------------------------
> >> >>> >>
> >> >>> >> George Kousiouris, PhD
> >> >>> >> Electrical and Computer Engineer
> >> >>> >> Division of Communications,
> >> >>> >> Electronics and Information Engineering
> >> >>> >> School of Electrical and Computer Engineering
> >> >>> >> Tel: +30 210 772 2546
> >> >>> >> Mobile: +30 6939354121
> >> >>> >> Fax: +30 210 772 2569
> >> >>> >> Email: gkousiou@mail.ntua.gr
> >> >>> >> Site: http://users.ntua.gr/gkousiou/
> >> >>> >>
> >> >>> >> National Technical University of Athens
> >> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Harsh J
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Yes, you are correct: using fsck tool I found some files in my cluster
expected more replications than the value defined in dfs.replication. If I
set the expected replication of this files to a proper number, the
decommissioning process will go smoothly and the datanode could be
decommissioned finally.

However, many users do not mention this and might confuse the situation
that cluster always stays on decommissioning phase, so I think we can make
some improvements to allow user more easily to do precheck before
decommisioning datanode here, and help them find out all filies which might
lack of replications after decommissioning a datanode. For example, we can
told user that the expected replication of file1 and file26 are 6, but
after decommissioning a datanode the datanodes of the cluster will be 5 and
won't satisify file1 and file26 any more. In this way, user can decide
whether to continue the decommissioning work or to reduce the expected
replications of those files. For the way to implementation, I think we
could add a decommission-precheck script or a parameter to fsck tool.

Any comments?


2013/8/1 Harsh J <ha...@cloudera.com>

> As I said before, it is a per-file property and the config can be
> bypassed by clients that do not read the configs, place a manual API
> override, etc..
>
> If you want to really define a hard maximum and catch such clients,
> try setting dfs.replication.max to 2 at your NameNode.
>
> On Thu, Aug 1, 2013 at 8:07 AM, sam liu <sa...@gmail.com> wrote:
> > But, please mention that the value of 'dfs.replication' of the cluster is
> > always 2, even when the datanode number is 3. And I am pretty sure I did
> not
> > manually create any files with rep=3. So, why were some files of hdfs
> > created with repl=3, but not repl=2?
> >
> >
> > 2013/8/1 Harsh J <ha...@cloudera.com>
> >>
> >> The step (a) points to your problem and solution both. You have files
> >> being created with repl=3 on a 2 DN cluster which will prevent
> >> decommission. This is not a bug.
> >>
> >> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com>
> wrote:
> >> > I opened a jira for tracking this issue:
> >> > https://issues.apache.org/jira/browse/HDFS-5046
> >> >
> >> >
> >> > 2013/7/2 sam liu <sa...@gmail.com>
> >> >>
> >> >> Yes, the default replication factor is 3. However, in my case, it's
> >> >> strange: during decommission hangs, I found some block's expected
> >> >> replicas
> >> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every
> cluster
> >> >> node
> >> >> is always 2 from the beginning of cluster setup. Below is my steps:
> >> >>
> >> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> And,
> >> >> in
> >> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> change
> >> >> the
> >> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> >> note: step 2 passed
> >> >> 3. Decommission dn3 from the cluster
> >> >> Expected result: dn3 could be decommissioned successfully
> >> >> Actual result:
> >> >> a). decommission progress hangs and the status always be 'Waiting
> >> >> DataNode
> >> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2
> /',
> >> >> the
> >> >> decommission continues and will be completed finally.
> >> >> b). However, if the initial cluster includes >= 3 datanodes, this
> issue
> >> >> won't be encountered when add/remove another datanode. For example,
> if
> >> >> I
> >> >> setup a cluster with 3 datanodes, and then I can successfully add the
> >> >> 4th
> >> >> datanode into it, and then also can successfully remove the 4th
> >> >> datanode
> >> >> from the cluster.
> >> >>
> >> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this.
> Any
> >> >> comments?
> >> >>
> >> >> Thanks!
> >> >>
> >> >>
> >> >> 2013/6/21 Harsh J <ha...@cloudera.com>
> >> >>>
> >> >>> The dfs.replication is a per-file parameter. If you have a client
> that
> >> >>> does not use the supplied configs, then its default replication is 3
> >> >>> and all files it will create (as part of the app or via a job
> config)
> >> >>> will be with replication factor 3.
> >> >>>
> >> >>> You can do an -lsr to find all files and filter which ones have been
> >> >>> created with a factor of 3 (versus expected config of 2).
> >> >>>
> >> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
> >> >>> wrote:
> >> >>> > Hi George,
> >> >>> >
> >> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
> >> >>> > But
> >> >>> > still
> >> >>> > encounter this issue.
> >> >>> >
> >> >>> > Thanks!
> >> >>> >
> >> >>> >
> >> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >> >>> >>
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> I think i have faced this before, the problem is that you have
> the
> >> >>> >> rep
> >> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve
> >> >>> >> the
> >> >>> >> factor
> >> >>> >> (replicas are not created on the same node). If you set the
> >> >>> >> replication
> >> >>> >> factor=2 i think you will not have this issue. So in general you
> >> >>> >> must
> >> >>> >> make
> >> >>> >> sure that the rep factor is <= to the available datanodes.
> >> >>> >>
> >> >>> >> BR,
> >> >>> >> George
> >> >>> >>
> >> >>> >>
> >> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> I encountered an issue which hangs the decommission operatoin.
> Its
> >> >>> >> steps:
> >> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> >> >>> >> And,
> >> >>> >> in
> >> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> >> >>> >> change
> >> >>> >> the
> >> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> >>> >> note: step 2 passed
> >> >>> >> 3. Decommission dn3 from the cluster
> >> >>> >>
> >> >>> >> Expected result: dn3 could be decommissioned successfully
> >> >>> >>
> >> >>> >> Actual result: decommission progress hangs and the status always
> be
> >> >>> >> 'Waiting DataNode status: Decommissioned'
> >> >>> >>
> >> >>> >> However, if the initial cluster includes >= 3 datanodes, this
> issue
> >> >>> >> won't
> >> >>> >> be encountered when add/remove another datanode.
> >> >>> >>
> >> >>> >> Also, after step 2, I noticed that some block's expected replicas
> >> >>> >> is
> >> >>> >> 3,
> >> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >> >>> >>
> >> >>> >> Could anyone pls help provide some triages?
> >> >>> >>
> >> >>> >> Thanks in advance!
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >> ---------------------------
> >> >>> >>
> >> >>> >> George Kousiouris, PhD
> >> >>> >> Electrical and Computer Engineer
> >> >>> >> Division of Communications,
> >> >>> >> Electronics and Information Engineering
> >> >>> >> School of Electrical and Computer Engineering
> >> >>> >> Tel: +30 210 772 2546
> >> >>> >> Mobile: +30 6939354121
> >> >>> >> Fax: +30 210 772 2569
> >> >>> >> Email: gkousiou@mail.ntua.gr
> >> >>> >> Site: http://users.ntua.gr/gkousiou/
> >> >>> >>
> >> >>> >> National Technical University of Athens
> >> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Harsh J
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Yes, you are correct: using fsck tool I found some files in my cluster
expected more replications than the value defined in dfs.replication. If I
set the expected replication of this files to a proper number, the
decommissioning process will go smoothly and the datanode could be
decommissioned finally.

However, many users do not mention this and might confuse the situation
that cluster always stays on decommissioning phase, so I think we can make
some improvements to allow user more easily to do precheck before
decommisioning datanode here, and help them find out all filies which might
lack of replications after decommissioning a datanode. For example, we can
told user that the expected replication of file1 and file26 are 6, but
after decommissioning a datanode the datanodes of the cluster will be 5 and
won't satisify file1 and file26 any more. In this way, user can decide
whether to continue the decommissioning work or to reduce the expected
replications of those files. For the way to implementation, I think we
could add a decommission-precheck script or a parameter to fsck tool.

Any comments?


2013/8/1 Harsh J <ha...@cloudera.com>

> As I said before, it is a per-file property and the config can be
> bypassed by clients that do not read the configs, place a manual API
> override, etc..
>
> If you want to really define a hard maximum and catch such clients,
> try setting dfs.replication.max to 2 at your NameNode.
>
> On Thu, Aug 1, 2013 at 8:07 AM, sam liu <sa...@gmail.com> wrote:
> > But, please mention that the value of 'dfs.replication' of the cluster is
> > always 2, even when the datanode number is 3. And I am pretty sure I did
> not
> > manually create any files with rep=3. So, why were some files of hdfs
> > created with repl=3, but not repl=2?
> >
> >
> > 2013/8/1 Harsh J <ha...@cloudera.com>
> >>
> >> The step (a) points to your problem and solution both. You have files
> >> being created with repl=3 on a 2 DN cluster which will prevent
> >> decommission. This is not a bug.
> >>
> >> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com>
> wrote:
> >> > I opened a jira for tracking this issue:
> >> > https://issues.apache.org/jira/browse/HDFS-5046
> >> >
> >> >
> >> > 2013/7/2 sam liu <sa...@gmail.com>
> >> >>
> >> >> Yes, the default replication factor is 3. However, in my case, it's
> >> >> strange: during decommission hangs, I found some block's expected
> >> >> replicas
> >> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every
> cluster
> >> >> node
> >> >> is always 2 from the beginning of cluster setup. Below is my steps:
> >> >>
> >> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> And,
> >> >> in
> >> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> change
> >> >> the
> >> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> >> note: step 2 passed
> >> >> 3. Decommission dn3 from the cluster
> >> >> Expected result: dn3 could be decommissioned successfully
> >> >> Actual result:
> >> >> a). decommission progress hangs and the status always be 'Waiting
> >> >> DataNode
> >> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2
> /',
> >> >> the
> >> >> decommission continues and will be completed finally.
> >> >> b). However, if the initial cluster includes >= 3 datanodes, this
> issue
> >> >> won't be encountered when add/remove another datanode. For example,
> if
> >> >> I
> >> >> setup a cluster with 3 datanodes, and then I can successfully add the
> >> >> 4th
> >> >> datanode into it, and then also can successfully remove the 4th
> >> >> datanode
> >> >> from the cluster.
> >> >>
> >> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this.
> Any
> >> >> comments?
> >> >>
> >> >> Thanks!
> >> >>
> >> >>
> >> >> 2013/6/21 Harsh J <ha...@cloudera.com>
> >> >>>
> >> >>> The dfs.replication is a per-file parameter. If you have a client
> that
> >> >>> does not use the supplied configs, then its default replication is 3
> >> >>> and all files it will create (as part of the app or via a job
> config)
> >> >>> will be with replication factor 3.
> >> >>>
> >> >>> You can do an -lsr to find all files and filter which ones have been
> >> >>> created with a factor of 3 (versus expected config of 2).
> >> >>>
> >> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
> >> >>> wrote:
> >> >>> > Hi George,
> >> >>> >
> >> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
> >> >>> > But
> >> >>> > still
> >> >>> > encounter this issue.
> >> >>> >
> >> >>> > Thanks!
> >> >>> >
> >> >>> >
> >> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >> >>> >>
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> I think i have faced this before, the problem is that you have
> the
> >> >>> >> rep
> >> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve
> >> >>> >> the
> >> >>> >> factor
> >> >>> >> (replicas are not created on the same node). If you set the
> >> >>> >> replication
> >> >>> >> factor=2 i think you will not have this issue. So in general you
> >> >>> >> must
> >> >>> >> make
> >> >>> >> sure that the rep factor is <= to the available datanodes.
> >> >>> >>
> >> >>> >> BR,
> >> >>> >> George
> >> >>> >>
> >> >>> >>
> >> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> I encountered an issue which hangs the decommission operatoin.
> Its
> >> >>> >> steps:
> >> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> >> >>> >> And,
> >> >>> >> in
> >> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> >> >>> >> change
> >> >>> >> the
> >> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> >>> >> note: step 2 passed
> >> >>> >> 3. Decommission dn3 from the cluster
> >> >>> >>
> >> >>> >> Expected result: dn3 could be decommissioned successfully
> >> >>> >>
> >> >>> >> Actual result: decommission progress hangs and the status always
> be
> >> >>> >> 'Waiting DataNode status: Decommissioned'
> >> >>> >>
> >> >>> >> However, if the initial cluster includes >= 3 datanodes, this
> issue
> >> >>> >> won't
> >> >>> >> be encountered when add/remove another datanode.
> >> >>> >>
> >> >>> >> Also, after step 2, I noticed that some block's expected replicas
> >> >>> >> is
> >> >>> >> 3,
> >> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >> >>> >>
> >> >>> >> Could anyone pls help provide some triages?
> >> >>> >>
> >> >>> >> Thanks in advance!
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >> ---------------------------
> >> >>> >>
> >> >>> >> George Kousiouris, PhD
> >> >>> >> Electrical and Computer Engineer
> >> >>> >> Division of Communications,
> >> >>> >> Electronics and Information Engineering
> >> >>> >> School of Electrical and Computer Engineering
> >> >>> >> Tel: +30 210 772 2546
> >> >>> >> Mobile: +30 6939354121
> >> >>> >> Fax: +30 210 772 2569
> >> >>> >> Email: gkousiou@mail.ntua.gr
> >> >>> >> Site: http://users.ntua.gr/gkousiou/
> >> >>> >>
> >> >>> >> National Technical University of Athens
> >> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Harsh J
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

As I said before, it is a per-file property and the config can be
bypassed by clients that do not read the configs, place a manual API
override, etc..

If you want to really define a hard maximum and catch such clients,
try setting dfs.replication.max to 2 at your NameNode.

On Thu, Aug 1, 2013 at 8:07 AM, sam liu <sa...@gmail.com> wrote:
> But, please mention that the value of 'dfs.replication' of the cluster is
> always 2, even when the datanode number is 3. And I am pretty sure I did not
> manually create any files with rep=3. So, why were some files of hdfs
> created with repl=3, but not repl=2?
>
>
> 2013/8/1 Harsh J <ha...@cloudera.com>
>>
>> The step (a) points to your problem and solution both. You have files
>> being created with repl=3 on a 2 DN cluster which will prevent
>> decommission. This is not a bug.
>>
>> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
>> > I opened a jira for tracking this issue:
>> > https://issues.apache.org/jira/browse/HDFS-5046
>> >
>> >
>> > 2013/7/2 sam liu <sa...@gmail.com>
>> >>
>> >> Yes, the default replication factor is 3. However, in my case, it's
>> >> strange: during decommission hangs, I found some block's expected
>> >> replicas
>> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
>> >> node
>> >> is always 2 from the beginning of cluster setup. Below is my steps:
>> >>
>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>> >> in
>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>> >> the
>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >> note: step 2 passed
>> >> 3. Decommission dn3 from the cluster
>> >> Expected result: dn3 could be decommissioned successfully
>> >> Actual result:
>> >> a). decommission progress hangs and the status always be 'Waiting
>> >> DataNode
>> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /',
>> >> the
>> >> decommission continues and will be completed finally.
>> >> b). However, if the initial cluster includes >= 3 datanodes, this issue
>> >> won't be encountered when add/remove another datanode. For example, if
>> >> I
>> >> setup a cluster with 3 datanodes, and then I can successfully add the
>> >> 4th
>> >> datanode into it, and then also can successfully remove the 4th
>> >> datanode
>> >> from the cluster.
>> >>
>> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
>> >> comments?
>> >>
>> >> Thanks!
>> >>
>> >>
>> >> 2013/6/21 Harsh J <ha...@cloudera.com>
>> >>>
>> >>> The dfs.replication is a per-file parameter. If you have a client that
>> >>> does not use the supplied configs, then its default replication is 3
>> >>> and all files it will create (as part of the app or via a job config)
>> >>> will be with replication factor 3.
>> >>>
>> >>> You can do an -lsr to find all files and filter which ones have been
>> >>> created with a factor of 3 (versus expected config of 2).
>> >>>
>> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
>> >>> wrote:
>> >>> > Hi George,
>> >>> >
>> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
>> >>> > But
>> >>> > still
>> >>> > encounter this issue.
>> >>> >
>> >>> > Thanks!
>> >>> >
>> >>> >
>> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>> >>> >>
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I think i have faced this before, the problem is that you have the
>> >>> >> rep
>> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve
>> >>> >> the
>> >>> >> factor
>> >>> >> (replicas are not created on the same node). If you set the
>> >>> >> replication
>> >>> >> factor=2 i think you will not have this issue. So in general you
>> >>> >> must
>> >>> >> make
>> >>> >> sure that the rep factor is <= to the available datanodes.
>> >>> >>
>> >>> >> BR,
>> >>> >> George
>> >>> >>
>> >>> >>
>> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I encountered an issue which hangs the decommission operatoin. Its
>> >>> >> steps:
>> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
>> >>> >> And,
>> >>> >> in
>> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
>> >>> >> change
>> >>> >> the
>> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >>> >> note: step 2 passed
>> >>> >> 3. Decommission dn3 from the cluster
>> >>> >>
>> >>> >> Expected result: dn3 could be decommissioned successfully
>> >>> >>
>> >>> >> Actual result: decommission progress hangs and the status always be
>> >>> >> 'Waiting DataNode status: Decommissioned'
>> >>> >>
>> >>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>> >>> >> won't
>> >>> >> be encountered when add/remove another datanode.
>> >>> >>
>> >>> >> Also, after step 2, I noticed that some block's expected replicas
>> >>> >> is
>> >>> >> 3,
>> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>> >>> >>
>> >>> >> Could anyone pls help provide some triages?
>> >>> >>
>> >>> >> Thanks in advance!
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> ---------------------------
>> >>> >>
>> >>> >> George Kousiouris, PhD
>> >>> >> Electrical and Computer Engineer
>> >>> >> Division of Communications,
>> >>> >> Electronics and Information Engineering
>> >>> >> School of Electrical and Computer Engineering
>> >>> >> Tel: +30 210 772 2546
>> >>> >> Mobile: +30 6939354121
>> >>> >> Fax: +30 210 772 2569
>> >>> >> Email: gkousiou@mail.ntua.gr
>> >>> >> Site: http://users.ntua.gr/gkousiou/
>> >>> >>
>> >>> >> National Technical University of Athens
>> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Harsh J
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

As I said before, it is a per-file property and the config can be
bypassed by clients that do not read the configs, place a manual API
override, etc..

If you want to really define a hard maximum and catch such clients,
try setting dfs.replication.max to 2 at your NameNode.

On Thu, Aug 1, 2013 at 8:07 AM, sam liu <sa...@gmail.com> wrote:
> But, please mention that the value of 'dfs.replication' of the cluster is
> always 2, even when the datanode number is 3. And I am pretty sure I did not
> manually create any files with rep=3. So, why were some files of hdfs
> created with repl=3, but not repl=2?
>
>
> 2013/8/1 Harsh J <ha...@cloudera.com>
>>
>> The step (a) points to your problem and solution both. You have files
>> being created with repl=3 on a 2 DN cluster which will prevent
>> decommission. This is not a bug.
>>
>> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
>> > I opened a jira for tracking this issue:
>> > https://issues.apache.org/jira/browse/HDFS-5046
>> >
>> >
>> > 2013/7/2 sam liu <sa...@gmail.com>
>> >>
>> >> Yes, the default replication factor is 3. However, in my case, it's
>> >> strange: during decommission hangs, I found some block's expected
>> >> replicas
>> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
>> >> node
>> >> is always 2 from the beginning of cluster setup. Below is my steps:
>> >>
>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>> >> in
>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>> >> the
>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >> note: step 2 passed
>> >> 3. Decommission dn3 from the cluster
>> >> Expected result: dn3 could be decommissioned successfully
>> >> Actual result:
>> >> a). decommission progress hangs and the status always be 'Waiting
>> >> DataNode
>> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /',
>> >> the
>> >> decommission continues and will be completed finally.
>> >> b). However, if the initial cluster includes >= 3 datanodes, this issue
>> >> won't be encountered when add/remove another datanode. For example, if
>> >> I
>> >> setup a cluster with 3 datanodes, and then I can successfully add the
>> >> 4th
>> >> datanode into it, and then also can successfully remove the 4th
>> >> datanode
>> >> from the cluster.
>> >>
>> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
>> >> comments?
>> >>
>> >> Thanks!
>> >>
>> >>
>> >> 2013/6/21 Harsh J <ha...@cloudera.com>
>> >>>
>> >>> The dfs.replication is a per-file parameter. If you have a client that
>> >>> does not use the supplied configs, then its default replication is 3
>> >>> and all files it will create (as part of the app or via a job config)
>> >>> will be with replication factor 3.
>> >>>
>> >>> You can do an -lsr to find all files and filter which ones have been
>> >>> created with a factor of 3 (versus expected config of 2).
>> >>>
>> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
>> >>> wrote:
>> >>> > Hi George,
>> >>> >
>> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
>> >>> > But
>> >>> > still
>> >>> > encounter this issue.
>> >>> >
>> >>> > Thanks!
>> >>> >
>> >>> >
>> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>> >>> >>
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I think i have faced this before, the problem is that you have the
>> >>> >> rep
>> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve
>> >>> >> the
>> >>> >> factor
>> >>> >> (replicas are not created on the same node). If you set the
>> >>> >> replication
>> >>> >> factor=2 i think you will not have this issue. So in general you
>> >>> >> must
>> >>> >> make
>> >>> >> sure that the rep factor is <= to the available datanodes.
>> >>> >>
>> >>> >> BR,
>> >>> >> George
>> >>> >>
>> >>> >>
>> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I encountered an issue which hangs the decommission operatoin. Its
>> >>> >> steps:
>> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
>> >>> >> And,
>> >>> >> in
>> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
>> >>> >> change
>> >>> >> the
>> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >>> >> note: step 2 passed
>> >>> >> 3. Decommission dn3 from the cluster
>> >>> >>
>> >>> >> Expected result: dn3 could be decommissioned successfully
>> >>> >>
>> >>> >> Actual result: decommission progress hangs and the status always be
>> >>> >> 'Waiting DataNode status: Decommissioned'
>> >>> >>
>> >>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>> >>> >> won't
>> >>> >> be encountered when add/remove another datanode.
>> >>> >>
>> >>> >> Also, after step 2, I noticed that some block's expected replicas
>> >>> >> is
>> >>> >> 3,
>> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>> >>> >>
>> >>> >> Could anyone pls help provide some triages?
>> >>> >>
>> >>> >> Thanks in advance!
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> ---------------------------
>> >>> >>
>> >>> >> George Kousiouris, PhD
>> >>> >> Electrical and Computer Engineer
>> >>> >> Division of Communications,
>> >>> >> Electronics and Information Engineering
>> >>> >> School of Electrical and Computer Engineering
>> >>> >> Tel: +30 210 772 2546
>> >>> >> Mobile: +30 6939354121
>> >>> >> Fax: +30 210 772 2569
>> >>> >> Email: gkousiou@mail.ntua.gr
>> >>> >> Site: http://users.ntua.gr/gkousiou/
>> >>> >>
>> >>> >> National Technical University of Athens
>> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Harsh J
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

As I said before, it is a per-file property and the config can be
bypassed by clients that do not read the configs, place a manual API
override, etc..

If you want to really define a hard maximum and catch such clients,
try setting dfs.replication.max to 2 at your NameNode.

On Thu, Aug 1, 2013 at 8:07 AM, sam liu <sa...@gmail.com> wrote:
> But, please mention that the value of 'dfs.replication' of the cluster is
> always 2, even when the datanode number is 3. And I am pretty sure I did not
> manually create any files with rep=3. So, why were some files of hdfs
> created with repl=3, but not repl=2?
>
>
> 2013/8/1 Harsh J <ha...@cloudera.com>
>>
>> The step (a) points to your problem and solution both. You have files
>> being created with repl=3 on a 2 DN cluster which will prevent
>> decommission. This is not a bug.
>>
>> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
>> > I opened a jira for tracking this issue:
>> > https://issues.apache.org/jira/browse/HDFS-5046
>> >
>> >
>> > 2013/7/2 sam liu <sa...@gmail.com>
>> >>
>> >> Yes, the default replication factor is 3. However, in my case, it's
>> >> strange: during decommission hangs, I found some block's expected
>> >> replicas
>> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
>> >> node
>> >> is always 2 from the beginning of cluster setup. Below is my steps:
>> >>
>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>> >> in
>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>> >> the
>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >> note: step 2 passed
>> >> 3. Decommission dn3 from the cluster
>> >> Expected result: dn3 could be decommissioned successfully
>> >> Actual result:
>> >> a). decommission progress hangs and the status always be 'Waiting
>> >> DataNode
>> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /',
>> >> the
>> >> decommission continues and will be completed finally.
>> >> b). However, if the initial cluster includes >= 3 datanodes, this issue
>> >> won't be encountered when add/remove another datanode. For example, if
>> >> I
>> >> setup a cluster with 3 datanodes, and then I can successfully add the
>> >> 4th
>> >> datanode into it, and then also can successfully remove the 4th
>> >> datanode
>> >> from the cluster.
>> >>
>> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
>> >> comments?
>> >>
>> >> Thanks!
>> >>
>> >>
>> >> 2013/6/21 Harsh J <ha...@cloudera.com>
>> >>>
>> >>> The dfs.replication is a per-file parameter. If you have a client that
>> >>> does not use the supplied configs, then its default replication is 3
>> >>> and all files it will create (as part of the app or via a job config)
>> >>> will be with replication factor 3.
>> >>>
>> >>> You can do an -lsr to find all files and filter which ones have been
>> >>> created with a factor of 3 (versus expected config of 2).
>> >>>
>> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
>> >>> wrote:
>> >>> > Hi George,
>> >>> >
>> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
>> >>> > But
>> >>> > still
>> >>> > encounter this issue.
>> >>> >
>> >>> > Thanks!
>> >>> >
>> >>> >
>> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>> >>> >>
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I think i have faced this before, the problem is that you have the
>> >>> >> rep
>> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve
>> >>> >> the
>> >>> >> factor
>> >>> >> (replicas are not created on the same node). If you set the
>> >>> >> replication
>> >>> >> factor=2 i think you will not have this issue. So in general you
>> >>> >> must
>> >>> >> make
>> >>> >> sure that the rep factor is <= to the available datanodes.
>> >>> >>
>> >>> >> BR,
>> >>> >> George
>> >>> >>
>> >>> >>
>> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I encountered an issue which hangs the decommission operatoin. Its
>> >>> >> steps:
>> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
>> >>> >> And,
>> >>> >> in
>> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
>> >>> >> change
>> >>> >> the
>> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >>> >> note: step 2 passed
>> >>> >> 3. Decommission dn3 from the cluster
>> >>> >>
>> >>> >> Expected result: dn3 could be decommissioned successfully
>> >>> >>
>> >>> >> Actual result: decommission progress hangs and the status always be
>> >>> >> 'Waiting DataNode status: Decommissioned'
>> >>> >>
>> >>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>> >>> >> won't
>> >>> >> be encountered when add/remove another datanode.
>> >>> >>
>> >>> >> Also, after step 2, I noticed that some block's expected replicas
>> >>> >> is
>> >>> >> 3,
>> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>> >>> >>
>> >>> >> Could anyone pls help provide some triages?
>> >>> >>
>> >>> >> Thanks in advance!
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> ---------------------------
>> >>> >>
>> >>> >> George Kousiouris, PhD
>> >>> >> Electrical and Computer Engineer
>> >>> >> Division of Communications,
>> >>> >> Electronics and Information Engineering
>> >>> >> School of Electrical and Computer Engineering
>> >>> >> Tel: +30 210 772 2546
>> >>> >> Mobile: +30 6939354121
>> >>> >> Fax: +30 210 772 2569
>> >>> >> Email: gkousiou@mail.ntua.gr
>> >>> >> Site: http://users.ntua.gr/gkousiou/
>> >>> >>
>> >>> >> National Technical University of Athens
>> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Harsh J
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

As I said before, it is a per-file property and the config can be
bypassed by clients that do not read the configs, place a manual API
override, etc..

If you want to really define a hard maximum and catch such clients,
try setting dfs.replication.max to 2 at your NameNode.

On Thu, Aug 1, 2013 at 8:07 AM, sam liu <sa...@gmail.com> wrote:
> But, please mention that the value of 'dfs.replication' of the cluster is
> always 2, even when the datanode number is 3. And I am pretty sure I did not
> manually create any files with rep=3. So, why were some files of hdfs
> created with repl=3, but not repl=2?
>
>
> 2013/8/1 Harsh J <ha...@cloudera.com>
>>
>> The step (a) points to your problem and solution both. You have files
>> being created with repl=3 on a 2 DN cluster which will prevent
>> decommission. This is not a bug.
>>
>> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
>> > I opened a jira for tracking this issue:
>> > https://issues.apache.org/jira/browse/HDFS-5046
>> >
>> >
>> > 2013/7/2 sam liu <sa...@gmail.com>
>> >>
>> >> Yes, the default replication factor is 3. However, in my case, it's
>> >> strange: during decommission hangs, I found some block's expected
>> >> replicas
>> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
>> >> node
>> >> is always 2 from the beginning of cluster setup. Below is my steps:
>> >>
>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>> >> in
>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>> >> the
>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >> note: step 2 passed
>> >> 3. Decommission dn3 from the cluster
>> >> Expected result: dn3 could be decommissioned successfully
>> >> Actual result:
>> >> a). decommission progress hangs and the status always be 'Waiting
>> >> DataNode
>> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /',
>> >> the
>> >> decommission continues and will be completed finally.
>> >> b). However, if the initial cluster includes >= 3 datanodes, this issue
>> >> won't be encountered when add/remove another datanode. For example, if
>> >> I
>> >> setup a cluster with 3 datanodes, and then I can successfully add the
>> >> 4th
>> >> datanode into it, and then also can successfully remove the 4th
>> >> datanode
>> >> from the cluster.
>> >>
>> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
>> >> comments?
>> >>
>> >> Thanks!
>> >>
>> >>
>> >> 2013/6/21 Harsh J <ha...@cloudera.com>
>> >>>
>> >>> The dfs.replication is a per-file parameter. If you have a client that
>> >>> does not use the supplied configs, then its default replication is 3
>> >>> and all files it will create (as part of the app or via a job config)
>> >>> will be with replication factor 3.
>> >>>
>> >>> You can do an -lsr to find all files and filter which ones have been
>> >>> created with a factor of 3 (versus expected config of 2).
>> >>>
>> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
>> >>> wrote:
>> >>> > Hi George,
>> >>> >
>> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
>> >>> > But
>> >>> > still
>> >>> > encounter this issue.
>> >>> >
>> >>> > Thanks!
>> >>> >
>> >>> >
>> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>> >>> >>
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I think i have faced this before, the problem is that you have the
>> >>> >> rep
>> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve
>> >>> >> the
>> >>> >> factor
>> >>> >> (replicas are not created on the same node). If you set the
>> >>> >> replication
>> >>> >> factor=2 i think you will not have this issue. So in general you
>> >>> >> must
>> >>> >> make
>> >>> >> sure that the rep factor is <= to the available datanodes.
>> >>> >>
>> >>> >> BR,
>> >>> >> George
>> >>> >>
>> >>> >>
>> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I encountered an issue which hangs the decommission operatoin. Its
>> >>> >> steps:
>> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
>> >>> >> And,
>> >>> >> in
>> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
>> >>> >> change
>> >>> >> the
>> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >>> >> note: step 2 passed
>> >>> >> 3. Decommission dn3 from the cluster
>> >>> >>
>> >>> >> Expected result: dn3 could be decommissioned successfully
>> >>> >>
>> >>> >> Actual result: decommission progress hangs and the status always be
>> >>> >> 'Waiting DataNode status: Decommissioned'
>> >>> >>
>> >>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>> >>> >> won't
>> >>> >> be encountered when add/remove another datanode.
>> >>> >>
>> >>> >> Also, after step 2, I noticed that some block's expected replicas
>> >>> >> is
>> >>> >> 3,
>> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>> >>> >>
>> >>> >> Could anyone pls help provide some triages?
>> >>> >>
>> >>> >> Thanks in advance!
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> ---------------------------
>> >>> >>
>> >>> >> George Kousiouris, PhD
>> >>> >> Electrical and Computer Engineer
>> >>> >> Division of Communications,
>> >>> >> Electronics and Information Engineering
>> >>> >> School of Electrical and Computer Engineering
>> >>> >> Tel: +30 210 772 2546
>> >>> >> Mobile: +30 6939354121
>> >>> >> Fax: +30 210 772 2569
>> >>> >> Email: gkousiou@mail.ntua.gr
>> >>> >> Site: http://users.ntua.gr/gkousiou/
>> >>> >>
>> >>> >> National Technical University of Athens
>> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Harsh J
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

But, please mention that the value of 'dfs.replication' of the cluster is
always 2, even when the datanode number is 3. And I am pretty sure I did
not manually create any files with rep=3. So, why were some files of hdfs
created with repl=3, but not repl=2?


2013/8/1 Harsh J <ha...@cloudera.com>

> The step (a) points to your problem and solution both. You have files
> being created with repl=3 on a 2 DN cluster which will prevent
> decommission. This is not a bug.
>
> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
> > I opened a jira for tracking this issue:
> > https://issues.apache.org/jira/browse/HDFS-5046
> >
> >
> > 2013/7/2 sam liu <sa...@gmail.com>
> >>
> >> Yes, the default replication factor is 3. However, in my case, it's
> >> strange: during decommission hangs, I found some block's expected
> replicas
> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node
> >> is always 2 from the beginning of cluster setup. Below is my steps:
> >>
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >> Expected result: dn3 could be decommissioned successfully
> >> Actual result:
> >> a). decommission progress hangs and the status always be 'Waiting
> DataNode
> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /',
> the
> >> decommission continues and will be completed finally.
> >> b). However, if the initial cluster includes >= 3 datanodes, this issue
> >> won't be encountered when add/remove another datanode. For example, if I
> >> setup a cluster with 3 datanodes, and then I can successfully add the
> 4th
> >> datanode into it, and then also can successfully remove the 4th datanode
> >> from the cluster.
> >>
> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> >> comments?
> >>
> >> Thanks!
> >>
> >>
> >> 2013/6/21 Harsh J <ha...@cloudera.com>
> >>>
> >>> The dfs.replication is a per-file parameter. If you have a client that
> >>> does not use the supplied configs, then its default replication is 3
> >>> and all files it will create (as part of the app or via a job config)
> >>> will be with replication factor 3.
> >>>
> >>> You can do an -lsr to find all files and filter which ones have been
> >>> created with a factor of 3 (versus expected config of 2).
> >>>
> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
> wrote:
> >>> > Hi George,
> >>> >
> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
> But
> >>> > still
> >>> > encounter this issue.
> >>> >
> >>> > Thanks!
> >>> >
> >>> >
> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >>> >>
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I think i have faced this before, the problem is that you have the
> rep
> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> >>> >> factor
> >>> >> (replicas are not created on the same node). If you set the
> >>> >> replication
> >>> >> factor=2 i think you will not have this issue. So in general you
> must
> >>> >> make
> >>> >> sure that the rep factor is <= to the available datanodes.
> >>> >>
> >>> >> BR,
> >>> >> George
> >>> >>
> >>> >>
> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I encountered an issue which hangs the decommission operatoin. Its
> >>> >> steps:
> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> And,
> >>> >> in
> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> change
> >>> >> the
> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >>> >> note: step 2 passed
> >>> >> 3. Decommission dn3 from the cluster
> >>> >>
> >>> >> Expected result: dn3 could be decommissioned successfully
> >>> >>
> >>> >> Actual result: decommission progress hangs and the status always be
> >>> >> 'Waiting DataNode status: Decommissioned'
> >>> >>
> >>> >> However, if the initial cluster includes >= 3 datanodes, this issue
> >>> >> won't
> >>> >> be encountered when add/remove another datanode.
> >>> >>
> >>> >> Also, after step 2, I noticed that some block's expected replicas is
> >>> >> 3,
> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >>> >>
> >>> >> Could anyone pls help provide some triages?
> >>> >>
> >>> >> Thanks in advance!
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> ---------------------------
> >>> >>
> >>> >> George Kousiouris, PhD
> >>> >> Electrical and Computer Engineer
> >>> >> Division of Communications,
> >>> >> Electronics and Information Engineering
> >>> >> School of Electrical and Computer Engineering
> >>> >> Tel: +30 210 772 2546
> >>> >> Mobile: +30 6939354121
> >>> >> Fax: +30 210 772 2569
> >>> >> Email: gkousiou@mail.ntua.gr
> >>> >> Site: http://users.ntua.gr/gkousiou/
> >>> >>
> >>> >> National Technical University of Athens
> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

But, please mention that the value of 'dfs.replication' of the cluster is
always 2, even when the datanode number is 3. And I am pretty sure I did
not manually create any files with rep=3. So, why were some files of hdfs
created with repl=3, but not repl=2?


2013/8/1 Harsh J <ha...@cloudera.com>

> The step (a) points to your problem and solution both. You have files
> being created with repl=3 on a 2 DN cluster which will prevent
> decommission. This is not a bug.
>
> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
> > I opened a jira for tracking this issue:
> > https://issues.apache.org/jira/browse/HDFS-5046
> >
> >
> > 2013/7/2 sam liu <sa...@gmail.com>
> >>
> >> Yes, the default replication factor is 3. However, in my case, it's
> >> strange: during decommission hangs, I found some block's expected
> replicas
> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node
> >> is always 2 from the beginning of cluster setup. Below is my steps:
> >>
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >> Expected result: dn3 could be decommissioned successfully
> >> Actual result:
> >> a). decommission progress hangs and the status always be 'Waiting
> DataNode
> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /',
> the
> >> decommission continues and will be completed finally.
> >> b). However, if the initial cluster includes >= 3 datanodes, this issue
> >> won't be encountered when add/remove another datanode. For example, if I
> >> setup a cluster with 3 datanodes, and then I can successfully add the
> 4th
> >> datanode into it, and then also can successfully remove the 4th datanode
> >> from the cluster.
> >>
> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> >> comments?
> >>
> >> Thanks!
> >>
> >>
> >> 2013/6/21 Harsh J <ha...@cloudera.com>
> >>>
> >>> The dfs.replication is a per-file parameter. If you have a client that
> >>> does not use the supplied configs, then its default replication is 3
> >>> and all files it will create (as part of the app or via a job config)
> >>> will be with replication factor 3.
> >>>
> >>> You can do an -lsr to find all files and filter which ones have been
> >>> created with a factor of 3 (versus expected config of 2).
> >>>
> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
> wrote:
> >>> > Hi George,
> >>> >
> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
> But
> >>> > still
> >>> > encounter this issue.
> >>> >
> >>> > Thanks!
> >>> >
> >>> >
> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >>> >>
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I think i have faced this before, the problem is that you have the
> rep
> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> >>> >> factor
> >>> >> (replicas are not created on the same node). If you set the
> >>> >> replication
> >>> >> factor=2 i think you will not have this issue. So in general you
> must
> >>> >> make
> >>> >> sure that the rep factor is <= to the available datanodes.
> >>> >>
> >>> >> BR,
> >>> >> George
> >>> >>
> >>> >>
> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I encountered an issue which hangs the decommission operatoin. Its
> >>> >> steps:
> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> And,
> >>> >> in
> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> change
> >>> >> the
> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >>> >> note: step 2 passed
> >>> >> 3. Decommission dn3 from the cluster
> >>> >>
> >>> >> Expected result: dn3 could be decommissioned successfully
> >>> >>
> >>> >> Actual result: decommission progress hangs and the status always be
> >>> >> 'Waiting DataNode status: Decommissioned'
> >>> >>
> >>> >> However, if the initial cluster includes >= 3 datanodes, this issue
> >>> >> won't
> >>> >> be encountered when add/remove another datanode.
> >>> >>
> >>> >> Also, after step 2, I noticed that some block's expected replicas is
> >>> >> 3,
> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >>> >>
> >>> >> Could anyone pls help provide some triages?
> >>> >>
> >>> >> Thanks in advance!
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> ---------------------------
> >>> >>
> >>> >> George Kousiouris, PhD
> >>> >> Electrical and Computer Engineer
> >>> >> Division of Communications,
> >>> >> Electronics and Information Engineering
> >>> >> School of Electrical and Computer Engineering
> >>> >> Tel: +30 210 772 2546
> >>> >> Mobile: +30 6939354121
> >>> >> Fax: +30 210 772 2569
> >>> >> Email: gkousiou@mail.ntua.gr
> >>> >> Site: http://users.ntua.gr/gkousiou/
> >>> >>
> >>> >> National Technical University of Athens
> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

But, please mention that the value of 'dfs.replication' of the cluster is
always 2, even when the datanode number is 3. And I am pretty sure I did
not manually create any files with rep=3. So, why were some files of hdfs
created with repl=3, but not repl=2?


2013/8/1 Harsh J <ha...@cloudera.com>

> The step (a) points to your problem and solution both. You have files
> being created with repl=3 on a 2 DN cluster which will prevent
> decommission. This is not a bug.
>
> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
> > I opened a jira for tracking this issue:
> > https://issues.apache.org/jira/browse/HDFS-5046
> >
> >
> > 2013/7/2 sam liu <sa...@gmail.com>
> >>
> >> Yes, the default replication factor is 3. However, in my case, it's
> >> strange: during decommission hangs, I found some block's expected
> replicas
> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node
> >> is always 2 from the beginning of cluster setup. Below is my steps:
> >>
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >> Expected result: dn3 could be decommissioned successfully
> >> Actual result:
> >> a). decommission progress hangs and the status always be 'Waiting
> DataNode
> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /',
> the
> >> decommission continues and will be completed finally.
> >> b). However, if the initial cluster includes >= 3 datanodes, this issue
> >> won't be encountered when add/remove another datanode. For example, if I
> >> setup a cluster with 3 datanodes, and then I can successfully add the
> 4th
> >> datanode into it, and then also can successfully remove the 4th datanode
> >> from the cluster.
> >>
> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> >> comments?
> >>
> >> Thanks!
> >>
> >>
> >> 2013/6/21 Harsh J <ha...@cloudera.com>
> >>>
> >>> The dfs.replication is a per-file parameter. If you have a client that
> >>> does not use the supplied configs, then its default replication is 3
> >>> and all files it will create (as part of the app or via a job config)
> >>> will be with replication factor 3.
> >>>
> >>> You can do an -lsr to find all files and filter which ones have been
> >>> created with a factor of 3 (versus expected config of 2).
> >>>
> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
> wrote:
> >>> > Hi George,
> >>> >
> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
> But
> >>> > still
> >>> > encounter this issue.
> >>> >
> >>> > Thanks!
> >>> >
> >>> >
> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >>> >>
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I think i have faced this before, the problem is that you have the
> rep
> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> >>> >> factor
> >>> >> (replicas are not created on the same node). If you set the
> >>> >> replication
> >>> >> factor=2 i think you will not have this issue. So in general you
> must
> >>> >> make
> >>> >> sure that the rep factor is <= to the available datanodes.
> >>> >>
> >>> >> BR,
> >>> >> George
> >>> >>
> >>> >>
> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I encountered an issue which hangs the decommission operatoin. Its
> >>> >> steps:
> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> And,
> >>> >> in
> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> change
> >>> >> the
> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >>> >> note: step 2 passed
> >>> >> 3. Decommission dn3 from the cluster
> >>> >>
> >>> >> Expected result: dn3 could be decommissioned successfully
> >>> >>
> >>> >> Actual result: decommission progress hangs and the status always be
> >>> >> 'Waiting DataNode status: Decommissioned'
> >>> >>
> >>> >> However, if the initial cluster includes >= 3 datanodes, this issue
> >>> >> won't
> >>> >> be encountered when add/remove another datanode.
> >>> >>
> >>> >> Also, after step 2, I noticed that some block's expected replicas is
> >>> >> 3,
> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >>> >>
> >>> >> Could anyone pls help provide some triages?
> >>> >>
> >>> >> Thanks in advance!
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> ---------------------------
> >>> >>
> >>> >> George Kousiouris, PhD
> >>> >> Electrical and Computer Engineer
> >>> >> Division of Communications,
> >>> >> Electronics and Information Engineering
> >>> >> School of Electrical and Computer Engineering
> >>> >> Tel: +30 210 772 2546
> >>> >> Mobile: +30 6939354121
> >>> >> Fax: +30 210 772 2569
> >>> >> Email: gkousiou@mail.ntua.gr
> >>> >> Site: http://users.ntua.gr/gkousiou/
> >>> >>
> >>> >> National Technical University of Athens
> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

But, please mention that the value of 'dfs.replication' of the cluster is
always 2, even when the datanode number is 3. And I am pretty sure I did
not manually create any files with rep=3. So, why were some files of hdfs
created with repl=3, but not repl=2?


2013/8/1 Harsh J <ha...@cloudera.com>

> The step (a) points to your problem and solution both. You have files
> being created with repl=3 on a 2 DN cluster which will prevent
> decommission. This is not a bug.
>
> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
> > I opened a jira for tracking this issue:
> > https://issues.apache.org/jira/browse/HDFS-5046
> >
> >
> > 2013/7/2 sam liu <sa...@gmail.com>
> >>
> >> Yes, the default replication factor is 3. However, in my case, it's
> >> strange: during decommission hangs, I found some block's expected
> replicas
> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node
> >> is always 2 from the beginning of cluster setup. Below is my steps:
> >>
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >> Expected result: dn3 could be decommissioned successfully
> >> Actual result:
> >> a). decommission progress hangs and the status always be 'Waiting
> DataNode
> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /',
> the
> >> decommission continues and will be completed finally.
> >> b). However, if the initial cluster includes >= 3 datanodes, this issue
> >> won't be encountered when add/remove another datanode. For example, if I
> >> setup a cluster with 3 datanodes, and then I can successfully add the
> 4th
> >> datanode into it, and then also can successfully remove the 4th datanode
> >> from the cluster.
> >>
> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> >> comments?
> >>
> >> Thanks!
> >>
> >>
> >> 2013/6/21 Harsh J <ha...@cloudera.com>
> >>>
> >>> The dfs.replication is a per-file parameter. If you have a client that
> >>> does not use the supplied configs, then its default replication is 3
> >>> and all files it will create (as part of the app or via a job config)
> >>> will be with replication factor 3.
> >>>
> >>> You can do an -lsr to find all files and filter which ones have been
> >>> created with a factor of 3 (versus expected config of 2).
> >>>
> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com>
> wrote:
> >>> > Hi George,
> >>> >
> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
> But
> >>> > still
> >>> > encounter this issue.
> >>> >
> >>> > Thanks!
> >>> >
> >>> >
> >>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >>> >>
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I think i have faced this before, the problem is that you have the
> rep
> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> >>> >> factor
> >>> >> (replicas are not created on the same node). If you set the
> >>> >> replication
> >>> >> factor=2 i think you will not have this issue. So in general you
> must
> >>> >> make
> >>> >> sure that the rep factor is <= to the available datanodes.
> >>> >>
> >>> >> BR,
> >>> >> George
> >>> >>
> >>> >>
> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I encountered an issue which hangs the decommission operatoin. Its
> >>> >> steps:
> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> And,
> >>> >> in
> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> change
> >>> >> the
> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >>> >> note: step 2 passed
> >>> >> 3. Decommission dn3 from the cluster
> >>> >>
> >>> >> Expected result: dn3 could be decommissioned successfully
> >>> >>
> >>> >> Actual result: decommission progress hangs and the status always be
> >>> >> 'Waiting DataNode status: Decommissioned'
> >>> >>
> >>> >> However, if the initial cluster includes >= 3 datanodes, this issue
> >>> >> won't
> >>> >> be encountered when add/remove another datanode.
> >>> >>
> >>> >> Also, after step 2, I noticed that some block's expected replicas is
> >>> >> 3,
> >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >>> >>
> >>> >> Could anyone pls help provide some triages?
> >>> >>
> >>> >> Thanks in advance!
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> ---------------------------
> >>> >>
> >>> >> George Kousiouris, PhD
> >>> >> Electrical and Computer Engineer
> >>> >> Division of Communications,
> >>> >> Electronics and Information Engineering
> >>> >> School of Electrical and Computer Engineering
> >>> >> Tel: +30 210 772 2546
> >>> >> Mobile: +30 6939354121
> >>> >> Fax: +30 210 772 2569
> >>> >> Email: gkousiou@mail.ntua.gr
> >>> >> Site: http://users.ntua.gr/gkousiou/
> >>> >>
> >>> >> National Technical University of Athens
> >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

The step (a) points to your problem and solution both. You have files
being created with repl=3 on a 2 DN cluster which will prevent
decommission. This is not a bug.

On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
> I opened a jira for tracking this issue:
> https://issues.apache.org/jira/browse/HDFS-5046
>
>
> 2013/7/2 sam liu <sa...@gmail.com>
>>
>> Yes, the default replication factor is 3. However, in my case, it's
>> strange: during decommission hangs, I found some block's expected replicas
>> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster node
>> is always 2 from the beginning of cluster setup. Below is my steps:
>>
>> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
>> hdfs-site.xml, set the 'dfs.replication' to 2
>> 2. Add node dn3 into the cluster as a new datanode, and did not change the
>> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> note: step 2 passed
>> 3. Decommission dn3 from the cluster
>> Expected result: dn3 could be decommissioned successfully
>> Actual result:
>> a). decommission progress hangs and the status always be 'Waiting DataNode
>> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
>> decommission continues and will be completed finally.
>> b). However, if the initial cluster includes >= 3 datanodes, this issue
>> won't be encountered when add/remove another datanode. For example, if I
>> setup a cluster with 3 datanodes, and then I can successfully add the 4th
>> datanode into it, and then also can successfully remove the 4th datanode
>> from the cluster.
>>
>> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
>> comments?
>>
>> Thanks!
>>
>>
>> 2013/6/21 Harsh J <ha...@cloudera.com>
>>>
>>> The dfs.replication is a per-file parameter. If you have a client that
>>> does not use the supplied configs, then its default replication is 3
>>> and all files it will create (as part of the app or via a job config)
>>> will be with replication factor 3.
>>>
>>> You can do an -lsr to find all files and filter which ones have been
>>> created with a factor of 3 (versus expected config of 2).
>>>
>>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
>>> > Hi George,
>>> >
>>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
>>> > still
>>> > encounter this issue.
>>> >
>>> > Thanks!
>>> >
>>> >
>>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>>> >>
>>> >>
>>> >> Hi,
>>> >>
>>> >> I think i have faced this before, the problem is that you have the rep
>>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
>>> >> factor
>>> >> (replicas are not created on the same node). If you set the
>>> >> replication
>>> >> factor=2 i think you will not have this issue. So in general you must
>>> >> make
>>> >> sure that the rep factor is <= to the available datanodes.
>>> >>
>>> >> BR,
>>> >> George
>>> >>
>>> >>
>>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I encountered an issue which hangs the decommission operatoin. Its
>>> >> steps:
>>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>>> >> in
>>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>>> >> the
>>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>>> >> note: step 2 passed
>>> >> 3. Decommission dn3 from the cluster
>>> >>
>>> >> Expected result: dn3 could be decommissioned successfully
>>> >>
>>> >> Actual result: decommission progress hangs and the status always be
>>> >> 'Waiting DataNode status: Decommissioned'
>>> >>
>>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>>> >> won't
>>> >> be encountered when add/remove another datanode.
>>> >>
>>> >> Also, after step 2, I noticed that some block's expected replicas is
>>> >> 3,
>>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>>> >>
>>> >> Could anyone pls help provide some triages?
>>> >>
>>> >> Thanks in advance!
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> ---------------------------
>>> >>
>>> >> George Kousiouris, PhD
>>> >> Electrical and Computer Engineer
>>> >> Division of Communications,
>>> >> Electronics and Information Engineering
>>> >> School of Electrical and Computer Engineering
>>> >> Tel: +30 210 772 2546
>>> >> Mobile: +30 6939354121
>>> >> Fax: +30 210 772 2569
>>> >> Email: gkousiou@mail.ntua.gr
>>> >> Site: http://users.ntua.gr/gkousiou/
>>> >>
>>> >> National Technical University of Athens
>>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

The step (a) points to your problem and solution both. You have files
being created with repl=3 on a 2 DN cluster which will prevent
decommission. This is not a bug.

On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
> I opened a jira for tracking this issue:
> https://issues.apache.org/jira/browse/HDFS-5046
>
>
> 2013/7/2 sam liu <sa...@gmail.com>
>>
>> Yes, the default replication factor is 3. However, in my case, it's
>> strange: during decommission hangs, I found some block's expected replicas
>> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster node
>> is always 2 from the beginning of cluster setup. Below is my steps:
>>
>> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
>> hdfs-site.xml, set the 'dfs.replication' to 2
>> 2. Add node dn3 into the cluster as a new datanode, and did not change the
>> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> note: step 2 passed
>> 3. Decommission dn3 from the cluster
>> Expected result: dn3 could be decommissioned successfully
>> Actual result:
>> a). decommission progress hangs and the status always be 'Waiting DataNode
>> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
>> decommission continues and will be completed finally.
>> b). However, if the initial cluster includes >= 3 datanodes, this issue
>> won't be encountered when add/remove another datanode. For example, if I
>> setup a cluster with 3 datanodes, and then I can successfully add the 4th
>> datanode into it, and then also can successfully remove the 4th datanode
>> from the cluster.
>>
>> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
>> comments?
>>
>> Thanks!
>>
>>
>> 2013/6/21 Harsh J <ha...@cloudera.com>
>>>
>>> The dfs.replication is a per-file parameter. If you have a client that
>>> does not use the supplied configs, then its default replication is 3
>>> and all files it will create (as part of the app or via a job config)
>>> will be with replication factor 3.
>>>
>>> You can do an -lsr to find all files and filter which ones have been
>>> created with a factor of 3 (versus expected config of 2).
>>>
>>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
>>> > Hi George,
>>> >
>>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
>>> > still
>>> > encounter this issue.
>>> >
>>> > Thanks!
>>> >
>>> >
>>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>>> >>
>>> >>
>>> >> Hi,
>>> >>
>>> >> I think i have faced this before, the problem is that you have the rep
>>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
>>> >> factor
>>> >> (replicas are not created on the same node). If you set the
>>> >> replication
>>> >> factor=2 i think you will not have this issue. So in general you must
>>> >> make
>>> >> sure that the rep factor is <= to the available datanodes.
>>> >>
>>> >> BR,
>>> >> George
>>> >>
>>> >>
>>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I encountered an issue which hangs the decommission operatoin. Its
>>> >> steps:
>>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>>> >> in
>>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>>> >> the
>>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>>> >> note: step 2 passed
>>> >> 3. Decommission dn3 from the cluster
>>> >>
>>> >> Expected result: dn3 could be decommissioned successfully
>>> >>
>>> >> Actual result: decommission progress hangs and the status always be
>>> >> 'Waiting DataNode status: Decommissioned'
>>> >>
>>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>>> >> won't
>>> >> be encountered when add/remove another datanode.
>>> >>
>>> >> Also, after step 2, I noticed that some block's expected replicas is
>>> >> 3,
>>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>>> >>
>>> >> Could anyone pls help provide some triages?
>>> >>
>>> >> Thanks in advance!
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> ---------------------------
>>> >>
>>> >> George Kousiouris, PhD
>>> >> Electrical and Computer Engineer
>>> >> Division of Communications,
>>> >> Electronics and Information Engineering
>>> >> School of Electrical and Computer Engineering
>>> >> Tel: +30 210 772 2546
>>> >> Mobile: +30 6939354121
>>> >> Fax: +30 210 772 2569
>>> >> Email: gkousiou@mail.ntua.gr
>>> >> Site: http://users.ntua.gr/gkousiou/
>>> >>
>>> >> National Technical University of Athens
>>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

The step (a) points to your problem and solution both. You have files
being created with repl=3 on a 2 DN cluster which will prevent
decommission. This is not a bug.

On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
> I opened a jira for tracking this issue:
> https://issues.apache.org/jira/browse/HDFS-5046
>
>
> 2013/7/2 sam liu <sa...@gmail.com>
>>
>> Yes, the default replication factor is 3. However, in my case, it's
>> strange: during decommission hangs, I found some block's expected replicas
>> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster node
>> is always 2 from the beginning of cluster setup. Below is my steps:
>>
>> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
>> hdfs-site.xml, set the 'dfs.replication' to 2
>> 2. Add node dn3 into the cluster as a new datanode, and did not change the
>> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> note: step 2 passed
>> 3. Decommission dn3 from the cluster
>> Expected result: dn3 could be decommissioned successfully
>> Actual result:
>> a). decommission progress hangs and the status always be 'Waiting DataNode
>> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
>> decommission continues and will be completed finally.
>> b). However, if the initial cluster includes >= 3 datanodes, this issue
>> won't be encountered when add/remove another datanode. For example, if I
>> setup a cluster with 3 datanodes, and then I can successfully add the 4th
>> datanode into it, and then also can successfully remove the 4th datanode
>> from the cluster.
>>
>> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
>> comments?
>>
>> Thanks!
>>
>>
>> 2013/6/21 Harsh J <ha...@cloudera.com>
>>>
>>> The dfs.replication is a per-file parameter. If you have a client that
>>> does not use the supplied configs, then its default replication is 3
>>> and all files it will create (as part of the app or via a job config)
>>> will be with replication factor 3.
>>>
>>> You can do an -lsr to find all files and filter which ones have been
>>> created with a factor of 3 (versus expected config of 2).
>>>
>>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
>>> > Hi George,
>>> >
>>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
>>> > still
>>> > encounter this issue.
>>> >
>>> > Thanks!
>>> >
>>> >
>>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>>> >>
>>> >>
>>> >> Hi,
>>> >>
>>> >> I think i have faced this before, the problem is that you have the rep
>>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
>>> >> factor
>>> >> (replicas are not created on the same node). If you set the
>>> >> replication
>>> >> factor=2 i think you will not have this issue. So in general you must
>>> >> make
>>> >> sure that the rep factor is <= to the available datanodes.
>>> >>
>>> >> BR,
>>> >> George
>>> >>
>>> >>
>>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I encountered an issue which hangs the decommission operatoin. Its
>>> >> steps:
>>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>>> >> in
>>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>>> >> the
>>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>>> >> note: step 2 passed
>>> >> 3. Decommission dn3 from the cluster
>>> >>
>>> >> Expected result: dn3 could be decommissioned successfully
>>> >>
>>> >> Actual result: decommission progress hangs and the status always be
>>> >> 'Waiting DataNode status: Decommissioned'
>>> >>
>>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>>> >> won't
>>> >> be encountered when add/remove another datanode.
>>> >>
>>> >> Also, after step 2, I noticed that some block's expected replicas is
>>> >> 3,
>>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>>> >>
>>> >> Could anyone pls help provide some triages?
>>> >>
>>> >> Thanks in advance!
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> ---------------------------
>>> >>
>>> >> George Kousiouris, PhD
>>> >> Electrical and Computer Engineer
>>> >> Division of Communications,
>>> >> Electronics and Information Engineering
>>> >> School of Electrical and Computer Engineering
>>> >> Tel: +30 210 772 2546
>>> >> Mobile: +30 6939354121
>>> >> Fax: +30 210 772 2569
>>> >> Email: gkousiou@mail.ntua.gr
>>> >> Site: http://users.ntua.gr/gkousiou/
>>> >>
>>> >> National Technical University of Athens
>>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

The step (a) points to your problem and solution both. You have files
being created with repl=3 on a 2 DN cluster which will prevent
decommission. This is not a bug.

On Wed, Jul 31, 2013 at 12:09 PM, sam liu <sa...@gmail.com> wrote:
> I opened a jira for tracking this issue:
> https://issues.apache.org/jira/browse/HDFS-5046
>
>
> 2013/7/2 sam liu <sa...@gmail.com>
>>
>> Yes, the default replication factor is 3. However, in my case, it's
>> strange: during decommission hangs, I found some block's expected replicas
>> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster node
>> is always 2 from the beginning of cluster setup. Below is my steps:
>>
>> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
>> hdfs-site.xml, set the 'dfs.replication' to 2
>> 2. Add node dn3 into the cluster as a new datanode, and did not change the
>> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> note: step 2 passed
>> 3. Decommission dn3 from the cluster
>> Expected result: dn3 could be decommissioned successfully
>> Actual result:
>> a). decommission progress hangs and the status always be 'Waiting DataNode
>> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
>> decommission continues and will be completed finally.
>> b). However, if the initial cluster includes >= 3 datanodes, this issue
>> won't be encountered when add/remove another datanode. For example, if I
>> setup a cluster with 3 datanodes, and then I can successfully add the 4th
>> datanode into it, and then also can successfully remove the 4th datanode
>> from the cluster.
>>
>> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
>> comments?
>>
>> Thanks!
>>
>>
>> 2013/6/21 Harsh J <ha...@cloudera.com>
>>>
>>> The dfs.replication is a per-file parameter. If you have a client that
>>> does not use the supplied configs, then its default replication is 3
>>> and all files it will create (as part of the app or via a job config)
>>> will be with replication factor 3.
>>>
>>> You can do an -lsr to find all files and filter which ones have been
>>> created with a factor of 3 (versus expected config of 2).
>>>
>>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
>>> > Hi George,
>>> >
>>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
>>> > still
>>> > encounter this issue.
>>> >
>>> > Thanks!
>>> >
>>> >
>>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>>> >>
>>> >>
>>> >> Hi,
>>> >>
>>> >> I think i have faced this before, the problem is that you have the rep
>>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
>>> >> factor
>>> >> (replicas are not created on the same node). If you set the
>>> >> replication
>>> >> factor=2 i think you will not have this issue. So in general you must
>>> >> make
>>> >> sure that the rep factor is <= to the available datanodes.
>>> >>
>>> >> BR,
>>> >> George
>>> >>
>>> >>
>>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I encountered an issue which hangs the decommission operatoin. Its
>>> >> steps:
>>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>>> >> in
>>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>>> >> the
>>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>>> >> note: step 2 passed
>>> >> 3. Decommission dn3 from the cluster
>>> >>
>>> >> Expected result: dn3 could be decommissioned successfully
>>> >>
>>> >> Actual result: decommission progress hangs and the status always be
>>> >> 'Waiting DataNode status: Decommissioned'
>>> >>
>>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>>> >> won't
>>> >> be encountered when add/remove another datanode.
>>> >>
>>> >> Also, after step 2, I noticed that some block's expected replicas is
>>> >> 3,
>>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>>> >>
>>> >> Could anyone pls help provide some triages?
>>> >>
>>> >> Thanks in advance!
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> ---------------------------
>>> >>
>>> >> George Kousiouris, PhD
>>> >> Electrical and Computer Engineer
>>> >> Division of Communications,
>>> >> Electronics and Information Engineering
>>> >> School of Electrical and Computer Engineering
>>> >> Tel: +30 210 772 2546
>>> >> Mobile: +30 6939354121
>>> >> Fax: +30 210 772 2569
>>> >> Email: gkousiou@mail.ntua.gr
>>> >> Site: http://users.ntua.gr/gkousiou/
>>> >>
>>> >> National Technical University of Athens
>>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

I opened a jira for tracking this issue:
https://issues.apache.org/jira/browse/HDFS-5046


2013/7/2 sam liu <sa...@gmail.com>

> Yes, the default replication factor is 3. However, in my case, it's
> strange: during decommission hangs, I found some block's expected replicas
> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node is always 2 from the beginning of cluster setup. Below is my steps:
>
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
> hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change the
> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
>  3. Decommission dn3 from the cluster
> Expected result: dn3 could be decommissioned successfully
> Actual result:
> a). decommission progress hangs and the status always be 'Waiting DataNode
> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
> decommission continues and will be completed finally.
> b). However, if the initial cluster includes >= 3 datanodes, this issue
> won't be encountered when add/remove another datanode. For example, if I
> setup a cluster with 3 datanodes, and then I can successfully add the 4th
> datanode into it, and then also can successfully remove the 4th datanode
> from the cluster.
>
> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> comments?
>
> Thanks!
>
>
> 2013/6/21 Harsh J <ha...@cloudera.com>
>
>> The dfs.replication is a per-file parameter. If you have a client that
>> does not use the supplied configs, then its default replication is 3
>> and all files it will create (as part of the app or via a job config)
>> will be with replication factor 3.
>>
>> You can do an -lsr to find all files and filter which ones have been
>> created with a factor of 3 (versus expected config of 2).
>>
>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
>> > Hi George,
>> >
>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
>> still
>> > encounter this issue.
>> >
>> > Thanks!
>> >
>> >
>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I think i have faced this before, the problem is that you have the rep
>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
>> factor
>> >> (replicas are not created on the same node). If you set the replication
>> >> factor=2 i think you will not have this issue. So in general you must
>> make
>> >> sure that the rep factor is <= to the available datanodes.
>> >>
>> >> BR,
>> >> George
>> >>
>> >>
>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>> >>
>> >> Hi,
>> >>
>> >> I encountered an issue which hangs the decommission operatoin. Its
>> steps:
>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>> in
>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>> the
>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >> note: step 2 passed
>> >> 3. Decommission dn3 from the cluster
>> >>
>> >> Expected result: dn3 could be decommissioned successfully
>> >>
>> >> Actual result: decommission progress hangs and the status always be
>> >> 'Waiting DataNode status: Decommissioned'
>> >>
>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>> won't
>> >> be encountered when add/remove another datanode.
>> >>
>> >> Also, after step 2, I noticed that some block's expected replicas is 3,
>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>> >>
>> >> Could anyone pls help provide some triages?
>> >>
>> >> Thanks in advance!
>> >>
>> >>
>> >>
>> >> --
>> >> ---------------------------
>> >>
>> >> George Kousiouris, PhD
>> >> Electrical and Computer Engineer
>> >> Division of Communications,
>> >> Electronics and Information Engineering
>> >> School of Electrical and Computer Engineering
>> >> Tel: +30 210 772 2546
>> >> Mobile: +30 6939354121
>> >> Fax: +30 210 772 2569
>> >> Email: gkousiou@mail.ntua.gr
>> >> Site: http://users.ntua.gr/gkousiou/
>> >>
>> >> National Technical University of Athens
>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

I opened a jira for tracking this issue:
https://issues.apache.org/jira/browse/HDFS-5046


2013/7/2 sam liu <sa...@gmail.com>

> Yes, the default replication factor is 3. However, in my case, it's
> strange: during decommission hangs, I found some block's expected replicas
> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node is always 2 from the beginning of cluster setup. Below is my steps:
>
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
> hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change the
> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
>  3. Decommission dn3 from the cluster
> Expected result: dn3 could be decommissioned successfully
> Actual result:
> a). decommission progress hangs and the status always be 'Waiting DataNode
> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
> decommission continues and will be completed finally.
> b). However, if the initial cluster includes >= 3 datanodes, this issue
> won't be encountered when add/remove another datanode. For example, if I
> setup a cluster with 3 datanodes, and then I can successfully add the 4th
> datanode into it, and then also can successfully remove the 4th datanode
> from the cluster.
>
> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> comments?
>
> Thanks!
>
>
> 2013/6/21 Harsh J <ha...@cloudera.com>
>
>> The dfs.replication is a per-file parameter. If you have a client that
>> does not use the supplied configs, then its default replication is 3
>> and all files it will create (as part of the app or via a job config)
>> will be with replication factor 3.
>>
>> You can do an -lsr to find all files and filter which ones have been
>> created with a factor of 3 (versus expected config of 2).
>>
>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
>> > Hi George,
>> >
>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
>> still
>> > encounter this issue.
>> >
>> > Thanks!
>> >
>> >
>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I think i have faced this before, the problem is that you have the rep
>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
>> factor
>> >> (replicas are not created on the same node). If you set the replication
>> >> factor=2 i think you will not have this issue. So in general you must
>> make
>> >> sure that the rep factor is <= to the available datanodes.
>> >>
>> >> BR,
>> >> George
>> >>
>> >>
>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>> >>
>> >> Hi,
>> >>
>> >> I encountered an issue which hangs the decommission operatoin. Its
>> steps:
>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>> in
>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>> the
>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >> note: step 2 passed
>> >> 3. Decommission dn3 from the cluster
>> >>
>> >> Expected result: dn3 could be decommissioned successfully
>> >>
>> >> Actual result: decommission progress hangs and the status always be
>> >> 'Waiting DataNode status: Decommissioned'
>> >>
>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>> won't
>> >> be encountered when add/remove another datanode.
>> >>
>> >> Also, after step 2, I noticed that some block's expected replicas is 3,
>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>> >>
>> >> Could anyone pls help provide some triages?
>> >>
>> >> Thanks in advance!
>> >>
>> >>
>> >>
>> >> --
>> >> ---------------------------
>> >>
>> >> George Kousiouris, PhD
>> >> Electrical and Computer Engineer
>> >> Division of Communications,
>> >> Electronics and Information Engineering
>> >> School of Electrical and Computer Engineering
>> >> Tel: +30 210 772 2546
>> >> Mobile: +30 6939354121
>> >> Fax: +30 210 772 2569
>> >> Email: gkousiou@mail.ntua.gr
>> >> Site: http://users.ntua.gr/gkousiou/
>> >>
>> >> National Technical University of Athens
>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

I opened a jira for tracking this issue:
https://issues.apache.org/jira/browse/HDFS-5046


2013/7/2 sam liu <sa...@gmail.com>

> Yes, the default replication factor is 3. However, in my case, it's
> strange: during decommission hangs, I found some block's expected replicas
> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node is always 2 from the beginning of cluster setup. Below is my steps:
>
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
> hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change the
> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
>  3. Decommission dn3 from the cluster
> Expected result: dn3 could be decommissioned successfully
> Actual result:
> a). decommission progress hangs and the status always be 'Waiting DataNode
> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
> decommission continues and will be completed finally.
> b). However, if the initial cluster includes >= 3 datanodes, this issue
> won't be encountered when add/remove another datanode. For example, if I
> setup a cluster with 3 datanodes, and then I can successfully add the 4th
> datanode into it, and then also can successfully remove the 4th datanode
> from the cluster.
>
> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> comments?
>
> Thanks!
>
>
> 2013/6/21 Harsh J <ha...@cloudera.com>
>
>> The dfs.replication is a per-file parameter. If you have a client that
>> does not use the supplied configs, then its default replication is 3
>> and all files it will create (as part of the app or via a job config)
>> will be with replication factor 3.
>>
>> You can do an -lsr to find all files and filter which ones have been
>> created with a factor of 3 (versus expected config of 2).
>>
>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
>> > Hi George,
>> >
>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
>> still
>> > encounter this issue.
>> >
>> > Thanks!
>> >
>> >
>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I think i have faced this before, the problem is that you have the rep
>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
>> factor
>> >> (replicas are not created on the same node). If you set the replication
>> >> factor=2 i think you will not have this issue. So in general you must
>> make
>> >> sure that the rep factor is <= to the available datanodes.
>> >>
>> >> BR,
>> >> George
>> >>
>> >>
>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>> >>
>> >> Hi,
>> >>
>> >> I encountered an issue which hangs the decommission operatoin. Its
>> steps:
>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>> in
>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>> the
>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >> note: step 2 passed
>> >> 3. Decommission dn3 from the cluster
>> >>
>> >> Expected result: dn3 could be decommissioned successfully
>> >>
>> >> Actual result: decommission progress hangs and the status always be
>> >> 'Waiting DataNode status: Decommissioned'
>> >>
>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>> won't
>> >> be encountered when add/remove another datanode.
>> >>
>> >> Also, after step 2, I noticed that some block's expected replicas is 3,
>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>> >>
>> >> Could anyone pls help provide some triages?
>> >>
>> >> Thanks in advance!
>> >>
>> >>
>> >>
>> >> --
>> >> ---------------------------
>> >>
>> >> George Kousiouris, PhD
>> >> Electrical and Computer Engineer
>> >> Division of Communications,
>> >> Electronics and Information Engineering
>> >> School of Electrical and Computer Engineering
>> >> Tel: +30 210 772 2546
>> >> Mobile: +30 6939354121
>> >> Fax: +30 210 772 2569
>> >> Email: gkousiou@mail.ntua.gr
>> >> Site: http://users.ntua.gr/gkousiou/
>> >>
>> >> National Technical University of Athens
>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

I opened a jira for tracking this issue:
https://issues.apache.org/jira/browse/HDFS-5046


2013/7/2 sam liu <sa...@gmail.com>

> Yes, the default replication factor is 3. However, in my case, it's
> strange: during decommission hangs, I found some block's expected replicas
> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node is always 2 from the beginning of cluster setup. Below is my steps:
>
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
> hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change the
> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
>  3. Decommission dn3 from the cluster
> Expected result: dn3 could be decommissioned successfully
> Actual result:
> a). decommission progress hangs and the status always be 'Waiting DataNode
> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
> decommission continues and will be completed finally.
> b). However, if the initial cluster includes >= 3 datanodes, this issue
> won't be encountered when add/remove another datanode. For example, if I
> setup a cluster with 3 datanodes, and then I can successfully add the 4th
> datanode into it, and then also can successfully remove the 4th datanode
> from the cluster.
>
> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> comments?
>
> Thanks!
>
>
> 2013/6/21 Harsh J <ha...@cloudera.com>
>
>> The dfs.replication is a per-file parameter. If you have a client that
>> does not use the supplied configs, then its default replication is 3
>> and all files it will create (as part of the app or via a job config)
>> will be with replication factor 3.
>>
>> You can do an -lsr to find all files and filter which ones have been
>> created with a factor of 3 (versus expected config of 2).
>>
>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
>> > Hi George,
>> >
>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
>> still
>> > encounter this issue.
>> >
>> > Thanks!
>> >
>> >
>> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I think i have faced this before, the problem is that you have the rep
>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
>> factor
>> >> (replicas are not created on the same node). If you set the replication
>> >> factor=2 i think you will not have this issue. So in general you must
>> make
>> >> sure that the rep factor is <= to the available datanodes.
>> >>
>> >> BR,
>> >> George
>> >>
>> >>
>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>> >>
>> >> Hi,
>> >>
>> >> I encountered an issue which hangs the decommission operatoin. Its
>> steps:
>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>> in
>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>> the
>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >> note: step 2 passed
>> >> 3. Decommission dn3 from the cluster
>> >>
>> >> Expected result: dn3 could be decommissioned successfully
>> >>
>> >> Actual result: decommission progress hangs and the status always be
>> >> 'Waiting DataNode status: Decommissioned'
>> >>
>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>> won't
>> >> be encountered when add/remove another datanode.
>> >>
>> >> Also, after step 2, I noticed that some block's expected replicas is 3,
>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>> >>
>> >> Could anyone pls help provide some triages?
>> >>
>> >> Thanks in advance!
>> >>
>> >>
>> >>
>> >> --
>> >> ---------------------------
>> >>
>> >> George Kousiouris, PhD
>> >> Electrical and Computer Engineer
>> >> Division of Communications,
>> >> Electronics and Information Engineering
>> >> School of Electrical and Computer Engineering
>> >> Tel: +30 210 772 2546
>> >> Mobile: +30 6939354121
>> >> Fax: +30 210 772 2569
>> >> Email: gkousiou@mail.ntua.gr
>> >> Site: http://users.ntua.gr/gkousiou/
>> >>
>> >> National Technical University of Athens
>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Yes, the default replication factor is 3. However, in my case, it's
strange: during decommission hangs, I found some block's expected replicas
is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
node is always 2 from the beginning of cluster setup. Below is my steps:
1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
hdfs-site.xml, set the 'dfs.replication' to 2
2. Add node dn3 into the cluster as a new datanode, and did not change the '
dfs.replication' value in hdfs-site.xml and keep it as 2
note: step 2 passed
3. Decommission dn3 from the cluster
Expected result: dn3 could be decommissioned successfully
Actual result:
a). decommission progress hangs and the status always be 'Waiting DataNode
status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
decommission continues and will be completed finally.
b). However, if the initial cluster includes >= 3 datanodes, this issue
won't be encountered when add/remove another datanode. For example, if I
setup a cluster with 3 datanodes, and then I can successfully add the 4th
datanode into it, and then also can successfully remove the 4th datanode
from the cluster.

I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
comments?

Thanks!

2013/6/21 Harsh J <ha...@cloudera.com>

> The dfs.replication is a per-file parameter. If you have a client that
> does not use the supplied configs, then its default replication is 3
> and all files it will create (as part of the app or via a job config)
> will be with replication factor 3.
>
> You can do an -lsr to find all files and filter which ones have been
> created with a factor of 3 (versus expected config of 2).
>
> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
> > Hi George,
> >
> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
> still
> > encounter this issue.
> >
> > Thanks!
> >
> >
> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >>
> >>
> >> Hi,
> >>
> >> I think i have faced this before, the problem is that you have the rep
> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> factor
> >> (replicas are not created on the same node). If you set the replication
> >> factor=2 i think you will not have this issue. So in general you must
> make
> >> sure that the rep factor is <= to the available datanodes.
> >>
> >> BR,
> >> George
> >>
> >>
> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>
> >> Hi,
> >>
> >> I encountered an issue which hangs the decommission operatoin. Its
> steps:
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >>
> >> Expected result: dn3 could be decommissioned successfully
> >>
> >> Actual result: decommission progress hangs and the status always be
> >> 'Waiting DataNode status: Decommissioned'
> >>
> >> However, if the initial cluster includes >= 3 datanodes, this issue
> won't
> >> be encountered when add/remove another datanode.
> >>
> >> Also, after step 2, I noticed that some block's expected replicas is 3,
> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >>
> >> Could anyone pls help provide some triages?
> >>
> >> Thanks in advance!
> >>
> >>
> >>
> >> --
> >> ---------------------------
> >>
> >> George Kousiouris, PhD
> >> Electrical and Computer Engineer
> >> Division of Communications,
> >> Electronics and Information Engineering
> >> School of Electrical and Computer Engineering
> >> Tel: +30 210 772 2546
> >> Mobile: +30 6939354121
> >> Fax: +30 210 772 2569
> >> Email: gkousiou@mail.ntua.gr
> >> Site: http://users.ntua.gr/gkousiou/
> >>
> >> National Technical University of Athens
> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Yes, the default replication factor is 3. However, in my case, it's
strange: during decommission hangs, I found some block's expected replicas
is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
node is always 2 from the beginning of cluster setup. Below is my steps:
1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
hdfs-site.xml, set the 'dfs.replication' to 2
2. Add node dn3 into the cluster as a new datanode, and did not change the '
dfs.replication' value in hdfs-site.xml and keep it as 2
note: step 2 passed
3. Decommission dn3 from the cluster
Expected result: dn3 could be decommissioned successfully
Actual result:
a). decommission progress hangs and the status always be 'Waiting DataNode
status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
decommission continues and will be completed finally.
b). However, if the initial cluster includes >= 3 datanodes, this issue
won't be encountered when add/remove another datanode. For example, if I
setup a cluster with 3 datanodes, and then I can successfully add the 4th
datanode into it, and then also can successfully remove the 4th datanode
from the cluster.

I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
comments?

Thanks!

2013/6/21 Harsh J <ha...@cloudera.com>

> The dfs.replication is a per-file parameter. If you have a client that
> does not use the supplied configs, then its default replication is 3
> and all files it will create (as part of the app or via a job config)
> will be with replication factor 3.
>
> You can do an -lsr to find all files and filter which ones have been
> created with a factor of 3 (versus expected config of 2).
>
> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
> > Hi George,
> >
> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
> still
> > encounter this issue.
> >
> > Thanks!
> >
> >
> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >>
> >>
> >> Hi,
> >>
> >> I think i have faced this before, the problem is that you have the rep
> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> factor
> >> (replicas are not created on the same node). If you set the replication
> >> factor=2 i think you will not have this issue. So in general you must
> make
> >> sure that the rep factor is <= to the available datanodes.
> >>
> >> BR,
> >> George
> >>
> >>
> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>
> >> Hi,
> >>
> >> I encountered an issue which hangs the decommission operatoin. Its
> steps:
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >>
> >> Expected result: dn3 could be decommissioned successfully
> >>
> >> Actual result: decommission progress hangs and the status always be
> >> 'Waiting DataNode status: Decommissioned'
> >>
> >> However, if the initial cluster includes >= 3 datanodes, this issue
> won't
> >> be encountered when add/remove another datanode.
> >>
> >> Also, after step 2, I noticed that some block's expected replicas is 3,
> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >>
> >> Could anyone pls help provide some triages?
> >>
> >> Thanks in advance!
> >>
> >>
> >>
> >> --
> >> ---------------------------
> >>
> >> George Kousiouris, PhD
> >> Electrical and Computer Engineer
> >> Division of Communications,
> >> Electronics and Information Engineering
> >> School of Electrical and Computer Engineering
> >> Tel: +30 210 772 2546
> >> Mobile: +30 6939354121
> >> Fax: +30 210 772 2569
> >> Email: gkousiou@mail.ntua.gr
> >> Site: http://users.ntua.gr/gkousiou/
> >>
> >> National Technical University of Athens
> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Yes, the default replication factor is 3. However, in my case, it's
strange: during decommission hangs, I found some block's expected replicas
is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
node is always 2 from the beginning of cluster setup. Below is my steps:
1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
hdfs-site.xml, set the 'dfs.replication' to 2
2. Add node dn3 into the cluster as a new datanode, and did not change the '
dfs.replication' value in hdfs-site.xml and keep it as 2
note: step 2 passed
3. Decommission dn3 from the cluster
Expected result: dn3 could be decommissioned successfully
Actual result:
a). decommission progress hangs and the status always be 'Waiting DataNode
status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
decommission continues and will be completed finally.
b). However, if the initial cluster includes >= 3 datanodes, this issue
won't be encountered when add/remove another datanode. For example, if I
setup a cluster with 3 datanodes, and then I can successfully add the 4th
datanode into it, and then also can successfully remove the 4th datanode
from the cluster.

I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
comments?

Thanks!

2013/6/21 Harsh J <ha...@cloudera.com>

> The dfs.replication is a per-file parameter. If you have a client that
> does not use the supplied configs, then its default replication is 3
> and all files it will create (as part of the app or via a job config)
> will be with replication factor 3.
>
> You can do an -lsr to find all files and filter which ones have been
> created with a factor of 3 (versus expected config of 2).
>
> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
> > Hi George,
> >
> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
> still
> > encounter this issue.
> >
> > Thanks!
> >
> >
> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >>
> >>
> >> Hi,
> >>
> >> I think i have faced this before, the problem is that you have the rep
> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> factor
> >> (replicas are not created on the same node). If you set the replication
> >> factor=2 i think you will not have this issue. So in general you must
> make
> >> sure that the rep factor is <= to the available datanodes.
> >>
> >> BR,
> >> George
> >>
> >>
> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>
> >> Hi,
> >>
> >> I encountered an issue which hangs the decommission operatoin. Its
> steps:
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >>
> >> Expected result: dn3 could be decommissioned successfully
> >>
> >> Actual result: decommission progress hangs and the status always be
> >> 'Waiting DataNode status: Decommissioned'
> >>
> >> However, if the initial cluster includes >= 3 datanodes, this issue
> won't
> >> be encountered when add/remove another datanode.
> >>
> >> Also, after step 2, I noticed that some block's expected replicas is 3,
> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >>
> >> Could anyone pls help provide some triages?
> >>
> >> Thanks in advance!
> >>
> >>
> >>
> >> --
> >> ---------------------------
> >>
> >> George Kousiouris, PhD
> >> Electrical and Computer Engineer
> >> Division of Communications,
> >> Electronics and Information Engineering
> >> School of Electrical and Computer Engineering
> >> Tel: +30 210 772 2546
> >> Mobile: +30 6939354121
> >> Fax: +30 210 772 2569
> >> Email: gkousiou@mail.ntua.gr
> >> Site: http://users.ntua.gr/gkousiou/
> >>
> >> National Technical University of Athens
> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Yes, the default replication factor is 3. However, in my case, it's
strange: during decommission hangs, I found some block's expected replicas
is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
node is always 2 from the beginning of cluster setup. Below is my steps:
1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
hdfs-site.xml, set the 'dfs.replication' to 2
2. Add node dn3 into the cluster as a new datanode, and did not change the '
dfs.replication' value in hdfs-site.xml and keep it as 2
note: step 2 passed
3. Decommission dn3 from the cluster
Expected result: dn3 could be decommissioned successfully
Actual result:
a). decommission progress hangs and the status always be 'Waiting DataNode
status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
decommission continues and will be completed finally.
b). However, if the initial cluster includes >= 3 datanodes, this issue
won't be encountered when add/remove another datanode. For example, if I
setup a cluster with 3 datanodes, and then I can successfully add the 4th
datanode into it, and then also can successfully remove the 4th datanode
from the cluster.

I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
comments?

Thanks!

2013/6/21 Harsh J <ha...@cloudera.com>

> The dfs.replication is a per-file parameter. If you have a client that
> does not use the supplied configs, then its default replication is 3
> and all files it will create (as part of the app or via a job config)
> will be with replication factor 3.
>
> You can do an -lsr to find all files and filter which ones have been
> created with a factor of 3 (versus expected config of 2).
>
> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
> > Hi George,
> >
> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
> still
> > encounter this issue.
> >
> > Thanks!
> >
> >
> > 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
> >>
> >>
> >> Hi,
> >>
> >> I think i have faced this before, the problem is that you have the rep
> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> factor
> >> (replicas are not created on the same node). If you set the replication
> >> factor=2 i think you will not have this issue. So in general you must
> make
> >> sure that the rep factor is <= to the available datanodes.
> >>
> >> BR,
> >> George
> >>
> >>
> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>
> >> Hi,
> >>
> >> I encountered an issue which hangs the decommission operatoin. Its
> steps:
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >>
> >> Expected result: dn3 could be decommissioned successfully
> >>
> >> Actual result: decommission progress hangs and the status always be
> >> 'Waiting DataNode status: Decommissioned'
> >>
> >> However, if the initial cluster includes >= 3 datanodes, this issue
> won't
> >> be encountered when add/remove another datanode.
> >>
> >> Also, after step 2, I noticed that some block's expected replicas is 3,
> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >>
> >> Could anyone pls help provide some triages?
> >>
> >> Thanks in advance!
> >>
> >>
> >>
> >> --
> >> ---------------------------
> >>
> >> George Kousiouris, PhD
> >> Electrical and Computer Engineer
> >> Division of Communications,
> >> Electronics and Information Engineering
> >> School of Electrical and Computer Engineering
> >> Tel: +30 210 772 2546
> >> Mobile: +30 6939354121
> >> Fax: +30 210 772 2569
> >> Email: gkousiou@mail.ntua.gr
> >> Site: http://users.ntua.gr/gkousiou/
> >>
> >> National Technical University of Athens
> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

The dfs.replication is a per-file parameter. If you have a client that
does not use the supplied configs, then its default replication is 3
and all files it will create (as part of the app or via a job config)
will be with replication factor 3.

You can do an -lsr to find all files and filter which ones have been
created with a factor of 3 (versus expected config of 2).

On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
> Hi George,
>
> Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But still
> encounter this issue.
>
> Thanks!
>
>
> 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>>
>>
>> Hi,
>>
>> I think i have faced this before, the problem is that you have the rep
>> factor=3 so it seems to hang because it needs 3 nodes to achieve the factor
>> (replicas are not created on the same node). If you set the replication
>> factor=2 i think you will not have this issue. So in general you must make
>> sure that the rep factor is <= to the available datanodes.
>>
>> BR,
>> George
>>
>>
>> On 6/21/2013 12:29 PM, sam liu wrote:
>>
>> Hi,
>>
>> I encountered an issue which hangs the decommission operatoin. Its steps:
>> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
>> hdfs-site.xml, set the 'dfs.replication' to 2
>> 2. Add node dn3 into the cluster as a new datanode, and did not change the
>> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> note: step 2 passed
>> 3. Decommission dn3 from the cluster
>>
>> Expected result: dn3 could be decommissioned successfully
>>
>> Actual result: decommission progress hangs and the status always be
>> 'Waiting DataNode status: Decommissioned'
>>
>> However, if the initial cluster includes >= 3 datanodes, this issue won't
>> be encountered when add/remove another datanode.
>>
>> Also, after step 2, I noticed that some block's expected replicas is 3,
>> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>>
>> Could anyone pls help provide some triages?
>>
>> Thanks in advance!
>>
>>
>>
>> --
>> ---------------------------
>>
>> George Kousiouris, PhD
>> Electrical and Computer Engineer
>> Division of Communications,
>> Electronics and Information Engineering
>> School of Electrical and Computer Engineering
>> Tel: +30 210 772 2546
>> Mobile: +30 6939354121
>> Fax: +30 210 772 2569
>> Email: gkousiou@mail.ntua.gr
>> Site: http://users.ntua.gr/gkousiou/
>>
>> National Technical University of Athens
>> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

The dfs.replication is a per-file parameter. If you have a client that
does not use the supplied configs, then its default replication is 3
and all files it will create (as part of the app or via a job config)
will be with replication factor 3.

You can do an -lsr to find all files and filter which ones have been
created with a factor of 3 (versus expected config of 2).

On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
> Hi George,
>
> Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But still
> encounter this issue.
>
> Thanks!
>
>
> 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>>
>>
>> Hi,
>>
>> I think i have faced this before, the problem is that you have the rep
>> factor=3 so it seems to hang because it needs 3 nodes to achieve the factor
>> (replicas are not created on the same node). If you set the replication
>> factor=2 i think you will not have this issue. So in general you must make
>> sure that the rep factor is <= to the available datanodes.
>>
>> BR,
>> George
>>
>>
>> On 6/21/2013 12:29 PM, sam liu wrote:
>>
>> Hi,
>>
>> I encountered an issue which hangs the decommission operatoin. Its steps:
>> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
>> hdfs-site.xml, set the 'dfs.replication' to 2
>> 2. Add node dn3 into the cluster as a new datanode, and did not change the
>> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> note: step 2 passed
>> 3. Decommission dn3 from the cluster
>>
>> Expected result: dn3 could be decommissioned successfully
>>
>> Actual result: decommission progress hangs and the status always be
>> 'Waiting DataNode status: Decommissioned'
>>
>> However, if the initial cluster includes >= 3 datanodes, this issue won't
>> be encountered when add/remove another datanode.
>>
>> Also, after step 2, I noticed that some block's expected replicas is 3,
>> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>>
>> Could anyone pls help provide some triages?
>>
>> Thanks in advance!
>>
>>
>>
>> --
>> ---------------------------
>>
>> George Kousiouris, PhD
>> Electrical and Computer Engineer
>> Division of Communications,
>> Electronics and Information Engineering
>> School of Electrical and Computer Engineering
>> Tel: +30 210 772 2546
>> Mobile: +30 6939354121
>> Fax: +30 210 772 2569
>> Email: gkousiou@mail.ntua.gr
>> Site: http://users.ntua.gr/gkousiou/
>>
>> National Technical University of Athens
>> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

The dfs.replication is a per-file parameter. If you have a client that
does not use the supplied configs, then its default replication is 3
and all files it will create (as part of the app or via a job config)
will be with replication factor 3.

You can do an -lsr to find all files and filter which ones have been
created with a factor of 3 (versus expected config of 2).

On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
> Hi George,
>
> Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But still
> encounter this issue.
>
> Thanks!
>
>
> 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>>
>>
>> Hi,
>>
>> I think i have faced this before, the problem is that you have the rep
>> factor=3 so it seems to hang because it needs 3 nodes to achieve the factor
>> (replicas are not created on the same node). If you set the replication
>> factor=2 i think you will not have this issue. So in general you must make
>> sure that the rep factor is <= to the available datanodes.
>>
>> BR,
>> George
>>
>>
>> On 6/21/2013 12:29 PM, sam liu wrote:
>>
>> Hi,
>>
>> I encountered an issue which hangs the decommission operatoin. Its steps:
>> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
>> hdfs-site.xml, set the 'dfs.replication' to 2
>> 2. Add node dn3 into the cluster as a new datanode, and did not change the
>> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> note: step 2 passed
>> 3. Decommission dn3 from the cluster
>>
>> Expected result: dn3 could be decommissioned successfully
>>
>> Actual result: decommission progress hangs and the status always be
>> 'Waiting DataNode status: Decommissioned'
>>
>> However, if the initial cluster includes >= 3 datanodes, this issue won't
>> be encountered when add/remove another datanode.
>>
>> Also, after step 2, I noticed that some block's expected replicas is 3,
>> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>>
>> Could anyone pls help provide some triages?
>>
>> Thanks in advance!
>>
>>
>>
>> --
>> ---------------------------
>>
>> George Kousiouris, PhD
>> Electrical and Computer Engineer
>> Division of Communications,
>> Electronics and Information Engineering
>> School of Electrical and Computer Engineering
>> Tel: +30 210 772 2546
>> Mobile: +30 6939354121
>> Fax: +30 210 772 2569
>> Email: gkousiou@mail.ntua.gr
>> Site: http://users.ntua.gr/gkousiou/
>>
>> National Technical University of Athens
>> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by Harsh J <ha...@cloudera.com>.

The dfs.replication is a per-file parameter. If you have a client that
does not use the supplied configs, then its default replication is 3
and all files it will create (as part of the app or via a job config)
will be with replication factor 3.

You can do an -lsr to find all files and filter which ones have been
created with a factor of 3 (versus expected config of 2).

On Fri, Jun 21, 2013 at 3:13 PM, sam liu <sa...@gmail.com> wrote:
> Hi George,
>
> Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But still
> encounter this issue.
>
> Thanks!
>
>
> 2013/6/21 George Kousiouris <gk...@mail.ntua.gr>
>>
>>
>> Hi,
>>
>> I think i have faced this before, the problem is that you have the rep
>> factor=3 so it seems to hang because it needs 3 nodes to achieve the factor
>> (replicas are not created on the same node). If you set the replication
>> factor=2 i think you will not have this issue. So in general you must make
>> sure that the rep factor is <= to the available datanodes.
>>
>> BR,
>> George
>>
>>
>> On 6/21/2013 12:29 PM, sam liu wrote:
>>
>> Hi,
>>
>> I encountered an issue which hangs the decommission operatoin. Its steps:
>> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
>> hdfs-site.xml, set the 'dfs.replication' to 2
>> 2. Add node dn3 into the cluster as a new datanode, and did not change the
>> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> note: step 2 passed
>> 3. Decommission dn3 from the cluster
>>
>> Expected result: dn3 could be decommissioned successfully
>>
>> Actual result: decommission progress hangs and the status always be
>> 'Waiting DataNode status: Decommissioned'
>>
>> However, if the initial cluster includes >= 3 datanodes, this issue won't
>> be encountered when add/remove another datanode.
>>
>> Also, after step 2, I noticed that some block's expected replicas is 3,
>> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>>
>> Could anyone pls help provide some triages?
>>
>> Thanks in advance!
>>
>>
>>
>> --
>> ---------------------------
>>
>> George Kousiouris, PhD
>> Electrical and Computer Engineer
>> Division of Communications,
>> Electronics and Information Engineering
>> School of Electrical and Computer Engineering
>> Tel: +30 210 772 2546
>> Mobile: +30 6939354121
>> Fax: +30 210 772 2569
>> Email: gkousiou@mail.ntua.gr
>> Site: http://users.ntua.gr/gkousiou/
>>
>> National Technical University of Athens
>> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>
>



-- 
Harsh J

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Hi George,

Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
still encounter this issue.

Thanks!


2013/6/21 George Kousiouris <gk...@mail.ntua.gr>

>
> Hi,
>
> I think i have faced this before, the problem is that you have the rep
> factor=3 so it seems to hang because it needs 3 nodes to achieve the factor
> (replicas are not created on the same node). If you set the replication
> factor=2 i think you will not have this issue. So in general you must make
> sure that the rep factor is <= to the available datanodes.
>
> BR,
> George
>
>
> On 6/21/2013 12:29 PM, sam liu wrote:
>
>  Hi,
>
> I encountered an issue which hangs the decommission operatoin. Its steps:
>  1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in hdfs-site.xml, set the 'dfs.replication' to 2
>  2. Add node dn3 into the cluster as a new datanode, and did not change
> the 'dfs.replication' value in hdfs-site.xml and keep it as 2
>  note: step 2 passed
>  3. Decommission dn3 from the cluster
>
>  Expected result: dn3 could be decommissioned successfully
>
>  Actual result: decommission progress hangs and the status always be
> 'Waiting DataNode status: Decommissioned'
>
>  However, if the initial cluster includes >= 3 datanodes, this issue
> won't be encountered when add/remove another datanode.
>
>  Also, after step 2, I noticed that some block's expected replicas is 3,
> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>
>  Could anyone pls help provide some triages?
>
>  Thanks in advance!
>
>
>
> --
> ---------------------------
>
> George Kousiouris, PhD
> Electrical and Computer Engineer
> Division of Communications,
> Electronics and Information Engineering
> School of Electrical and Computer Engineering
> Tel: +30 210 772 2546
> Mobile: +30 6939354121
> Fax: +30 210 772 2569
> Email: gkousiou@mail.ntua.gr
> Site: http://users.ntua.gr/gkousiou/
>
> National Technical University of Athens
> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Hi George,

Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
still encounter this issue.

Thanks!


2013/6/21 George Kousiouris <gk...@mail.ntua.gr>

>
> Hi,
>
> I think i have faced this before, the problem is that you have the rep
> factor=3 so it seems to hang because it needs 3 nodes to achieve the factor
> (replicas are not created on the same node). If you set the replication
> factor=2 i think you will not have this issue. So in general you must make
> sure that the rep factor is <= to the available datanodes.
>
> BR,
> George
>
>
> On 6/21/2013 12:29 PM, sam liu wrote:
>
>  Hi,
>
> I encountered an issue which hangs the decommission operatoin. Its steps:
>  1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in hdfs-site.xml, set the 'dfs.replication' to 2
>  2. Add node dn3 into the cluster as a new datanode, and did not change
> the 'dfs.replication' value in hdfs-site.xml and keep it as 2
>  note: step 2 passed
>  3. Decommission dn3 from the cluster
>
>  Expected result: dn3 could be decommissioned successfully
>
>  Actual result: decommission progress hangs and the status always be
> 'Waiting DataNode status: Decommissioned'
>
>  However, if the initial cluster includes >= 3 datanodes, this issue
> won't be encountered when add/remove another datanode.
>
>  Also, after step 2, I noticed that some block's expected replicas is 3,
> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>
>  Could anyone pls help provide some triages?
>
>  Thanks in advance!
>
>
>
> --
> ---------------------------
>
> George Kousiouris, PhD
> Electrical and Computer Engineer
> Division of Communications,
> Electronics and Information Engineering
> School of Electrical and Computer Engineering
> Tel: +30 210 772 2546
> Mobile: +30 6939354121
> Fax: +30 210 772 2569
> Email: gkousiou@mail.ntua.gr
> Site: http://users.ntua.gr/gkousiou/
>
> National Technical University of Athens
> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Hi George,

Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
still encounter this issue.

Thanks!


2013/6/21 George Kousiouris <gk...@mail.ntua.gr>

>
> Hi,
>
> I think i have faced this before, the problem is that you have the rep
> factor=3 so it seems to hang because it needs 3 nodes to achieve the factor
> (replicas are not created on the same node). If you set the replication
> factor=2 i think you will not have this issue. So in general you must make
> sure that the rep factor is <= to the available datanodes.
>
> BR,
> George
>
>
> On 6/21/2013 12:29 PM, sam liu wrote:
>
>  Hi,
>
> I encountered an issue which hangs the decommission operatoin. Its steps:
>  1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in hdfs-site.xml, set the 'dfs.replication' to 2
>  2. Add node dn3 into the cluster as a new datanode, and did not change
> the 'dfs.replication' value in hdfs-site.xml and keep it as 2
>  note: step 2 passed
>  3. Decommission dn3 from the cluster
>
>  Expected result: dn3 could be decommissioned successfully
>
>  Actual result: decommission progress hangs and the status always be
> 'Waiting DataNode status: Decommissioned'
>
>  However, if the initial cluster includes >= 3 datanodes, this issue
> won't be encountered when add/remove another datanode.
>
>  Also, after step 2, I noticed that some block's expected replicas is 3,
> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>
>  Could anyone pls help provide some triages?
>
>  Thanks in advance!
>
>
>
> --
> ---------------------------
>
> George Kousiouris, PhD
> Electrical and Computer Engineer
> Division of Communications,
> Electronics and Information Engineering
> School of Electrical and Computer Engineering
> Tel: +30 210 772 2546
> Mobile: +30 6939354121
> Fax: +30 210 772 2569
> Email: gkousiou@mail.ntua.gr
> Site: http://users.ntua.gr/gkousiou/
>
> National Technical University of Athens
> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by sam liu <sa...@gmail.com>.

Hi George,

Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
still encounter this issue.

Thanks!


2013/6/21 George Kousiouris <gk...@mail.ntua.gr>

>
> Hi,
>
> I think i have faced this before, the problem is that you have the rep
> factor=3 so it seems to hang because it needs 3 nodes to achieve the factor
> (replicas are not created on the same node). If you set the replication
> factor=2 i think you will not have this issue. So in general you must make
> sure that the rep factor is <= to the available datanodes.
>
> BR,
> George
>
>
> On 6/21/2013 12:29 PM, sam liu wrote:
>
>  Hi,
>
> I encountered an issue which hangs the decommission operatoin. Its steps:
>  1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in hdfs-site.xml, set the 'dfs.replication' to 2
>  2. Add node dn3 into the cluster as a new datanode, and did not change
> the 'dfs.replication' value in hdfs-site.xml and keep it as 2
>  note: step 2 passed
>  3. Decommission dn3 from the cluster
>
>  Expected result: dn3 could be decommissioned successfully
>
>  Actual result: decommission progress hangs and the status always be
> 'Waiting DataNode status: Decommissioned'
>
>  However, if the initial cluster includes >= 3 datanodes, this issue
> won't be encountered when add/remove another datanode.
>
>  Also, after step 2, I noticed that some block's expected replicas is 3,
> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>
>  Could anyone pls help provide some triages?
>
>  Thanks in advance!
>
>
>
> --
> ---------------------------
>
> George Kousiouris, PhD
> Electrical and Computer Engineer
> Division of Communications,
> Electronics and Information Engineering
> School of Electrical and Computer Engineering
> Tel: +30 210 772 2546
> Mobile: +30 6939354121
> Fax: +30 210 772 2569
> Email: gkousiou@mail.ntua.gr
> Site: http://users.ntua.gr/gkousiou/
>
> National Technical University of Athens
> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>
>

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by George Kousiouris <gk...@mail.ntua.gr>.

Hi,

I think i have faced this before, the problem is that you have the rep 
factor=3 so it seems to hang because it needs 3 nodes to achieve the 
factor (replicas are not created on the same node). If you set the 
replication factor=2 i think you will not have this issue. So in general 
you must make sure that the rep factor is <= to the available datanodes.

BR,
George

On 6/21/2013 12:29 PM, sam liu wrote:
> Hi,
>
> I encountered an issue which hangs the decommission operatoin. Its steps:
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, 
> in hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change 
> the 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
> 3. Decommission dn3 from the cluster
>
> Expected result: dn3 could be decommissioned successfully
>
> Actual result: decommission progress hangs and the status always be 
> 'Waiting DataNode status: Decommissioned'
>
> However, if the initial cluster includes >= 3 datanodes, this issue 
> won't be encountered when add/remove another datanode.
>
> Also, after step 2, I noticed that some block's expected replicas is 
> 3, but the 'dfs.replication' value in hdfs-site.xml is always 2!
>
> Could anyone pls help provide some triages?
>
> Thanks in advance!


-- 
---------------------------

George Kousiouris, PhD
Electrical and Computer Engineer
Division of Communications,
Electronics and Information Engineering
School of Electrical and Computer Engineering
Tel: +30 210 772 2546
Mobile: +30 6939354121
Fax: +30 210 772 2569
Email: gkousiou@mail.ntua.gr
Site: http://users.ntua.gr/gkousiou/

National Technical University of Athens
9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by George Kousiouris <gk...@mail.ntua.gr>.

Hi,

I think i have faced this before, the problem is that you have the rep 
factor=3 so it seems to hang because it needs 3 nodes to achieve the 
factor (replicas are not created on the same node). If you set the 
replication factor=2 i think you will not have this issue. So in general 
you must make sure that the rep factor is <= to the available datanodes.

BR,
George

On 6/21/2013 12:29 PM, sam liu wrote:
> Hi,
>
> I encountered an issue which hangs the decommission operatoin. Its steps:
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, 
> in hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change 
> the 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
> 3. Decommission dn3 from the cluster
>
> Expected result: dn3 could be decommissioned successfully
>
> Actual result: decommission progress hangs and the status always be 
> 'Waiting DataNode status: Decommissioned'
>
> However, if the initial cluster includes >= 3 datanodes, this issue 
> won't be encountered when add/remove another datanode.
>
> Also, after step 2, I noticed that some block's expected replicas is 
> 3, but the 'dfs.replication' value in hdfs-site.xml is always 2!
>
> Could anyone pls help provide some triages?
>
> Thanks in advance!


-- 
---------------------------

George Kousiouris, PhD
Electrical and Computer Engineer
Division of Communications,
Electronics and Information Engineering
School of Electrical and Computer Engineering
Tel: +30 210 772 2546
Mobile: +30 6939354121
Fax: +30 210 772 2569
Email: gkousiou@mail.ntua.gr
Site: http://users.ntua.gr/gkousiou/

National Technical University of Athens
9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by George Kousiouris <gk...@mail.ntua.gr>.

Hi,

I think i have faced this before, the problem is that you have the rep 
factor=3 so it seems to hang because it needs 3 nodes to achieve the 
factor (replicas are not created on the same node). If you set the 
replication factor=2 i think you will not have this issue. So in general 
you must make sure that the rep factor is <= to the available datanodes.

BR,
George

On 6/21/2013 12:29 PM, sam liu wrote:
> Hi,
>
> I encountered an issue which hangs the decommission operatoin. Its steps:
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, 
> in hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change 
> the 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
> 3. Decommission dn3 from the cluster
>
> Expected result: dn3 could be decommissioned successfully
>
> Actual result: decommission progress hangs and the status always be 
> 'Waiting DataNode status: Decommissioned'
>
> However, if the initial cluster includes >= 3 datanodes, this issue 
> won't be encountered when add/remove another datanode.
>
> Also, after step 2, I noticed that some block's expected replicas is 
> 3, but the 'dfs.replication' value in hdfs-site.xml is always 2!
>
> Could anyone pls help provide some triages?
>
> Thanks in advance!


-- 
---------------------------

George Kousiouris, PhD
Electrical and Computer Engineer
Division of Communications,
Electronics and Information Engineering
School of Electrical and Computer Engineering
Tel: +30 210 772 2546
Mobile: +30 6939354121
Fax: +30 210 772 2569
Email: gkousiou@mail.ntua.gr
Site: http://users.ntua.gr/gkousiou/

National Technical University of Athens
9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

Posted by George Kousiouris <gk...@mail.ntua.gr>.

Hi,

I think i have faced this before, the problem is that you have the rep 
factor=3 so it seems to hang because it needs 3 nodes to achieve the 
factor (replicas are not created on the same node). If you set the 
replication factor=2 i think you will not have this issue. So in general 
you must make sure that the rep factor is <= to the available datanodes.

BR,
George

On 6/21/2013 12:29 PM, sam liu wrote:
> Hi,
>
> I encountered an issue which hangs the decommission operatoin. Its steps:
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, 
> in hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change 
> the 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
> 3. Decommission dn3 from the cluster
>
> Expected result: dn3 could be decommissioned successfully
>
> Actual result: decommission progress hangs and the status always be 
> 'Waiting DataNode status: Decommissioned'
>
> However, if the initial cluster includes >= 3 datanodes, this issue 
> won't be encountered when add/remove another datanode.
>
> Also, after step 2, I noticed that some block's expected replicas is 
> 3, but the 'dfs.replication' value in hdfs-site.xml is always 2!
>
> Could anyone pls help provide some triages?
>
> Thanks in advance!


-- 
---------------------------

George Kousiouris, PhD
Electrical and Computer Engineer
Division of Communications,
Electronics and Information Engineering
School of Electrical and Computer Engineering
Tel: +30 210 772 2546
Mobile: +30 6939354121
Fax: +30 210 772 2569
Email: gkousiou@mail.ntua.gr
Site: http://users.ntua.gr/gkousiou/

National Technical University of Athens
9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece