You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Patai Sangbutsarakum <si...@gmail.com> on 2012/10/15 21:01:32 UTC

final the dfs.replication and fsck

Hi Hadoopers,

I have
<property>
    <name>dfs.replication</name>
    <value>2</value>
    <final>true</final>
  </property>

set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.

Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.

I am on cdh3u4.

please suggest.
Patai

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Thanks you so much for confirming that.

On Mon, Oct 15, 2012 at 9:25 PM, Harsh J <ha...@cloudera.com> wrote:
> Patai,
>
> My bad - that was on my mind but I missed noting it down on my earlier
> reply. Yes you'd have to control that as well. 2 should be fine for
> smaller clusters.
>
> On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Just want to share & check if this is make sense.
>>
>> Job was failed to run after i restarted the namenode and the cluster
>> stopped complain about under-replication.
>>
>> this is what i found in log file
>>
>> Requested replication 10 exceeds maximum 2
>> java.io.IOException: file
>> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
>> Requested replication 10 exceeds maximum 2
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
>>         at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
>>         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
>>
>>
>> So, i scanned though those xml config files, and guess to change
>> <name>mapred.submit.replication</name> from 10 to 2, and restarted again.
>>
>> That's when jobs can start running again.
>> Hopefully that change is make sense.
>>
>>
>> Thanks
>> Patai
>>
>> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>> Thanks Harsh, dfs.replication.max does do the magic!!
>>>
>>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>>>
>>>>
>>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>> Hey Chris,
>>>>>
>>>>> The dfs.replication param is an exception to the <final> config
>>>>> feature. If one uses the FileSystem API, one can pass in any short
>>>>> value they want the replication to be. This bypasses the
>>>>> configuration, and the configuration (being per-file) is also client
>>>>> sided.
>>>>>
>>>>> The right way for an administrator to enforce a "max" replication
>>>>> value at a create/setRep level, would be to set
>>>>> the dfs.replication.max to a desired value at the NameNode and restart
>>>>> it.
>>>>>
>>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>>>> <cn...@hortonworks.com> wrote:
>>>>> > Hello Patai,
>>>>> >
>>>>> > Has your configuration file change been copied to all nodes in the
>>>>> > cluster?
>>>>> >
>>>>> > Are there applications connecting from outside of the cluster?  If so,
>>>>> > then
>>>>> > those clients could have separate configuration files or code setting
>>>>> > dfs.replication (and other configuration properties).  These would not
>>>>> > be
>>>>> > limited by final declarations in the cluster's configuration files.
>>>>> > <final>true</final> controls configuration file resource loading, but it
>>>>> > does not necessarily block different nodes or different applications
>>>>> > from
>>>>> > running with completely different configurations.
>>>>> >
>>>>> > Hope this helps,
>>>>> > --Chris
>>>>> >
>>>>> >
>>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>>>> > <si...@gmail.com> wrote:
>>>>> >>
>>>>> >> Hi Hadoopers,
>>>>> >>
>>>>> >> I have
>>>>> >> <property>
>>>>> >>     <name>dfs.replication</name>
>>>>> >>     <value>2</value>
>>>>> >>     <final>true</final>
>>>>> >>   </property>
>>>>> >>
>>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>>>> >> cluster is running the code that will later be deployed in production,
>>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>>>> >> 2; the number that developer thought that will fit in production
>>>>> >> environment.
>>>>> >>
>>>>> >> Even though I final the property dfs.replication in staging cluster
>>>>> >> already. every time i run fsck on the staging cluster i still see it
>>>>> >> said under replication.
>>>>> >> I thought final keyword will not honor value in job config, but it
>>>>> >> doesn't seem so when i run fsck.
>>>>> >>
>>>>> >> I am on cdh3u4.
>>>>> >>
>>>>> >> please suggest.
>>>>> >> Patai
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>
>>>>
>
>
>
> --
> Harsh J

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Thanks you so much for confirming that.

On Mon, Oct 15, 2012 at 9:25 PM, Harsh J <ha...@cloudera.com> wrote:
> Patai,
>
> My bad - that was on my mind but I missed noting it down on my earlier
> reply. Yes you'd have to control that as well. 2 should be fine for
> smaller clusters.
>
> On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Just want to share & check if this is make sense.
>>
>> Job was failed to run after i restarted the namenode and the cluster
>> stopped complain about under-replication.
>>
>> this is what i found in log file
>>
>> Requested replication 10 exceeds maximum 2
>> java.io.IOException: file
>> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
>> Requested replication 10 exceeds maximum 2
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
>>         at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
>>         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
>>
>>
>> So, i scanned though those xml config files, and guess to change
>> <name>mapred.submit.replication</name> from 10 to 2, and restarted again.
>>
>> That's when jobs can start running again.
>> Hopefully that change is make sense.
>>
>>
>> Thanks
>> Patai
>>
>> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>> Thanks Harsh, dfs.replication.max does do the magic!!
>>>
>>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>>>
>>>>
>>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>> Hey Chris,
>>>>>
>>>>> The dfs.replication param is an exception to the <final> config
>>>>> feature. If one uses the FileSystem API, one can pass in any short
>>>>> value they want the replication to be. This bypasses the
>>>>> configuration, and the configuration (being per-file) is also client
>>>>> sided.
>>>>>
>>>>> The right way for an administrator to enforce a "max" replication
>>>>> value at a create/setRep level, would be to set
>>>>> the dfs.replication.max to a desired value at the NameNode and restart
>>>>> it.
>>>>>
>>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>>>> <cn...@hortonworks.com> wrote:
>>>>> > Hello Patai,
>>>>> >
>>>>> > Has your configuration file change been copied to all nodes in the
>>>>> > cluster?
>>>>> >
>>>>> > Are there applications connecting from outside of the cluster?  If so,
>>>>> > then
>>>>> > those clients could have separate configuration files or code setting
>>>>> > dfs.replication (and other configuration properties).  These would not
>>>>> > be
>>>>> > limited by final declarations in the cluster's configuration files.
>>>>> > <final>true</final> controls configuration file resource loading, but it
>>>>> > does not necessarily block different nodes or different applications
>>>>> > from
>>>>> > running with completely different configurations.
>>>>> >
>>>>> > Hope this helps,
>>>>> > --Chris
>>>>> >
>>>>> >
>>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>>>> > <si...@gmail.com> wrote:
>>>>> >>
>>>>> >> Hi Hadoopers,
>>>>> >>
>>>>> >> I have
>>>>> >> <property>
>>>>> >>     <name>dfs.replication</name>
>>>>> >>     <value>2</value>
>>>>> >>     <final>true</final>
>>>>> >>   </property>
>>>>> >>
>>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>>>> >> cluster is running the code that will later be deployed in production,
>>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>>>> >> 2; the number that developer thought that will fit in production
>>>>> >> environment.
>>>>> >>
>>>>> >> Even though I final the property dfs.replication in staging cluster
>>>>> >> already. every time i run fsck on the staging cluster i still see it
>>>>> >> said under replication.
>>>>> >> I thought final keyword will not honor value in job config, but it
>>>>> >> doesn't seem so when i run fsck.
>>>>> >>
>>>>> >> I am on cdh3u4.
>>>>> >>
>>>>> >> please suggest.
>>>>> >> Patai
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>
>>>>
>
>
>
> --
> Harsh J

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Thanks you so much for confirming that.

On Mon, Oct 15, 2012 at 9:25 PM, Harsh J <ha...@cloudera.com> wrote:
> Patai,
>
> My bad - that was on my mind but I missed noting it down on my earlier
> reply. Yes you'd have to control that as well. 2 should be fine for
> smaller clusters.
>
> On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Just want to share & check if this is make sense.
>>
>> Job was failed to run after i restarted the namenode and the cluster
>> stopped complain about under-replication.
>>
>> this is what i found in log file
>>
>> Requested replication 10 exceeds maximum 2
>> java.io.IOException: file
>> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
>> Requested replication 10 exceeds maximum 2
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
>>         at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
>>         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
>>
>>
>> So, i scanned though those xml config files, and guess to change
>> <name>mapred.submit.replication</name> from 10 to 2, and restarted again.
>>
>> That's when jobs can start running again.
>> Hopefully that change is make sense.
>>
>>
>> Thanks
>> Patai
>>
>> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>> Thanks Harsh, dfs.replication.max does do the magic!!
>>>
>>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>>>
>>>>
>>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>> Hey Chris,
>>>>>
>>>>> The dfs.replication param is an exception to the <final> config
>>>>> feature. If one uses the FileSystem API, one can pass in any short
>>>>> value they want the replication to be. This bypasses the
>>>>> configuration, and the configuration (being per-file) is also client
>>>>> sided.
>>>>>
>>>>> The right way for an administrator to enforce a "max" replication
>>>>> value at a create/setRep level, would be to set
>>>>> the dfs.replication.max to a desired value at the NameNode and restart
>>>>> it.
>>>>>
>>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>>>> <cn...@hortonworks.com> wrote:
>>>>> > Hello Patai,
>>>>> >
>>>>> > Has your configuration file change been copied to all nodes in the
>>>>> > cluster?
>>>>> >
>>>>> > Are there applications connecting from outside of the cluster?  If so,
>>>>> > then
>>>>> > those clients could have separate configuration files or code setting
>>>>> > dfs.replication (and other configuration properties).  These would not
>>>>> > be
>>>>> > limited by final declarations in the cluster's configuration files.
>>>>> > <final>true</final> controls configuration file resource loading, but it
>>>>> > does not necessarily block different nodes or different applications
>>>>> > from
>>>>> > running with completely different configurations.
>>>>> >
>>>>> > Hope this helps,
>>>>> > --Chris
>>>>> >
>>>>> >
>>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>>>> > <si...@gmail.com> wrote:
>>>>> >>
>>>>> >> Hi Hadoopers,
>>>>> >>
>>>>> >> I have
>>>>> >> <property>
>>>>> >>     <name>dfs.replication</name>
>>>>> >>     <value>2</value>
>>>>> >>     <final>true</final>
>>>>> >>   </property>
>>>>> >>
>>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>>>> >> cluster is running the code that will later be deployed in production,
>>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>>>> >> 2; the number that developer thought that will fit in production
>>>>> >> environment.
>>>>> >>
>>>>> >> Even though I final the property dfs.replication in staging cluster
>>>>> >> already. every time i run fsck on the staging cluster i still see it
>>>>> >> said under replication.
>>>>> >> I thought final keyword will not honor value in job config, but it
>>>>> >> doesn't seem so when i run fsck.
>>>>> >>
>>>>> >> I am on cdh3u4.
>>>>> >>
>>>>> >> please suggest.
>>>>> >> Patai
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>
>>>>
>
>
>
> --
> Harsh J

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Thanks you so much for confirming that.

On Mon, Oct 15, 2012 at 9:25 PM, Harsh J <ha...@cloudera.com> wrote:
> Patai,
>
> My bad - that was on my mind but I missed noting it down on my earlier
> reply. Yes you'd have to control that as well. 2 should be fine for
> smaller clusters.
>
> On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Just want to share & check if this is make sense.
>>
>> Job was failed to run after i restarted the namenode and the cluster
>> stopped complain about under-replication.
>>
>> this is what i found in log file
>>
>> Requested replication 10 exceeds maximum 2
>> java.io.IOException: file
>> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
>> Requested replication 10 exceeds maximum 2
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
>>         at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
>>         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
>>
>>
>> So, i scanned though those xml config files, and guess to change
>> <name>mapred.submit.replication</name> from 10 to 2, and restarted again.
>>
>> That's when jobs can start running again.
>> Hopefully that change is make sense.
>>
>>
>> Thanks
>> Patai
>>
>> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>> Thanks Harsh, dfs.replication.max does do the magic!!
>>>
>>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>>>
>>>>
>>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>> Hey Chris,
>>>>>
>>>>> The dfs.replication param is an exception to the <final> config
>>>>> feature. If one uses the FileSystem API, one can pass in any short
>>>>> value they want the replication to be. This bypasses the
>>>>> configuration, and the configuration (being per-file) is also client
>>>>> sided.
>>>>>
>>>>> The right way for an administrator to enforce a "max" replication
>>>>> value at a create/setRep level, would be to set
>>>>> the dfs.replication.max to a desired value at the NameNode and restart
>>>>> it.
>>>>>
>>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>>>> <cn...@hortonworks.com> wrote:
>>>>> > Hello Patai,
>>>>> >
>>>>> > Has your configuration file change been copied to all nodes in the
>>>>> > cluster?
>>>>> >
>>>>> > Are there applications connecting from outside of the cluster?  If so,
>>>>> > then
>>>>> > those clients could have separate configuration files or code setting
>>>>> > dfs.replication (and other configuration properties).  These would not
>>>>> > be
>>>>> > limited by final declarations in the cluster's configuration files.
>>>>> > <final>true</final> controls configuration file resource loading, but it
>>>>> > does not necessarily block different nodes or different applications
>>>>> > from
>>>>> > running with completely different configurations.
>>>>> >
>>>>> > Hope this helps,
>>>>> > --Chris
>>>>> >
>>>>> >
>>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>>>> > <si...@gmail.com> wrote:
>>>>> >>
>>>>> >> Hi Hadoopers,
>>>>> >>
>>>>> >> I have
>>>>> >> <property>
>>>>> >>     <name>dfs.replication</name>
>>>>> >>     <value>2</value>
>>>>> >>     <final>true</final>
>>>>> >>   </property>
>>>>> >>
>>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>>>> >> cluster is running the code that will later be deployed in production,
>>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>>>> >> 2; the number that developer thought that will fit in production
>>>>> >> environment.
>>>>> >>
>>>>> >> Even though I final the property dfs.replication in staging cluster
>>>>> >> already. every time i run fsck on the staging cluster i still see it
>>>>> >> said under replication.
>>>>> >> I thought final keyword will not honor value in job config, but it
>>>>> >> doesn't seem so when i run fsck.
>>>>> >>
>>>>> >> I am on cdh3u4.
>>>>> >>
>>>>> >> please suggest.
>>>>> >> Patai
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>
>>>>
>
>
>
> --
> Harsh J

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Patai,

My bad - that was on my mind but I missed noting it down on my earlier
reply. Yes you'd have to control that as well. 2 should be fine for
smaller clusters.

On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Just want to share & check if this is make sense.
>
> Job was failed to run after i restarted the namenode and the cluster
> stopped complain about under-replication.
>
> this is what i found in log file
>
> Requested replication 10 exceeds maximum 2
> java.io.IOException: file
> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
> Requested replication 10 exceeds maximum 2
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
>         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
>
>
> So, i scanned though those xml config files, and guess to change
> <name>mapred.submit.replication</name> from 10 to 2, and restarted again.
>
> That's when jobs can start running again.
> Hopefully that change is make sense.
>
>
> Thanks
> Patai
>
> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Thanks Harsh, dfs.replication.max does do the magic!!
>>
>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>>
>>>
>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Hey Chris,
>>>>
>>>> The dfs.replication param is an exception to the <final> config
>>>> feature. If one uses the FileSystem API, one can pass in any short
>>>> value they want the replication to be. This bypasses the
>>>> configuration, and the configuration (being per-file) is also client
>>>> sided.
>>>>
>>>> The right way for an administrator to enforce a "max" replication
>>>> value at a create/setRep level, would be to set
>>>> the dfs.replication.max to a desired value at the NameNode and restart
>>>> it.
>>>>
>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>>> <cn...@hortonworks.com> wrote:
>>>> > Hello Patai,
>>>> >
>>>> > Has your configuration file change been copied to all nodes in the
>>>> > cluster?
>>>> >
>>>> > Are there applications connecting from outside of the cluster?  If so,
>>>> > then
>>>> > those clients could have separate configuration files or code setting
>>>> > dfs.replication (and other configuration properties).  These would not
>>>> > be
>>>> > limited by final declarations in the cluster's configuration files.
>>>> > <final>true</final> controls configuration file resource loading, but it
>>>> > does not necessarily block different nodes or different applications
>>>> > from
>>>> > running with completely different configurations.
>>>> >
>>>> > Hope this helps,
>>>> > --Chris
>>>> >
>>>> >
>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>>> > <si...@gmail.com> wrote:
>>>> >>
>>>> >> Hi Hadoopers,
>>>> >>
>>>> >> I have
>>>> >> <property>
>>>> >>     <name>dfs.replication</name>
>>>> >>     <value>2</value>
>>>> >>     <final>true</final>
>>>> >>   </property>
>>>> >>
>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>>> >> cluster is running the code that will later be deployed in production,
>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>>> >> 2; the number that developer thought that will fit in production
>>>> >> environment.
>>>> >>
>>>> >> Even though I final the property dfs.replication in staging cluster
>>>> >> already. every time i run fsck on the staging cluster i still see it
>>>> >> said under replication.
>>>> >> I thought final keyword will not honor value in job config, but it
>>>> >> doesn't seem so when i run fsck.
>>>> >>
>>>> >> I am on cdh3u4.
>>>> >>
>>>> >> please suggest.
>>>> >> Patai
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Patai,

My bad - that was on my mind but I missed noting it down on my earlier
reply. Yes you'd have to control that as well. 2 should be fine for
smaller clusters.

On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Just want to share & check if this is make sense.
>
> Job was failed to run after i restarted the namenode and the cluster
> stopped complain about under-replication.
>
> this is what i found in log file
>
> Requested replication 10 exceeds maximum 2
> java.io.IOException: file
> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
> Requested replication 10 exceeds maximum 2
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
>         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
>
>
> So, i scanned though those xml config files, and guess to change
> <name>mapred.submit.replication</name> from 10 to 2, and restarted again.
>
> That's when jobs can start running again.
> Hopefully that change is make sense.
>
>
> Thanks
> Patai
>
> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Thanks Harsh, dfs.replication.max does do the magic!!
>>
>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>>
>>>
>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Hey Chris,
>>>>
>>>> The dfs.replication param is an exception to the <final> config
>>>> feature. If one uses the FileSystem API, one can pass in any short
>>>> value they want the replication to be. This bypasses the
>>>> configuration, and the configuration (being per-file) is also client
>>>> sided.
>>>>
>>>> The right way for an administrator to enforce a "max" replication
>>>> value at a create/setRep level, would be to set
>>>> the dfs.replication.max to a desired value at the NameNode and restart
>>>> it.
>>>>
>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>>> <cn...@hortonworks.com> wrote:
>>>> > Hello Patai,
>>>> >
>>>> > Has your configuration file change been copied to all nodes in the
>>>> > cluster?
>>>> >
>>>> > Are there applications connecting from outside of the cluster?  If so,
>>>> > then
>>>> > those clients could have separate configuration files or code setting
>>>> > dfs.replication (and other configuration properties).  These would not
>>>> > be
>>>> > limited by final declarations in the cluster's configuration files.
>>>> > <final>true</final> controls configuration file resource loading, but it
>>>> > does not necessarily block different nodes or different applications
>>>> > from
>>>> > running with completely different configurations.
>>>> >
>>>> > Hope this helps,
>>>> > --Chris
>>>> >
>>>> >
>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>>> > <si...@gmail.com> wrote:
>>>> >>
>>>> >> Hi Hadoopers,
>>>> >>
>>>> >> I have
>>>> >> <property>
>>>> >>     <name>dfs.replication</name>
>>>> >>     <value>2</value>
>>>> >>     <final>true</final>
>>>> >>   </property>
>>>> >>
>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>>> >> cluster is running the code that will later be deployed in production,
>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>>> >> 2; the number that developer thought that will fit in production
>>>> >> environment.
>>>> >>
>>>> >> Even though I final the property dfs.replication in staging cluster
>>>> >> already. every time i run fsck on the staging cluster i still see it
>>>> >> said under replication.
>>>> >> I thought final keyword will not honor value in job config, but it
>>>> >> doesn't seem so when i run fsck.
>>>> >>
>>>> >> I am on cdh3u4.
>>>> >>
>>>> >> please suggest.
>>>> >> Patai
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Patai,

My bad - that was on my mind but I missed noting it down on my earlier
reply. Yes you'd have to control that as well. 2 should be fine for
smaller clusters.

On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Just want to share & check if this is make sense.
>
> Job was failed to run after i restarted the namenode and the cluster
> stopped complain about under-replication.
>
> this is what i found in log file
>
> Requested replication 10 exceeds maximum 2
> java.io.IOException: file
> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
> Requested replication 10 exceeds maximum 2
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
>         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
>
>
> So, i scanned though those xml config files, and guess to change
> <name>mapred.submit.replication</name> from 10 to 2, and restarted again.
>
> That's when jobs can start running again.
> Hopefully that change is make sense.
>
>
> Thanks
> Patai
>
> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Thanks Harsh, dfs.replication.max does do the magic!!
>>
>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>>
>>>
>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Hey Chris,
>>>>
>>>> The dfs.replication param is an exception to the <final> config
>>>> feature. If one uses the FileSystem API, one can pass in any short
>>>> value they want the replication to be. This bypasses the
>>>> configuration, and the configuration (being per-file) is also client
>>>> sided.
>>>>
>>>> The right way for an administrator to enforce a "max" replication
>>>> value at a create/setRep level, would be to set
>>>> the dfs.replication.max to a desired value at the NameNode and restart
>>>> it.
>>>>
>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>>> <cn...@hortonworks.com> wrote:
>>>> > Hello Patai,
>>>> >
>>>> > Has your configuration file change been copied to all nodes in the
>>>> > cluster?
>>>> >
>>>> > Are there applications connecting from outside of the cluster?  If so,
>>>> > then
>>>> > those clients could have separate configuration files or code setting
>>>> > dfs.replication (and other configuration properties).  These would not
>>>> > be
>>>> > limited by final declarations in the cluster's configuration files.
>>>> > <final>true</final> controls configuration file resource loading, but it
>>>> > does not necessarily block different nodes or different applications
>>>> > from
>>>> > running with completely different configurations.
>>>> >
>>>> > Hope this helps,
>>>> > --Chris
>>>> >
>>>> >
>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>>> > <si...@gmail.com> wrote:
>>>> >>
>>>> >> Hi Hadoopers,
>>>> >>
>>>> >> I have
>>>> >> <property>
>>>> >>     <name>dfs.replication</name>
>>>> >>     <value>2</value>
>>>> >>     <final>true</final>
>>>> >>   </property>
>>>> >>
>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>>> >> cluster is running the code that will later be deployed in production,
>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>>> >> 2; the number that developer thought that will fit in production
>>>> >> environment.
>>>> >>
>>>> >> Even though I final the property dfs.replication in staging cluster
>>>> >> already. every time i run fsck on the staging cluster i still see it
>>>> >> said under replication.
>>>> >> I thought final keyword will not honor value in job config, but it
>>>> >> doesn't seem so when i run fsck.
>>>> >>
>>>> >> I am on cdh3u4.
>>>> >>
>>>> >> please suggest.
>>>> >> Patai
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Patai,

My bad - that was on my mind but I missed noting it down on my earlier
reply. Yes you'd have to control that as well. 2 should be fine for
smaller clusters.

On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Just want to share & check if this is make sense.
>
> Job was failed to run after i restarted the namenode and the cluster
> stopped complain about under-replication.
>
> this is what i found in log file
>
> Requested replication 10 exceeds maximum 2
> java.io.IOException: file
> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
> Requested replication 10 exceeds maximum 2
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
>         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
>
>
> So, i scanned though those xml config files, and guess to change
> <name>mapred.submit.replication</name> from 10 to 2, and restarted again.
>
> That's when jobs can start running again.
> Hopefully that change is make sense.
>
>
> Thanks
> Patai
>
> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Thanks Harsh, dfs.replication.max does do the magic!!
>>
>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>>
>>>
>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Hey Chris,
>>>>
>>>> The dfs.replication param is an exception to the <final> config
>>>> feature. If one uses the FileSystem API, one can pass in any short
>>>> value they want the replication to be. This bypasses the
>>>> configuration, and the configuration (being per-file) is also client
>>>> sided.
>>>>
>>>> The right way for an administrator to enforce a "max" replication
>>>> value at a create/setRep level, would be to set
>>>> the dfs.replication.max to a desired value at the NameNode and restart
>>>> it.
>>>>
>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>>> <cn...@hortonworks.com> wrote:
>>>> > Hello Patai,
>>>> >
>>>> > Has your configuration file change been copied to all nodes in the
>>>> > cluster?
>>>> >
>>>> > Are there applications connecting from outside of the cluster?  If so,
>>>> > then
>>>> > those clients could have separate configuration files or code setting
>>>> > dfs.replication (and other configuration properties).  These would not
>>>> > be
>>>> > limited by final declarations in the cluster's configuration files.
>>>> > <final>true</final> controls configuration file resource loading, but it
>>>> > does not necessarily block different nodes or different applications
>>>> > from
>>>> > running with completely different configurations.
>>>> >
>>>> > Hope this helps,
>>>> > --Chris
>>>> >
>>>> >
>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>>> > <si...@gmail.com> wrote:
>>>> >>
>>>> >> Hi Hadoopers,
>>>> >>
>>>> >> I have
>>>> >> <property>
>>>> >>     <name>dfs.replication</name>
>>>> >>     <value>2</value>
>>>> >>     <final>true</final>
>>>> >>   </property>
>>>> >>
>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>>> >> cluster is running the code that will later be deployed in production,
>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>>> >> 2; the number that developer thought that will fit in production
>>>> >> environment.
>>>> >>
>>>> >> Even though I final the property dfs.replication in staging cluster
>>>> >> already. every time i run fsck on the staging cluster i still see it
>>>> >> said under replication.
>>>> >> I thought final keyword will not honor value in job config, but it
>>>> >> doesn't seem so when i run fsck.
>>>> >>
>>>> >> I am on cdh3u4.
>>>> >>
>>>> >> please suggest.
>>>> >> Patai
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Just want to share & check if this is make sense.

Job was failed to run after i restarted the namenode and the cluster
stopped complain about under-replication.

this is what i found in log file

Requested replication 10 exceeds maximum 2
java.io.IOException: file
/tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
Requested replication 10 exceeds maximum 2
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
        at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143


So, i scanned though those xml config files, and guess to change
<name>mapred.submit.replication</name> from 10 to 2, and restarted again.

That's when jobs can start running again.
Hopefully that change is make sense.


Thanks
Patai

On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks Harsh, dfs.replication.max does do the magic!!
>
> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>
>>
>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Hey Chris,
>>>
>>> The dfs.replication param is an exception to the <final> config
>>> feature. If one uses the FileSystem API, one can pass in any short
>>> value they want the replication to be. This bypasses the
>>> configuration, and the configuration (being per-file) is also client
>>> sided.
>>>
>>> The right way for an administrator to enforce a "max" replication
>>> value at a create/setRep level, would be to set
>>> the dfs.replication.max to a desired value at the NameNode and restart
>>> it.
>>>
>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>> <cn...@hortonworks.com> wrote:
>>> > Hello Patai,
>>> >
>>> > Has your configuration file change been copied to all nodes in the
>>> > cluster?
>>> >
>>> > Are there applications connecting from outside of the cluster?  If so,
>>> > then
>>> > those clients could have separate configuration files or code setting
>>> > dfs.replication (and other configuration properties).  These would not
>>> > be
>>> > limited by final declarations in the cluster's configuration files.
>>> > <final>true</final> controls configuration file resource loading, but it
>>> > does not necessarily block different nodes or different applications
>>> > from
>>> > running with completely different configurations.
>>> >
>>> > Hope this helps,
>>> > --Chris
>>> >
>>> >
>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>> > <si...@gmail.com> wrote:
>>> >>
>>> >> Hi Hadoopers,
>>> >>
>>> >> I have
>>> >> <property>
>>> >>     <name>dfs.replication</name>
>>> >>     <value>2</value>
>>> >>     <final>true</final>
>>> >>   </property>
>>> >>
>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>> >> cluster is running the code that will later be deployed in production,
>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>> >> 2; the number that developer thought that will fit in production
>>> >> environment.
>>> >>
>>> >> Even though I final the property dfs.replication in staging cluster
>>> >> already. every time i run fsck on the staging cluster i still see it
>>> >> said under replication.
>>> >> I thought final keyword will not honor value in job config, but it
>>> >> doesn't seem so when i run fsck.
>>> >>
>>> >> I am on cdh3u4.
>>> >>
>>> >> please suggest.
>>> >> Patai
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Just want to share & check if this is make sense.

Job was failed to run after i restarted the namenode and the cluster
stopped complain about under-replication.

this is what i found in log file

Requested replication 10 exceeds maximum 2
java.io.IOException: file
/tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
Requested replication 10 exceeds maximum 2
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
        at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143


So, i scanned though those xml config files, and guess to change
<name>mapred.submit.replication</name> from 10 to 2, and restarted again.

That's when jobs can start running again.
Hopefully that change is make sense.


Thanks
Patai

On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks Harsh, dfs.replication.max does do the magic!!
>
> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>
>>
>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Hey Chris,
>>>
>>> The dfs.replication param is an exception to the <final> config
>>> feature. If one uses the FileSystem API, one can pass in any short
>>> value they want the replication to be. This bypasses the
>>> configuration, and the configuration (being per-file) is also client
>>> sided.
>>>
>>> The right way for an administrator to enforce a "max" replication
>>> value at a create/setRep level, would be to set
>>> the dfs.replication.max to a desired value at the NameNode and restart
>>> it.
>>>
>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>> <cn...@hortonworks.com> wrote:
>>> > Hello Patai,
>>> >
>>> > Has your configuration file change been copied to all nodes in the
>>> > cluster?
>>> >
>>> > Are there applications connecting from outside of the cluster?  If so,
>>> > then
>>> > those clients could have separate configuration files or code setting
>>> > dfs.replication (and other configuration properties).  These would not
>>> > be
>>> > limited by final declarations in the cluster's configuration files.
>>> > <final>true</final> controls configuration file resource loading, but it
>>> > does not necessarily block different nodes or different applications
>>> > from
>>> > running with completely different configurations.
>>> >
>>> > Hope this helps,
>>> > --Chris
>>> >
>>> >
>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>> > <si...@gmail.com> wrote:
>>> >>
>>> >> Hi Hadoopers,
>>> >>
>>> >> I have
>>> >> <property>
>>> >>     <name>dfs.replication</name>
>>> >>     <value>2</value>
>>> >>     <final>true</final>
>>> >>   </property>
>>> >>
>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>> >> cluster is running the code that will later be deployed in production,
>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>> >> 2; the number that developer thought that will fit in production
>>> >> environment.
>>> >>
>>> >> Even though I final the property dfs.replication in staging cluster
>>> >> already. every time i run fsck on the staging cluster i still see it
>>> >> said under replication.
>>> >> I thought final keyword will not honor value in job config, but it
>>> >> doesn't seem so when i run fsck.
>>> >>
>>> >> I am on cdh3u4.
>>> >>
>>> >> please suggest.
>>> >> Patai
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Just want to share & check if this is make sense.

Job was failed to run after i restarted the namenode and the cluster
stopped complain about under-replication.

this is what i found in log file

Requested replication 10 exceeds maximum 2
java.io.IOException: file
/tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
Requested replication 10 exceeds maximum 2
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
        at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143


So, i scanned though those xml config files, and guess to change
<name>mapred.submit.replication</name> from 10 to 2, and restarted again.

That's when jobs can start running again.
Hopefully that change is make sense.


Thanks
Patai

On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks Harsh, dfs.replication.max does do the magic!!
>
> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>
>>
>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Hey Chris,
>>>
>>> The dfs.replication param is an exception to the <final> config
>>> feature. If one uses the FileSystem API, one can pass in any short
>>> value they want the replication to be. This bypasses the
>>> configuration, and the configuration (being per-file) is also client
>>> sided.
>>>
>>> The right way for an administrator to enforce a "max" replication
>>> value at a create/setRep level, would be to set
>>> the dfs.replication.max to a desired value at the NameNode and restart
>>> it.
>>>
>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>> <cn...@hortonworks.com> wrote:
>>> > Hello Patai,
>>> >
>>> > Has your configuration file change been copied to all nodes in the
>>> > cluster?
>>> >
>>> > Are there applications connecting from outside of the cluster?  If so,
>>> > then
>>> > those clients could have separate configuration files or code setting
>>> > dfs.replication (and other configuration properties).  These would not
>>> > be
>>> > limited by final declarations in the cluster's configuration files.
>>> > <final>true</final> controls configuration file resource loading, but it
>>> > does not necessarily block different nodes or different applications
>>> > from
>>> > running with completely different configurations.
>>> >
>>> > Hope this helps,
>>> > --Chris
>>> >
>>> >
>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>> > <si...@gmail.com> wrote:
>>> >>
>>> >> Hi Hadoopers,
>>> >>
>>> >> I have
>>> >> <property>
>>> >>     <name>dfs.replication</name>
>>> >>     <value>2</value>
>>> >>     <final>true</final>
>>> >>   </property>
>>> >>
>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>> >> cluster is running the code that will later be deployed in production,
>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>> >> 2; the number that developer thought that will fit in production
>>> >> environment.
>>> >>
>>> >> Even though I final the property dfs.replication in staging cluster
>>> >> already. every time i run fsck on the staging cluster i still see it
>>> >> said under replication.
>>> >> I thought final keyword will not honor value in job config, but it
>>> >> doesn't seem so when i run fsck.
>>> >>
>>> >> I am on cdh3u4.
>>> >>
>>> >> please suggest.
>>> >> Patai
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Just want to share & check if this is make sense.

Job was failed to run after i restarted the namenode and the cluster
stopped complain about under-replication.

this is what i found in log file

Requested replication 10 exceeds maximum 2
java.io.IOException: file
/tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
Requested replication 10 exceeds maximum 2
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
        at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143


So, i scanned though those xml config files, and guess to change
<name>mapred.submit.replication</name> from 10 to 2, and restarted again.

That's when jobs can start running again.
Hopefully that change is make sense.


Thanks
Patai

On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks Harsh, dfs.replication.max does do the magic!!
>
> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>
>>
>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Hey Chris,
>>>
>>> The dfs.replication param is an exception to the <final> config
>>> feature. If one uses the FileSystem API, one can pass in any short
>>> value they want the replication to be. This bypasses the
>>> configuration, and the configuration (being per-file) is also client
>>> sided.
>>>
>>> The right way for an administrator to enforce a "max" replication
>>> value at a create/setRep level, would be to set
>>> the dfs.replication.max to a desired value at the NameNode and restart
>>> it.
>>>
>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>> <cn...@hortonworks.com> wrote:
>>> > Hello Patai,
>>> >
>>> > Has your configuration file change been copied to all nodes in the
>>> > cluster?
>>> >
>>> > Are there applications connecting from outside of the cluster?  If so,
>>> > then
>>> > those clients could have separate configuration files or code setting
>>> > dfs.replication (and other configuration properties).  These would not
>>> > be
>>> > limited by final declarations in the cluster's configuration files.
>>> > <final>true</final> controls configuration file resource loading, but it
>>> > does not necessarily block different nodes or different applications
>>> > from
>>> > running with completely different configurations.
>>> >
>>> > Hope this helps,
>>> > --Chris
>>> >
>>> >
>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>> > <si...@gmail.com> wrote:
>>> >>
>>> >> Hi Hadoopers,
>>> >>
>>> >> I have
>>> >> <property>
>>> >>     <name>dfs.replication</name>
>>> >>     <value>2</value>
>>> >>     <final>true</final>
>>> >>   </property>
>>> >>
>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>> >> cluster is running the code that will later be deployed in production,
>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>>> >> 2; the number that developer thought that will fit in production
>>> >> environment.
>>> >>
>>> >> Even though I final the property dfs.replication in staging cluster
>>> >> already. every time i run fsck on the staging cluster i still see it
>>> >> said under replication.
>>> >> I thought final keyword will not honor value in job config, but it
>>> >> doesn't seem so when i run fsck.
>>> >>
>>> >> I am on cdh3u4.
>>> >>
>>> >> please suggest.
>>> >> Patai
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Thanks Harsh, dfs.replication.max does do the magic!!

On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> Thank you, Harsh.  I did not know about dfs.replication.max.
>
>
> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Chris,
>>
>> The dfs.replication param is an exception to the <final> config
>> feature. If one uses the FileSystem API, one can pass in any short
>> value they want the replication to be. This bypasses the
>> configuration, and the configuration (being per-file) is also client
>> sided.
>>
>> The right way for an administrator to enforce a "max" replication
>> value at a create/setRep level, would be to set
>> the dfs.replication.max to a desired value at the NameNode and restart
>> it.
>>
>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>> <cn...@hortonworks.com> wrote:
>> > Hello Patai,
>> >
>> > Has your configuration file change been copied to all nodes in the
>> > cluster?
>> >
>> > Are there applications connecting from outside of the cluster?  If so,
>> > then
>> > those clients could have separate configuration files or code setting
>> > dfs.replication (and other configuration properties).  These would not
>> > be
>> > limited by final declarations in the cluster's configuration files.
>> > <final>true</final> controls configuration file resource loading, but it
>> > does not necessarily block different nodes or different applications
>> > from
>> > running with completely different configurations.
>> >
>> > Hope this helps,
>> > --Chris
>> >
>> >
>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>> > <si...@gmail.com> wrote:
>> >>
>> >> Hi Hadoopers,
>> >>
>> >> I have
>> >> <property>
>> >>     <name>dfs.replication</name>
>> >>     <value>2</value>
>> >>     <final>true</final>
>> >>   </property>
>> >>
>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>> >> cluster is running the code that will later be deployed in production,
>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>> >> 2; the number that developer thought that will fit in production
>> >> environment.
>> >>
>> >> Even though I final the property dfs.replication in staging cluster
>> >> already. every time i run fsck on the staging cluster i still see it
>> >> said under replication.
>> >> I thought final keyword will not honor value in job config, but it
>> >> doesn't seem so when i run fsck.
>> >>
>> >> I am on cdh3u4.
>> >>
>> >> please suggest.
>> >> Patai
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Thanks Harsh, dfs.replication.max does do the magic!!

On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> Thank you, Harsh.  I did not know about dfs.replication.max.
>
>
> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Chris,
>>
>> The dfs.replication param is an exception to the <final> config
>> feature. If one uses the FileSystem API, one can pass in any short
>> value they want the replication to be. This bypasses the
>> configuration, and the configuration (being per-file) is also client
>> sided.
>>
>> The right way for an administrator to enforce a "max" replication
>> value at a create/setRep level, would be to set
>> the dfs.replication.max to a desired value at the NameNode and restart
>> it.
>>
>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>> <cn...@hortonworks.com> wrote:
>> > Hello Patai,
>> >
>> > Has your configuration file change been copied to all nodes in the
>> > cluster?
>> >
>> > Are there applications connecting from outside of the cluster?  If so,
>> > then
>> > those clients could have separate configuration files or code setting
>> > dfs.replication (and other configuration properties).  These would not
>> > be
>> > limited by final declarations in the cluster's configuration files.
>> > <final>true</final> controls configuration file resource loading, but it
>> > does not necessarily block different nodes or different applications
>> > from
>> > running with completely different configurations.
>> >
>> > Hope this helps,
>> > --Chris
>> >
>> >
>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>> > <si...@gmail.com> wrote:
>> >>
>> >> Hi Hadoopers,
>> >>
>> >> I have
>> >> <property>
>> >>     <name>dfs.replication</name>
>> >>     <value>2</value>
>> >>     <final>true</final>
>> >>   </property>
>> >>
>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>> >> cluster is running the code that will later be deployed in production,
>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>> >> 2; the number that developer thought that will fit in production
>> >> environment.
>> >>
>> >> Even though I final the property dfs.replication in staging cluster
>> >> already. every time i run fsck on the staging cluster i still see it
>> >> said under replication.
>> >> I thought final keyword will not honor value in job config, but it
>> >> doesn't seem so when i run fsck.
>> >>
>> >> I am on cdh3u4.
>> >>
>> >> please suggest.
>> >> Patai
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Thanks Harsh, dfs.replication.max does do the magic!!

On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> Thank you, Harsh.  I did not know about dfs.replication.max.
>
>
> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Chris,
>>
>> The dfs.replication param is an exception to the <final> config
>> feature. If one uses the FileSystem API, one can pass in any short
>> value they want the replication to be. This bypasses the
>> configuration, and the configuration (being per-file) is also client
>> sided.
>>
>> The right way for an administrator to enforce a "max" replication
>> value at a create/setRep level, would be to set
>> the dfs.replication.max to a desired value at the NameNode and restart
>> it.
>>
>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>> <cn...@hortonworks.com> wrote:
>> > Hello Patai,
>> >
>> > Has your configuration file change been copied to all nodes in the
>> > cluster?
>> >
>> > Are there applications connecting from outside of the cluster?  If so,
>> > then
>> > those clients could have separate configuration files or code setting
>> > dfs.replication (and other configuration properties).  These would not
>> > be
>> > limited by final declarations in the cluster's configuration files.
>> > <final>true</final> controls configuration file resource loading, but it
>> > does not necessarily block different nodes or different applications
>> > from
>> > running with completely different configurations.
>> >
>> > Hope this helps,
>> > --Chris
>> >
>> >
>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>> > <si...@gmail.com> wrote:
>> >>
>> >> Hi Hadoopers,
>> >>
>> >> I have
>> >> <property>
>> >>     <name>dfs.replication</name>
>> >>     <value>2</value>
>> >>     <final>true</final>
>> >>   </property>
>> >>
>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>> >> cluster is running the code that will later be deployed in production,
>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>> >> 2; the number that developer thought that will fit in production
>> >> environment.
>> >>
>> >> Even though I final the property dfs.replication in staging cluster
>> >> already. every time i run fsck on the staging cluster i still see it
>> >> said under replication.
>> >> I thought final keyword will not honor value in job config, but it
>> >> doesn't seem so when i run fsck.
>> >>
>> >> I am on cdh3u4.
>> >>
>> >> please suggest.
>> >> Patai
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

Re: final the dfs.replication and fsck

Posted by Patai Sangbutsarakum <si...@gmail.com>.
Thanks Harsh, dfs.replication.max does do the magic!!

On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> Thank you, Harsh.  I did not know about dfs.replication.max.
>
>
> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Chris,
>>
>> The dfs.replication param is an exception to the <final> config
>> feature. If one uses the FileSystem API, one can pass in any short
>> value they want the replication to be. This bypasses the
>> configuration, and the configuration (being per-file) is also client
>> sided.
>>
>> The right way for an administrator to enforce a "max" replication
>> value at a create/setRep level, would be to set
>> the dfs.replication.max to a desired value at the NameNode and restart
>> it.
>>
>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>> <cn...@hortonworks.com> wrote:
>> > Hello Patai,
>> >
>> > Has your configuration file change been copied to all nodes in the
>> > cluster?
>> >
>> > Are there applications connecting from outside of the cluster?  If so,
>> > then
>> > those clients could have separate configuration files or code setting
>> > dfs.replication (and other configuration properties).  These would not
>> > be
>> > limited by final declarations in the cluster's configuration files.
>> > <final>true</final> controls configuration file resource loading, but it
>> > does not necessarily block different nodes or different applications
>> > from
>> > running with completely different configurations.
>> >
>> > Hope this helps,
>> > --Chris
>> >
>> >
>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>> > <si...@gmail.com> wrote:
>> >>
>> >> Hi Hadoopers,
>> >>
>> >> I have
>> >> <property>
>> >>     <name>dfs.replication</name>
>> >>     <value>2</value>
>> >>     <final>true</final>
>> >>   </property>
>> >>
>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>> >> cluster is running the code that will later be deployed in production,
>> >> those code is trying to have dfs.replication of 3, 10, 50, other than
>> >> 2; the number that developer thought that will fit in production
>> >> environment.
>> >>
>> >> Even though I final the property dfs.replication in staging cluster
>> >> already. every time i run fsck on the staging cluster i still see it
>> >> said under replication.
>> >> I thought final keyword will not honor value in job config, but it
>> >> doesn't seem so when i run fsck.
>> >>
>> >> I am on cdh3u4.
>> >>
>> >> please suggest.
>> >> Patai
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

Re: final the dfs.replication and fsck

Posted by Chris Nauroth <cn...@hortonworks.com>.
Thank you, Harsh.  I did not know about dfs.replication.max.

On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:

> Hey Chris,
>
> The dfs.replication param is an exception to the <final> config
> feature. If one uses the FileSystem API, one can pass in any short
> value they want the replication to be. This bypasses the
> configuration, and the configuration (being per-file) is also client
> sided.
>
> The right way for an administrator to enforce a "max" replication
> value at a create/setRep level, would be to set
> the dfs.replication.max to a desired value at the NameNode and restart
> it.
>
> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
> <cn...@hortonworks.com> wrote:
> > Hello Patai,
> >
> > Has your configuration file change been copied to all nodes in the
> cluster?
> >
> > Are there applications connecting from outside of the cluster?  If so,
> then
> > those clients could have separate configuration files or code setting
> > dfs.replication (and other configuration properties).  These would not be
> > limited by final declarations in the cluster's configuration files.
> > <final>true</final> controls configuration file resource loading, but it
> > does not necessarily block different nodes or different applications from
> > running with completely different configurations.
> >
> > Hope this helps,
> > --Chris
> >
> >
> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
> > <si...@gmail.com> wrote:
> >>
> >> Hi Hadoopers,
> >>
> >> I have
> >> <property>
> >>     <name>dfs.replication</name>
> >>     <value>2</value>
> >>     <final>true</final>
> >>   </property>
> >>
> >> set in hdfs-site.xml in staging environment cluster. while the staging
> >> cluster is running the code that will later be deployed in production,
> >> those code is trying to have dfs.replication of 3, 10, 50, other than
> >> 2; the number that developer thought that will fit in production
> >> environment.
> >>
> >> Even though I final the property dfs.replication in staging cluster
> >> already. every time i run fsck on the staging cluster i still see it
> >> said under replication.
> >> I thought final keyword will not honor value in job config, but it
> >> doesn't seem so when i run fsck.
> >>
> >> I am on cdh3u4.
> >>
> >> please suggest.
> >> Patai
> >
> >
>
>
>
> --
> Harsh J
>

Re: final the dfs.replication and fsck

Posted by Chris Nauroth <cn...@hortonworks.com>.
Thank you, Harsh.  I did not know about dfs.replication.max.

On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:

> Hey Chris,
>
> The dfs.replication param is an exception to the <final> config
> feature. If one uses the FileSystem API, one can pass in any short
> value they want the replication to be. This bypasses the
> configuration, and the configuration (being per-file) is also client
> sided.
>
> The right way for an administrator to enforce a "max" replication
> value at a create/setRep level, would be to set
> the dfs.replication.max to a desired value at the NameNode and restart
> it.
>
> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
> <cn...@hortonworks.com> wrote:
> > Hello Patai,
> >
> > Has your configuration file change been copied to all nodes in the
> cluster?
> >
> > Are there applications connecting from outside of the cluster?  If so,
> then
> > those clients could have separate configuration files or code setting
> > dfs.replication (and other configuration properties).  These would not be
> > limited by final declarations in the cluster's configuration files.
> > <final>true</final> controls configuration file resource loading, but it
> > does not necessarily block different nodes or different applications from
> > running with completely different configurations.
> >
> > Hope this helps,
> > --Chris
> >
> >
> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
> > <si...@gmail.com> wrote:
> >>
> >> Hi Hadoopers,
> >>
> >> I have
> >> <property>
> >>     <name>dfs.replication</name>
> >>     <value>2</value>
> >>     <final>true</final>
> >>   </property>
> >>
> >> set in hdfs-site.xml in staging environment cluster. while the staging
> >> cluster is running the code that will later be deployed in production,
> >> those code is trying to have dfs.replication of 3, 10, 50, other than
> >> 2; the number that developer thought that will fit in production
> >> environment.
> >>
> >> Even though I final the property dfs.replication in staging cluster
> >> already. every time i run fsck on the staging cluster i still see it
> >> said under replication.
> >> I thought final keyword will not honor value in job config, but it
> >> doesn't seem so when i run fsck.
> >>
> >> I am on cdh3u4.
> >>
> >> please suggest.
> >> Patai
> >
> >
>
>
>
> --
> Harsh J
>

Re: final the dfs.replication and fsck

Posted by Chris Nauroth <cn...@hortonworks.com>.
Thank you, Harsh.  I did not know about dfs.replication.max.

On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:

> Hey Chris,
>
> The dfs.replication param is an exception to the <final> config
> feature. If one uses the FileSystem API, one can pass in any short
> value they want the replication to be. This bypasses the
> configuration, and the configuration (being per-file) is also client
> sided.
>
> The right way for an administrator to enforce a "max" replication
> value at a create/setRep level, would be to set
> the dfs.replication.max to a desired value at the NameNode and restart
> it.
>
> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
> <cn...@hortonworks.com> wrote:
> > Hello Patai,
> >
> > Has your configuration file change been copied to all nodes in the
> cluster?
> >
> > Are there applications connecting from outside of the cluster?  If so,
> then
> > those clients could have separate configuration files or code setting
> > dfs.replication (and other configuration properties).  These would not be
> > limited by final declarations in the cluster's configuration files.
> > <final>true</final> controls configuration file resource loading, but it
> > does not necessarily block different nodes or different applications from
> > running with completely different configurations.
> >
> > Hope this helps,
> > --Chris
> >
> >
> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
> > <si...@gmail.com> wrote:
> >>
> >> Hi Hadoopers,
> >>
> >> I have
> >> <property>
> >>     <name>dfs.replication</name>
> >>     <value>2</value>
> >>     <final>true</final>
> >>   </property>
> >>
> >> set in hdfs-site.xml in staging environment cluster. while the staging
> >> cluster is running the code that will later be deployed in production,
> >> those code is trying to have dfs.replication of 3, 10, 50, other than
> >> 2; the number that developer thought that will fit in production
> >> environment.
> >>
> >> Even though I final the property dfs.replication in staging cluster
> >> already. every time i run fsck on the staging cluster i still see it
> >> said under replication.
> >> I thought final keyword will not honor value in job config, but it
> >> doesn't seem so when i run fsck.
> >>
> >> I am on cdh3u4.
> >>
> >> please suggest.
> >> Patai
> >
> >
>
>
>
> --
> Harsh J
>

Re: final the dfs.replication and fsck

Posted by Chris Nauroth <cn...@hortonworks.com>.
Thank you, Harsh.  I did not know about dfs.replication.max.

On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <ha...@cloudera.com> wrote:

> Hey Chris,
>
> The dfs.replication param is an exception to the <final> config
> feature. If one uses the FileSystem API, one can pass in any short
> value they want the replication to be. This bypasses the
> configuration, and the configuration (being per-file) is also client
> sided.
>
> The right way for an administrator to enforce a "max" replication
> value at a create/setRep level, would be to set
> the dfs.replication.max to a desired value at the NameNode and restart
> it.
>
> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
> <cn...@hortonworks.com> wrote:
> > Hello Patai,
> >
> > Has your configuration file change been copied to all nodes in the
> cluster?
> >
> > Are there applications connecting from outside of the cluster?  If so,
> then
> > those clients could have separate configuration files or code setting
> > dfs.replication (and other configuration properties).  These would not be
> > limited by final declarations in the cluster's configuration files.
> > <final>true</final> controls configuration file resource loading, but it
> > does not necessarily block different nodes or different applications from
> > running with completely different configurations.
> >
> > Hope this helps,
> > --Chris
> >
> >
> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
> > <si...@gmail.com> wrote:
> >>
> >> Hi Hadoopers,
> >>
> >> I have
> >> <property>
> >>     <name>dfs.replication</name>
> >>     <value>2</value>
> >>     <final>true</final>
> >>   </property>
> >>
> >> set in hdfs-site.xml in staging environment cluster. while the staging
> >> cluster is running the code that will later be deployed in production,
> >> those code is trying to have dfs.replication of 3, 10, 50, other than
> >> 2; the number that developer thought that will fit in production
> >> environment.
> >>
> >> Even though I final the property dfs.replication in staging cluster
> >> already. every time i run fsck on the staging cluster i still see it
> >> said under replication.
> >> I thought final keyword will not honor value in job config, but it
> >> doesn't seem so when i run fsck.
> >>
> >> I am on cdh3u4.
> >>
> >> please suggest.
> >> Patai
> >
> >
>
>
>
> --
> Harsh J
>

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Hey Chris,

The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.

The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.

On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
<cn...@hortonworks.com> wrote:
> Hello Patai,
>
> Has your configuration file change been copied to all nodes in the cluster?
>
> Are there applications connecting from outside of the cluster?  If so, then
> those clients could have separate configuration files or code setting
> dfs.replication (and other configuration properties).  These would not be
> limited by final declarations in the cluster's configuration files.
> <final>true</final> controls configuration file resource loading, but it
> does not necessarily block different nodes or different applications from
> running with completely different configurations.
>
> Hope this helps,
> --Chris
>
>
> On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>>
>> Hi Hadoopers,
>>
>> I have
>> <property>
>>     <name>dfs.replication</name>
>>     <value>2</value>
>>     <final>true</final>
>>   </property>
>>
>> set in hdfs-site.xml in staging environment cluster. while the staging
>> cluster is running the code that will later be deployed in production,
>> those code is trying to have dfs.replication of 3, 10, 50, other than
>> 2; the number that developer thought that will fit in production
>> environment.
>>
>> Even though I final the property dfs.replication in staging cluster
>> already. every time i run fsck on the staging cluster i still see it
>> said under replication.
>> I thought final keyword will not honor value in job config, but it
>> doesn't seem so when i run fsck.
>>
>> I am on cdh3u4.
>>
>> please suggest.
>> Patai
>
>



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Hey Chris,

The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.

The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.

On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
<cn...@hortonworks.com> wrote:
> Hello Patai,
>
> Has your configuration file change been copied to all nodes in the cluster?
>
> Are there applications connecting from outside of the cluster?  If so, then
> those clients could have separate configuration files or code setting
> dfs.replication (and other configuration properties).  These would not be
> limited by final declarations in the cluster's configuration files.
> <final>true</final> controls configuration file resource loading, but it
> does not necessarily block different nodes or different applications from
> running with completely different configurations.
>
> Hope this helps,
> --Chris
>
>
> On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>>
>> Hi Hadoopers,
>>
>> I have
>> <property>
>>     <name>dfs.replication</name>
>>     <value>2</value>
>>     <final>true</final>
>>   </property>
>>
>> set in hdfs-site.xml in staging environment cluster. while the staging
>> cluster is running the code that will later be deployed in production,
>> those code is trying to have dfs.replication of 3, 10, 50, other than
>> 2; the number that developer thought that will fit in production
>> environment.
>>
>> Even though I final the property dfs.replication in staging cluster
>> already. every time i run fsck on the staging cluster i still see it
>> said under replication.
>> I thought final keyword will not honor value in job config, but it
>> doesn't seem so when i run fsck.
>>
>> I am on cdh3u4.
>>
>> please suggest.
>> Patai
>
>



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Hey Chris,

The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.

The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.

On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
<cn...@hortonworks.com> wrote:
> Hello Patai,
>
> Has your configuration file change been copied to all nodes in the cluster?
>
> Are there applications connecting from outside of the cluster?  If so, then
> those clients could have separate configuration files or code setting
> dfs.replication (and other configuration properties).  These would not be
> limited by final declarations in the cluster's configuration files.
> <final>true</final> controls configuration file resource loading, but it
> does not necessarily block different nodes or different applications from
> running with completely different configurations.
>
> Hope this helps,
> --Chris
>
>
> On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>>
>> Hi Hadoopers,
>>
>> I have
>> <property>
>>     <name>dfs.replication</name>
>>     <value>2</value>
>>     <final>true</final>
>>   </property>
>>
>> set in hdfs-site.xml in staging environment cluster. while the staging
>> cluster is running the code that will later be deployed in production,
>> those code is trying to have dfs.replication of 3, 10, 50, other than
>> 2; the number that developer thought that will fit in production
>> environment.
>>
>> Even though I final the property dfs.replication in staging cluster
>> already. every time i run fsck on the staging cluster i still see it
>> said under replication.
>> I thought final keyword will not honor value in job config, but it
>> doesn't seem so when i run fsck.
>>
>> I am on cdh3u4.
>>
>> please suggest.
>> Patai
>
>



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Hey Chris,

The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.

The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.

On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
<cn...@hortonworks.com> wrote:
> Hello Patai,
>
> Has your configuration file change been copied to all nodes in the cluster?
>
> Are there applications connecting from outside of the cluster?  If so, then
> those clients could have separate configuration files or code setting
> dfs.replication (and other configuration properties).  These would not be
> limited by final declarations in the cluster's configuration files.
> <final>true</final> controls configuration file resource loading, but it
> does not necessarily block different nodes or different applications from
> running with completely different configurations.
>
> Hope this helps,
> --Chris
>
>
> On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>>
>> Hi Hadoopers,
>>
>> I have
>> <property>
>>     <name>dfs.replication</name>
>>     <value>2</value>
>>     <final>true</final>
>>   </property>
>>
>> set in hdfs-site.xml in staging environment cluster. while the staging
>> cluster is running the code that will later be deployed in production,
>> those code is trying to have dfs.replication of 3, 10, 50, other than
>> 2; the number that developer thought that will fit in production
>> environment.
>>
>> Even though I final the property dfs.replication in staging cluster
>> already. every time i run fsck on the staging cluster i still see it
>> said under replication.
>> I thought final keyword will not honor value in job config, but it
>> doesn't seem so when i run fsck.
>>
>> I am on cdh3u4.
>>
>> please suggest.
>> Patai
>
>



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Patai,

Has your configuration file change been copied to all nodes in the cluster?

Are there applications connecting from outside of the cluster?  If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties).  These would not be
limited by final declarations in the cluster's configuration files.
 <final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.

Hope this helps,
--Chris

On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum <
silvianhadoop@gmail.com> wrote:

> Hi Hadoopers,
>
> I have
> <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>
> set in hdfs-site.xml in staging environment cluster. while the staging
> cluster is running the code that will later be deployed in production,
> those code is trying to have dfs.replication of 3, 10, 50, other than
> 2; the number that developer thought that will fit in production
> environment.
>
> Even though I final the property dfs.replication in staging cluster
> already. every time i run fsck on the staging cluster i still see it
> said under replication.
> I thought final keyword will not honor value in job config, but it
> doesn't seem so when i run fsck.
>
> I am on cdh3u4.
>
> please suggest.
> Patai
>

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Hi Patai,

Set the dfs.replication.max parameter to 2 to achieve what you want.

On Tue, Oct 16, 2012 at 12:31 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Hi Hadoopers,
>
> I have
> <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>
> set in hdfs-site.xml in staging environment cluster. while the staging
> cluster is running the code that will later be deployed in production,
> those code is trying to have dfs.replication of 3, 10, 50, other than
> 2; the number that developer thought that will fit in production
> environment.
>
> Even though I final the property dfs.replication in staging cluster
> already. every time i run fsck on the staging cluster i still see it
> said under replication.
> I thought final keyword will not honor value in job config, but it
> doesn't seem so when i run fsck.
>
> I am on cdh3u4.
>
> please suggest.
> Patai



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Hi Patai,

Set the dfs.replication.max parameter to 2 to achieve what you want.

On Tue, Oct 16, 2012 at 12:31 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Hi Hadoopers,
>
> I have
> <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>
> set in hdfs-site.xml in staging environment cluster. while the staging
> cluster is running the code that will later be deployed in production,
> those code is trying to have dfs.replication of 3, 10, 50, other than
> 2; the number that developer thought that will fit in production
> environment.
>
> Even though I final the property dfs.replication in staging cluster
> already. every time i run fsck on the staging cluster i still see it
> said under replication.
> I thought final keyword will not honor value in job config, but it
> doesn't seem so when i run fsck.
>
> I am on cdh3u4.
>
> please suggest.
> Patai



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Patai,

Has your configuration file change been copied to all nodes in the cluster?

Are there applications connecting from outside of the cluster?  If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties).  These would not be
limited by final declarations in the cluster's configuration files.
 <final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.

Hope this helps,
--Chris

On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum <
silvianhadoop@gmail.com> wrote:

> Hi Hadoopers,
>
> I have
> <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>
> set in hdfs-site.xml in staging environment cluster. while the staging
> cluster is running the code that will later be deployed in production,
> those code is trying to have dfs.replication of 3, 10, 50, other than
> 2; the number that developer thought that will fit in production
> environment.
>
> Even though I final the property dfs.replication in staging cluster
> already. every time i run fsck on the staging cluster i still see it
> said under replication.
> I thought final keyword will not honor value in job config, but it
> doesn't seem so when i run fsck.
>
> I am on cdh3u4.
>
> please suggest.
> Patai
>

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Hi Patai,

Set the dfs.replication.max parameter to 2 to achieve what you want.

On Tue, Oct 16, 2012 at 12:31 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Hi Hadoopers,
>
> I have
> <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>
> set in hdfs-site.xml in staging environment cluster. while the staging
> cluster is running the code that will later be deployed in production,
> those code is trying to have dfs.replication of 3, 10, 50, other than
> 2; the number that developer thought that will fit in production
> environment.
>
> Even though I final the property dfs.replication in staging cluster
> already. every time i run fsck on the staging cluster i still see it
> said under replication.
> I thought final keyword will not honor value in job config, but it
> doesn't seem so when i run fsck.
>
> I am on cdh3u4.
>
> please suggest.
> Patai



-- 
Harsh J

Re: final the dfs.replication and fsck

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Patai,

Has your configuration file change been copied to all nodes in the cluster?

Are there applications connecting from outside of the cluster?  If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties).  These would not be
limited by final declarations in the cluster's configuration files.
 <final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.

Hope this helps,
--Chris

On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum <
silvianhadoop@gmail.com> wrote:

> Hi Hadoopers,
>
> I have
> <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>
> set in hdfs-site.xml in staging environment cluster. while the staging
> cluster is running the code that will later be deployed in production,
> those code is trying to have dfs.replication of 3, 10, 50, other than
> 2; the number that developer thought that will fit in production
> environment.
>
> Even though I final the property dfs.replication in staging cluster
> already. every time i run fsck on the staging cluster i still see it
> said under replication.
> I thought final keyword will not honor value in job config, but it
> doesn't seem so when i run fsck.
>
> I am on cdh3u4.
>
> please suggest.
> Patai
>

Re: final the dfs.replication and fsck

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Patai,

Has your configuration file change been copied to all nodes in the cluster?

Are there applications connecting from outside of the cluster?  If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties).  These would not be
limited by final declarations in the cluster's configuration files.
 <final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.

Hope this helps,
--Chris

On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum <
silvianhadoop@gmail.com> wrote:

> Hi Hadoopers,
>
> I have
> <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>
> set in hdfs-site.xml in staging environment cluster. while the staging
> cluster is running the code that will later be deployed in production,
> those code is trying to have dfs.replication of 3, 10, 50, other than
> 2; the number that developer thought that will fit in production
> environment.
>
> Even though I final the property dfs.replication in staging cluster
> already. every time i run fsck on the staging cluster i still see it
> said under replication.
> I thought final keyword will not honor value in job config, but it
> doesn't seem so when i run fsck.
>
> I am on cdh3u4.
>
> please suggest.
> Patai
>

Re: final the dfs.replication and fsck

Posted by Harsh J <ha...@cloudera.com>.
Hi Patai,

Set the dfs.replication.max parameter to 2 to achieve what you want.

On Tue, Oct 16, 2012 at 12:31 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Hi Hadoopers,
>
> I have
> <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>
> set in hdfs-site.xml in staging environment cluster. while the staging
> cluster is running the code that will later be deployed in production,
> those code is trying to have dfs.replication of 3, 10, 50, other than
> 2; the number that developer thought that will fit in production
> environment.
>
> Even though I final the property dfs.replication in staging cluster
> already. every time i run fsck on the staging cluster i still see it
> said under replication.
> I thought final keyword will not honor value in job config, but it
> doesn't seem so when i run fsck.
>
> I am on cdh3u4.
>
> please suggest.
> Patai



-- 
Harsh J