You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Demai Ni <ni...@gmail.com> on 2014/09/08 20:47:24 UTC

conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

hi, experts,

I am trying to get the local filesystem directory of data node. My cluster
is using CDH5.x (hadoop 2.3) and the default configuration. So the datanode
is under file:///dfs/dn. I didn't specify the value in hdfs-site.xml.

My code is something like:

conf = new Configuration()

// test both with and without the following two lines
conf.addResource (new
Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
conf.addResource (new Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));

// I also tried get("dfs.datanode.data.dir"), which also return NULL
String dnDir = conf.get("dfs.data.dir");  // return NULL

It looks like the get only look at the configuration file instead of
retrieving the info from the live cluster?

Many thanks for your help in advance.

Demai

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Bhooshan,

Many thanks. I appreciate the help. I will also try out Cloudera mailing
list/community

Demai

On Mon, Sep 8, 2014 at 4:58 PM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi Demai,
>
> conf = new Configuration()
>
> will create a new Configuration object and only add the properties from
> core-default.xml and core-site.xml in the conf object.
>
> This is basically a new configuration object, not the same that the
> daemons in the hadoop cluster use.
>
>
>
> I think what you are trying to ask is if you can get the Configuration
> object that a daemon in your live cluster (e.g. datanode) is using. I am
> not sure if the datanode or any other daemon on a hadoop cluster exposes
> such an API.
>
> I would in fact be tempted to get this information from the configuration
> management daemon instead - in your case cloudera manager. But I am not
> sure if CM exposes that API either. You could probably find out on the
> Cloudera mailing list.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, Bhooshan,
>>
>> thanks for your kind response.  I run the code on one of the data node of
>> my cluster, with only one hadoop daemon running. I believe my java client
>> code connect to the cluster correctly as I am able to retrieve fileStatus,
>> and list files under a particular hdfs path, and similar things...
>> However, you are right that the daemon process use the hdfs-site.xml under
>> another folder for cloudera :
>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>>
>> about " retrieving the info from a live cluster", I would like to get the
>> information beyond the configuration files(that is beyond the .xml files).
>> Since I am able to use :
>> conf = new Configuration()
>> to connect to hdfs and did other operations, shouldn't I be able to
>> retrieve the configuration variables?
>>
>> Thanks
>>
>> Demai
>>
>>
>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> When you read a property from the conf object, it will only have a value
>>> if the conf object contains that property.
>>>
>>> In your case, you created the conf object as new Configuration() -- adds
>>> core-default and core-site.xml.
>>>
>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>>> locations. If none of these files have defined dfs.data.dir, then you will
>>> get NULL. This is expected behavior.
>>>
>>> What do you mean by retrieving the info from a live cluster? Even for
>>> processes like datanode, namenode etc, the source of truth for these
>>> properties is hdfs-site.xml. It is loaded from a specific location when you
>>> start these services.
>>>
>>> Question: Where are you running the above code? Is it on a node which
>>> has other hadoop daemons as well?
>>>
>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>>> properties are defined. Since this is a CDH cluster, you would probably be
>>> best served by asking on the CDH mailing list as to where the right path to
>>> these files is.
>>>
>>>
>>> HTH,
>>> Bhooshan
>>>
>>>
>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> hi, experts,
>>>>
>>>> I am trying to get the local filesystem directory of data node. My
>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>>> hdfs-site.xml.
>>>>
>>>> My code is something like:
>>>>
>>>> conf = new Configuration()
>>>>
>>>> // test both with and without the following two lines
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>>
>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>>
>>>> It looks like the get only look at the configuration file instead of
>>>> retrieving the info from the live cluster?
>>>>
>>>> Many thanks for your help in advance.
>>>>
>>>> Demai
>>>>
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
I am interested in job related configuration properties.

I have a mix of EC2 instance types - m1.small and m1.medium.

I am not clear which properties are server side and which are client
side in the mapred-site.xml and yarn-site.xml.

I have edited the resource manager node (m1.medium EC2 instance) and
gave yarn.app.mapreduce.am.resource.mb=256 (default is 1536),
mapreduce.map.memory.mb=256 (default is 1GB),
mapreduce.reduce.memory.mb=256 (default is 1GB),
mapreduce.map.speculative=false (default is true),
mapreduce.job.reduce.slowstart.completedmaps=0.8 (default is 0.05)
and some more..

When I look at the conf.xml of the job under HDFS directory
/tmp/hadoop-yarn/staging/<user>/.staging/<job id>/<job id>_conf.xml
I see some values are accepted and some are not accepted.
I see mapreduce.map.memory.mb, mapreduce.reduce.memory.mb with modified values
but yarn.app.mapreduce.am.resource.mb is with default value of 1536.
The mapreduce.map.speculative is with default value of true.
The mapreduce.job.reduce.slowstart.completedmaps is with default value of 0.05.

To forcefully set this new values I am sending these properties in the
client by the command
hadoop jar <jar name> <main class> \
-D mapreduce.job.reduce.slowstart.completedmaps=0.80 \
-D mapreduce.map.speculative=false \

There is no good document giving distinction between what is client
side property and what is server side property.

TIA
Susheel Kumar

On 9/9/14, java8964 <ja...@hotmail.com> wrote:
> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
> Remember, the cluster is made of set of computers, and in hadoop, there are
> hdfs xml, mapred xml and even yarn xml.
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
> Yong
>
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't
> set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com>
> wrote:
> One doubt on building Configuration object.
>
>
>
> I have a Hadoop remote client and Hadoop cluster.
>
> When a client submitted a MR job, the Configuration object is built
>
> from Hadoop cluster node xml files, basically the resource manager
>
> node core-site.xml and mapred-site.xml and yarn-site.xml.
>
> Am I correct?
>
>
>
> TIA
>
> Susheel Kumar
>
>
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
>
>> Hi Demai,
>
>>
>
>> conf = new Configuration()
>
>>
>
>> will create a new Configuration object and only add the properties from
>
>> core-default.xml and core-site.xml in the conf object.
>
>>
>
>> This is basically a new configuration object, not the same that the
>> daemons
>
>> in the hadoop cluster use.
>
>>
>
>>
>
>>
>
>> I think what you are trying to ask is if you can get the Configuration
>
>> object that a daemon in your live cluster (e.g. datanode) is using. I am
>
>> not sure if the datanode or any other daemon on a hadoop cluster exposes
>
>> such an API.
>
>>
>
>> I would in fact be tempted to get this information from the configuration
>
>> management daemon instead - in your case cloudera manager. But I am not
>
>> sure if CM exposes that API either. You could probably find out on the
>
>> Cloudera mailing list.
>
>>
>
>>
>
>> HTH,
>
>> Bhooshan
>
>>
>
>>
>
>> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>>
>
>>> hi, Bhooshan,
>
>>>
>
>>> thanks for your kind response.  I run the code on one of the data node
>>> of
>
>>> my cluster, with only one hadoop daemon running. I believe my java
>>> client
>
>>> code connect to the cluster correctly as I am able to retrieve
>
>>> fileStatus,
>
>>> and list files under a particular hdfs path, and similar things...
>
>>> However, you are right that the daemon process use the hdfs-site.xml
>
>>> under
>
>>> another folder for cloudera :
>
>>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>
>>>
>
>>> about " retrieving the info from a live cluster", I would like to get
>>> the
>
>>> information beyond the configuration files(that is beyond the .xml
>
>>> files).
>
>>> Since I am able to use :
>
>>> conf = new Configuration()
>
>>> to connect to hdfs and did other operations, shouldn't I be able to
>
>>> retrieve the configuration variables?
>
>>>
>
>>> Thanks
>
>>>
>
>>> Demai
>
>>>
>
>>>
>
>>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal
>>> <bh...@gmail.com>
>
>>> wrote:
>
>>>
>
>>>> Hi Demai,
>
>>>>
>
>>>> When you read a property from the conf object, it will only have a
>>>> value
>
>>>> if the conf object contains that property.
>
>>>>
>
>>>> In your case, you created the conf object as new Configuration() --
>>>> adds
>
>>>> core-default and core-site.xml.
>
>>>>
>
>>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
>>>> specific
>
>>>> locations. If none of these files have defined dfs.data.dir, then you
>
>>>> will
>
>>>> get NULL. This is expected behavior.
>
>>>>
>
>>>> What do you mean by retrieving the info from a live cluster? Even for
>
>>>> processes like datanode, namenode etc, the source of truth for these
>
>>>> properties is hdfs-site.xml. It is loaded from a specific location when
>
>>>> you
>
>>>> start these services.
>
>>>>
>
>>>> Question: Where are you running the above code? Is it on a node which
>
>>>> has
>
>>>> other hadoop daemons as well?
>
>>>>
>
>>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>
>>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>
>>>> properties are defined. Since this is a CDH cluster, you would probably
>
>>>> be
>
>>>> best served by asking on the CDH mailing list as to where the right
>>>> path
>
>>>> to
>
>>>> these files is.
>
>>>>
>
>>>>
>
>>>> HTH,
>
>>>> Bhooshan
>
>>>>
>
>>>>
>
>>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>
>>>>
>
>>>>> hi, experts,
>
>>>>>
>
>>>>> I am trying to get the local filesystem directory of data node. My
>
>>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
>
>>>>> the
>
>>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>
>>>>> hdfs-site.xml.
>
>>>>>
>
>>>>> My code is something like:
>
>>>>>
>
>>>>> conf = new Configuration()
>
>>>>>
>
>>>>> // test both with and without the following two lines
>
>>>>> conf.addResource (new
>
>>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>
>>>>> conf.addResource (new
>
>>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>
>>>>>
>
>>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>
>>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>
>>>>>
>
>>>>> It looks like the get only look at the configuration file instead of
>
>>>>> retrieving the info from the live cluster?
>
>>>>>
>
>>>>> Many thanks for your help in advance.
>
>>>>>
>
>>>>> Demai
>
>>>>>
>
>>>>
>
>>>>
>
>>>>
>
>>>> --
>
>>>> Bhooshan
>
>>>>
>
>>>
>
>>>
>
>>
>
>>
>
>> --
>
>> Bhooshan
>
>>
>
>
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Yup, I agree. That makes a lot of sense. There may be cluster-wide
configuration settings in core-default and core-site xmls and they will
always be available in new Configuration objects. However, nothing prevents
users from having different values of these settings on different nodes,
and so, there's no guarantee that configuration objects created using new
Configuration() will be uniform throughout.

Demai, it may be useful to know more about your usecase, since even if you
have the 'right' hdfs-site.xml, it will be 'current' only for the node
where you are running. So your application could potentially behave
differently on different nodes of a hadoop cluster.


On Tue, Sep 9, 2014 at 12:03 PM, java8964 <ja...@hotmail.com> wrote:

> Even the "dfs.data.dir" could be containing different values on different
> data nodes. So it doesn't make sense for a remote client to ask that value
> from the cluster, as which value from which data node should be given back?
>
> Some data nodes could have 4 disks to be used for "dfs.data.dir", some
> data nodes could have more.
>
> If you really think about it, it could be only block size needs to be one
> value across the whole cluster.
>
> The configuration values of the only CURRENT node makes sense for the
> applications running in that node, maybe what's why you can get a
> configuration object reference from the JobContext.
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 11:34:03 -0700
>
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
> Yong,
>
> good point about each node of the cluster could have different values in
> the .xml files, and probably true if the nodes have different role or
> hardware settings. so some of the configuration (like memory, heap) may not
> make sense to client at all.
>
> are some of the settings the same across the cluster? The one I am
> interested in at this moment is the folder(for local filesystem) for data
> node dir. I am thinking about doing some local read, so it will the very
> first step if I know where to read the data.
>
> Demai
>
> On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:
>
> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
>
> Remember, the cluster is made of set of computers, and in hadoop, there
> are hdfs xml, mapred xml and even yarn xml.
>
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
>
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <
> skgadalay@gmail.com> wrote:
>
> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>
>
>
>


-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Yup, I agree. That makes a lot of sense. There may be cluster-wide
configuration settings in core-default and core-site xmls and they will
always be available in new Configuration objects. However, nothing prevents
users from having different values of these settings on different nodes,
and so, there's no guarantee that configuration objects created using new
Configuration() will be uniform throughout.

Demai, it may be useful to know more about your usecase, since even if you
have the 'right' hdfs-site.xml, it will be 'current' only for the node
where you are running. So your application could potentially behave
differently on different nodes of a hadoop cluster.


On Tue, Sep 9, 2014 at 12:03 PM, java8964 <ja...@hotmail.com> wrote:

> Even the "dfs.data.dir" could be containing different values on different
> data nodes. So it doesn't make sense for a remote client to ask that value
> from the cluster, as which value from which data node should be given back?
>
> Some data nodes could have 4 disks to be used for "dfs.data.dir", some
> data nodes could have more.
>
> If you really think about it, it could be only block size needs to be one
> value across the whole cluster.
>
> The configuration values of the only CURRENT node makes sense for the
> applications running in that node, maybe what's why you can get a
> configuration object reference from the JobContext.
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 11:34:03 -0700
>
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
> Yong,
>
> good point about each node of the cluster could have different values in
> the .xml files, and probably true if the nodes have different role or
> hardware settings. so some of the configuration (like memory, heap) may not
> make sense to client at all.
>
> are some of the settings the same across the cluster? The one I am
> interested in at this moment is the folder(for local filesystem) for data
> node dir. I am thinking about doing some local read, so it will the very
> first step if I know where to read the data.
>
> Demai
>
> On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:
>
> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
>
> Remember, the cluster is made of set of computers, and in hadoop, there
> are hdfs xml, mapred xml and even yarn xml.
>
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
>
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <
> skgadalay@gmail.com> wrote:
>
> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>
>
>
>


-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Yup, I agree. That makes a lot of sense. There may be cluster-wide
configuration settings in core-default and core-site xmls and they will
always be available in new Configuration objects. However, nothing prevents
users from having different values of these settings on different nodes,
and so, there's no guarantee that configuration objects created using new
Configuration() will be uniform throughout.

Demai, it may be useful to know more about your usecase, since even if you
have the 'right' hdfs-site.xml, it will be 'current' only for the node
where you are running. So your application could potentially behave
differently on different nodes of a hadoop cluster.


On Tue, Sep 9, 2014 at 12:03 PM, java8964 <ja...@hotmail.com> wrote:

> Even the "dfs.data.dir" could be containing different values on different
> data nodes. So it doesn't make sense for a remote client to ask that value
> from the cluster, as which value from which data node should be given back?
>
> Some data nodes could have 4 disks to be used for "dfs.data.dir", some
> data nodes could have more.
>
> If you really think about it, it could be only block size needs to be one
> value across the whole cluster.
>
> The configuration values of the only CURRENT node makes sense for the
> applications running in that node, maybe what's why you can get a
> configuration object reference from the JobContext.
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 11:34:03 -0700
>
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
> Yong,
>
> good point about each node of the cluster could have different values in
> the .xml files, and probably true if the nodes have different role or
> hardware settings. so some of the configuration (like memory, heap) may not
> make sense to client at all.
>
> are some of the settings the same across the cluster? The one I am
> interested in at this moment is the folder(for local filesystem) for data
> node dir. I am thinking about doing some local read, so it will the very
> first step if I know where to read the data.
>
> Demai
>
> On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:
>
> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
>
> Remember, the cluster is made of set of computers, and in hadoop, there
> are hdfs xml, mapred xml and even yarn xml.
>
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
>
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <
> skgadalay@gmail.com> wrote:
>
> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>
>
>
>


-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Yup, I agree. That makes a lot of sense. There may be cluster-wide
configuration settings in core-default and core-site xmls and they will
always be available in new Configuration objects. However, nothing prevents
users from having different values of these settings on different nodes,
and so, there's no guarantee that configuration objects created using new
Configuration() will be uniform throughout.

Demai, it may be useful to know more about your usecase, since even if you
have the 'right' hdfs-site.xml, it will be 'current' only for the node
where you are running. So your application could potentially behave
differently on different nodes of a hadoop cluster.


On Tue, Sep 9, 2014 at 12:03 PM, java8964 <ja...@hotmail.com> wrote:

> Even the "dfs.data.dir" could be containing different values on different
> data nodes. So it doesn't make sense for a remote client to ask that value
> from the cluster, as which value from which data node should be given back?
>
> Some data nodes could have 4 disks to be used for "dfs.data.dir", some
> data nodes could have more.
>
> If you really think about it, it could be only block size needs to be one
> value across the whole cluster.
>
> The configuration values of the only CURRENT node makes sense for the
> applications running in that node, maybe what's why you can get a
> configuration object reference from the JobContext.
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 11:34:03 -0700
>
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
> Yong,
>
> good point about each node of the cluster could have different values in
> the .xml files, and probably true if the nodes have different role or
> hardware settings. so some of the configuration (like memory, heap) may not
> make sense to client at all.
>
> are some of the settings the same across the cluster? The one I am
> interested in at this moment is the folder(for local filesystem) for data
> node dir. I am thinking about doing some local read, so it will the very
> first step if I know where to read the data.
>
> Demai
>
> On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:
>
> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
>
> Remember, the cluster is made of set of computers, and in hadoop, there
> are hdfs xml, mapred xml and even yarn xml.
>
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
>
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <
> skgadalay@gmail.com> wrote:
>
> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>
>
>
>


-- 
Bhooshan

RE: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by java8964 <ja...@hotmail.com>.
Even the "dfs.data.dir" could be containing different values on different data nodes. So it doesn't make sense for a remote client to ask that value from the cluster, as which value from which data node should be given back?
Some data nodes could have 4 disks to be used for "dfs.data.dir", some data nodes could have more.
If you really think about it, it could be only block size needs to be one value across the whole cluster.
The configuration values of the only CURRENT node makes sense for the applications running in that node, maybe what's why you can get a configuration object reference from the JobContext.
Yong

Date: Tue, 9 Sep 2014 11:34:03 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Yong, 

good point about each node of the cluster could have different values in the .xml files, and probably true if the nodes have different role or hardware settings. so some of the configuration (like memory, heap) may not make sense to client at all. 

are some of the settings the same across the cluster? The one I am interested in at this moment is the folder(for local filesystem) for data node dir. I am thinking about doing some local read, so it will the very first step if I know where to read the data. 

Demai

On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:



The configuration in fact depends on the xml file. Not sure what kind of cluster configuration variables/values you are looking for.
Remember, the cluster is made of set of computers, and in hadoop, there are hdfs xml, mapred xml and even yarn xml.
Mapred.xml and yarn.xml are job related. Without concrete job, there is no detail configuration can be given.
About the HDFS configuration, there are a set of computers in the cluster. In theory, there is nothing wrong that each computer will have different configuration settings. Every computer could have different cpu cores, memory, disk counts, mount names etc. When you ask configuration variables/values, which one should be returned?
Yong

Date: Tue, 9 Sep 2014 10:01:14 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Susheel actually brought up a good point. 

once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side? 

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
One doubt on building Configuration object.



I have a Hadoop remote client and Hadoop cluster.

When a client submitted a MR job, the Configuration object is built

from Hadoop cluster node xml files, basically the resource manager

node core-site.xml and mapred-site.xml and yarn-site.xml.

Am I correct?



TIA

Susheel Kumar



On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:

> Hi Demai,

>

> conf = new Configuration()

>

> will create a new Configuration object and only add the properties from

> core-default.xml and core-site.xml in the conf object.

>

> This is basically a new configuration object, not the same that the daemons

> in the hadoop cluster use.

>

>

>

> I think what you are trying to ask is if you can get the Configuration

> object that a daemon in your live cluster (e.g. datanode) is using. I am

> not sure if the datanode or any other daemon on a hadoop cluster exposes

> such an API.

>

> I would in fact be tempted to get this information from the configuration

> management daemon instead - in your case cloudera manager. But I am not

> sure if CM exposes that API either. You could probably find out on the

> Cloudera mailing list.

>

>

> HTH,

> Bhooshan

>

>

> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

>

>> hi, Bhooshan,

>>

>> thanks for your kind response.  I run the code on one of the data node of

>> my cluster, with only one hadoop daemon running. I believe my java client

>> code connect to the cluster correctly as I am able to retrieve

>> fileStatus,

>> and list files under a particular hdfs path, and similar things...

>> However, you are right that the daemon process use the hdfs-site.xml

>> under

>> another folder for cloudera :

>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

>>

>> about " retrieving the info from a live cluster", I would like to get the

>> information beyond the configuration files(that is beyond the .xml

>> files).

>> Since I am able to use :

>> conf = new Configuration()

>> to connect to hdfs and did other operations, shouldn't I be able to

>> retrieve the configuration variables?

>>

>> Thanks

>>

>> Demai

>>

>>

>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>

>> wrote:

>>

>>> Hi Demai,

>>>

>>> When you read a property from the conf object, it will only have a value

>>> if the conf object contains that property.

>>>

>>> In your case, you created the conf object as new Configuration() -- adds

>>> core-default and core-site.xml.

>>>

>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific

>>> locations. If none of these files have defined dfs.data.dir, then you

>>> will

>>> get NULL. This is expected behavior.

>>>

>>> What do you mean by retrieving the info from a live cluster? Even for

>>> processes like datanode, namenode etc, the source of truth for these

>>> properties is hdfs-site.xml. It is loaded from a specific location when

>>> you

>>> start these services.

>>>

>>> Question: Where are you running the above code? Is it on a node which

>>> has

>>> other hadoop daemons as well?

>>>

>>> My guess is that the path you are referring to (/etc/hadoop/conf.

>>> cloudera.hdfs/core-site.xml) is not the right path where these config

>>> properties are defined. Since this is a CDH cluster, you would probably

>>> be

>>> best served by asking on the CDH mailing list as to where the right path

>>> to

>>> these files is.

>>>

>>>

>>> HTH,

>>> Bhooshan

>>>

>>>

>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

>>>

>>>> hi, experts,

>>>>

>>>> I am trying to get the local filesystem directory of data node. My

>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So

>>>> the

>>>> datanode is under file:///dfs/dn. I didn't specify the value in

>>>> hdfs-site.xml.

>>>>

>>>> My code is something like:

>>>>

>>>> conf = new Configuration()

>>>>

>>>> // test both with and without the following two lines

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));

>>>>

>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL

>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL

>>>>

>>>> It looks like the get only look at the configuration file instead of

>>>> retrieving the info from the live cluster?

>>>>

>>>> Many thanks for your help in advance.

>>>>

>>>> Demai

>>>>

>>>

>>>

>>>

>>> --

>>> Bhooshan

>>>

>>

>>

>

>

> --

> Bhooshan

>


 		 	   		  

 		 	   		  

RE: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by java8964 <ja...@hotmail.com>.
Even the "dfs.data.dir" could be containing different values on different data nodes. So it doesn't make sense for a remote client to ask that value from the cluster, as which value from which data node should be given back?
Some data nodes could have 4 disks to be used for "dfs.data.dir", some data nodes could have more.
If you really think about it, it could be only block size needs to be one value across the whole cluster.
The configuration values of the only CURRENT node makes sense for the applications running in that node, maybe what's why you can get a configuration object reference from the JobContext.
Yong

Date: Tue, 9 Sep 2014 11:34:03 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Yong, 

good point about each node of the cluster could have different values in the .xml files, and probably true if the nodes have different role or hardware settings. so some of the configuration (like memory, heap) may not make sense to client at all. 

are some of the settings the same across the cluster? The one I am interested in at this moment is the folder(for local filesystem) for data node dir. I am thinking about doing some local read, so it will the very first step if I know where to read the data. 

Demai

On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:



The configuration in fact depends on the xml file. Not sure what kind of cluster configuration variables/values you are looking for.
Remember, the cluster is made of set of computers, and in hadoop, there are hdfs xml, mapred xml and even yarn xml.
Mapred.xml and yarn.xml are job related. Without concrete job, there is no detail configuration can be given.
About the HDFS configuration, there are a set of computers in the cluster. In theory, there is nothing wrong that each computer will have different configuration settings. Every computer could have different cpu cores, memory, disk counts, mount names etc. When you ask configuration variables/values, which one should be returned?
Yong

Date: Tue, 9 Sep 2014 10:01:14 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Susheel actually brought up a good point. 

once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side? 

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
One doubt on building Configuration object.



I have a Hadoop remote client and Hadoop cluster.

When a client submitted a MR job, the Configuration object is built

from Hadoop cluster node xml files, basically the resource manager

node core-site.xml and mapred-site.xml and yarn-site.xml.

Am I correct?



TIA

Susheel Kumar



On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:

> Hi Demai,

>

> conf = new Configuration()

>

> will create a new Configuration object and only add the properties from

> core-default.xml and core-site.xml in the conf object.

>

> This is basically a new configuration object, not the same that the daemons

> in the hadoop cluster use.

>

>

>

> I think what you are trying to ask is if you can get the Configuration

> object that a daemon in your live cluster (e.g. datanode) is using. I am

> not sure if the datanode or any other daemon on a hadoop cluster exposes

> such an API.

>

> I would in fact be tempted to get this information from the configuration

> management daemon instead - in your case cloudera manager. But I am not

> sure if CM exposes that API either. You could probably find out on the

> Cloudera mailing list.

>

>

> HTH,

> Bhooshan

>

>

> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

>

>> hi, Bhooshan,

>>

>> thanks for your kind response.  I run the code on one of the data node of

>> my cluster, with only one hadoop daemon running. I believe my java client

>> code connect to the cluster correctly as I am able to retrieve

>> fileStatus,

>> and list files under a particular hdfs path, and similar things...

>> However, you are right that the daemon process use the hdfs-site.xml

>> under

>> another folder for cloudera :

>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

>>

>> about " retrieving the info from a live cluster", I would like to get the

>> information beyond the configuration files(that is beyond the .xml

>> files).

>> Since I am able to use :

>> conf = new Configuration()

>> to connect to hdfs and did other operations, shouldn't I be able to

>> retrieve the configuration variables?

>>

>> Thanks

>>

>> Demai

>>

>>

>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>

>> wrote:

>>

>>> Hi Demai,

>>>

>>> When you read a property from the conf object, it will only have a value

>>> if the conf object contains that property.

>>>

>>> In your case, you created the conf object as new Configuration() -- adds

>>> core-default and core-site.xml.

>>>

>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific

>>> locations. If none of these files have defined dfs.data.dir, then you

>>> will

>>> get NULL. This is expected behavior.

>>>

>>> What do you mean by retrieving the info from a live cluster? Even for

>>> processes like datanode, namenode etc, the source of truth for these

>>> properties is hdfs-site.xml. It is loaded from a specific location when

>>> you

>>> start these services.

>>>

>>> Question: Where are you running the above code? Is it on a node which

>>> has

>>> other hadoop daemons as well?

>>>

>>> My guess is that the path you are referring to (/etc/hadoop/conf.

>>> cloudera.hdfs/core-site.xml) is not the right path where these config

>>> properties are defined. Since this is a CDH cluster, you would probably

>>> be

>>> best served by asking on the CDH mailing list as to where the right path

>>> to

>>> these files is.

>>>

>>>

>>> HTH,

>>> Bhooshan

>>>

>>>

>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

>>>

>>>> hi, experts,

>>>>

>>>> I am trying to get the local filesystem directory of data node. My

>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So

>>>> the

>>>> datanode is under file:///dfs/dn. I didn't specify the value in

>>>> hdfs-site.xml.

>>>>

>>>> My code is something like:

>>>>

>>>> conf = new Configuration()

>>>>

>>>> // test both with and without the following two lines

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));

>>>>

>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL

>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL

>>>>

>>>> It looks like the get only look at the configuration file instead of

>>>> retrieving the info from the live cluster?

>>>>

>>>> Many thanks for your help in advance.

>>>>

>>>> Demai

>>>>

>>>

>>>

>>>

>>> --

>>> Bhooshan

>>>

>>

>>

>

>

> --

> Bhooshan

>


 		 	   		  

 		 	   		  

RE: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by java8964 <ja...@hotmail.com>.
Even the "dfs.data.dir" could be containing different values on different data nodes. So it doesn't make sense for a remote client to ask that value from the cluster, as which value from which data node should be given back?
Some data nodes could have 4 disks to be used for "dfs.data.dir", some data nodes could have more.
If you really think about it, it could be only block size needs to be one value across the whole cluster.
The configuration values of the only CURRENT node makes sense for the applications running in that node, maybe what's why you can get a configuration object reference from the JobContext.
Yong

Date: Tue, 9 Sep 2014 11:34:03 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Yong, 

good point about each node of the cluster could have different values in the .xml files, and probably true if the nodes have different role or hardware settings. so some of the configuration (like memory, heap) may not make sense to client at all. 

are some of the settings the same across the cluster? The one I am interested in at this moment is the folder(for local filesystem) for data node dir. I am thinking about doing some local read, so it will the very first step if I know where to read the data. 

Demai

On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:



The configuration in fact depends on the xml file. Not sure what kind of cluster configuration variables/values you are looking for.
Remember, the cluster is made of set of computers, and in hadoop, there are hdfs xml, mapred xml and even yarn xml.
Mapred.xml and yarn.xml are job related. Without concrete job, there is no detail configuration can be given.
About the HDFS configuration, there are a set of computers in the cluster. In theory, there is nothing wrong that each computer will have different configuration settings. Every computer could have different cpu cores, memory, disk counts, mount names etc. When you ask configuration variables/values, which one should be returned?
Yong

Date: Tue, 9 Sep 2014 10:01:14 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Susheel actually brought up a good point. 

once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side? 

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
One doubt on building Configuration object.



I have a Hadoop remote client and Hadoop cluster.

When a client submitted a MR job, the Configuration object is built

from Hadoop cluster node xml files, basically the resource manager

node core-site.xml and mapred-site.xml and yarn-site.xml.

Am I correct?



TIA

Susheel Kumar



On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:

> Hi Demai,

>

> conf = new Configuration()

>

> will create a new Configuration object and only add the properties from

> core-default.xml and core-site.xml in the conf object.

>

> This is basically a new configuration object, not the same that the daemons

> in the hadoop cluster use.

>

>

>

> I think what you are trying to ask is if you can get the Configuration

> object that a daemon in your live cluster (e.g. datanode) is using. I am

> not sure if the datanode or any other daemon on a hadoop cluster exposes

> such an API.

>

> I would in fact be tempted to get this information from the configuration

> management daemon instead - in your case cloudera manager. But I am not

> sure if CM exposes that API either. You could probably find out on the

> Cloudera mailing list.

>

>

> HTH,

> Bhooshan

>

>

> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

>

>> hi, Bhooshan,

>>

>> thanks for your kind response.  I run the code on one of the data node of

>> my cluster, with only one hadoop daemon running. I believe my java client

>> code connect to the cluster correctly as I am able to retrieve

>> fileStatus,

>> and list files under a particular hdfs path, and similar things...

>> However, you are right that the daemon process use the hdfs-site.xml

>> under

>> another folder for cloudera :

>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

>>

>> about " retrieving the info from a live cluster", I would like to get the

>> information beyond the configuration files(that is beyond the .xml

>> files).

>> Since I am able to use :

>> conf = new Configuration()

>> to connect to hdfs and did other operations, shouldn't I be able to

>> retrieve the configuration variables?

>>

>> Thanks

>>

>> Demai

>>

>>

>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>

>> wrote:

>>

>>> Hi Demai,

>>>

>>> When you read a property from the conf object, it will only have a value

>>> if the conf object contains that property.

>>>

>>> In your case, you created the conf object as new Configuration() -- adds

>>> core-default and core-site.xml.

>>>

>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific

>>> locations. If none of these files have defined dfs.data.dir, then you

>>> will

>>> get NULL. This is expected behavior.

>>>

>>> What do you mean by retrieving the info from a live cluster? Even for

>>> processes like datanode, namenode etc, the source of truth for these

>>> properties is hdfs-site.xml. It is loaded from a specific location when

>>> you

>>> start these services.

>>>

>>> Question: Where are you running the above code? Is it on a node which

>>> has

>>> other hadoop daemons as well?

>>>

>>> My guess is that the path you are referring to (/etc/hadoop/conf.

>>> cloudera.hdfs/core-site.xml) is not the right path where these config

>>> properties are defined. Since this is a CDH cluster, you would probably

>>> be

>>> best served by asking on the CDH mailing list as to where the right path

>>> to

>>> these files is.

>>>

>>>

>>> HTH,

>>> Bhooshan

>>>

>>>

>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

>>>

>>>> hi, experts,

>>>>

>>>> I am trying to get the local filesystem directory of data node. My

>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So

>>>> the

>>>> datanode is under file:///dfs/dn. I didn't specify the value in

>>>> hdfs-site.xml.

>>>>

>>>> My code is something like:

>>>>

>>>> conf = new Configuration()

>>>>

>>>> // test both with and without the following two lines

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));

>>>>

>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL

>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL

>>>>

>>>> It looks like the get only look at the configuration file instead of

>>>> retrieving the info from the live cluster?

>>>>

>>>> Many thanks for your help in advance.

>>>>

>>>> Demai

>>>>

>>>

>>>

>>>

>>> --

>>> Bhooshan

>>>

>>

>>

>

>

> --

> Bhooshan

>


 		 	   		  

 		 	   		  

RE: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by java8964 <ja...@hotmail.com>.
Even the "dfs.data.dir" could be containing different values on different data nodes. So it doesn't make sense for a remote client to ask that value from the cluster, as which value from which data node should be given back?
Some data nodes could have 4 disks to be used for "dfs.data.dir", some data nodes could have more.
If you really think about it, it could be only block size needs to be one value across the whole cluster.
The configuration values of the only CURRENT node makes sense for the applications running in that node, maybe what's why you can get a configuration object reference from the JobContext.
Yong

Date: Tue, 9 Sep 2014 11:34:03 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Yong, 

good point about each node of the cluster could have different values in the .xml files, and probably true if the nodes have different role or hardware settings. so some of the configuration (like memory, heap) may not make sense to client at all. 

are some of the settings the same across the cluster? The one I am interested in at this moment is the folder(for local filesystem) for data node dir. I am thinking about doing some local read, so it will the very first step if I know where to read the data. 

Demai

On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:



The configuration in fact depends on the xml file. Not sure what kind of cluster configuration variables/values you are looking for.
Remember, the cluster is made of set of computers, and in hadoop, there are hdfs xml, mapred xml and even yarn xml.
Mapred.xml and yarn.xml are job related. Without concrete job, there is no detail configuration can be given.
About the HDFS configuration, there are a set of computers in the cluster. In theory, there is nothing wrong that each computer will have different configuration settings. Every computer could have different cpu cores, memory, disk counts, mount names etc. When you ask configuration variables/values, which one should be returned?
Yong

Date: Tue, 9 Sep 2014 10:01:14 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Susheel actually brought up a good point. 

once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side? 

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
One doubt on building Configuration object.



I have a Hadoop remote client and Hadoop cluster.

When a client submitted a MR job, the Configuration object is built

from Hadoop cluster node xml files, basically the resource manager

node core-site.xml and mapred-site.xml and yarn-site.xml.

Am I correct?



TIA

Susheel Kumar



On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:

> Hi Demai,

>

> conf = new Configuration()

>

> will create a new Configuration object and only add the properties from

> core-default.xml and core-site.xml in the conf object.

>

> This is basically a new configuration object, not the same that the daemons

> in the hadoop cluster use.

>

>

>

> I think what you are trying to ask is if you can get the Configuration

> object that a daemon in your live cluster (e.g. datanode) is using. I am

> not sure if the datanode or any other daemon on a hadoop cluster exposes

> such an API.

>

> I would in fact be tempted to get this information from the configuration

> management daemon instead - in your case cloudera manager. But I am not

> sure if CM exposes that API either. You could probably find out on the

> Cloudera mailing list.

>

>

> HTH,

> Bhooshan

>

>

> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

>

>> hi, Bhooshan,

>>

>> thanks for your kind response.  I run the code on one of the data node of

>> my cluster, with only one hadoop daemon running. I believe my java client

>> code connect to the cluster correctly as I am able to retrieve

>> fileStatus,

>> and list files under a particular hdfs path, and similar things...

>> However, you are right that the daemon process use the hdfs-site.xml

>> under

>> another folder for cloudera :

>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

>>

>> about " retrieving the info from a live cluster", I would like to get the

>> information beyond the configuration files(that is beyond the .xml

>> files).

>> Since I am able to use :

>> conf = new Configuration()

>> to connect to hdfs and did other operations, shouldn't I be able to

>> retrieve the configuration variables?

>>

>> Thanks

>>

>> Demai

>>

>>

>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>

>> wrote:

>>

>>> Hi Demai,

>>>

>>> When you read a property from the conf object, it will only have a value

>>> if the conf object contains that property.

>>>

>>> In your case, you created the conf object as new Configuration() -- adds

>>> core-default and core-site.xml.

>>>

>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific

>>> locations. If none of these files have defined dfs.data.dir, then you

>>> will

>>> get NULL. This is expected behavior.

>>>

>>> What do you mean by retrieving the info from a live cluster? Even for

>>> processes like datanode, namenode etc, the source of truth for these

>>> properties is hdfs-site.xml. It is loaded from a specific location when

>>> you

>>> start these services.

>>>

>>> Question: Where are you running the above code? Is it on a node which

>>> has

>>> other hadoop daemons as well?

>>>

>>> My guess is that the path you are referring to (/etc/hadoop/conf.

>>> cloudera.hdfs/core-site.xml) is not the right path where these config

>>> properties are defined. Since this is a CDH cluster, you would probably

>>> be

>>> best served by asking on the CDH mailing list as to where the right path

>>> to

>>> these files is.

>>>

>>>

>>> HTH,

>>> Bhooshan

>>>

>>>

>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

>>>

>>>> hi, experts,

>>>>

>>>> I am trying to get the local filesystem directory of data node. My

>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So

>>>> the

>>>> datanode is under file:///dfs/dn. I didn't specify the value in

>>>> hdfs-site.xml.

>>>>

>>>> My code is something like:

>>>>

>>>> conf = new Configuration()

>>>>

>>>> // test both with and without the following two lines

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));

>>>>

>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL

>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL

>>>>

>>>> It looks like the get only look at the configuration file instead of

>>>> retrieving the info from the live cluster?

>>>>

>>>> Many thanks for your help in advance.

>>>>

>>>> Demai

>>>>

>>>

>>>

>>>

>>> --

>>> Bhooshan

>>>

>>

>>

>

>

> --

> Bhooshan

>


 		 	   		  

 		 	   		  

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Yong,

good point about each node of the cluster could have different values in
the .xml files, and probably true if the nodes have different role or
hardware settings. so some of the configuration (like memory, heap) may not
make sense to client at all.

are some of the settings the same across the cluster? The one I am
interested in at this moment is the folder(for local filesystem) for data
node dir. I am thinking about doing some local read, so it will the very
first step if I know where to read the data.

Demai

On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:

> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
>
> Remember, the cluster is made of set of computers, and in hadoop, there
> are hdfs xml, mapred xml and even yarn xml.
>
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
>
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <
> skgadalay@gmail.com> wrote:
>
> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>
>
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
I am interested in job related configuration properties.

I have a mix of EC2 instance types - m1.small and m1.medium.

I am not clear which properties are server side and which are client
side in the mapred-site.xml and yarn-site.xml.

I have edited the resource manager node (m1.medium EC2 instance) and
gave yarn.app.mapreduce.am.resource.mb=256 (default is 1536),
mapreduce.map.memory.mb=256 (default is 1GB),
mapreduce.reduce.memory.mb=256 (default is 1GB),
mapreduce.map.speculative=false (default is true),
mapreduce.job.reduce.slowstart.completedmaps=0.8 (default is 0.05)
and some more..

When I look at the conf.xml of the job under HDFS directory
/tmp/hadoop-yarn/staging/<user>/.staging/<job id>/<job id>_conf.xml
I see some values are accepted and some are not accepted.
I see mapreduce.map.memory.mb, mapreduce.reduce.memory.mb with modified values
but yarn.app.mapreduce.am.resource.mb is with default value of 1536.
The mapreduce.map.speculative is with default value of true.
The mapreduce.job.reduce.slowstart.completedmaps is with default value of 0.05.

To forcefully set this new values I am sending these properties in the
client by the command
hadoop jar <jar name> <main class> \
-D mapreduce.job.reduce.slowstart.completedmaps=0.80 \
-D mapreduce.map.speculative=false \

There is no good document giving distinction between what is client
side property and what is server side property.

TIA
Susheel Kumar

On 9/9/14, java8964 <ja...@hotmail.com> wrote:
> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
> Remember, the cluster is made of set of computers, and in hadoop, there are
> hdfs xml, mapred xml and even yarn xml.
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
> Yong
>
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't
> set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com>
> wrote:
> One doubt on building Configuration object.
>
>
>
> I have a Hadoop remote client and Hadoop cluster.
>
> When a client submitted a MR job, the Configuration object is built
>
> from Hadoop cluster node xml files, basically the resource manager
>
> node core-site.xml and mapred-site.xml and yarn-site.xml.
>
> Am I correct?
>
>
>
> TIA
>
> Susheel Kumar
>
>
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
>
>> Hi Demai,
>
>>
>
>> conf = new Configuration()
>
>>
>
>> will create a new Configuration object and only add the properties from
>
>> core-default.xml and core-site.xml in the conf object.
>
>>
>
>> This is basically a new configuration object, not the same that the
>> daemons
>
>> in the hadoop cluster use.
>
>>
>
>>
>
>>
>
>> I think what you are trying to ask is if you can get the Configuration
>
>> object that a daemon in your live cluster (e.g. datanode) is using. I am
>
>> not sure if the datanode or any other daemon on a hadoop cluster exposes
>
>> such an API.
>
>>
>
>> I would in fact be tempted to get this information from the configuration
>
>> management daemon instead - in your case cloudera manager. But I am not
>
>> sure if CM exposes that API either. You could probably find out on the
>
>> Cloudera mailing list.
>
>>
>
>>
>
>> HTH,
>
>> Bhooshan
>
>>
>
>>
>
>> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>>
>
>>> hi, Bhooshan,
>
>>>
>
>>> thanks for your kind response.  I run the code on one of the data node
>>> of
>
>>> my cluster, with only one hadoop daemon running. I believe my java
>>> client
>
>>> code connect to the cluster correctly as I am able to retrieve
>
>>> fileStatus,
>
>>> and list files under a particular hdfs path, and similar things...
>
>>> However, you are right that the daemon process use the hdfs-site.xml
>
>>> under
>
>>> another folder for cloudera :
>
>>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>
>>>
>
>>> about " retrieving the info from a live cluster", I would like to get
>>> the
>
>>> information beyond the configuration files(that is beyond the .xml
>
>>> files).
>
>>> Since I am able to use :
>
>>> conf = new Configuration()
>
>>> to connect to hdfs and did other operations, shouldn't I be able to
>
>>> retrieve the configuration variables?
>
>>>
>
>>> Thanks
>
>>>
>
>>> Demai
>
>>>
>
>>>
>
>>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal
>>> <bh...@gmail.com>
>
>>> wrote:
>
>>>
>
>>>> Hi Demai,
>
>>>>
>
>>>> When you read a property from the conf object, it will only have a
>>>> value
>
>>>> if the conf object contains that property.
>
>>>>
>
>>>> In your case, you created the conf object as new Configuration() --
>>>> adds
>
>>>> core-default and core-site.xml.
>
>>>>
>
>>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
>>>> specific
>
>>>> locations. If none of these files have defined dfs.data.dir, then you
>
>>>> will
>
>>>> get NULL. This is expected behavior.
>
>>>>
>
>>>> What do you mean by retrieving the info from a live cluster? Even for
>
>>>> processes like datanode, namenode etc, the source of truth for these
>
>>>> properties is hdfs-site.xml. It is loaded from a specific location when
>
>>>> you
>
>>>> start these services.
>
>>>>
>
>>>> Question: Where are you running the above code? Is it on a node which
>
>>>> has
>
>>>> other hadoop daemons as well?
>
>>>>
>
>>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>
>>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>
>>>> properties are defined. Since this is a CDH cluster, you would probably
>
>>>> be
>
>>>> best served by asking on the CDH mailing list as to where the right
>>>> path
>
>>>> to
>
>>>> these files is.
>
>>>>
>
>>>>
>
>>>> HTH,
>
>>>> Bhooshan
>
>>>>
>
>>>>
>
>>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>
>>>>
>
>>>>> hi, experts,
>
>>>>>
>
>>>>> I am trying to get the local filesystem directory of data node. My
>
>>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
>
>>>>> the
>
>>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>
>>>>> hdfs-site.xml.
>
>>>>>
>
>>>>> My code is something like:
>
>>>>>
>
>>>>> conf = new Configuration()
>
>>>>>
>
>>>>> // test both with and without the following two lines
>
>>>>> conf.addResource (new
>
>>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>
>>>>> conf.addResource (new
>
>>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>
>>>>>
>
>>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>
>>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>
>>>>>
>
>>>>> It looks like the get only look at the configuration file instead of
>
>>>>> retrieving the info from the live cluster?
>
>>>>>
>
>>>>> Many thanks for your help in advance.
>
>>>>>
>
>>>>> Demai
>
>>>>>
>
>>>>
>
>>>>
>
>>>>
>
>>>> --
>
>>>> Bhooshan
>
>>>>
>
>>>
>
>>>
>
>>
>
>>
>
>> --
>
>> Bhooshan
>
>>
>
>
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
I am interested in job related configuration properties.

I have a mix of EC2 instance types - m1.small and m1.medium.

I am not clear which properties are server side and which are client
side in the mapred-site.xml and yarn-site.xml.

I have edited the resource manager node (m1.medium EC2 instance) and
gave yarn.app.mapreduce.am.resource.mb=256 (default is 1536),
mapreduce.map.memory.mb=256 (default is 1GB),
mapreduce.reduce.memory.mb=256 (default is 1GB),
mapreduce.map.speculative=false (default is true),
mapreduce.job.reduce.slowstart.completedmaps=0.8 (default is 0.05)
and some more..

When I look at the conf.xml of the job under HDFS directory
/tmp/hadoop-yarn/staging/<user>/.staging/<job id>/<job id>_conf.xml
I see some values are accepted and some are not accepted.
I see mapreduce.map.memory.mb, mapreduce.reduce.memory.mb with modified values
but yarn.app.mapreduce.am.resource.mb is with default value of 1536.
The mapreduce.map.speculative is with default value of true.
The mapreduce.job.reduce.slowstart.completedmaps is with default value of 0.05.

To forcefully set this new values I am sending these properties in the
client by the command
hadoop jar <jar name> <main class> \
-D mapreduce.job.reduce.slowstart.completedmaps=0.80 \
-D mapreduce.map.speculative=false \

There is no good document giving distinction between what is client
side property and what is server side property.

TIA
Susheel Kumar

On 9/9/14, java8964 <ja...@hotmail.com> wrote:
> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
> Remember, the cluster is made of set of computers, and in hadoop, there are
> hdfs xml, mapred xml and even yarn xml.
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
> Yong
>
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't
> set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com>
> wrote:
> One doubt on building Configuration object.
>
>
>
> I have a Hadoop remote client and Hadoop cluster.
>
> When a client submitted a MR job, the Configuration object is built
>
> from Hadoop cluster node xml files, basically the resource manager
>
> node core-site.xml and mapred-site.xml and yarn-site.xml.
>
> Am I correct?
>
>
>
> TIA
>
> Susheel Kumar
>
>
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
>
>> Hi Demai,
>
>>
>
>> conf = new Configuration()
>
>>
>
>> will create a new Configuration object and only add the properties from
>
>> core-default.xml and core-site.xml in the conf object.
>
>>
>
>> This is basically a new configuration object, not the same that the
>> daemons
>
>> in the hadoop cluster use.
>
>>
>
>>
>
>>
>
>> I think what you are trying to ask is if you can get the Configuration
>
>> object that a daemon in your live cluster (e.g. datanode) is using. I am
>
>> not sure if the datanode or any other daemon on a hadoop cluster exposes
>
>> such an API.
>
>>
>
>> I would in fact be tempted to get this information from the configuration
>
>> management daemon instead - in your case cloudera manager. But I am not
>
>> sure if CM exposes that API either. You could probably find out on the
>
>> Cloudera mailing list.
>
>>
>
>>
>
>> HTH,
>
>> Bhooshan
>
>>
>
>>
>
>> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>>
>
>>> hi, Bhooshan,
>
>>>
>
>>> thanks for your kind response.  I run the code on one of the data node
>>> of
>
>>> my cluster, with only one hadoop daemon running. I believe my java
>>> client
>
>>> code connect to the cluster correctly as I am able to retrieve
>
>>> fileStatus,
>
>>> and list files under a particular hdfs path, and similar things...
>
>>> However, you are right that the daemon process use the hdfs-site.xml
>
>>> under
>
>>> another folder for cloudera :
>
>>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>
>>>
>
>>> about " retrieving the info from a live cluster", I would like to get
>>> the
>
>>> information beyond the configuration files(that is beyond the .xml
>
>>> files).
>
>>> Since I am able to use :
>
>>> conf = new Configuration()
>
>>> to connect to hdfs and did other operations, shouldn't I be able to
>
>>> retrieve the configuration variables?
>
>>>
>
>>> Thanks
>
>>>
>
>>> Demai
>
>>>
>
>>>
>
>>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal
>>> <bh...@gmail.com>
>
>>> wrote:
>
>>>
>
>>>> Hi Demai,
>
>>>>
>
>>>> When you read a property from the conf object, it will only have a
>>>> value
>
>>>> if the conf object contains that property.
>
>>>>
>
>>>> In your case, you created the conf object as new Configuration() --
>>>> adds
>
>>>> core-default and core-site.xml.
>
>>>>
>
>>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
>>>> specific
>
>>>> locations. If none of these files have defined dfs.data.dir, then you
>
>>>> will
>
>>>> get NULL. This is expected behavior.
>
>>>>
>
>>>> What do you mean by retrieving the info from a live cluster? Even for
>
>>>> processes like datanode, namenode etc, the source of truth for these
>
>>>> properties is hdfs-site.xml. It is loaded from a specific location when
>
>>>> you
>
>>>> start these services.
>
>>>>
>
>>>> Question: Where are you running the above code? Is it on a node which
>
>>>> has
>
>>>> other hadoop daemons as well?
>
>>>>
>
>>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>
>>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>
>>>> properties are defined. Since this is a CDH cluster, you would probably
>
>>>> be
>
>>>> best served by asking on the CDH mailing list as to where the right
>>>> path
>
>>>> to
>
>>>> these files is.
>
>>>>
>
>>>>
>
>>>> HTH,
>
>>>> Bhooshan
>
>>>>
>
>>>>
>
>>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>
>>>>
>
>>>>> hi, experts,
>
>>>>>
>
>>>>> I am trying to get the local filesystem directory of data node. My
>
>>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
>
>>>>> the
>
>>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>
>>>>> hdfs-site.xml.
>
>>>>>
>
>>>>> My code is something like:
>
>>>>>
>
>>>>> conf = new Configuration()
>
>>>>>
>
>>>>> // test both with and without the following two lines
>
>>>>> conf.addResource (new
>
>>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>
>>>>> conf.addResource (new
>
>>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>
>>>>>
>
>>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>
>>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>
>>>>>
>
>>>>> It looks like the get only look at the configuration file instead of
>
>>>>> retrieving the info from the live cluster?
>
>>>>>
>
>>>>> Many thanks for your help in advance.
>
>>>>>
>
>>>>> Demai
>
>>>>>
>
>>>>
>
>>>>
>
>>>>
>
>>>> --
>
>>>> Bhooshan
>
>>>>
>
>>>
>
>>>
>
>>
>
>>
>
>> --
>
>> Bhooshan
>
>>
>
>
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Yong,

good point about each node of the cluster could have different values in
the .xml files, and probably true if the nodes have different role or
hardware settings. so some of the configuration (like memory, heap) may not
make sense to client at all.

are some of the settings the same across the cluster? The one I am
interested in at this moment is the folder(for local filesystem) for data
node dir. I am thinking about doing some local read, so it will the very
first step if I know where to read the data.

Demai

On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:

> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
>
> Remember, the cluster is made of set of computers, and in hadoop, there
> are hdfs xml, mapred xml and even yarn xml.
>
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
>
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <
> skgadalay@gmail.com> wrote:
>
> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>
>
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Yong,

good point about each node of the cluster could have different values in
the .xml files, and probably true if the nodes have different role or
hardware settings. so some of the configuration (like memory, heap) may not
make sense to client at all.

are some of the settings the same across the cluster? The one I am
interested in at this moment is the folder(for local filesystem) for data
node dir. I am thinking about doing some local read, so it will the very
first step if I know where to read the data.

Demai

On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:

> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
>
> Remember, the cluster is made of set of computers, and in hadoop, there
> are hdfs xml, mapred xml and even yarn xml.
>
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
>
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <
> skgadalay@gmail.com> wrote:
>
> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>
>
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Yong,

good point about each node of the cluster could have different values in
the .xml files, and probably true if the nodes have different role or
hardware settings. so some of the configuration (like memory, heap) may not
make sense to client at all.

are some of the settings the same across the cluster? The one I am
interested in at this moment is the folder(for local filesystem) for data
node dir. I am thinking about doing some local read, so it will the very
first step if I know where to read the data.

Demai

On Tue, Sep 9, 2014 at 11:13 AM, java8964 <ja...@hotmail.com> wrote:

> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
>
> Remember, the cluster is made of set of computers, and in hadoop, there
> are hdfs xml, mapred xml and even yarn xml.
>
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
>
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
>
> Yong
>
> ------------------------------
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml
> doesn't set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <
> skgadalay@gmail.com> wrote:
>
> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>
>
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
I am interested in job related configuration properties.

I have a mix of EC2 instance types - m1.small and m1.medium.

I am not clear which properties are server side and which are client
side in the mapred-site.xml and yarn-site.xml.

I have edited the resource manager node (m1.medium EC2 instance) and
gave yarn.app.mapreduce.am.resource.mb=256 (default is 1536),
mapreduce.map.memory.mb=256 (default is 1GB),
mapreduce.reduce.memory.mb=256 (default is 1GB),
mapreduce.map.speculative=false (default is true),
mapreduce.job.reduce.slowstart.completedmaps=0.8 (default is 0.05)
and some more..

When I look at the conf.xml of the job under HDFS directory
/tmp/hadoop-yarn/staging/<user>/.staging/<job id>/<job id>_conf.xml
I see some values are accepted and some are not accepted.
I see mapreduce.map.memory.mb, mapreduce.reduce.memory.mb with modified values
but yarn.app.mapreduce.am.resource.mb is with default value of 1536.
The mapreduce.map.speculative is with default value of true.
The mapreduce.job.reduce.slowstart.completedmaps is with default value of 0.05.

To forcefully set this new values I am sending these properties in the
client by the command
hadoop jar <jar name> <main class> \
-D mapreduce.job.reduce.slowstart.completedmaps=0.80 \
-D mapreduce.map.speculative=false \

There is no good document giving distinction between what is client
side property and what is server side property.

TIA
Susheel Kumar

On 9/9/14, java8964 <ja...@hotmail.com> wrote:
> The configuration in fact depends on the xml file. Not sure what kind of
> cluster configuration variables/values you are looking for.
> Remember, the cluster is made of set of computers, and in hadoop, there are
> hdfs xml, mapred xml and even yarn xml.
> Mapred.xml and yarn.xml are job related. Without concrete job, there is no
> detail configuration can be given.
> About the HDFS configuration, there are a set of computers in the cluster.
> In theory, there is nothing wrong that each computer will have different
> configuration settings. Every computer could have different cpu cores,
> memory, disk counts, mount names etc. When you ask configuration
> variables/values, which one should be returned?
> Yong
>
> Date: Tue, 9 Sep 2014 10:01:14 -0700
> Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't
> set it explicitly
> From: nidmgg@gmail.com
> To: user@hadoop.apache.org
>
> Susheel actually brought up a good point.
>
> once the client code connects to the cluster, is there way to get the real
> cluster configuration variables/values instead of relying on the .xml files
> on client side?
>
> Demai
>
> On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com>
> wrote:
> One doubt on building Configuration object.
>
>
>
> I have a Hadoop remote client and Hadoop cluster.
>
> When a client submitted a MR job, the Configuration object is built
>
> from Hadoop cluster node xml files, basically the resource manager
>
> node core-site.xml and mapred-site.xml and yarn-site.xml.
>
> Am I correct?
>
>
>
> TIA
>
> Susheel Kumar
>
>
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
>
>> Hi Demai,
>
>>
>
>> conf = new Configuration()
>
>>
>
>> will create a new Configuration object and only add the properties from
>
>> core-default.xml and core-site.xml in the conf object.
>
>>
>
>> This is basically a new configuration object, not the same that the
>> daemons
>
>> in the hadoop cluster use.
>
>>
>
>>
>
>>
>
>> I think what you are trying to ask is if you can get the Configuration
>
>> object that a daemon in your live cluster (e.g. datanode) is using. I am
>
>> not sure if the datanode or any other daemon on a hadoop cluster exposes
>
>> such an API.
>
>>
>
>> I would in fact be tempted to get this information from the configuration
>
>> management daemon instead - in your case cloudera manager. But I am not
>
>> sure if CM exposes that API either. You could probably find out on the
>
>> Cloudera mailing list.
>
>>
>
>>
>
>> HTH,
>
>> Bhooshan
>
>>
>
>>
>
>> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>>
>
>>> hi, Bhooshan,
>
>>>
>
>>> thanks for your kind response.  I run the code on one of the data node
>>> of
>
>>> my cluster, with only one hadoop daemon running. I believe my java
>>> client
>
>>> code connect to the cluster correctly as I am able to retrieve
>
>>> fileStatus,
>
>>> and list files under a particular hdfs path, and similar things...
>
>>> However, you are right that the daemon process use the hdfs-site.xml
>
>>> under
>
>>> another folder for cloudera :
>
>>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>
>>>
>
>>> about " retrieving the info from a live cluster", I would like to get
>>> the
>
>>> information beyond the configuration files(that is beyond the .xml
>
>>> files).
>
>>> Since I am able to use :
>
>>> conf = new Configuration()
>
>>> to connect to hdfs and did other operations, shouldn't I be able to
>
>>> retrieve the configuration variables?
>
>>>
>
>>> Thanks
>
>>>
>
>>> Demai
>
>>>
>
>>>
>
>>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal
>>> <bh...@gmail.com>
>
>>> wrote:
>
>>>
>
>>>> Hi Demai,
>
>>>>
>
>>>> When you read a property from the conf object, it will only have a
>>>> value
>
>>>> if the conf object contains that property.
>
>>>>
>
>>>> In your case, you created the conf object as new Configuration() --
>>>> adds
>
>>>> core-default and core-site.xml.
>
>>>>
>
>>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
>>>> specific
>
>>>> locations. If none of these files have defined dfs.data.dir, then you
>
>>>> will
>
>>>> get NULL. This is expected behavior.
>
>>>>
>
>>>> What do you mean by retrieving the info from a live cluster? Even for
>
>>>> processes like datanode, namenode etc, the source of truth for these
>
>>>> properties is hdfs-site.xml. It is loaded from a specific location when
>
>>>> you
>
>>>> start these services.
>
>>>>
>
>>>> Question: Where are you running the above code? Is it on a node which
>
>>>> has
>
>>>> other hadoop daemons as well?
>
>>>>
>
>>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>
>>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>
>>>> properties are defined. Since this is a CDH cluster, you would probably
>
>>>> be
>
>>>> best served by asking on the CDH mailing list as to where the right
>>>> path
>
>>>> to
>
>>>> these files is.
>
>>>>
>
>>>>
>
>>>> HTH,
>
>>>> Bhooshan
>
>>>>
>
>>>>
>
>>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>
>>>>
>
>>>>> hi, experts,
>
>>>>>
>
>>>>> I am trying to get the local filesystem directory of data node. My
>
>>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
>
>>>>> the
>
>>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>
>>>>> hdfs-site.xml.
>
>>>>>
>
>>>>> My code is something like:
>
>>>>>
>
>>>>> conf = new Configuration()
>
>>>>>
>
>>>>> // test both with and without the following two lines
>
>>>>> conf.addResource (new
>
>>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>
>>>>> conf.addResource (new
>
>>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>
>>>>>
>
>>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>
>>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>
>>>>>
>
>>>>> It looks like the get only look at the configuration file instead of
>
>>>>> retrieving the info from the live cluster?
>
>>>>>
>
>>>>> Many thanks for your help in advance.
>
>>>>>
>
>>>>> Demai
>
>>>>>
>
>>>>
>
>>>>
>
>>>>
>
>>>> --
>
>>>> Bhooshan
>
>>>>
>
>>>
>
>>>
>
>>
>
>>
>
>> --
>
>> Bhooshan
>
>>
>
>
>

RE: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by java8964 <ja...@hotmail.com>.
The configuration in fact depends on the xml file. Not sure what kind of cluster configuration variables/values you are looking for.
Remember, the cluster is made of set of computers, and in hadoop, there are hdfs xml, mapred xml and even yarn xml.
Mapred.xml and yarn.xml are job related. Without concrete job, there is no detail configuration can be given.
About the HDFS configuration, there are a set of computers in the cluster. In theory, there is nothing wrong that each computer will have different configuration settings. Every computer could have different cpu cores, memory, disk counts, mount names etc. When you ask configuration variables/values, which one should be returned?
Yong

Date: Tue, 9 Sep 2014 10:01:14 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Susheel actually brought up a good point. 

once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side? 

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
One doubt on building Configuration object.



I have a Hadoop remote client and Hadoop cluster.

When a client submitted a MR job, the Configuration object is built

from Hadoop cluster node xml files, basically the resource manager

node core-site.xml and mapred-site.xml and yarn-site.xml.

Am I correct?



TIA

Susheel Kumar



On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:

> Hi Demai,

>

> conf = new Configuration()

>

> will create a new Configuration object and only add the properties from

> core-default.xml and core-site.xml in the conf object.

>

> This is basically a new configuration object, not the same that the daemons

> in the hadoop cluster use.

>

>

>

> I think what you are trying to ask is if you can get the Configuration

> object that a daemon in your live cluster (e.g. datanode) is using. I am

> not sure if the datanode or any other daemon on a hadoop cluster exposes

> such an API.

>

> I would in fact be tempted to get this information from the configuration

> management daemon instead - in your case cloudera manager. But I am not

> sure if CM exposes that API either. You could probably find out on the

> Cloudera mailing list.

>

>

> HTH,

> Bhooshan

>

>

> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

>

>> hi, Bhooshan,

>>

>> thanks for your kind response.  I run the code on one of the data node of

>> my cluster, with only one hadoop daemon running. I believe my java client

>> code connect to the cluster correctly as I am able to retrieve

>> fileStatus,

>> and list files under a particular hdfs path, and similar things...

>> However, you are right that the daemon process use the hdfs-site.xml

>> under

>> another folder for cloudera :

>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

>>

>> about " retrieving the info from a live cluster", I would like to get the

>> information beyond the configuration files(that is beyond the .xml

>> files).

>> Since I am able to use :

>> conf = new Configuration()

>> to connect to hdfs and did other operations, shouldn't I be able to

>> retrieve the configuration variables?

>>

>> Thanks

>>

>> Demai

>>

>>

>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>

>> wrote:

>>

>>> Hi Demai,

>>>

>>> When you read a property from the conf object, it will only have a value

>>> if the conf object contains that property.

>>>

>>> In your case, you created the conf object as new Configuration() -- adds

>>> core-default and core-site.xml.

>>>

>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific

>>> locations. If none of these files have defined dfs.data.dir, then you

>>> will

>>> get NULL. This is expected behavior.

>>>

>>> What do you mean by retrieving the info from a live cluster? Even for

>>> processes like datanode, namenode etc, the source of truth for these

>>> properties is hdfs-site.xml. It is loaded from a specific location when

>>> you

>>> start these services.

>>>

>>> Question: Where are you running the above code? Is it on a node which

>>> has

>>> other hadoop daemons as well?

>>>

>>> My guess is that the path you are referring to (/etc/hadoop/conf.

>>> cloudera.hdfs/core-site.xml) is not the right path where these config

>>> properties are defined. Since this is a CDH cluster, you would probably

>>> be

>>> best served by asking on the CDH mailing list as to where the right path

>>> to

>>> these files is.

>>>

>>>

>>> HTH,

>>> Bhooshan

>>>

>>>

>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

>>>

>>>> hi, experts,

>>>>

>>>> I am trying to get the local filesystem directory of data node. My

>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So

>>>> the

>>>> datanode is under file:///dfs/dn. I didn't specify the value in

>>>> hdfs-site.xml.

>>>>

>>>> My code is something like:

>>>>

>>>> conf = new Configuration()

>>>>

>>>> // test both with and without the following two lines

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));

>>>>

>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL

>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL

>>>>

>>>> It looks like the get only look at the configuration file instead of

>>>> retrieving the info from the live cluster?

>>>>

>>>> Many thanks for your help in advance.

>>>>

>>>> Demai

>>>>

>>>

>>>

>>>

>>> --

>>> Bhooshan

>>>

>>

>>

>

>

> --

> Bhooshan

>


 		 	   		  

RE: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by java8964 <ja...@hotmail.com>.
The configuration in fact depends on the xml file. Not sure what kind of cluster configuration variables/values you are looking for.
Remember, the cluster is made of set of computers, and in hadoop, there are hdfs xml, mapred xml and even yarn xml.
Mapred.xml and yarn.xml are job related. Without concrete job, there is no detail configuration can be given.
About the HDFS configuration, there are a set of computers in the cluster. In theory, there is nothing wrong that each computer will have different configuration settings. Every computer could have different cpu cores, memory, disk counts, mount names etc. When you ask configuration variables/values, which one should be returned?
Yong

Date: Tue, 9 Sep 2014 10:01:14 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Susheel actually brought up a good point. 

once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side? 

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
One doubt on building Configuration object.



I have a Hadoop remote client and Hadoop cluster.

When a client submitted a MR job, the Configuration object is built

from Hadoop cluster node xml files, basically the resource manager

node core-site.xml and mapred-site.xml and yarn-site.xml.

Am I correct?



TIA

Susheel Kumar



On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:

> Hi Demai,

>

> conf = new Configuration()

>

> will create a new Configuration object and only add the properties from

> core-default.xml and core-site.xml in the conf object.

>

> This is basically a new configuration object, not the same that the daemons

> in the hadoop cluster use.

>

>

>

> I think what you are trying to ask is if you can get the Configuration

> object that a daemon in your live cluster (e.g. datanode) is using. I am

> not sure if the datanode or any other daemon on a hadoop cluster exposes

> such an API.

>

> I would in fact be tempted to get this information from the configuration

> management daemon instead - in your case cloudera manager. But I am not

> sure if CM exposes that API either. You could probably find out on the

> Cloudera mailing list.

>

>

> HTH,

> Bhooshan

>

>

> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

>

>> hi, Bhooshan,

>>

>> thanks for your kind response.  I run the code on one of the data node of

>> my cluster, with only one hadoop daemon running. I believe my java client

>> code connect to the cluster correctly as I am able to retrieve

>> fileStatus,

>> and list files under a particular hdfs path, and similar things...

>> However, you are right that the daemon process use the hdfs-site.xml

>> under

>> another folder for cloudera :

>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

>>

>> about " retrieving the info from a live cluster", I would like to get the

>> information beyond the configuration files(that is beyond the .xml

>> files).

>> Since I am able to use :

>> conf = new Configuration()

>> to connect to hdfs and did other operations, shouldn't I be able to

>> retrieve the configuration variables?

>>

>> Thanks

>>

>> Demai

>>

>>

>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>

>> wrote:

>>

>>> Hi Demai,

>>>

>>> When you read a property from the conf object, it will only have a value

>>> if the conf object contains that property.

>>>

>>> In your case, you created the conf object as new Configuration() -- adds

>>> core-default and core-site.xml.

>>>

>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific

>>> locations. If none of these files have defined dfs.data.dir, then you

>>> will

>>> get NULL. This is expected behavior.

>>>

>>> What do you mean by retrieving the info from a live cluster? Even for

>>> processes like datanode, namenode etc, the source of truth for these

>>> properties is hdfs-site.xml. It is loaded from a specific location when

>>> you

>>> start these services.

>>>

>>> Question: Where are you running the above code? Is it on a node which

>>> has

>>> other hadoop daemons as well?

>>>

>>> My guess is that the path you are referring to (/etc/hadoop/conf.

>>> cloudera.hdfs/core-site.xml) is not the right path where these config

>>> properties are defined. Since this is a CDH cluster, you would probably

>>> be

>>> best served by asking on the CDH mailing list as to where the right path

>>> to

>>> these files is.

>>>

>>>

>>> HTH,

>>> Bhooshan

>>>

>>>

>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

>>>

>>>> hi, experts,

>>>>

>>>> I am trying to get the local filesystem directory of data node. My

>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So

>>>> the

>>>> datanode is under file:///dfs/dn. I didn't specify the value in

>>>> hdfs-site.xml.

>>>>

>>>> My code is something like:

>>>>

>>>> conf = new Configuration()

>>>>

>>>> // test both with and without the following two lines

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));

>>>>

>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL

>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL

>>>>

>>>> It looks like the get only look at the configuration file instead of

>>>> retrieving the info from the live cluster?

>>>>

>>>> Many thanks for your help in advance.

>>>>

>>>> Demai

>>>>

>>>

>>>

>>>

>>> --

>>> Bhooshan

>>>

>>

>>

>

>

> --

> Bhooshan

>


 		 	   		  

RE: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by java8964 <ja...@hotmail.com>.
The configuration in fact depends on the xml file. Not sure what kind of cluster configuration variables/values you are looking for.
Remember, the cluster is made of set of computers, and in hadoop, there are hdfs xml, mapred xml and even yarn xml.
Mapred.xml and yarn.xml are job related. Without concrete job, there is no detail configuration can be given.
About the HDFS configuration, there are a set of computers in the cluster. In theory, there is nothing wrong that each computer will have different configuration settings. Every computer could have different cpu cores, memory, disk counts, mount names etc. When you ask configuration variables/values, which one should be returned?
Yong

Date: Tue, 9 Sep 2014 10:01:14 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Susheel actually brought up a good point. 

once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side? 

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
One doubt on building Configuration object.



I have a Hadoop remote client and Hadoop cluster.

When a client submitted a MR job, the Configuration object is built

from Hadoop cluster node xml files, basically the resource manager

node core-site.xml and mapred-site.xml and yarn-site.xml.

Am I correct?



TIA

Susheel Kumar



On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:

> Hi Demai,

>

> conf = new Configuration()

>

> will create a new Configuration object and only add the properties from

> core-default.xml and core-site.xml in the conf object.

>

> This is basically a new configuration object, not the same that the daemons

> in the hadoop cluster use.

>

>

>

> I think what you are trying to ask is if you can get the Configuration

> object that a daemon in your live cluster (e.g. datanode) is using. I am

> not sure if the datanode or any other daemon on a hadoop cluster exposes

> such an API.

>

> I would in fact be tempted to get this information from the configuration

> management daemon instead - in your case cloudera manager. But I am not

> sure if CM exposes that API either. You could probably find out on the

> Cloudera mailing list.

>

>

> HTH,

> Bhooshan

>

>

> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

>

>> hi, Bhooshan,

>>

>> thanks for your kind response.  I run the code on one of the data node of

>> my cluster, with only one hadoop daemon running. I believe my java client

>> code connect to the cluster correctly as I am able to retrieve

>> fileStatus,

>> and list files under a particular hdfs path, and similar things...

>> However, you are right that the daemon process use the hdfs-site.xml

>> under

>> another folder for cloudera :

>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

>>

>> about " retrieving the info from a live cluster", I would like to get the

>> information beyond the configuration files(that is beyond the .xml

>> files).

>> Since I am able to use :

>> conf = new Configuration()

>> to connect to hdfs and did other operations, shouldn't I be able to

>> retrieve the configuration variables?

>>

>> Thanks

>>

>> Demai

>>

>>

>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>

>> wrote:

>>

>>> Hi Demai,

>>>

>>> When you read a property from the conf object, it will only have a value

>>> if the conf object contains that property.

>>>

>>> In your case, you created the conf object as new Configuration() -- adds

>>> core-default and core-site.xml.

>>>

>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific

>>> locations. If none of these files have defined dfs.data.dir, then you

>>> will

>>> get NULL. This is expected behavior.

>>>

>>> What do you mean by retrieving the info from a live cluster? Even for

>>> processes like datanode, namenode etc, the source of truth for these

>>> properties is hdfs-site.xml. It is loaded from a specific location when

>>> you

>>> start these services.

>>>

>>> Question: Where are you running the above code? Is it on a node which

>>> has

>>> other hadoop daemons as well?

>>>

>>> My guess is that the path you are referring to (/etc/hadoop/conf.

>>> cloudera.hdfs/core-site.xml) is not the right path where these config

>>> properties are defined. Since this is a CDH cluster, you would probably

>>> be

>>> best served by asking on the CDH mailing list as to where the right path

>>> to

>>> these files is.

>>>

>>>

>>> HTH,

>>> Bhooshan

>>>

>>>

>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

>>>

>>>> hi, experts,

>>>>

>>>> I am trying to get the local filesystem directory of data node. My

>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So

>>>> the

>>>> datanode is under file:///dfs/dn. I didn't specify the value in

>>>> hdfs-site.xml.

>>>>

>>>> My code is something like:

>>>>

>>>> conf = new Configuration()

>>>>

>>>> // test both with and without the following two lines

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));

>>>>

>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL

>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL

>>>>

>>>> It looks like the get only look at the configuration file instead of

>>>> retrieving the info from the live cluster?

>>>>

>>>> Many thanks for your help in advance.

>>>>

>>>> Demai

>>>>

>>>

>>>

>>>

>>> --

>>> Bhooshan

>>>

>>

>>

>

>

> --

> Bhooshan

>


 		 	   		  

RE: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by java8964 <ja...@hotmail.com>.
The configuration in fact depends on the xml file. Not sure what kind of cluster configuration variables/values you are looking for.
Remember, the cluster is made of set of computers, and in hadoop, there are hdfs xml, mapred xml and even yarn xml.
Mapred.xml and yarn.xml are job related. Without concrete job, there is no detail configuration can be given.
About the HDFS configuration, there are a set of computers in the cluster. In theory, there is nothing wrong that each computer will have different configuration settings. Every computer could have different cpu cores, memory, disk counts, mount names etc. When you ask configuration variables/values, which one should be returned?
Yong

Date: Tue, 9 Sep 2014 10:01:14 -0700
Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
From: nidmgg@gmail.com
To: user@hadoop.apache.org

Susheel actually brought up a good point. 

once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side? 

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
One doubt on building Configuration object.



I have a Hadoop remote client and Hadoop cluster.

When a client submitted a MR job, the Configuration object is built

from Hadoop cluster node xml files, basically the resource manager

node core-site.xml and mapred-site.xml and yarn-site.xml.

Am I correct?



TIA

Susheel Kumar



On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:

> Hi Demai,

>

> conf = new Configuration()

>

> will create a new Configuration object and only add the properties from

> core-default.xml and core-site.xml in the conf object.

>

> This is basically a new configuration object, not the same that the daemons

> in the hadoop cluster use.

>

>

>

> I think what you are trying to ask is if you can get the Configuration

> object that a daemon in your live cluster (e.g. datanode) is using. I am

> not sure if the datanode or any other daemon on a hadoop cluster exposes

> such an API.

>

> I would in fact be tempted to get this information from the configuration

> management daemon instead - in your case cloudera manager. But I am not

> sure if CM exposes that API either. You could probably find out on the

> Cloudera mailing list.

>

>

> HTH,

> Bhooshan

>

>

> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

>

>> hi, Bhooshan,

>>

>> thanks for your kind response.  I run the code on one of the data node of

>> my cluster, with only one hadoop daemon running. I believe my java client

>> code connect to the cluster correctly as I am able to retrieve

>> fileStatus,

>> and list files under a particular hdfs path, and similar things...

>> However, you are right that the daemon process use the hdfs-site.xml

>> under

>> another folder for cloudera :

>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

>>

>> about " retrieving the info from a live cluster", I would like to get the

>> information beyond the configuration files(that is beyond the .xml

>> files).

>> Since I am able to use :

>> conf = new Configuration()

>> to connect to hdfs and did other operations, shouldn't I be able to

>> retrieve the configuration variables?

>>

>> Thanks

>>

>> Demai

>>

>>

>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>

>> wrote:

>>

>>> Hi Demai,

>>>

>>> When you read a property from the conf object, it will only have a value

>>> if the conf object contains that property.

>>>

>>> In your case, you created the conf object as new Configuration() -- adds

>>> core-default and core-site.xml.

>>>

>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific

>>> locations. If none of these files have defined dfs.data.dir, then you

>>> will

>>> get NULL. This is expected behavior.

>>>

>>> What do you mean by retrieving the info from a live cluster? Even for

>>> processes like datanode, namenode etc, the source of truth for these

>>> properties is hdfs-site.xml. It is loaded from a specific location when

>>> you

>>> start these services.

>>>

>>> Question: Where are you running the above code? Is it on a node which

>>> has

>>> other hadoop daemons as well?

>>>

>>> My guess is that the path you are referring to (/etc/hadoop/conf.

>>> cloudera.hdfs/core-site.xml) is not the right path where these config

>>> properties are defined. Since this is a CDH cluster, you would probably

>>> be

>>> best served by asking on the CDH mailing list as to where the right path

>>> to

>>> these files is.

>>>

>>>

>>> HTH,

>>> Bhooshan

>>>

>>>

>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

>>>

>>>> hi, experts,

>>>>

>>>> I am trying to get the local filesystem directory of data node. My

>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So

>>>> the

>>>> datanode is under file:///dfs/dn. I didn't specify the value in

>>>> hdfs-site.xml.

>>>>

>>>> My code is something like:

>>>>

>>>> conf = new Configuration()

>>>>

>>>> // test both with and without the following two lines

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));

>>>> conf.addResource (new

>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));

>>>>

>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL

>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL

>>>>

>>>> It looks like the get only look at the configuration file instead of

>>>> retrieving the info from the live cluster?

>>>>

>>>> Many thanks for your help in advance.

>>>>

>>>> Demai

>>>>

>>>

>>>

>>>

>>> --

>>> Bhooshan

>>>

>>

>>

>

>

> --

> Bhooshan

>


 		 	   		  

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Susheel actually brought up a good point.

once the client code connects to the cluster, is there way to get the real
cluster configuration variables/values instead of relying on the .xml files
on client side?

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Susheel actually brought up a good point.

once the client code connects to the cluster, is there way to get the real
cluster configuration variables/values instead of relying on the .xml files
on client side?

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Susheel actually brought up a good point.

once the client code connects to the cluster, is there way to get the real
cluster configuration variables/values instead of relying on the .xml files
on client side?

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Susheel actually brought up a good point.

once the client code connects to the cluster, is there way to get the real
cluster configuration variables/values instead of relying on the .xml files
on client side?

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:

> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
One doubt on building Configuration object.

I have a Hadoop remote client and Hadoop cluster.
When a client submitted a MR job, the Configuration object is built
from Hadoop cluster node xml files, basically the resource manager
node core-site.xml and mapred-site.xml and yarn-site.xml.
Am I correct?

TIA
Susheel Kumar

On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> Hi Demai,
>
> conf = new Configuration()
>
> will create a new Configuration object and only add the properties from
> core-default.xml and core-site.xml in the conf object.
>
> This is basically a new configuration object, not the same that the daemons
> in the hadoop cluster use.
>
>
>
> I think what you are trying to ask is if you can get the Configuration
> object that a daemon in your live cluster (e.g. datanode) is using. I am
> not sure if the datanode or any other daemon on a hadoop cluster exposes
> such an API.
>
> I would in fact be tempted to get this information from the configuration
> management daemon instead - in your case cloudera manager. But I am not
> sure if CM exposes that API either. You could probably find out on the
> Cloudera mailing list.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, Bhooshan,
>>
>> thanks for your kind response.  I run the code on one of the data node of
>> my cluster, with only one hadoop daemon running. I believe my java client
>> code connect to the cluster correctly as I am able to retrieve
>> fileStatus,
>> and list files under a particular hdfs path, and similar things...
>> However, you are right that the daemon process use the hdfs-site.xml
>> under
>> another folder for cloudera :
>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>>
>> about " retrieving the info from a live cluster", I would like to get the
>> information beyond the configuration files(that is beyond the .xml
>> files).
>> Since I am able to use :
>> conf = new Configuration()
>> to connect to hdfs and did other operations, shouldn't I be able to
>> retrieve the configuration variables?
>>
>> Thanks
>>
>> Demai
>>
>>
>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> When you read a property from the conf object, it will only have a value
>>> if the conf object contains that property.
>>>
>>> In your case, you created the conf object as new Configuration() -- adds
>>> core-default and core-site.xml.
>>>
>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>>> locations. If none of these files have defined dfs.data.dir, then you
>>> will
>>> get NULL. This is expected behavior.
>>>
>>> What do you mean by retrieving the info from a live cluster? Even for
>>> processes like datanode, namenode etc, the source of truth for these
>>> properties is hdfs-site.xml. It is loaded from a specific location when
>>> you
>>> start these services.
>>>
>>> Question: Where are you running the above code? Is it on a node which
>>> has
>>> other hadoop daemons as well?
>>>
>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>>> properties are defined. Since this is a CDH cluster, you would probably
>>> be
>>> best served by asking on the CDH mailing list as to where the right path
>>> to
>>> these files is.
>>>
>>>
>>> HTH,
>>> Bhooshan
>>>
>>>
>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> hi, experts,
>>>>
>>>> I am trying to get the local filesystem directory of data node. My
>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
>>>> the
>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>>> hdfs-site.xml.
>>>>
>>>> My code is something like:
>>>>
>>>> conf = new Configuration()
>>>>
>>>> // test both with and without the following two lines
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>>
>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>>
>>>> It looks like the get only look at the configuration file instead of
>>>> retrieving the info from the live cluster?
>>>>
>>>> Many thanks for your help in advance.
>>>>
>>>> Demai
>>>>
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
One doubt on building Configuration object.

I have a Hadoop remote client and Hadoop cluster.
When a client submitted a MR job, the Configuration object is built
from Hadoop cluster node xml files, basically the resource manager
node core-site.xml and mapred-site.xml and yarn-site.xml.
Am I correct?

TIA
Susheel Kumar

On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> Hi Demai,
>
> conf = new Configuration()
>
> will create a new Configuration object and only add the properties from
> core-default.xml and core-site.xml in the conf object.
>
> This is basically a new configuration object, not the same that the daemons
> in the hadoop cluster use.
>
>
>
> I think what you are trying to ask is if you can get the Configuration
> object that a daemon in your live cluster (e.g. datanode) is using. I am
> not sure if the datanode or any other daemon on a hadoop cluster exposes
> such an API.
>
> I would in fact be tempted to get this information from the configuration
> management daemon instead - in your case cloudera manager. But I am not
> sure if CM exposes that API either. You could probably find out on the
> Cloudera mailing list.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, Bhooshan,
>>
>> thanks for your kind response.  I run the code on one of the data node of
>> my cluster, with only one hadoop daemon running. I believe my java client
>> code connect to the cluster correctly as I am able to retrieve
>> fileStatus,
>> and list files under a particular hdfs path, and similar things...
>> However, you are right that the daemon process use the hdfs-site.xml
>> under
>> another folder for cloudera :
>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>>
>> about " retrieving the info from a live cluster", I would like to get the
>> information beyond the configuration files(that is beyond the .xml
>> files).
>> Since I am able to use :
>> conf = new Configuration()
>> to connect to hdfs and did other operations, shouldn't I be able to
>> retrieve the configuration variables?
>>
>> Thanks
>>
>> Demai
>>
>>
>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> When you read a property from the conf object, it will only have a value
>>> if the conf object contains that property.
>>>
>>> In your case, you created the conf object as new Configuration() -- adds
>>> core-default and core-site.xml.
>>>
>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>>> locations. If none of these files have defined dfs.data.dir, then you
>>> will
>>> get NULL. This is expected behavior.
>>>
>>> What do you mean by retrieving the info from a live cluster? Even for
>>> processes like datanode, namenode etc, the source of truth for these
>>> properties is hdfs-site.xml. It is loaded from a specific location when
>>> you
>>> start these services.
>>>
>>> Question: Where are you running the above code? Is it on a node which
>>> has
>>> other hadoop daemons as well?
>>>
>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>>> properties are defined. Since this is a CDH cluster, you would probably
>>> be
>>> best served by asking on the CDH mailing list as to where the right path
>>> to
>>> these files is.
>>>
>>>
>>> HTH,
>>> Bhooshan
>>>
>>>
>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> hi, experts,
>>>>
>>>> I am trying to get the local filesystem directory of data node. My
>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
>>>> the
>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>>> hdfs-site.xml.
>>>>
>>>> My code is something like:
>>>>
>>>> conf = new Configuration()
>>>>
>>>> // test both with and without the following two lines
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>>
>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>>
>>>> It looks like the get only look at the configuration file instead of
>>>> retrieving the info from the live cluster?
>>>>
>>>> Many thanks for your help in advance.
>>>>
>>>> Demai
>>>>
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Bhooshan,

Many thanks. I appreciate the help. I will also try out Cloudera mailing
list/community

Demai

On Mon, Sep 8, 2014 at 4:58 PM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi Demai,
>
> conf = new Configuration()
>
> will create a new Configuration object and only add the properties from
> core-default.xml and core-site.xml in the conf object.
>
> This is basically a new configuration object, not the same that the
> daemons in the hadoop cluster use.
>
>
>
> I think what you are trying to ask is if you can get the Configuration
> object that a daemon in your live cluster (e.g. datanode) is using. I am
> not sure if the datanode or any other daemon on a hadoop cluster exposes
> such an API.
>
> I would in fact be tempted to get this information from the configuration
> management daemon instead - in your case cloudera manager. But I am not
> sure if CM exposes that API either. You could probably find out on the
> Cloudera mailing list.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, Bhooshan,
>>
>> thanks for your kind response.  I run the code on one of the data node of
>> my cluster, with only one hadoop daemon running. I believe my java client
>> code connect to the cluster correctly as I am able to retrieve fileStatus,
>> and list files under a particular hdfs path, and similar things...
>> However, you are right that the daemon process use the hdfs-site.xml under
>> another folder for cloudera :
>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>>
>> about " retrieving the info from a live cluster", I would like to get the
>> information beyond the configuration files(that is beyond the .xml files).
>> Since I am able to use :
>> conf = new Configuration()
>> to connect to hdfs and did other operations, shouldn't I be able to
>> retrieve the configuration variables?
>>
>> Thanks
>>
>> Demai
>>
>>
>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> When you read a property from the conf object, it will only have a value
>>> if the conf object contains that property.
>>>
>>> In your case, you created the conf object as new Configuration() -- adds
>>> core-default and core-site.xml.
>>>
>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>>> locations. If none of these files have defined dfs.data.dir, then you will
>>> get NULL. This is expected behavior.
>>>
>>> What do you mean by retrieving the info from a live cluster? Even for
>>> processes like datanode, namenode etc, the source of truth for these
>>> properties is hdfs-site.xml. It is loaded from a specific location when you
>>> start these services.
>>>
>>> Question: Where are you running the above code? Is it on a node which
>>> has other hadoop daemons as well?
>>>
>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>>> properties are defined. Since this is a CDH cluster, you would probably be
>>> best served by asking on the CDH mailing list as to where the right path to
>>> these files is.
>>>
>>>
>>> HTH,
>>> Bhooshan
>>>
>>>
>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> hi, experts,
>>>>
>>>> I am trying to get the local filesystem directory of data node. My
>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>>> hdfs-site.xml.
>>>>
>>>> My code is something like:
>>>>
>>>> conf = new Configuration()
>>>>
>>>> // test both with and without the following two lines
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>>
>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>>
>>>> It looks like the get only look at the configuration file instead of
>>>> retrieving the info from the live cluster?
>>>>
>>>> Many thanks for your help in advance.
>>>>
>>>> Demai
>>>>
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Bhooshan,

Many thanks. I appreciate the help. I will also try out Cloudera mailing
list/community

Demai

On Mon, Sep 8, 2014 at 4:58 PM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi Demai,
>
> conf = new Configuration()
>
> will create a new Configuration object and only add the properties from
> core-default.xml and core-site.xml in the conf object.
>
> This is basically a new configuration object, not the same that the
> daemons in the hadoop cluster use.
>
>
>
> I think what you are trying to ask is if you can get the Configuration
> object that a daemon in your live cluster (e.g. datanode) is using. I am
> not sure if the datanode or any other daemon on a hadoop cluster exposes
> such an API.
>
> I would in fact be tempted to get this information from the configuration
> management daemon instead - in your case cloudera manager. But I am not
> sure if CM exposes that API either. You could probably find out on the
> Cloudera mailing list.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, Bhooshan,
>>
>> thanks for your kind response.  I run the code on one of the data node of
>> my cluster, with only one hadoop daemon running. I believe my java client
>> code connect to the cluster correctly as I am able to retrieve fileStatus,
>> and list files under a particular hdfs path, and similar things...
>> However, you are right that the daemon process use the hdfs-site.xml under
>> another folder for cloudera :
>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>>
>> about " retrieving the info from a live cluster", I would like to get the
>> information beyond the configuration files(that is beyond the .xml files).
>> Since I am able to use :
>> conf = new Configuration()
>> to connect to hdfs and did other operations, shouldn't I be able to
>> retrieve the configuration variables?
>>
>> Thanks
>>
>> Demai
>>
>>
>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> When you read a property from the conf object, it will only have a value
>>> if the conf object contains that property.
>>>
>>> In your case, you created the conf object as new Configuration() -- adds
>>> core-default and core-site.xml.
>>>
>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>>> locations. If none of these files have defined dfs.data.dir, then you will
>>> get NULL. This is expected behavior.
>>>
>>> What do you mean by retrieving the info from a live cluster? Even for
>>> processes like datanode, namenode etc, the source of truth for these
>>> properties is hdfs-site.xml. It is loaded from a specific location when you
>>> start these services.
>>>
>>> Question: Where are you running the above code? Is it on a node which
>>> has other hadoop daemons as well?
>>>
>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>>> properties are defined. Since this is a CDH cluster, you would probably be
>>> best served by asking on the CDH mailing list as to where the right path to
>>> these files is.
>>>
>>>
>>> HTH,
>>> Bhooshan
>>>
>>>
>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> hi, experts,
>>>>
>>>> I am trying to get the local filesystem directory of data node. My
>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>>> hdfs-site.xml.
>>>>
>>>> My code is something like:
>>>>
>>>> conf = new Configuration()
>>>>
>>>> // test both with and without the following two lines
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>>
>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>>
>>>> It looks like the get only look at the configuration file instead of
>>>> retrieving the info from the live cluster?
>>>>
>>>> Many thanks for your help in advance.
>>>>
>>>> Demai
>>>>
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
One doubt on building Configuration object.

I have a Hadoop remote client and Hadoop cluster.
When a client submitted a MR job, the Configuration object is built
from Hadoop cluster node xml files, basically the resource manager
node core-site.xml and mapred-site.xml and yarn-site.xml.
Am I correct?

TIA
Susheel Kumar

On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> Hi Demai,
>
> conf = new Configuration()
>
> will create a new Configuration object and only add the properties from
> core-default.xml and core-site.xml in the conf object.
>
> This is basically a new configuration object, not the same that the daemons
> in the hadoop cluster use.
>
>
>
> I think what you are trying to ask is if you can get the Configuration
> object that a daemon in your live cluster (e.g. datanode) is using. I am
> not sure if the datanode or any other daemon on a hadoop cluster exposes
> such an API.
>
> I would in fact be tempted to get this information from the configuration
> management daemon instead - in your case cloudera manager. But I am not
> sure if CM exposes that API either. You could probably find out on the
> Cloudera mailing list.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, Bhooshan,
>>
>> thanks for your kind response.  I run the code on one of the data node of
>> my cluster, with only one hadoop daemon running. I believe my java client
>> code connect to the cluster correctly as I am able to retrieve
>> fileStatus,
>> and list files under a particular hdfs path, and similar things...
>> However, you are right that the daemon process use the hdfs-site.xml
>> under
>> another folder for cloudera :
>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>>
>> about " retrieving the info from a live cluster", I would like to get the
>> information beyond the configuration files(that is beyond the .xml
>> files).
>> Since I am able to use :
>> conf = new Configuration()
>> to connect to hdfs and did other operations, shouldn't I be able to
>> retrieve the configuration variables?
>>
>> Thanks
>>
>> Demai
>>
>>
>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> When you read a property from the conf object, it will only have a value
>>> if the conf object contains that property.
>>>
>>> In your case, you created the conf object as new Configuration() -- adds
>>> core-default and core-site.xml.
>>>
>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>>> locations. If none of these files have defined dfs.data.dir, then you
>>> will
>>> get NULL. This is expected behavior.
>>>
>>> What do you mean by retrieving the info from a live cluster? Even for
>>> processes like datanode, namenode etc, the source of truth for these
>>> properties is hdfs-site.xml. It is loaded from a specific location when
>>> you
>>> start these services.
>>>
>>> Question: Where are you running the above code? Is it on a node which
>>> has
>>> other hadoop daemons as well?
>>>
>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>>> properties are defined. Since this is a CDH cluster, you would probably
>>> be
>>> best served by asking on the CDH mailing list as to where the right path
>>> to
>>> these files is.
>>>
>>>
>>> HTH,
>>> Bhooshan
>>>
>>>
>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> hi, experts,
>>>>
>>>> I am trying to get the local filesystem directory of data node. My
>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
>>>> the
>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>>> hdfs-site.xml.
>>>>
>>>> My code is something like:
>>>>
>>>> conf = new Configuration()
>>>>
>>>> // test both with and without the following two lines
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>>
>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>>
>>>> It looks like the get only look at the configuration file instead of
>>>> retrieving the info from the live cluster?
>>>>
>>>> Many thanks for your help in advance.
>>>>
>>>> Demai
>>>>
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
One doubt on building Configuration object.

I have a Hadoop remote client and Hadoop cluster.
When a client submitted a MR job, the Configuration object is built
from Hadoop cluster node xml files, basically the resource manager
node core-site.xml and mapred-site.xml and yarn-site.xml.
Am I correct?

TIA
Susheel Kumar

On 9/9/14, Bhooshan Mogal <bh...@gmail.com> wrote:
> Hi Demai,
>
> conf = new Configuration()
>
> will create a new Configuration object and only add the properties from
> core-default.xml and core-site.xml in the conf object.
>
> This is basically a new configuration object, not the same that the daemons
> in the hadoop cluster use.
>
>
>
> I think what you are trying to ask is if you can get the Configuration
> object that a daemon in your live cluster (e.g. datanode) is using. I am
> not sure if the datanode or any other daemon on a hadoop cluster exposes
> such an API.
>
> I would in fact be tempted to get this information from the configuration
> management daemon instead - in your case cloudera manager. But I am not
> sure if CM exposes that API either. You could probably find out on the
> Cloudera mailing list.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, Bhooshan,
>>
>> thanks for your kind response.  I run the code on one of the data node of
>> my cluster, with only one hadoop daemon running. I believe my java client
>> code connect to the cluster correctly as I am able to retrieve
>> fileStatus,
>> and list files under a particular hdfs path, and similar things...
>> However, you are right that the daemon process use the hdfs-site.xml
>> under
>> another folder for cloudera :
>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>>
>> about " retrieving the info from a live cluster", I would like to get the
>> information beyond the configuration files(that is beyond the .xml
>> files).
>> Since I am able to use :
>> conf = new Configuration()
>> to connect to hdfs and did other operations, shouldn't I be able to
>> retrieve the configuration variables?
>>
>> Thanks
>>
>> Demai
>>
>>
>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> When you read a property from the conf object, it will only have a value
>>> if the conf object contains that property.
>>>
>>> In your case, you created the conf object as new Configuration() -- adds
>>> core-default and core-site.xml.
>>>
>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>>> locations. If none of these files have defined dfs.data.dir, then you
>>> will
>>> get NULL. This is expected behavior.
>>>
>>> What do you mean by retrieving the info from a live cluster? Even for
>>> processes like datanode, namenode etc, the source of truth for these
>>> properties is hdfs-site.xml. It is loaded from a specific location when
>>> you
>>> start these services.
>>>
>>> Question: Where are you running the above code? Is it on a node which
>>> has
>>> other hadoop daemons as well?
>>>
>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>>> properties are defined. Since this is a CDH cluster, you would probably
>>> be
>>> best served by asking on the CDH mailing list as to where the right path
>>> to
>>> these files is.
>>>
>>>
>>> HTH,
>>> Bhooshan
>>>
>>>
>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> hi, experts,
>>>>
>>>> I am trying to get the local filesystem directory of data node. My
>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So
>>>> the
>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>>> hdfs-site.xml.
>>>>
>>>> My code is something like:
>>>>
>>>> conf = new Configuration()
>>>>
>>>> // test both with and without the following two lines
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>>
>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>>
>>>> It looks like the get only look at the configuration file instead of
>>>> retrieving the info from the live cluster?
>>>>
>>>> Many thanks for your help in advance.
>>>>
>>>> Demai
>>>>
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
Bhooshan,

Many thanks. I appreciate the help. I will also try out Cloudera mailing
list/community

Demai

On Mon, Sep 8, 2014 at 4:58 PM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi Demai,
>
> conf = new Configuration()
>
> will create a new Configuration object and only add the properties from
> core-default.xml and core-site.xml in the conf object.
>
> This is basically a new configuration object, not the same that the
> daemons in the hadoop cluster use.
>
>
>
> I think what you are trying to ask is if you can get the Configuration
> object that a daemon in your live cluster (e.g. datanode) is using. I am
> not sure if the datanode or any other daemon on a hadoop cluster exposes
> such an API.
>
> I would in fact be tempted to get this information from the configuration
> management daemon instead - in your case cloudera manager. But I am not
> sure if CM exposes that API either. You could probably find out on the
> Cloudera mailing list.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, Bhooshan,
>>
>> thanks for your kind response.  I run the code on one of the data node of
>> my cluster, with only one hadoop daemon running. I believe my java client
>> code connect to the cluster correctly as I am able to retrieve fileStatus,
>> and list files under a particular hdfs path, and similar things...
>> However, you are right that the daemon process use the hdfs-site.xml under
>> another folder for cloudera :
>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>>
>> about " retrieving the info from a live cluster", I would like to get the
>> information beyond the configuration files(that is beyond the .xml files).
>> Since I am able to use :
>> conf = new Configuration()
>> to connect to hdfs and did other operations, shouldn't I be able to
>> retrieve the configuration variables?
>>
>> Thanks
>>
>> Demai
>>
>>
>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> When you read a property from the conf object, it will only have a value
>>> if the conf object contains that property.
>>>
>>> In your case, you created the conf object as new Configuration() -- adds
>>> core-default and core-site.xml.
>>>
>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>>> locations. If none of these files have defined dfs.data.dir, then you will
>>> get NULL. This is expected behavior.
>>>
>>> What do you mean by retrieving the info from a live cluster? Even for
>>> processes like datanode, namenode etc, the source of truth for these
>>> properties is hdfs-site.xml. It is loaded from a specific location when you
>>> start these services.
>>>
>>> Question: Where are you running the above code? Is it on a node which
>>> has other hadoop daemons as well?
>>>
>>> My guess is that the path you are referring to (/etc/hadoop/conf.
>>> cloudera.hdfs/core-site.xml) is not the right path where these config
>>> properties are defined. Since this is a CDH cluster, you would probably be
>>> best served by asking on the CDH mailing list as to where the right path to
>>> these files is.
>>>
>>>
>>> HTH,
>>> Bhooshan
>>>
>>>
>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> hi, experts,
>>>>
>>>> I am trying to get the local filesystem directory of data node. My
>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>>> hdfs-site.xml.
>>>>
>>>> My code is something like:
>>>>
>>>> conf = new Configuration()
>>>>
>>>> // test both with and without the following two lines
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>>
>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>>
>>>> It looks like the get only look at the configuration file instead of
>>>> retrieving the info from the live cluster?
>>>>
>>>> Many thanks for your help in advance.
>>>>
>>>> Demai
>>>>
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Hi Demai,

conf = new Configuration()

will create a new Configuration object and only add the properties from
core-default.xml and core-site.xml in the conf object.

This is basically a new configuration object, not the same that the daemons
in the hadoop cluster use.



I think what you are trying to ask is if you can get the Configuration
object that a daemon in your live cluster (e.g. datanode) is using. I am
not sure if the datanode or any other daemon on a hadoop cluster exposes
such an API.

I would in fact be tempted to get this information from the configuration
management daemon instead - in your case cloudera manager. But I am not
sure if CM exposes that API either. You could probably find out on the
Cloudera mailing list.


HTH,
Bhooshan


On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

> hi, Bhooshan,
>
> thanks for your kind response.  I run the code on one of the data node of
> my cluster, with only one hadoop daemon running. I believe my java client
> code connect to the cluster correctly as I am able to retrieve fileStatus,
> and list files under a particular hdfs path, and similar things...
> However, you are right that the daemon process use the hdfs-site.xml under
> another folder for cloudera :
> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>
> about " retrieving the info from a live cluster", I would like to get the
> information beyond the configuration files(that is beyond the .xml files).
> Since I am able to use :
> conf = new Configuration()
> to connect to hdfs and did other operations, shouldn't I be able to
> retrieve the configuration variables?
>
> Thanks
>
> Demai
>
>
> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
> wrote:
>
>> Hi Demai,
>>
>> When you read a property from the conf object, it will only have a value
>> if the conf object contains that property.
>>
>> In your case, you created the conf object as new Configuration() -- adds
>> core-default and core-site.xml.
>>
>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>> locations. If none of these files have defined dfs.data.dir, then you will
>> get NULL. This is expected behavior.
>>
>> What do you mean by retrieving the info from a live cluster? Even for
>> processes like datanode, namenode etc, the source of truth for these
>> properties is hdfs-site.xml. It is loaded from a specific location when you
>> start these services.
>>
>> Question: Where are you running the above code? Is it on a node which has
>> other hadoop daemons as well?
>>
>> My guess is that the path you are referring to (/etc/hadoop/conf.
>> cloudera.hdfs/core-site.xml) is not the right path where these config
>> properties are defined. Since this is a CDH cluster, you would probably be
>> best served by asking on the CDH mailing list as to where the right path to
>> these files is.
>>
>>
>> HTH,
>> Bhooshan
>>
>>
>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> hi, experts,
>>>
>>> I am trying to get the local filesystem directory of data node. My
>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>> hdfs-site.xml.
>>>
>>> My code is something like:
>>>
>>> conf = new Configuration()
>>>
>>> // test both with and without the following two lines
>>> conf.addResource (new
>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>> conf.addResource (new
>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>
>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>
>>> It looks like the get only look at the configuration file instead of
>>> retrieving the info from the live cluster?
>>>
>>> Many thanks for your help in advance.
>>>
>>> Demai
>>>
>>
>>
>>
>> --
>> Bhooshan
>>
>
>


-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Hi Demai,

conf = new Configuration()

will create a new Configuration object and only add the properties from
core-default.xml and core-site.xml in the conf object.

This is basically a new configuration object, not the same that the daemons
in the hadoop cluster use.



I think what you are trying to ask is if you can get the Configuration
object that a daemon in your live cluster (e.g. datanode) is using. I am
not sure if the datanode or any other daemon on a hadoop cluster exposes
such an API.

I would in fact be tempted to get this information from the configuration
management daemon instead - in your case cloudera manager. But I am not
sure if CM exposes that API either. You could probably find out on the
Cloudera mailing list.


HTH,
Bhooshan


On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

> hi, Bhooshan,
>
> thanks for your kind response.  I run the code on one of the data node of
> my cluster, with only one hadoop daemon running. I believe my java client
> code connect to the cluster correctly as I am able to retrieve fileStatus,
> and list files under a particular hdfs path, and similar things...
> However, you are right that the daemon process use the hdfs-site.xml under
> another folder for cloudera :
> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>
> about " retrieving the info from a live cluster", I would like to get the
> information beyond the configuration files(that is beyond the .xml files).
> Since I am able to use :
> conf = new Configuration()
> to connect to hdfs and did other operations, shouldn't I be able to
> retrieve the configuration variables?
>
> Thanks
>
> Demai
>
>
> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
> wrote:
>
>> Hi Demai,
>>
>> When you read a property from the conf object, it will only have a value
>> if the conf object contains that property.
>>
>> In your case, you created the conf object as new Configuration() -- adds
>> core-default and core-site.xml.
>>
>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>> locations. If none of these files have defined dfs.data.dir, then you will
>> get NULL. This is expected behavior.
>>
>> What do you mean by retrieving the info from a live cluster? Even for
>> processes like datanode, namenode etc, the source of truth for these
>> properties is hdfs-site.xml. It is loaded from a specific location when you
>> start these services.
>>
>> Question: Where are you running the above code? Is it on a node which has
>> other hadoop daemons as well?
>>
>> My guess is that the path you are referring to (/etc/hadoop/conf.
>> cloudera.hdfs/core-site.xml) is not the right path where these config
>> properties are defined. Since this is a CDH cluster, you would probably be
>> best served by asking on the CDH mailing list as to where the right path to
>> these files is.
>>
>>
>> HTH,
>> Bhooshan
>>
>>
>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> hi, experts,
>>>
>>> I am trying to get the local filesystem directory of data node. My
>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>> hdfs-site.xml.
>>>
>>> My code is something like:
>>>
>>> conf = new Configuration()
>>>
>>> // test both with and without the following two lines
>>> conf.addResource (new
>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>> conf.addResource (new
>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>
>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>
>>> It looks like the get only look at the configuration file instead of
>>> retrieving the info from the live cluster?
>>>
>>> Many thanks for your help in advance.
>>>
>>> Demai
>>>
>>
>>
>>
>> --
>> Bhooshan
>>
>
>


-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Hi Demai,

conf = new Configuration()

will create a new Configuration object and only add the properties from
core-default.xml and core-site.xml in the conf object.

This is basically a new configuration object, not the same that the daemons
in the hadoop cluster use.



I think what you are trying to ask is if you can get the Configuration
object that a daemon in your live cluster (e.g. datanode) is using. I am
not sure if the datanode or any other daemon on a hadoop cluster exposes
such an API.

I would in fact be tempted to get this information from the configuration
management daemon instead - in your case cloudera manager. But I am not
sure if CM exposes that API either. You could probably find out on the
Cloudera mailing list.


HTH,
Bhooshan


On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

> hi, Bhooshan,
>
> thanks for your kind response.  I run the code on one of the data node of
> my cluster, with only one hadoop daemon running. I believe my java client
> code connect to the cluster correctly as I am able to retrieve fileStatus,
> and list files under a particular hdfs path, and similar things...
> However, you are right that the daemon process use the hdfs-site.xml under
> another folder for cloudera :
> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>
> about " retrieving the info from a live cluster", I would like to get the
> information beyond the configuration files(that is beyond the .xml files).
> Since I am able to use :
> conf = new Configuration()
> to connect to hdfs and did other operations, shouldn't I be able to
> retrieve the configuration variables?
>
> Thanks
>
> Demai
>
>
> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
> wrote:
>
>> Hi Demai,
>>
>> When you read a property from the conf object, it will only have a value
>> if the conf object contains that property.
>>
>> In your case, you created the conf object as new Configuration() -- adds
>> core-default and core-site.xml.
>>
>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>> locations. If none of these files have defined dfs.data.dir, then you will
>> get NULL. This is expected behavior.
>>
>> What do you mean by retrieving the info from a live cluster? Even for
>> processes like datanode, namenode etc, the source of truth for these
>> properties is hdfs-site.xml. It is loaded from a specific location when you
>> start these services.
>>
>> Question: Where are you running the above code? Is it on a node which has
>> other hadoop daemons as well?
>>
>> My guess is that the path you are referring to (/etc/hadoop/conf.
>> cloudera.hdfs/core-site.xml) is not the right path where these config
>> properties are defined. Since this is a CDH cluster, you would probably be
>> best served by asking on the CDH mailing list as to where the right path to
>> these files is.
>>
>>
>> HTH,
>> Bhooshan
>>
>>
>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> hi, experts,
>>>
>>> I am trying to get the local filesystem directory of data node. My
>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>> hdfs-site.xml.
>>>
>>> My code is something like:
>>>
>>> conf = new Configuration()
>>>
>>> // test both with and without the following two lines
>>> conf.addResource (new
>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>> conf.addResource (new
>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>
>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>
>>> It looks like the get only look at the configuration file instead of
>>> retrieving the info from the live cluster?
>>>
>>> Many thanks for your help in advance.
>>>
>>> Demai
>>>
>>
>>
>>
>> --
>> Bhooshan
>>
>
>


-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Hi Demai,

conf = new Configuration()

will create a new Configuration object and only add the properties from
core-default.xml and core-site.xml in the conf object.

This is basically a new configuration object, not the same that the daemons
in the hadoop cluster use.



I think what you are trying to ask is if you can get the Configuration
object that a daemon in your live cluster (e.g. datanode) is using. I am
not sure if the datanode or any other daemon on a hadoop cluster exposes
such an API.

I would in fact be tempted to get this information from the configuration
management daemon instead - in your case cloudera manager. But I am not
sure if CM exposes that API either. You could probably find out on the
Cloudera mailing list.


HTH,
Bhooshan


On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <ni...@gmail.com> wrote:

> hi, Bhooshan,
>
> thanks for your kind response.  I run the code on one of the data node of
> my cluster, with only one hadoop daemon running. I believe my java client
> code connect to the cluster correctly as I am able to retrieve fileStatus,
> and list files under a particular hdfs path, and similar things...
> However, you are right that the daemon process use the hdfs-site.xml under
> another folder for cloudera :
> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
>
> about " retrieving the info from a live cluster", I would like to get the
> information beyond the configuration files(that is beyond the .xml files).
> Since I am able to use :
> conf = new Configuration()
> to connect to hdfs and did other operations, shouldn't I be able to
> retrieve the configuration variables?
>
> Thanks
>
> Demai
>
>
> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
> wrote:
>
>> Hi Demai,
>>
>> When you read a property from the conf object, it will only have a value
>> if the conf object contains that property.
>>
>> In your case, you created the conf object as new Configuration() -- adds
>> core-default and core-site.xml.
>>
>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
>> locations. If none of these files have defined dfs.data.dir, then you will
>> get NULL. This is expected behavior.
>>
>> What do you mean by retrieving the info from a live cluster? Even for
>> processes like datanode, namenode etc, the source of truth for these
>> properties is hdfs-site.xml. It is loaded from a specific location when you
>> start these services.
>>
>> Question: Where are you running the above code? Is it on a node which has
>> other hadoop daemons as well?
>>
>> My guess is that the path you are referring to (/etc/hadoop/conf.
>> cloudera.hdfs/core-site.xml) is not the right path where these config
>> properties are defined. Since this is a CDH cluster, you would probably be
>> best served by asking on the CDH mailing list as to where the right path to
>> these files is.
>>
>>
>> HTH,
>> Bhooshan
>>
>>
>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> hi, experts,
>>>
>>> I am trying to get the local filesystem directory of data node. My
>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>>> datanode is under file:///dfs/dn. I didn't specify the value in
>>> hdfs-site.xml.
>>>
>>> My code is something like:
>>>
>>> conf = new Configuration()
>>>
>>> // test both with and without the following two lines
>>> conf.addResource (new
>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>>> conf.addResource (new
>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>>
>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>>
>>> It looks like the get only look at the configuration file instead of
>>> retrieving the info from the live cluster?
>>>
>>> Many thanks for your help in advance.
>>>
>>> Demai
>>>
>>
>>
>>
>> --
>> Bhooshan
>>
>
>


-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
hi, Bhooshan,

thanks for your kind response.  I run the code on one of the data node of
my cluster, with only one hadoop daemon running. I believe my java client
code connect to the cluster correctly as I am able to retrieve fileStatus,
and list files under a particular hdfs path, and similar things... However,
you are right that the daemon process use the hdfs-site.xml under another
folder for cloudera :
/var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

about " retrieving the info from a live cluster", I would like to get the
information beyond the configuration files(that is beyond the .xml files).
Since I am able to use :
conf = new Configuration()
to connect to hdfs and did other operations, shouldn't I be able to
retrieve the configuration variables?

Thanks

Demai


On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi Demai,
>
> When you read a property from the conf object, it will only have a value
> if the conf object contains that property.
>
> In your case, you created the conf object as new Configuration() -- adds
> core-default and core-site.xml.
>
> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
> locations. If none of these files have defined dfs.data.dir, then you will
> get NULL. This is expected behavior.
>
> What do you mean by retrieving the info from a live cluster? Even for
> processes like datanode, namenode etc, the source of truth for these
> properties is hdfs-site.xml. It is loaded from a specific location when you
> start these services.
>
> Question: Where are you running the above code? Is it on a node which has
> other hadoop daemons as well?
>
> My guess is that the path you are referring to (/etc/hadoop/conf.
> cloudera.hdfs/core-site.xml) is not the right path where these config
> properties are defined. Since this is a CDH cluster, you would probably be
> best served by asking on the CDH mailing list as to where the right path to
> these files is.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, experts,
>>
>> I am trying to get the local filesystem directory of data node. My
>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>> datanode is under file:///dfs/dn. I didn't specify the value in
>> hdfs-site.xml.
>>
>> My code is something like:
>>
>> conf = new Configuration()
>>
>> // test both with and without the following two lines
>> conf.addResource (new
>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>> conf.addResource (new
>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>
>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>
>> It looks like the get only look at the configuration file instead of
>> retrieving the info from the live cluster?
>>
>> Many thanks for your help in advance.
>>
>> Demai
>>
>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
hi, Bhooshan,

thanks for your kind response.  I run the code on one of the data node of
my cluster, with only one hadoop daemon running. I believe my java client
code connect to the cluster correctly as I am able to retrieve fileStatus,
and list files under a particular hdfs path, and similar things... However,
you are right that the daemon process use the hdfs-site.xml under another
folder for cloudera :
/var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

about " retrieving the info from a live cluster", I would like to get the
information beyond the configuration files(that is beyond the .xml files).
Since I am able to use :
conf = new Configuration()
to connect to hdfs and did other operations, shouldn't I be able to
retrieve the configuration variables?

Thanks

Demai


On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi Demai,
>
> When you read a property from the conf object, it will only have a value
> if the conf object contains that property.
>
> In your case, you created the conf object as new Configuration() -- adds
> core-default and core-site.xml.
>
> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
> locations. If none of these files have defined dfs.data.dir, then you will
> get NULL. This is expected behavior.
>
> What do you mean by retrieving the info from a live cluster? Even for
> processes like datanode, namenode etc, the source of truth for these
> properties is hdfs-site.xml. It is loaded from a specific location when you
> start these services.
>
> Question: Where are you running the above code? Is it on a node which has
> other hadoop daemons as well?
>
> My guess is that the path you are referring to (/etc/hadoop/conf.
> cloudera.hdfs/core-site.xml) is not the right path where these config
> properties are defined. Since this is a CDH cluster, you would probably be
> best served by asking on the CDH mailing list as to where the right path to
> these files is.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, experts,
>>
>> I am trying to get the local filesystem directory of data node. My
>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>> datanode is under file:///dfs/dn. I didn't specify the value in
>> hdfs-site.xml.
>>
>> My code is something like:
>>
>> conf = new Configuration()
>>
>> // test both with and without the following two lines
>> conf.addResource (new
>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>> conf.addResource (new
>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>
>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>
>> It looks like the get only look at the configuration file instead of
>> retrieving the info from the live cluster?
>>
>> Many thanks for your help in advance.
>>
>> Demai
>>
>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
hi, Bhooshan,

thanks for your kind response.  I run the code on one of the data node of
my cluster, with only one hadoop daemon running. I believe my java client
code connect to the cluster correctly as I am able to retrieve fileStatus,
and list files under a particular hdfs path, and similar things... However,
you are right that the daemon process use the hdfs-site.xml under another
folder for cloudera :
/var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

about " retrieving the info from a live cluster", I would like to get the
information beyond the configuration files(that is beyond the .xml files).
Since I am able to use :
conf = new Configuration()
to connect to hdfs and did other operations, shouldn't I be able to
retrieve the configuration variables?

Thanks

Demai


On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi Demai,
>
> When you read a property from the conf object, it will only have a value
> if the conf object contains that property.
>
> In your case, you created the conf object as new Configuration() -- adds
> core-default and core-site.xml.
>
> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
> locations. If none of these files have defined dfs.data.dir, then you will
> get NULL. This is expected behavior.
>
> What do you mean by retrieving the info from a live cluster? Even for
> processes like datanode, namenode etc, the source of truth for these
> properties is hdfs-site.xml. It is loaded from a specific location when you
> start these services.
>
> Question: Where are you running the above code? Is it on a node which has
> other hadoop daemons as well?
>
> My guess is that the path you are referring to (/etc/hadoop/conf.
> cloudera.hdfs/core-site.xml) is not the right path where these config
> properties are defined. Since this is a CDH cluster, you would probably be
> best served by asking on the CDH mailing list as to where the right path to
> these files is.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, experts,
>>
>> I am trying to get the local filesystem directory of data node. My
>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>> datanode is under file:///dfs/dn. I didn't specify the value in
>> hdfs-site.xml.
>>
>> My code is something like:
>>
>> conf = new Configuration()
>>
>> // test both with and without the following two lines
>> conf.addResource (new
>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>> conf.addResource (new
>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>
>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>
>> It looks like the get only look at the configuration file instead of
>> retrieving the info from the live cluster?
>>
>> Many thanks for your help in advance.
>>
>> Demai
>>
>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Demai Ni <ni...@gmail.com>.
hi, Bhooshan,

thanks for your kind response.  I run the code on one of the data node of
my cluster, with only one hadoop daemon running. I believe my java client
code connect to the cluster correctly as I am able to retrieve fileStatus,
and list files under a particular hdfs path, and similar things... However,
you are right that the daemon process use the hdfs-site.xml under another
folder for cloudera :
/var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.

about " retrieving the info from a live cluster", I would like to get the
information beyond the configuration files(that is beyond the .xml files).
Since I am able to use :
conf = new Configuration()
to connect to hdfs and did other operations, shouldn't I be able to
retrieve the configuration variables?

Thanks

Demai


On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi Demai,
>
> When you read a property from the conf object, it will only have a value
> if the conf object contains that property.
>
> In your case, you created the conf object as new Configuration() -- adds
> core-default and core-site.xml.
>
> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
> locations. If none of these files have defined dfs.data.dir, then you will
> get NULL. This is expected behavior.
>
> What do you mean by retrieving the info from a live cluster? Even for
> processes like datanode, namenode etc, the source of truth for these
> properties is hdfs-site.xml. It is loaded from a specific location when you
> start these services.
>
> Question: Where are you running the above code? Is it on a node which has
> other hadoop daemons as well?
>
> My guess is that the path you are referring to (/etc/hadoop/conf.
> cloudera.hdfs/core-site.xml) is not the right path where these config
> properties are defined. Since this is a CDH cluster, you would probably be
> best served by asking on the CDH mailing list as to where the right path to
> these files is.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, experts,
>>
>> I am trying to get the local filesystem directory of data node. My
>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the
>> datanode is under file:///dfs/dn. I didn't specify the value in
>> hdfs-site.xml.
>>
>> My code is something like:
>>
>> conf = new Configuration()
>>
>> // test both with and without the following two lines
>> conf.addResource (new
>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
>> conf.addResource (new
>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>>
>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
>> String dnDir = conf.get("dfs.data.dir");  // return NULL
>>
>> It looks like the get only look at the configuration file instead of
>> retrieving the info from the live cluster?
>>
>> Many thanks for your help in advance.
>>
>> Demai
>>
>
>
>
> --
> Bhooshan
>

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Hi Demai,

When you read a property from the conf object, it will only have a value if
the conf object contains that property.

In your case, you created the conf object as new Configuration() -- adds
core-default and core-site.xml.

Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
locations. If none of these files have defined dfs.data.dir, then you will
get NULL. This is expected behavior.

What do you mean by retrieving the info from a live cluster? Even for
processes like datanode, namenode etc, the source of truth for these
properties is hdfs-site.xml. It is loaded from a specific location when you
start these services.

Question: Where are you running the above code? Is it on a node which has
other hadoop daemons as well?

My guess is that the path you are referring to (/etc/hadoop/conf.
cloudera.hdfs/core-site.xml) is not the right path where these config
properties are defined. Since this is a CDH cluster, you would probably be
best served by asking on the CDH mailing list as to where the right path to
these files is.


HTH,
Bhooshan


On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

> hi, experts,
>
> I am trying to get the local filesystem directory of data node. My cluster
> is using CDH5.x (hadoop 2.3) and the default configuration. So the datanode
> is under file:///dfs/dn. I didn't specify the value in hdfs-site.xml.
>
> My code is something like:
>
> conf = new Configuration()
>
> // test both with and without the following two lines
> conf.addResource (new
> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> conf.addResource (new
> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>
> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> String dnDir = conf.get("dfs.data.dir");  // return NULL
>
> It looks like the get only look at the configuration file instead of
> retrieving the info from the live cluster?
>
> Many thanks for your help in advance.
>
> Demai
>



-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Hi Demai,

When you read a property from the conf object, it will only have a value if
the conf object contains that property.

In your case, you created the conf object as new Configuration() -- adds
core-default and core-site.xml.

Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
locations. If none of these files have defined dfs.data.dir, then you will
get NULL. This is expected behavior.

What do you mean by retrieving the info from a live cluster? Even for
processes like datanode, namenode etc, the source of truth for these
properties is hdfs-site.xml. It is loaded from a specific location when you
start these services.

Question: Where are you running the above code? Is it on a node which has
other hadoop daemons as well?

My guess is that the path you are referring to (/etc/hadoop/conf.
cloudera.hdfs/core-site.xml) is not the right path where these config
properties are defined. Since this is a CDH cluster, you would probably be
best served by asking on the CDH mailing list as to where the right path to
these files is.


HTH,
Bhooshan


On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

> hi, experts,
>
> I am trying to get the local filesystem directory of data node. My cluster
> is using CDH5.x (hadoop 2.3) and the default configuration. So the datanode
> is under file:///dfs/dn. I didn't specify the value in hdfs-site.xml.
>
> My code is something like:
>
> conf = new Configuration()
>
> // test both with and without the following two lines
> conf.addResource (new
> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> conf.addResource (new
> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>
> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> String dnDir = conf.get("dfs.data.dir");  // return NULL
>
> It looks like the get only look at the configuration file instead of
> retrieving the info from the live cluster?
>
> Many thanks for your help in advance.
>
> Demai
>



-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Hi Demai,

When you read a property from the conf object, it will only have a value if
the conf object contains that property.

In your case, you created the conf object as new Configuration() -- adds
core-default and core-site.xml.

Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
locations. If none of these files have defined dfs.data.dir, then you will
get NULL. This is expected behavior.

What do you mean by retrieving the info from a live cluster? Even for
processes like datanode, namenode etc, the source of truth for these
properties is hdfs-site.xml. It is loaded from a specific location when you
start these services.

Question: Where are you running the above code? Is it on a node which has
other hadoop daemons as well?

My guess is that the path you are referring to (/etc/hadoop/conf.
cloudera.hdfs/core-site.xml) is not the right path where these config
properties are defined. Since this is a CDH cluster, you would probably be
best served by asking on the CDH mailing list as to where the right path to
these files is.


HTH,
Bhooshan


On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

> hi, experts,
>
> I am trying to get the local filesystem directory of data node. My cluster
> is using CDH5.x (hadoop 2.3) and the default configuration. So the datanode
> is under file:///dfs/dn. I didn't specify the value in hdfs-site.xml.
>
> My code is something like:
>
> conf = new Configuration()
>
> // test both with and without the following two lines
> conf.addResource (new
> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> conf.addResource (new
> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>
> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> String dnDir = conf.get("dfs.data.dir");  // return NULL
>
> It looks like the get only look at the configuration file instead of
> retrieving the info from the live cluster?
>
> Many thanks for your help in advance.
>
> Demai
>



-- 
Bhooshan

Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly

Posted by Bhooshan Mogal <bh...@gmail.com>.
Hi Demai,

When you read a property from the conf object, it will only have a value if
the conf object contains that property.

In your case, you created the conf object as new Configuration() -- adds
core-default and core-site.xml.

Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific
locations. If none of these files have defined dfs.data.dir, then you will
get NULL. This is expected behavior.

What do you mean by retrieving the info from a live cluster? Even for
processes like datanode, namenode etc, the source of truth for these
properties is hdfs-site.xml. It is loaded from a specific location when you
start these services.

Question: Where are you running the above code? Is it on a node which has
other hadoop daemons as well?

My guess is that the path you are referring to (/etc/hadoop/conf.
cloudera.hdfs/core-site.xml) is not the right path where these config
properties are defined. Since this is a CDH cluster, you would probably be
best served by asking on the CDH mailing list as to where the right path to
these files is.


HTH,
Bhooshan


On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <ni...@gmail.com> wrote:

> hi, experts,
>
> I am trying to get the local filesystem directory of data node. My cluster
> is using CDH5.x (hadoop 2.3) and the default configuration. So the datanode
> is under file:///dfs/dn. I didn't specify the value in hdfs-site.xml.
>
> My code is something like:
>
> conf = new Configuration()
>
> // test both with and without the following two lines
> conf.addResource (new
> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> conf.addResource (new
> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
>
> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> String dnDir = conf.get("dfs.data.dir");  // return NULL
>
> It looks like the get only look at the configuration file instead of
> retrieving the info from the live cluster?
>
> Many thanks for your help in advance.
>
> Demai
>



-- 
Bhooshan