You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by lohit <lo...@gmail.com> on 2013/11/04 23:02:42 UTC

Question regarding access to different hadoop 2.0 cluster

Hello Devs,

With hadoop 1.0 when there was single namespace. One could access any HDFS
cluster using any other hadoop config. Something like this

hadoop --config /path/to/hadoop-cluster1 hdfs://hadoop-cluster2:8020/

Since NameNode host and port were passed directly as part of URI, if hdfs
client version matched, one could talk to different clusters without
needing to have access to cluster specific configuration.

With Hadoop 2.0 or HA mode, we only specify logical name for namenode and
rely on hdfs-site.xml  to resolve logical name to two underlying namenode
hosts.

So, you cannot do something like
hadoop --config /path/to/hadoop-cluster1
hdfs://hadoop-cluster2-logicalname/

since /path/to/hadoop-cluster1/hdfs-site.xml do not have information about
hadoop-cluster2-logicalname's namenodes.


One option is to add hadoop-cluster2-logicalname's namednodes to
/path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, this
becomes problem.
Is there any other cleaner approach to solving this?

-- 
Have a Nice Day!
Lohit

Re: Question regarding access to different hadoop 2.0 cluster

Posted by Todd Lipcon <to...@cloudera.com>.
We've discussed a few times adding a FailoverProxyProvider which would use
DNS records for this. For example, you'd add a SRV record (or multiple A
records) for the logical name, pointing to the physical hosts backing the
cluster. I think it would help reduce client-side configuration pretty
neatly, though has the disadvantage that your DNS admins need to get in the
loop.

-Todd


On Wed, Nov 6, 2013 at 7:36 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:

> Suresh,
>
> You are correct I did not explain myself very well. If one of the name
> nodes has hardware failure.  In order to avoid updating the configs for
> every single service that talks to HDFS you have to make sure the
> replacement box appears to the network to be exactly the same as the
> original.  This is not impossible as you mentioned.
>
> The more common case when this is problematic is upgrading clusters from
> non-HA to HA, or adding in new HA clusters, because there is no existing
> IP address/config to be copied.  Every time this happens all existing
> services must have new configs pushed to be able to talk to the
> new/updated HDFS. This includes Gateways, RM, Compute Nodes, Oozie
> Servers, etc.
>
> Again, this is not that big of a deal for a small setup, but for a large
> setup it can be painful.
>
> --Bobby
>
> On 11/5/13 4:57 PM, "Suresh Srinivas" <su...@hortonworks.com> wrote:
>
> >On Tue, Nov 5, 2013 at 6:57 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:
> >
> >> But that does present a problem if you have to change the DNS address of
> >> one of the HA namenodes.
> >
> >
> >Not sure what you mean by this? Do you mean hostname of one of the
> >namenode
> >changes?
> >If so, why is this is not a problem for single namenode deployment?. How
> >do
> >applications
> >addressing a namenode in a different cluster handle the change?
> >
> >
> >> It forces you to update the config on all other
> >> clusters that want to talk to it.  If you only have a few clusters that
> >>is
> >> probably not a big deal, but it can be problematic if you have many
> >> different clusters that talk to each other.
> >>
> >> --Bobby
> >>
> >> On 11/4/13 4:15 PM, "lohit" <lo...@gmail.com> wrote:
> >>
> >> >Thanks Suresh!
> >> >
> >> >
> >> >2013/11/4 Suresh Srinivas <su...@hortonworks.com>
> >> >
> >> >> Lohit,
> >> >>
> >> >> The option you have enumerated at the end is the current way to set
> >>up
> >> >> multi cluster
> >> >> environment. That is, all the client side configurations will include
> >> >>the
> >> >> following:
> >> >> - Logical service names (either for federation or HA)
> >> >> - The corresponding physical namenode addresses information
> >> >>
> >> >> For simpler management, one could use xml include to include an xml
> >> >> document
> >> >> that defines all the namespaces and namenodes.
> >> >>
> >> >> Regards,
> >> >> Suresh
> >> >>
> >> >>
> >> >> On Mon, Nov 4, 2013 at 2:02 PM, lohit <lo...@gmail.com>
> >> >>wrote:
> >> >>
> >> >> > Hello Devs,
> >> >> >
> >> >> > With hadoop 1.0 when there was single namespace. One could access
> >>any
> >> >> HDFS
> >> >> > cluster using any other hadoop config. Something like this
> >> >> >
> >> >> > hadoop --config /path/to/hadoop-cluster1
> >>hdfs://hadoop-cluster2:8020/
> >> >> >
> >> >> > Since NameNode host and port were passed directly as part of URI,
> >>if
> >> >>hdfs
> >> >> > client version matched, one could talk to different clusters
> >>without
> >> >> > needing to have access to cluster specific configuration.
> >> >> >
> >> >> > With Hadoop 2.0 or HA mode, we only specify logical name for
> >>namenode
> >> >>and
> >> >> > rely on hdfs-site.xml  to resolve logical name to two underlying
> >> >>namenode
> >> >> > hosts.
> >> >> >
> >> >> > So, you cannot do something like
> >> >> > hadoop --config /path/to/hadoop-cluster1
> >> >> > hdfs://hadoop-cluster2-logicalname/
> >> >> >
> >> >> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have
> >>information
> >> >> about
> >> >> > hadoop-cluster2-logicalname's namenodes.
> >> >> >
> >> >> >
> >> >> > One option is to add hadoop-cluster2-logicalname's namednodes to
> >> >> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters,
> >>this
> >> >> > becomes problem.
> >> >> > Is there any other cleaner approach to solving this?
> >> >> >
> >> >> > --
> >> >> > Have a Nice Day!
> >> >> > Lohit
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> http://hortonworks.com/download/
> >> >>
> >> >> --
> >> >> CONFIDENTIALITY NOTICE
> >> >> NOTICE: This message is intended for the use of the individual or
> >> >>entity to
> >> >> which it is addressed and may contain information that is
> >>confidential,
> >> >> privileged and exempt from disclosure under applicable law. If the
> >> >>reader
> >> >> of this message is not the intended recipient, you are hereby
> >>notified
> >> >>that
> >> >> any printing, copying, dissemination, distribution, disclosure or
> >> >> forwarding of this communication is strictly prohibited. If you have
> >> >> received this communication in error, please contact the sender
> >> >>immediately
> >> >> and delete it from your system. Thank You.
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >Have a Nice Day!
> >> >Lohit
> >>
> >>
> >
> >
> >--
> >http://hortonworks.com/download/
> >
> >--
> >CONFIDENTIALITY NOTICE
> >NOTICE: This message is intended for the use of the individual or entity
> >to
> >which it is addressed and may contain information that is confidential,
> >privileged and exempt from disclosure under applicable law. If the reader
> >of this message is not the intended recipient, you are hereby notified
> >that
> >any printing, copying, dissemination, distribution, disclosure or
> >forwarding of this communication is strictly prohibited. If you have
> >received this communication in error, please contact the sender
> >immediately
> >and delete it from your system. Thank You.
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Question regarding access to different hadoop 2.0 cluster

Posted by Bobby Evans <ev...@yahoo-inc.com>.
Suresh, 

You are correct I did not explain myself very well. If one of the name
nodes has hardware failure.  In order to avoid updating the configs for
every single service that talks to HDFS you have to make sure the
replacement box appears to the network to be exactly the same as the
original.  This is not impossible as you mentioned.

The more common case when this is problematic is upgrading clusters from
non-HA to HA, or adding in new HA clusters, because there is no existing
IP address/config to be copied.  Every time this happens all existing
services must have new configs pushed to be able to talk to the
new/updated HDFS. This includes Gateways, RM, Compute Nodes, Oozie
Servers, etc.

Again, this is not that big of a deal for a small setup, but for a large
setup it can be painful.

--Bobby

On 11/5/13 4:57 PM, "Suresh Srinivas" <su...@hortonworks.com> wrote:

>On Tue, Nov 5, 2013 at 6:57 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:
>
>> But that does present a problem if you have to change the DNS address of
>> one of the HA namenodes.
>
>
>Not sure what you mean by this? Do you mean hostname of one of the
>namenode
>changes?
>If so, why is this is not a problem for single namenode deployment?. How
>do
>applications
>addressing a namenode in a different cluster handle the change?
>
>
>> It forces you to update the config on all other
>> clusters that want to talk to it.  If you only have a few clusters that
>>is
>> probably not a big deal, but it can be problematic if you have many
>> different clusters that talk to each other.
>>
>> --Bobby
>>
>> On 11/4/13 4:15 PM, "lohit" <lo...@gmail.com> wrote:
>>
>> >Thanks Suresh!
>> >
>> >
>> >2013/11/4 Suresh Srinivas <su...@hortonworks.com>
>> >
>> >> Lohit,
>> >>
>> >> The option you have enumerated at the end is the current way to set
>>up
>> >> multi cluster
>> >> environment. That is, all the client side configurations will include
>> >>the
>> >> following:
>> >> - Logical service names (either for federation or HA)
>> >> - The corresponding physical namenode addresses information
>> >>
>> >> For simpler management, one could use xml include to include an xml
>> >> document
>> >> that defines all the namespaces and namenodes.
>> >>
>> >> Regards,
>> >> Suresh
>> >>
>> >>
>> >> On Mon, Nov 4, 2013 at 2:02 PM, lohit <lo...@gmail.com>
>> >>wrote:
>> >>
>> >> > Hello Devs,
>> >> >
>> >> > With hadoop 1.0 when there was single namespace. One could access
>>any
>> >> HDFS
>> >> > cluster using any other hadoop config. Something like this
>> >> >
>> >> > hadoop --config /path/to/hadoop-cluster1
>>hdfs://hadoop-cluster2:8020/
>> >> >
>> >> > Since NameNode host and port were passed directly as part of URI,
>>if
>> >>hdfs
>> >> > client version matched, one could talk to different clusters
>>without
>> >> > needing to have access to cluster specific configuration.
>> >> >
>> >> > With Hadoop 2.0 or HA mode, we only specify logical name for
>>namenode
>> >>and
>> >> > rely on hdfs-site.xml  to resolve logical name to two underlying
>> >>namenode
>> >> > hosts.
>> >> >
>> >> > So, you cannot do something like
>> >> > hadoop --config /path/to/hadoop-cluster1
>> >> > hdfs://hadoop-cluster2-logicalname/
>> >> >
>> >> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have
>>information
>> >> about
>> >> > hadoop-cluster2-logicalname's namenodes.
>> >> >
>> >> >
>> >> > One option is to add hadoop-cluster2-logicalname's namednodes to
>> >> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters,
>>this
>> >> > becomes problem.
>> >> > Is there any other cleaner approach to solving this?
>> >> >
>> >> > --
>> >> > Have a Nice Day!
>> >> > Lohit
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> http://hortonworks.com/download/
>> >>
>> >> --
>> >> CONFIDENTIALITY NOTICE
>> >> NOTICE: This message is intended for the use of the individual or
>> >>entity to
>> >> which it is addressed and may contain information that is
>>confidential,
>> >> privileged and exempt from disclosure under applicable law. If the
>> >>reader
>> >> of this message is not the intended recipient, you are hereby
>>notified
>> >>that
>> >> any printing, copying, dissemination, distribution, disclosure or
>> >> forwarding of this communication is strictly prohibited. If you have
>> >> received this communication in error, please contact the sender
>> >>immediately
>> >> and delete it from your system. Thank You.
>> >>
>> >
>> >
>> >
>> >--
>> >Have a Nice Day!
>> >Lohit
>>
>>
>
>
>-- 
>http://hortonworks.com/download/
>
>-- 
>CONFIDENTIALITY NOTICE
>NOTICE: This message is intended for the use of the individual or entity
>to 
>which it is addressed and may contain information that is confidential,
>privileged and exempt from disclosure under applicable law. If the reader
>of this message is not the intended recipient, you are hereby notified
>that 
>any printing, copying, dissemination, distribution, disclosure or
>forwarding of this communication is strictly prohibited. If you have
>received this communication in error, please contact the sender
>immediately 
>and delete it from your system. Thank You.


Re: Question regarding access to different hadoop 2.0 cluster

Posted by Suresh Srinivas <su...@hortonworks.com>.
On Tue, Nov 5, 2013 at 6:57 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:

> But that does present a problem if you have to change the DNS address of
> one of the HA namenodes.


Not sure what you mean by this? Do you mean hostname of one of the namenode
changes?
If so, why is this is not a problem for single namenode deployment?. How do
applications
addressing a namenode in a different cluster handle the change?


> It forces you to update the config on all other
> clusters that want to talk to it.  If you only have a few clusters that is
> probably not a big deal, but it can be problematic if you have many
> different clusters that talk to each other.
>
> --Bobby
>
> On 11/4/13 4:15 PM, "lohit" <lo...@gmail.com> wrote:
>
> >Thanks Suresh!
> >
> >
> >2013/11/4 Suresh Srinivas <su...@hortonworks.com>
> >
> >> Lohit,
> >>
> >> The option you have enumerated at the end is the current way to set up
> >> multi cluster
> >> environment. That is, all the client side configurations will include
> >>the
> >> following:
> >> - Logical service names (either for federation or HA)
> >> - The corresponding physical namenode addresses information
> >>
> >> For simpler management, one could use xml include to include an xml
> >> document
> >> that defines all the namespaces and namenodes.
> >>
> >> Regards,
> >> Suresh
> >>
> >>
> >> On Mon, Nov 4, 2013 at 2:02 PM, lohit <lo...@gmail.com>
> >>wrote:
> >>
> >> > Hello Devs,
> >> >
> >> > With hadoop 1.0 when there was single namespace. One could access any
> >> HDFS
> >> > cluster using any other hadoop config. Something like this
> >> >
> >> > hadoop --config /path/to/hadoop-cluster1 hdfs://hadoop-cluster2:8020/
> >> >
> >> > Since NameNode host and port were passed directly as part of URI, if
> >>hdfs
> >> > client version matched, one could talk to different clusters without
> >> > needing to have access to cluster specific configuration.
> >> >
> >> > With Hadoop 2.0 or HA mode, we only specify logical name for namenode
> >>and
> >> > rely on hdfs-site.xml  to resolve logical name to two underlying
> >>namenode
> >> > hosts.
> >> >
> >> > So, you cannot do something like
> >> > hadoop --config /path/to/hadoop-cluster1
> >> > hdfs://hadoop-cluster2-logicalname/
> >> >
> >> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have information
> >> about
> >> > hadoop-cluster2-logicalname's namenodes.
> >> >
> >> >
> >> > One option is to add hadoop-cluster2-logicalname's namednodes to
> >> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, this
> >> > becomes problem.
> >> > Is there any other cleaner approach to solving this?
> >> >
> >> > --
> >> > Have a Nice Day!
> >> > Lohit
> >> >
> >>
> >>
> >>
> >> --
> >> http://hortonworks.com/download/
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> >>entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> >>reader
> >> of this message is not the intended recipient, you are hereby notified
> >>that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> >>immediately
> >> and delete it from your system. Thank You.
> >>
> >
> >
> >
> >--
> >Have a Nice Day!
> >Lohit
>
>


-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Question regarding access to different hadoop 2.0 cluster

Posted by Bobby Evans <ev...@yahoo-inc.com>.
But that does present a problem if you have to change the DNS address of
one of the HA namenodes.  It forces you to update the config on all other
clusters that want to talk to it.  If you only have a few clusters that is
probably not a big deal, but it can be problematic if you have many
different clusters that talk to each other.

--Bobby

On 11/4/13 4:15 PM, "lohit" <lo...@gmail.com> wrote:

>Thanks Suresh!
>
>
>2013/11/4 Suresh Srinivas <su...@hortonworks.com>
>
>> Lohit,
>>
>> The option you have enumerated at the end is the current way to set up
>> multi cluster
>> environment. That is, all the client side configurations will include
>>the
>> following:
>> - Logical service names (either for federation or HA)
>> - The corresponding physical namenode addresses information
>>
>> For simpler management, one could use xml include to include an xml
>> document
>> that defines all the namespaces and namenodes.
>>
>> Regards,
>> Suresh
>>
>>
>> On Mon, Nov 4, 2013 at 2:02 PM, lohit <lo...@gmail.com>
>>wrote:
>>
>> > Hello Devs,
>> >
>> > With hadoop 1.0 when there was single namespace. One could access any
>> HDFS
>> > cluster using any other hadoop config. Something like this
>> >
>> > hadoop --config /path/to/hadoop-cluster1 hdfs://hadoop-cluster2:8020/
>> >
>> > Since NameNode host and port were passed directly as part of URI, if
>>hdfs
>> > client version matched, one could talk to different clusters without
>> > needing to have access to cluster specific configuration.
>> >
>> > With Hadoop 2.0 or HA mode, we only specify logical name for namenode
>>and
>> > rely on hdfs-site.xml  to resolve logical name to two underlying
>>namenode
>> > hosts.
>> >
>> > So, you cannot do something like
>> > hadoop --config /path/to/hadoop-cluster1
>> > hdfs://hadoop-cluster2-logicalname/
>> >
>> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have information
>> about
>> > hadoop-cluster2-logicalname's namenodes.
>> >
>> >
>> > One option is to add hadoop-cluster2-logicalname's namednodes to
>> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, this
>> > becomes problem.
>> > Is there any other cleaner approach to solving this?
>> >
>> > --
>> > Have a Nice Day!
>> > Lohit
>> >
>>
>>
>>
>> --
>> http://hortonworks.com/download/
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
>>entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the
>>reader
>> of this message is not the intended recipient, you are hereby notified
>>that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>>immediately
>> and delete it from your system. Thank You.
>>
>
>
>
>-- 
>Have a Nice Day!
>Lohit


Re: Question regarding access to different hadoop 2.0 cluster

Posted by lohit <lo...@gmail.com>.
Thanks Suresh!


2013/11/4 Suresh Srinivas <su...@hortonworks.com>

> Lohit,
>
> The option you have enumerated at the end is the current way to set up
> multi cluster
> environment. That is, all the client side configurations will include the
> following:
> - Logical service names (either for federation or HA)
> - The corresponding physical namenode addresses information
>
> For simpler management, one could use xml include to include an xml
> document
> that defines all the namespaces and namenodes.
>
> Regards,
> Suresh
>
>
> On Mon, Nov 4, 2013 at 2:02 PM, lohit <lo...@gmail.com> wrote:
>
> > Hello Devs,
> >
> > With hadoop 1.0 when there was single namespace. One could access any
> HDFS
> > cluster using any other hadoop config. Something like this
> >
> > hadoop --config /path/to/hadoop-cluster1 hdfs://hadoop-cluster2:8020/
> >
> > Since NameNode host and port were passed directly as part of URI, if hdfs
> > client version matched, one could talk to different clusters without
> > needing to have access to cluster specific configuration.
> >
> > With Hadoop 2.0 or HA mode, we only specify logical name for namenode and
> > rely on hdfs-site.xml  to resolve logical name to two underlying namenode
> > hosts.
> >
> > So, you cannot do something like
> > hadoop --config /path/to/hadoop-cluster1
> > hdfs://hadoop-cluster2-logicalname/
> >
> > since /path/to/hadoop-cluster1/hdfs-site.xml do not have information
> about
> > hadoop-cluster2-logicalname's namenodes.
> >
> >
> > One option is to add hadoop-cluster2-logicalname's namednodes to
> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, this
> > becomes problem.
> > Is there any other cleaner approach to solving this?
> >
> > --
> > Have a Nice Day!
> > Lohit
> >
>
>
>
> --
> http://hortonworks.com/download/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Have a Nice Day!
Lohit

Re: Question regarding access to different hadoop 2.0 cluster

Posted by Suresh Srinivas <su...@hortonworks.com>.
Lohit,

The option you have enumerated at the end is the current way to set up
multi cluster
environment. That is, all the client side configurations will include the
following:
- Logical service names (either for federation or HA)
- The corresponding physical namenode addresses information

For simpler management, one could use xml include to include an xml document
that defines all the namespaces and namenodes.

Regards,
Suresh


On Mon, Nov 4, 2013 at 2:02 PM, lohit <lo...@gmail.com> wrote:

> Hello Devs,
>
> With hadoop 1.0 when there was single namespace. One could access any HDFS
> cluster using any other hadoop config. Something like this
>
> hadoop --config /path/to/hadoop-cluster1 hdfs://hadoop-cluster2:8020/
>
> Since NameNode host and port were passed directly as part of URI, if hdfs
> client version matched, one could talk to different clusters without
> needing to have access to cluster specific configuration.
>
> With Hadoop 2.0 or HA mode, we only specify logical name for namenode and
> rely on hdfs-site.xml  to resolve logical name to two underlying namenode
> hosts.
>
> So, you cannot do something like
> hadoop --config /path/to/hadoop-cluster1
> hdfs://hadoop-cluster2-logicalname/
>
> since /path/to/hadoop-cluster1/hdfs-site.xml do not have information about
> hadoop-cluster2-logicalname's namenodes.
>
>
> One option is to add hadoop-cluster2-logicalname's namednodes to
> /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters, this
> becomes problem.
> Is there any other cleaner approach to solving this?
>
> --
> Have a Nice Day!
> Lohit
>



-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.