You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Vamsi Krishna <va...@gmail.com> on 2016/06/08 17:35:34 UTC

Namenode automatic failover - how to handle WebHDFS URL?

Hi,

How to handle WebHDFS URL in case of Namenode automatic failover in HA HDFS
Cluster?


*HDFS CLI:*

HDFS URI: hdfs://<HOST>:<RPC_PORT>/<PATH>

When working with HDFS CLI replacing the ‘<HOST>:<RPC_PORT>’ with ‘
DFS.NAMESERVICES’ (from hdfs-site.xml) value in the HDFS URI is fetching me
the same result as with ‘<HOST>:<RPC_PORT>’.

By using the ‘DFS.NAMESERVICES’ in the HDFS URI I do not need to change my
HDFS CLI commands in case of Namenode automatic failover.

*Example:*

hdfs dfs -ls hdfs://<HOST>:<RPC_PORT>/<PATH>

hdfs dfs -ls hdfs://<DFS.NAMESERVICES>/<PATH>


*WebHDFS:*

WebHDFS URL: http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=...

Is there a way to frame the WebHDFS URL so that we don’t have to change the
URL (host) in case of Namenode automatic failover (failover from namenode-1
to namenode-2)?

http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS

*Scenario:*
I have a web application which uses WebHDFS HTTP request to read data files
from Hadoop cluster.
I would like to know if there is a way to make the web application work
without any downtime in case of Namenode automatic failover (failover from
namenode-1 to namenode-2)

Thanks,
Vamsi Attluri
-- 
Vamsi Attluri

Re: Namenode automatic failover - how to handle WebHDFS URL?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Vamsi,

A general-purpose HTTP client like curl won't have knowledge of the HA failover mechanism, so unfortunately it won't be possible to craft the URL in a certain way so that it can failover automatically.

However, Hadoop ships with the WebHdfsFileSystem class, which does have awareness of HA failover.  If your web application is coded in Java, or has a reasonable way to bridge over to Java, then you could take advantage of that class.  This class gets executed when running Hadoop shell commands that reference a URI containing the webhdfs: scheme.  For example:

hdfs dfs -ls webhdfs://127.0.0.1:50070/

You could also get an instance of WebHdfsFileSystem by calling FileSystem#get with a Configuration object that sets fs.defaultFS to a webhdfs: URI, or call the overload of FileSystem#get that accepts an explicit URI argument.

--Chris Nauroth

From: Vamsi Krishna <va...@gmail.com>>
Date: Wednesday, June 8, 2016 at 10:35 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Namenode automatic failover - how to handle WebHDFS URL?


Hi,

How to handle WebHDFS URL in case of Namenode automatic failover in HA HDFS Cluster?


HDFS CLI:

HDFS URI: hdfs://<HOST>:<RPC_PORT>/<PATH>

When working with HDFS CLI replacing the '<HOST>:<RPC_PORT>' with 'DFS.NAMESERVICES' (from hdfs-site.xml) value in the HDFS URI is fetching me the same result as with '<HOST>:<RPC_PORT>'.

By using the 'DFS.NAMESERVICES' in the HDFS URI I do not need to change my HDFS CLI commands in case of Namenode automatic failover.

Example:

hdfs dfs -ls hdfs://<HOST>:<RPC_PORT>/<PATH>

hdfs dfs -ls hdfs://<DFS.NAMESERVICES>/<PATH>


WebHDFS:

WebHDFS URL: http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=...

Is there a way to frame the WebHDFS URL so that we don't have to change the URL (host) in case of Namenode automatic failover (failover from namenode-1 to namenode-2)?

http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS

Scenario:
I have a web application which uses WebHDFS HTTP request to read data files from Hadoop cluster.
I would like to know if there is a way to make the web application work without any downtime in case of Namenode automatic failover (failover from namenode-1 to namenode-2)

Thanks,
Vamsi Attluri
--
Vamsi Attluri

Re: Namenode automatic failover - how to handle WebHDFS URL?

Posted by Gagan Brahmi <ga...@gmail.com>.
Hi Vamsi,

WebHDFS is not HA aware however there is a WebHDFSFileSystem provided
through https://issues.apache.org/jira/browse/HDFS-5122. You can try to
utilize it in your code.

Or you have other options of either using HttpFS or Knox.

HttpFS works with HA enabled HDFS cluster. However, there are several
limitations of using HttpFS. The biggest one can be the performance. HttpFS
is to be installed as an additional service and will be streamed through a
single node. This can result in performance bottleneck. WebHDFS on the
other hand streams data from each datanode.

The other option is to use Knox gateway (if already installed) and
configure WebHDFS through it. Knox provides basic failover and retry
functionality for REST API calls made to WebHDFS when HDFS HA has been
configured and enabled.

This will certainly mean you have to install and configure Knox gateway
service if not already installed.


Regards,
Gagan Brahmi

On Wed, Jun 8, 2016 at 10:35 AM, Vamsi Krishna <va...@gmail.com>
wrote:

> Hi,
>
> How to handle WebHDFS URL in case of Namenode automatic failover in HA
> HDFS Cluster?
>
>
> *HDFS CLI:*
>
> HDFS URI: hdfs://<HOST>:<RPC_PORT>/<PATH>
>
> When working with HDFS CLI replacing the ‘<HOST>:<RPC_PORT>’ with ‘
> DFS.NAMESERVICES’ (from hdfs-site.xml) value in the HDFS URI is fetching
> me the same result as with ‘<HOST>:<RPC_PORT>’.
>
> By using the ‘DFS.NAMESERVICES’ in the HDFS URI I do not need to change
> my HDFS CLI commands in case of Namenode automatic failover.
>
> *Example:*
>
> hdfs dfs -ls hdfs://<HOST>:<RPC_PORT>/<PATH>
>
> hdfs dfs -ls hdfs://<DFS.NAMESERVICES>/<PATH>
>
>
> *WebHDFS:*
>
> WebHDFS URL: http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=...
>
> Is there a way to frame the WebHDFS URL so that we don’t have to change
> the URL (host) in case of Namenode automatic failover (failover from
> namenode-1 to namenode-2)?
>
> http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS
>
> *Scenario:*
> I have a web application which uses WebHDFS HTTP request to read data
> files from Hadoop cluster.
> I would like to know if there is a way to make the web application work
> without any downtime in case of Namenode automatic failover (failover
> from namenode-1 to namenode-2)
>
> Thanks,
> Vamsi Attluri
> --
> Vamsi Attluri
>