You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Andy Sautins <an...@returnpath.net> on 2009/09/24 21:26:31 UTC

hadoop fsck through proxy...

   I looked in JIRA but didn't see this reported so I thought I'd see what this list thinks.  We've been using SOCKS proxying to access a Hadoop cluster generally  using setup described on the Couldera blog posting (http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/ ).  This works great by setting  hadoop.rpc.socket.factory.class.default to org.apache.hadoop.net.SocksSocketFactory.  Generally thinks work well ( hadoop dfs activity like -ls -rmr -cat ) all work fine.  The one command that doesn't work is fsck.  Note the following command and error:

hadoop fsck /
Exception in thread "main" java.net.NoRouteToHostException: No route to host

   So looking at org.apache.hadoop.hdfs.tools.DFSck.java the connection is created using URLConnection, so it makes sense why it wouldn't work since it doesn't seem to use the socket factory.

   So to me this seems like an issue.  Can someone please confirm?  If it is I'll add a JIRA.  Happy to take a crack and making a change as well ( if one should be made ).  Unclear to me the easiest way to change.  I haven't run across in the codebase code that uses hadoop.rpc.socket.factory.class.default for HTTP connections.

   Any thoughts would be appreciated.

   Thanks

   Andy


Re: hadoop fsck through proxy...

Posted by Ted Dunning <te...@gmail.com>.
Indeed.  But that is a different discussion.  I can only address the
quotidien aspects of the issue, not the philosophical.

On Thu, Sep 24, 2009 at 1:23 PM, Andy Sautins
<an...@returnpath.net>wrote:

>  Still seems a little inconsistent for that to be one of the few commands
> that don't work through a proxy with the hadoop command line.




-- 
Ted Dunning, CTO
DeepDyve

RE: hadoop fsck through proxy...

Posted by Andy Sautins <an...@returnpath.net>.
  Thanks Ted.  Right, if I setup my browser to use the SOCKS proxy and then access the namenode fsck URL ( e.g., http://namenode/fsck ) I get the same information.

  Still seems a little inconsistent for that to be one of the few commands that don't work through a proxy with the hadoop command line.

  Thanks

  Andy 

-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Thursday, September 24, 2009 2:16 PM
To: common-user@hadoop.apache.org
Subject: Re: hadoop fsck through proxy...

An easy work-around is to hit the fsck url on the namenode.  You get the
same output.

On Thu, Sep 24, 2009 at 12:26 PM, Andy Sautins
<an...@returnpath.net>wrote:

>   I looked in JIRA but didn't see this reported so I thought I'd see what
> this list thinks.  We've been using SOCKS proxying to access a Hadoop
> cluster generally  using setup described on the Couldera blog posting (
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/).  This works great by setting  hadoop.rpc.socket.factory.class.default to
> org.apache.hadoop.net.SocksSocketFactory.  Generally thinks work well (
> hadoop dfs activity like -ls -rmr -cat ) all work fine.  The one command
> that doesn't work is fsck.  Note the following command and error:
>
> hadoop fsck /
> Exception in thread "main" java.net.NoRouteToHostException: No route to
> host
>
>   So looking at org.apache.hadoop.hdfs.tools.DFSck.java the connection is
> created using URLConnection, so it makes sense why it wouldn't work since it
> doesn't seem to use the socket factory.
>
>   So to me this seems like an issue.  Can someone please confirm?  If it is
> I'll add a JIRA.  Happy to take a crack and making a change as well ( if one
> should be made ).  Unclear to me the easiest way to change.  I haven't run
> across in the codebase code that uses
> hadoop.rpc.socket.factory.class.default for HTTP connections.
>
>   Any thoughts would be appreciated.
>
>   Thanks
>
>   Andy
>
>


-- 
Ted Dunning, CTO
DeepDyve

Re: hadoop fsck through proxy...

Posted by Ted Dunning <te...@gmail.com>.
An easy work-around is to hit the fsck url on the namenode.  You get the
same output.

On Thu, Sep 24, 2009 at 12:26 PM, Andy Sautins
<an...@returnpath.net>wrote:

>   I looked in JIRA but didn't see this reported so I thought I'd see what
> this list thinks.  We've been using SOCKS proxying to access a Hadoop
> cluster generally  using setup described on the Couldera blog posting (
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/).  This works great by setting  hadoop.rpc.socket.factory.class.default to
> org.apache.hadoop.net.SocksSocketFactory.  Generally thinks work well (
> hadoop dfs activity like -ls -rmr -cat ) all work fine.  The one command
> that doesn't work is fsck.  Note the following command and error:
>
> hadoop fsck /
> Exception in thread "main" java.net.NoRouteToHostException: No route to
> host
>
>   So looking at org.apache.hadoop.hdfs.tools.DFSck.java the connection is
> created using URLConnection, so it makes sense why it wouldn't work since it
> doesn't seem to use the socket factory.
>
>   So to me this seems like an issue.  Can someone please confirm?  If it is
> I'll add a JIRA.  Happy to take a crack and making a change as well ( if one
> should be made ).  Unclear to me the easiest way to change.  I haven't run
> across in the codebase code that uses
> hadoop.rpc.socket.factory.class.default for HTTP connections.
>
>   Any thoughts would be appreciated.
>
>   Thanks
>
>   Andy
>
>


-- 
Ted Dunning, CTO
DeepDyve