You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2014/06/04 20:46:01 UTC

Gathering connection information

We've found that much of the Hadoop samples assume that running is being done form a cluster node, and that the connection information can be gleaned directly from a configuration object.  However, we always run our client from a remote computer, and our users must manually specify the NN/RM addresses and ports.  We've found this varies maddeningly between distros and especially on hosted virtual implementations.  Getting the wrong port results in various inscrutable errors with red-herring messages about security.  Is there a prescribed way to get the correct connection information more easily, like from a web API (where at least we'd only need one address and port)?

john

RE: Gathering connection information

Posted by John Lilley <jo...@redpoint.net>.

Thanks, that’s interesting information.  Use of an Edge Node sounds like a useful convention.  We are software vendors, and we want to connect to any Hadoop cluster regardless of configuration.  How does the Edge Node support connections to HDFS from the client?  Doesn’t the HDFS FileSystem require direct connections to each DataNode?  Does such an Edge Node proxy all of those connections automatically, or does our software need to be made aware of this convention somehow?

Thanks,
John

From: Rishi Yadav [mailto:rishi@infoobjects.com]
Sent: Saturday, June 07, 2014 8:20 AM
To: user@hadoop.apache.org
Subject: Re: Gathering connection information

Typically users ssh edge node which is co-located with the cluster. It also minimizes latency between client and cluster.

—
Sent from Mailbox<https://www.dropbox.com/mailbox>

On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian <mo...@gmail.com>> wrote:
In my experience you build a node called Edge Node which has all the libraries and configuration setting in XML to connect to the cluster, it just doesn't have any of the Hadoop daemons running.

On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>> wrote:
We’ve found that much of the Hadoop samples assume that running is being done form a cluster node, and that the connection information can be gleaned directly from a configuration object.  However, we always run our client from a remote computer, and our users must manually specify the NN/RM addresses and ports.  We’ve found this varies maddeningly between distros and especially on hosted virtual implementations.  Getting the wrong port results in various inscrutable errors with red-herring messages about security.  Is there a prescribed way to get the correct connection information more easily, like from a web API (where at least we’d only need one address and port)?

john

RE: Gathering connection information

Posted by John Lilley <jo...@redpoint.net>.

Thanks, that’s interesting information.  Use of an Edge Node sounds like a useful convention.  We are software vendors, and we want to connect to any Hadoop cluster regardless of configuration.  How does the Edge Node support connections to HDFS from the client?  Doesn’t the HDFS FileSystem require direct connections to each DataNode?  Does such an Edge Node proxy all of those connections automatically, or does our software need to be made aware of this convention somehow?

Thanks,
John

From: Rishi Yadav [mailto:rishi@infoobjects.com]
Sent: Saturday, June 07, 2014 8:20 AM
To: user@hadoop.apache.org
Subject: Re: Gathering connection information

Typically users ssh edge node which is co-located with the cluster. It also minimizes latency between client and cluster.

—
Sent from Mailbox<https://www.dropbox.com/mailbox>

On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian <mo...@gmail.com>> wrote:
In my experience you build a node called Edge Node which has all the libraries and configuration setting in XML to connect to the cluster, it just doesn't have any of the Hadoop daemons running.

On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>> wrote:
We’ve found that much of the Hadoop samples assume that running is being done form a cluster node, and that the connection information can be gleaned directly from a configuration object.  However, we always run our client from a remote computer, and our users must manually specify the NN/RM addresses and ports.  We’ve found this varies maddeningly between distros and especially on hosted virtual implementations.  Getting the wrong port results in various inscrutable errors with red-herring messages about security.  Is there a prescribed way to get the correct connection information more easily, like from a web API (where at least we’d only need one address and port)?

john

RE: Gathering connection information

Posted by John Lilley <jo...@redpoint.net>.

Thanks, that’s interesting information.  Use of an Edge Node sounds like a useful convention.  We are software vendors, and we want to connect to any Hadoop cluster regardless of configuration.  How does the Edge Node support connections to HDFS from the client?  Doesn’t the HDFS FileSystem require direct connections to each DataNode?  Does such an Edge Node proxy all of those connections automatically, or does our software need to be made aware of this convention somehow?

Thanks,
John

From: Rishi Yadav [mailto:rishi@infoobjects.com]
Sent: Saturday, June 07, 2014 8:20 AM
To: user@hadoop.apache.org
Subject: Re: Gathering connection information

Typically users ssh edge node which is co-located with the cluster. It also minimizes latency between client and cluster.

—
Sent from Mailbox<https://www.dropbox.com/mailbox>

On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian <mo...@gmail.com>> wrote:
In my experience you build a node called Edge Node which has all the libraries and configuration setting in XML to connect to the cluster, it just doesn't have any of the Hadoop daemons running.

On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>> wrote:
We’ve found that much of the Hadoop samples assume that running is being done form a cluster node, and that the connection information can be gleaned directly from a configuration object.  However, we always run our client from a remote computer, and our users must manually specify the NN/RM addresses and ports.  We’ve found this varies maddeningly between distros and especially on hosted virtual implementations.  Getting the wrong port results in various inscrutable errors with red-herring messages about security.  Is there a prescribed way to get the correct connection information more easily, like from a web API (where at least we’d only need one address and port)?

john

RE: Gathering connection information

Posted by John Lilley <jo...@redpoint.net>.

Thanks, that’s interesting information.  Use of an Edge Node sounds like a useful convention.  We are software vendors, and we want to connect to any Hadoop cluster regardless of configuration.  How does the Edge Node support connections to HDFS from the client?  Doesn’t the HDFS FileSystem require direct connections to each DataNode?  Does such an Edge Node proxy all of those connections automatically, or does our software need to be made aware of this convention somehow?

Thanks,
John

From: Rishi Yadav [mailto:rishi@infoobjects.com]
Sent: Saturday, June 07, 2014 8:20 AM
To: user@hadoop.apache.org
Subject: Re: Gathering connection information

Typically users ssh edge node which is co-located with the cluster. It also minimizes latency between client and cluster.

—
Sent from Mailbox<https://www.dropbox.com/mailbox>

On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian <mo...@gmail.com>> wrote:
In my experience you build a node called Edge Node which has all the libraries and configuration setting in XML to connect to the cluster, it just doesn't have any of the Hadoop daemons running.

On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>> wrote:
We’ve found that much of the Hadoop samples assume that running is being done form a cluster node, and that the connection information can be gleaned directly from a configuration object.  However, we always run our client from a remote computer, and our users must manually specify the NN/RM addresses and ports.  We’ve found this varies maddeningly between distros and especially on hosted virtual implementations.  Getting the wrong port results in various inscrutable errors with red-herring messages about security.  Is there a prescribed way to get the correct connection information more easily, like from a web API (where at least we’d only need one address and port)?

john

Re: Gathering connection information

Posted by Rishi Yadav <ri...@infoobjects.com>.

Typically users ssh edge node which is co-located with the cluster. It also minimizes latency between client and cluster.




—
Sent from Mailbox

On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> In my experience you build a node called Edge Node which has all the
> libraries and configuration setting in XML to connect to the cluster, it
> just doesn't have any of the Hadoop daemons running.
> On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>
> wrote:
>>  We’ve found that much of the Hadoop samples assume that running is being
>> done form a cluster node, and that the connection information can be
>> gleaned directly from a configuration object.  However, we always run our
>> client from a remote computer, and our users must manually specify the
>> NN/RM addresses and ports.  We’ve found this varies maddeningly between
>> distros and especially on hosted virtual implementations.  Getting the
>> wrong port results in various inscrutable errors with red-herring messages
>> about security.  Is there a prescribed way to get the correct connection
>> information more easily, like from a web API (where at least we’d only need
>> one address and port)?
>>
>>
>>
>> john
>>
>>
>>

Re: Gathering connection information

Posted by Rishi Yadav <ri...@infoobjects.com>.

Typically users ssh edge node which is co-located with the cluster. It also minimizes latency between client and cluster.




—
Sent from Mailbox

On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> In my experience you build a node called Edge Node which has all the
> libraries and configuration setting in XML to connect to the cluster, it
> just doesn't have any of the Hadoop daemons running.
> On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>
> wrote:
>>  We’ve found that much of the Hadoop samples assume that running is being
>> done form a cluster node, and that the connection information can be
>> gleaned directly from a configuration object.  However, we always run our
>> client from a remote computer, and our users must manually specify the
>> NN/RM addresses and ports.  We’ve found this varies maddeningly between
>> distros and especially on hosted virtual implementations.  Getting the
>> wrong port results in various inscrutable errors with red-herring messages
>> about security.  Is there a prescribed way to get the correct connection
>> information more easily, like from a web API (where at least we’d only need
>> one address and port)?
>>
>>
>>
>> john
>>
>>
>>

Re: Gathering connection information

Posted by Rishi Yadav <ri...@infoobjects.com>.

Typically users ssh edge node which is co-located with the cluster. It also minimizes latency between client and cluster.




—
Sent from Mailbox

On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> In my experience you build a node called Edge Node which has all the
> libraries and configuration setting in XML to connect to the cluster, it
> just doesn't have any of the Hadoop daemons running.
> On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>
> wrote:
>>  We’ve found that much of the Hadoop samples assume that running is being
>> done form a cluster node, and that the connection information can be
>> gleaned directly from a configuration object.  However, we always run our
>> client from a remote computer, and our users must manually specify the
>> NN/RM addresses and ports.  We’ve found this varies maddeningly between
>> distros and especially on hosted virtual implementations.  Getting the
>> wrong port results in various inscrutable errors with red-herring messages
>> about security.  Is there a prescribed way to get the correct connection
>> information more easily, like from a web API (where at least we’d only need
>> one address and port)?
>>
>>
>>
>> john
>>
>>
>>

Re: Gathering connection information

Posted by Rishi Yadav <ri...@infoobjects.com>.

Typically users ssh edge node which is co-located with the cluster. It also minimizes latency between client and cluster.




—
Sent from Mailbox

On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> In my experience you build a node called Edge Node which has all the
> libraries and configuration setting in XML to connect to the cluster, it
> just doesn't have any of the Hadoop daemons running.
> On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>
> wrote:
>>  We’ve found that much of the Hadoop samples assume that running is being
>> done form a cluster node, and that the connection information can be
>> gleaned directly from a configuration object.  However, we always run our
>> client from a remote computer, and our users must manually specify the
>> NN/RM addresses and ports.  We’ve found this varies maddeningly between
>> distros and especially on hosted virtual implementations.  Getting the
>> wrong port results in various inscrutable errors with red-herring messages
>> about security.  Is there a prescribed way to get the correct connection
>> information more easily, like from a web API (where at least we’d only need
>> one address and port)?
>>
>>
>>
>> john
>>
>>
>>

Re: Gathering connection information

Posted by Peyman Mohajerian <mo...@gmail.com>.

In my experience you build a node called Edge Node which has all the
libraries and configuration setting in XML to connect to the cluster, it
just doesn't have any of the Hadoop daemons running.


On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>
wrote:

>  We’ve found that much of the Hadoop samples assume that running is being
> done form a cluster node, and that the connection information can be
> gleaned directly from a configuration object.  However, we always run our
> client from a remote computer, and our users must manually specify the
> NN/RM addresses and ports.  We’ve found this varies maddeningly between
> distros and especially on hosted virtual implementations.  Getting the
> wrong port results in various inscrutable errors with red-herring messages
> about security.  Is there a prescribed way to get the correct connection
> information more easily, like from a web API (where at least we’d only need
> one address and port)?
>
>
>
> john
>
>
>

Re: Gathering connection information

Posted by Peyman Mohajerian <mo...@gmail.com>.

In my experience you build a node called Edge Node which has all the
libraries and configuration setting in XML to connect to the cluster, it
just doesn't have any of the Hadoop daemons running.


On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>
wrote:

>  We’ve found that much of the Hadoop samples assume that running is being
> done form a cluster node, and that the connection information can be
> gleaned directly from a configuration object.  However, we always run our
> client from a remote computer, and our users must manually specify the
> NN/RM addresses and ports.  We’ve found this varies maddeningly between
> distros and especially on hosted virtual implementations.  Getting the
> wrong port results in various inscrutable errors with red-herring messages
> about security.  Is there a prescribed way to get the correct connection
> information more easily, like from a web API (where at least we’d only need
> one address and port)?
>
>
>
> john
>
>
>

Re: Gathering connection information

Posted by Peyman Mohajerian <mo...@gmail.com>.

In my experience you build a node called Edge Node which has all the
libraries and configuration setting in XML to connect to the cluster, it
just doesn't have any of the Hadoop daemons running.


On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>
wrote:

>  We’ve found that much of the Hadoop samples assume that running is being
> done form a cluster node, and that the connection information can be
> gleaned directly from a configuration object.  However, we always run our
> client from a remote computer, and our users must manually specify the
> NN/RM addresses and ports.  We’ve found this varies maddeningly between
> distros and especially on hosted virtual implementations.  Getting the
> wrong port results in various inscrutable errors with red-herring messages
> about security.  Is there a prescribed way to get the correct connection
> information more easily, like from a web API (where at least we’d only need
> one address and port)?
>
>
>
> john
>
>
>

Re: Gathering connection information

Posted by Peyman Mohajerian <mo...@gmail.com>.

In my experience you build a node called Edge Node which has all the
libraries and configuration setting in XML to connect to the cluster, it
just doesn't have any of the Hadoop daemons running.


On Wed, Jun 4, 2014 at 2:46 PM, John Lilley <jo...@redpoint.net>
wrote:

>  We’ve found that much of the Hadoop samples assume that running is being
> done form a cluster node, and that the connection information can be
> gleaned directly from a configuration object.  However, we always run our
> client from a remote computer, and our users must manually specify the
> NN/RM addresses and ports.  We’ve found this varies maddeningly between
> distros and especially on hosted virtual implementations.  Getting the
> wrong port results in various inscrutable errors with red-herring messages
> about security.  Is there a prescribed way to get the correct connection
> information more easily, like from a web API (where at least we’d only need
> one address and port)?
>
>
>
> john
>
>
>