You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Evert Lammerts <Ev...@sara.nl> on 2010/11/25 20:16:10 UTC

interfaces to HDFS icw Kerberos

Hi list,

We're considering to provide our users with FTP and WebDAV interfaces (with
software provided here: http://www.hadoop.iponweb.net/). These both support
user accounts, so we'll be able to deal with authentication. We're
evaluating Cloudera's Hue, which we have coupled to our LDAP service for
authentication, as an interface to MapReduce.

These solutions are not the most beautiful in terms of authentication. We'd
much prefer to use Kerberos as provided by Y!. But if we do so, how will we
enable users to get data from the outside world onto HDFS? How do others
provide secure but easy interfaces to HDFS?

Kind regards,
Evert Lammerts


RE: interfaces to HDFS icw Kerberos

Posted by "Gibbon, Robert, VF-Group" <Ro...@vodafone.com>.
Hi Evert,

We use the WebDAV tool integrated with HUE for authenticated ad-hoc
read/write access but for full throttle inbound data, our implementation
uses Flume.

With that said, the WebDAV solution is horizontally scalable  - it is a
stateless web app - so a software or hardware loadbalancer could be your
friend here to get the throughput up.

FTP is a session based protocol - and I have not looked in detail at the
HDFS-FTP implementation so it might not be so easy to scale it sideways.

HTH
Robert

-----Original Message-----
From: Evert Lammerts [mailto:Evert.Lammerts@sara.nl] 
Sent: Samstag, 27. November 2010 12:29
To: Vinithra Varadharajan; general@hadoop.apache.org
Cc: hdfs-user@hadoop.apache.org; Hue-Users
Subject: RE: interfaces to HDFS icw Kerberos

Hi Vinithra, others,

We are using CDH3b3 (which works amazingly well!!). And it's nice see
Y!'s Kerberos solution coupled to HDFS. But I wouldn't use Hue to upload
a set of files accumulating to 100's of GBs or a number of TB's.
Browser-based applications are not suitable for that, in my experience.
Do you have different experiences with Hue? (To be fair, we haven't
tested its performance yet.)

We are setting up a cluster that will be shared by people from a number
of different institutes, all working on different cases with different
data. Their work and data should be protected, also from each other. At
the same time they need to be able to transfer their data onto HDFS
(with a high enough throughput) from their local clusters / machines. Is
there a standard that others are using and that works for shared
clusters? How are Y! people getting their data onto HDFS?

Right now we are using SFTP. We handle authentication a bit 'hacky', but
it works: we've coupled our LDAP server to Hue through an Auth*Handler,
which also allows for executing a script that updates authentication
tokens for our FTP. So far the throughput is far from high enough though
- 1.5 MB/s - with data going over the line unencrypted. Unless we can
get that up significantly, while providing the option to encrypt the
data on the wire, that will probably not be a long term solution.

If anybody can share experiences on transparently and securely getting
data onto HDFS from external locations, that would be much appreciated!

Cheers,
Evert

________________________________________
From: Vinithra Varadharajan [vinithra@cloudera.com]
Sent: Friday, November 26, 2010 10:12 PM
To: general@hadoop.apache.org; Evert Lammerts
Cc: hdfs-user@hadoop.apache.org; Hue-Users
Subject: Re: interfaces to HDFS icw Kerberos

+ Hue-user mailing list

Hi Evert,

Which version of Hue and CDH are you using? CDH3b3 includes Yahoo's
security patches, which provide Kerberos authentication. In CDH3b3, we
have made changes to Hue's filebrowser application, which provides an
interface to upload data into HDFS, so that it works with Hadoop's
authentication. Is this similar to what you're looking for?

Thanks,
Vinithra

On Thu, Nov 25, 2010 at 11:16 AM, Evert Lammerts
<Ev...@sara.nl>> wrote:
Hi list,

We're considering to provide our users with FTP and WebDAV interfaces
(with software provided here: http://www.hadoop.iponweb.net/). These
both support user accounts, so we'll be able to deal with
authentication. We're evaluating Cloudera's Hue, which we have coupled
to our LDAP service for authentication, as an interface to MapReduce.

These solutions are not the most beautiful in terms of authentication.
We'd much prefer to use Kerberos as provided by Y!. But if we do so, how
will we enable users to get data from the outside world onto HDFS? How
do others provide secure but easy interfaces to HDFS?

Kind regards,
Evert Lammerts



RE: interfaces to HDFS icw Kerberos

Posted by Evert Lammerts <Ev...@sara.nl>.
Hi Vinithra, others,

We are using CDH3b3 (which works amazingly well!!). And it's nice see Y!'s Kerberos solution coupled to HDFS. But I wouldn't use Hue to upload a set of files accumulating to 100's of GBs or a number of TB's. Browser-based applications are not suitable for that, in my experience. Do you have different experiences with Hue? (To be fair, we haven't tested its performance yet.)

We are setting up a cluster that will be shared by people from a number of different institutes, all working on different cases with different data. Their work and data should be protected, also from each other. At the same time they need to be able to transfer their data onto HDFS (with a high enough throughput) from their local clusters / machines. Is there a standard that others are using and that works for shared clusters? How are Y! people getting their data onto HDFS?

Right now we are using SFTP. We handle authentication a bit 'hacky', but it works: we've coupled our LDAP server to Hue through an Auth*Handler, which also allows for executing a script that updates authentication tokens for our FTP. So far the throughput is far from high enough though - 1.5 MB/s - with data going over the line unencrypted. Unless we can get that up significantly, while providing the option to encrypt the data on the wire, that will probably not be a long term solution.

If anybody can share experiences on transparently and securely getting data onto HDFS from external locations, that would be much appreciated!

Cheers,
Evert

________________________________________
From: Vinithra Varadharajan [vinithra@cloudera.com]
Sent: Friday, November 26, 2010 10:12 PM
To: general@hadoop.apache.org; Evert Lammerts
Cc: hdfs-user@hadoop.apache.org; Hue-Users
Subject: Re: interfaces to HDFS icw Kerberos

+ Hue-user mailing list

Hi Evert,

Which version of Hue and CDH are you using? CDH3b3 includes Yahoo's security patches, which provide Kerberos authentication. In CDH3b3, we have made changes to Hue's filebrowser application, which provides an interface to upload data into HDFS, so that it works with Hadoop's authentication. Is this similar to what you're looking for?

Thanks,
Vinithra

On Thu, Nov 25, 2010 at 11:16 AM, Evert Lammerts <Ev...@sara.nl>> wrote:
Hi list,

We're considering to provide our users with FTP and WebDAV interfaces (with
software provided here: http://www.hadoop.iponweb.net/). These both support
user accounts, so we'll be able to deal with authentication. We're
evaluating Cloudera's Hue, which we have coupled to our LDAP service for
authentication, as an interface to MapReduce.

These solutions are not the most beautiful in terms of authentication. We'd
much prefer to use Kerberos as provided by Y!. But if we do so, how will we
enable users to get data from the outside world onto HDFS? How do others
provide secure but easy interfaces to HDFS?

Kind regards,
Evert Lammerts



RE: interfaces to HDFS icw Kerberos

Posted by Evert Lammerts <Ev...@sara.nl>.
Hi Vinithra, others,

We are using CDH3b3 (which works amazingly well!!). And it's nice see Y!'s Kerberos solution coupled to HDFS. But I wouldn't use Hue to upload a set of files accumulating to 100's of GBs or a number of TB's. Browser-based applications are not suitable for that, in my experience. Do you have different experiences with Hue? (To be fair, we haven't tested its performance yet.)

We are setting up a cluster that will be shared by people from a number of different institutes, all working on different cases with different data. Their work and data should be protected, also from each other. At the same time they need to be able to transfer their data onto HDFS (with a high enough throughput) from their local clusters / machines. Is there a standard that others are using and that works for shared clusters? How are Y! people getting their data onto HDFS?

Right now we are using SFTP. We handle authentication a bit 'hacky', but it works: we've coupled our LDAP server to Hue through an Auth*Handler, which also allows for executing a script that updates authentication tokens for our FTP. So far the throughput is far from high enough though - 1.5 MB/s - with data going over the line unencrypted. Unless we can get that up significantly, while providing the option to encrypt the data on the wire, that will probably not be a long term solution.

If anybody can share experiences on transparently and securely getting data onto HDFS from external locations, that would be much appreciated!

Cheers,
Evert

________________________________________
From: Vinithra Varadharajan [vinithra@cloudera.com]
Sent: Friday, November 26, 2010 10:12 PM
To: general@hadoop.apache.org; Evert Lammerts
Cc: hdfs-user@hadoop.apache.org; Hue-Users
Subject: Re: interfaces to HDFS icw Kerberos

+ Hue-user mailing list

Hi Evert,

Which version of Hue and CDH are you using? CDH3b3 includes Yahoo's security patches, which provide Kerberos authentication. In CDH3b3, we have made changes to Hue's filebrowser application, which provides an interface to upload data into HDFS, so that it works with Hadoop's authentication. Is this similar to what you're looking for?

Thanks,
Vinithra

On Thu, Nov 25, 2010 at 11:16 AM, Evert Lammerts <Ev...@sara.nl>> wrote:
Hi list,

We're considering to provide our users with FTP and WebDAV interfaces (with
software provided here: http://www.hadoop.iponweb.net/). These both support
user accounts, so we'll be able to deal with authentication. We're
evaluating Cloudera's Hue, which we have coupled to our LDAP service for
authentication, as an interface to MapReduce.

These solutions are not the most beautiful in terms of authentication. We'd
much prefer to use Kerberos as provided by Y!. But if we do so, how will we
enable users to get data from the outside world onto HDFS? How do others
provide secure but easy interfaces to HDFS?

Kind regards,
Evert Lammerts



Re: interfaces to HDFS icw Kerberos

Posted by Vinithra Varadharajan <vi...@cloudera.com>.
+ Hue-user mailing list

Hi Evert,

Which version of Hue and CDH are you using? CDH3b3 includes Yahoo's security
patches, which provide Kerberos authentication. In CDH3b3, we have made
changes to Hue's filebrowser application, which provides an interface to
upload data into HDFS, so that it works with Hadoop's authentication. Is
this similar to what you're looking for?

Thanks,
Vinithra

On Thu, Nov 25, 2010 at 11:16 AM, Evert Lammerts <Ev...@sara.nl>wrote:

> Hi list,
>
> We're considering to provide our users with FTP and WebDAV interfaces (with
> software provided here: http://www.hadoop.iponweb.net/). These both
> support
> user accounts, so we'll be able to deal with authentication. We're
> evaluating Cloudera's Hue, which we have coupled to our LDAP service for
> authentication, as an interface to MapReduce.
>
> These solutions are not the most beautiful in terms of authentication. We'd
> much prefer to use Kerberos as provided by Y!. But if we do so, how will we
> enable users to get data from the outside world onto HDFS? How do others
> provide secure but easy interfaces to HDFS?
>
> Kind regards,
> Evert Lammerts
>
>

Re: interfaces to HDFS icw Kerberos

Posted by Venkatesh S <sv...@yahoo-inc.com>.
We use HDFS Proxy which is part of contrib. The new version yet to be
released has both R+W and supports SPNEGO (Kerberos over HTTP). Its been
working well for us.

Thanks,
Venkatesh


On 11/26/10 12:46 AM, "Evert Lammerts" <Ev...@sara.nl> wrote:

> Hi list,
> 
> We're considering to provide our users with FTP and WebDAV interfaces (with
> software provided here: http://www.hadoop.iponweb.net/). These both support
> user accounts, so we'll be able to deal with authentication. We're
> evaluating Cloudera's Hue, which we have coupled to our LDAP service for
> authentication, as an interface to MapReduce.
> 
> These solutions are not the most beautiful in terms of authentication. We'd
> much prefer to use Kerberos as provided by Y!. But if we do so, how will we
> enable users to get data from the outside world onto HDFS? How do others
> provide secure but easy interfaces to HDFS?
> 
> Kind regards,
> Evert Lammerts
>