You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by Benjamin Ross <br...@Lattice-Engines.com> on 2016/07/06 14:43:57 UTC

RE: Question regarding WebHDFS security

Hi Apache Knox devs,
I've been in contact with Larry McCay to figure out a reasonable solution for my use case.  I haven't gotten a chance to play around with Knox yet but in theory it will solve my problem - a three-phased upgrade should hopefully work.  My specific use case is described below.  Any ideas are welcome.  Thanks in advance.

Consider that we have:
1. A Hadoop cluster running without Kerberos
2. A number of services contacting that hadoop cluster and retrieving data from it using WebHDFS.

Now what happens when we enable Kerberos on the cluster?  We still need to allow those services to contact the cluster without credentials until we can upgrade them.  Otherwise we'll have downtime.  So what can we do?

As a possible solution, is there any way to allow unprotected access from just those machines until we can upgrade them?

Thanks,
Ben
As a proposed solution for a zero-downtime upgrade, it looks like a potential path forward is something like:

  1.  Stand up Apache Knox
  2.  Modify the webhdfs traffic to point to the proxy and provide credentials (that are ignored)
  3.  Kerberize the Hadoop cluster and modify proxy so that it provides credentials

Thanks in advance,
Ben


________________________________
From: Larry McCay [lmccay@hortonworks.com]
Sent: Wednesday, July 06, 2016 7:27 AM
To: Benjamin Ross
Cc: David Morel; user@hadoop.apache.org
Subject: Re: Question regarding WebHDFS security

Hi Ben -

It doesn’t really work exactly that way but will likely be able to handle your usecase.
I suggest that you bring the conversation over to the dev@ for Knox.

We can delve into the details of your usecase and your options there.

thanks,

—larry

On Jul 5, 2016, at 10:58 PM, Benjamin Ross <br...@Lattice-Engines.com>> wrote:

Thanks Larry.  I'll need to look into the details quite a bit further, but I take it that I can define some mapping such that requests for particular file paths will trigger particular credentials to be used (until everything's upgraded)?  Currently all requests come in using permissive auth with username yarn.  Once we enable Kerberos, I'd optimally like for that to translate to use some set of Kerberos credentials if the path is /foo and some other set of credentials if the path is /bar.  This will only be temporary until things are fully upgraded.

Appreciate the help.
Ben


________________________________
From: Larry McCay [lmccay@hortonworks.com<ma...@hortonworks.com>]
Sent: Tuesday, July 05, 2016 4:23 PM
To: Benjamin Ross
Cc: David Morel; user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Question regarding WebHDFS security

For consuming REST APIs like webhdfs, where kerberos is inconvenient or impossible, you may want to consider using a trusted proxy like Apache Knox.
It will authenticate as knox to the backend services and act on behalf of your custom services.
It will also allow you to authenticate to Knox from the services using a number of different mechanisms.

http://knox.apache.org<http://knox.apache.org/>

On Jul 5, 2016, at 2:43 PM, Benjamin Ross <br...@Lattice-Engines.com>> wrote:

Hey David,
Thanks.  Yep - that's the easy part.  Let me clarify.

Consider that we have:
1. A Hadoop cluster running without Kerberos
2. A number of services contacting that hadoop cluster and retrieving data from it using WebHDFS.

Clearly the services don't need to login to WebHDFS using credentials because the cluster isn't kerberized just yet.

Now what happens when we enable Kerberos on the cluster?  We still need to allow those services to contact the cluster without credentials until we can upgrade them.  Otherwise we'll have downtime.  So what can we do?

As a possible solution, is there any way to allow unprotected access from just those machines until we can upgrade them?

Thanks,
Ben





________________________________
From: David Morel [dmorel@amakuru.net<ma...@amakuru.net>]
Sent: Tuesday, July 05, 2016 2:33 PM
To: Benjamin Ross
Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Question regarding WebHDFS security


Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <br...@lattice-engines.com>> a écrit :
>
> All,
> We're planning the rollout of kerberizing our hadoop cluster.  The issue is that we have several single tenant services that rely on contacting the HDFS cluster over WebHDFS without credentials.  So, the concern is that once we kerberize the cluster, we will no longer be able to access it without credentials from these single-tenant systems, which results in a painful upgrade dependency.
>
> Any suggestions for dealing with this problem in a simple way?
>
> If not, any suggestion for a better forum to ask this question?
>
> Thanks in advance,
> Ben

It's usually not super-hard to wrap your http calls with a module that handles Kerberos, depending on what language you use. For instance https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.

David



Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report this email as spam.



This message has been scanned for malware by Websense. www.websense.com<http://www.websense.com/>


Re: Question regarding WebHDFS security

Posted by larry mccay <lm...@apache.org>.
Great advice from Kris and Balaji.

Depending on your clients, I don't know that I would require them to use
kerberos.

The combination of a secure cluster and  trusted proxy like Knox provides a
really flexible and client friendly integration. This is especially true
for clients that don't expect kerberos like REST clients.

Please feel free to share more of your use case and available credentials
and we can point you toward appropriate configurations.
On Jul 6, 2016 6:52 PM, "Balaji Ganesan" <ba...@gmail.com> wrote:

> Agree with Kris. Better off to start with Knox proxy auth, and then enable
> Kerberos within the cluster. Enabling Kerberos would still involve
> restarting of the services such as HDFS and Hive. I am not sure if there is
> a way to enable Kerberos without downtime.
>
> On Wed, Jul 6, 2016 at 5:22 PM, Kristopher Kane <kristopher.kane@gmail.com
> >
> wrote:
>
> > Hi Ben, rather. :-)
> >
> > On Wed, Jul 6, 2016 at 6:21 PM, Kristopher Kane <
> kristopher.kane@gmail.com
> > >
> > wrote:
> >
> > > Hi David,
> > >
> > > Looks like you are on the right track but you may have a hard time
> > turning
> > > off Knox auth while the cluster without kerberos - at least I have
> never
> > > done this.  Might be best to assume Knox authentication from the start
> > and
> > > then you don't have to worry about it once the cluster is Kerberized.
> ->
> > > This is the approach I would go with.
> > >
> > > Depending on your WebHDFS usage you might consider multiple Knox
> > instances
> > > behind a load balancer of your choice - like Apache httpd or HAProxy.
> > > Remember that your WebHDFS useage was a data transfer from the DataNode
> > > direct and Knox will become a funnel for that same data transfer which
> > > might require a fan out of Knox instances for load distribution.
> > >
> > > Kris
> > >
> > >
> > >
> > >
> > > On Wed, Jul 6, 2016 at 10:43 AM, Benjamin Ross <
> > bross@lattice-engines.com>
> > > wrote:
> > >
> > >> Hi Apache Knox devs,
> > >> I've been in contact with Larry McCay to figure out a reasonable
> > solution
> > >> for my use case.  I haven't gotten a chance to play around with Knox
> yet
> > >> but in theory it will solve my problem - a three-phased upgrade should
> > >> hopefully work.  My specific use case is described below.  Any ideas
> are
> > >> welcome.  Thanks in advance.
> > >>
> > >> Consider that we have:
> > >> 1. A Hadoop cluster running without Kerberos
> > >> 2. A number of services contacting that hadoop cluster and retrieving
> > >> data from it using WebHDFS.
> > >>
> > >> Now what happens when we enable Kerberos on the cluster?  We still
> need
> > >> to allow those services to contact the cluster without credentials
> > until we
> > >> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
> > >>
> > >> As a possible solution, is there any way to allow unprotected access
> > from
> > >> just those machines until we can upgrade them?
> > >>
> > >> Thanks,
> > >> Ben
> > >> As a proposed solution for a zero-downtime upgrade, it looks like a
> > >> potential path forward is something like:
> > >>
> > >>   1.  Stand up Apache Knox
> > >>   2.  Modify the webhdfs traffic to point to the proxy and provide
> > >> credentials (that are ignored)
> > >>   3.  Kerberize the Hadoop cluster and modify proxy so that it
> provides
> > >> credentials
> > >>
> > >> Thanks in advance,
> > >> Ben
> > >>
> > >>
> > >> ________________________________
> > >> From: Larry McCay [lmccay@hortonworks.com]
> > >> Sent: Wednesday, July 06, 2016 7:27 AM
> > >> To: Benjamin Ross
> > >> Cc: David Morel; user@hadoop.apache.org
> > >> Subject: Re: Question regarding WebHDFS security
> > >>
> > >> Hi Ben -
> > >>
> > >> It doesn’t really work exactly that way but will likely be able to
> > handle
> > >> your usecase.
> > >> I suggest that you bring the conversation over to the dev@ for Knox.
> > >>
> > >> We can delve into the details of your usecase and your options there.
> > >>
> > >> thanks,
> > >>
> > >> —larry
> > >>
> > >> On Jul 5, 2016, at 10:58 PM, Benjamin Ross <bross@Lattice-Engines.com
> > >> <ma...@Lattice-Engines.com>> wrote:
> > >>
> > >> Thanks Larry.  I'll need to look into the details quite a bit further,
> > >> but I take it that I can define some mapping such that requests for
> > >> particular file paths will trigger particular credentials to be used
> > (until
> > >> everything's upgraded)?  Currently all requests come in using
> permissive
> > >> auth with username yarn.  Once we enable Kerberos, I'd optimally like
> > for
> > >> that to translate to use some set of Kerberos credentials if the path
> is
> > >> /foo and some other set of credentials if the path is /bar.  This will
> > only
> > >> be temporary until things are fully upgraded.
> > >>
> > >> Appreciate the help.
> > >> Ben
> > >>
> > >>
> > >> ________________________________
> > >> From: Larry McCay [lmccay@hortonworks.com<mailto:
> lmccay@hortonworks.com
> > >]
> > >> Sent: Tuesday, July 05, 2016 4:23 PM
> > >> To: Benjamin Ross
> > >> Cc: David Morel; user@hadoop.apache.org<mailto:user@hadoop.apache.org
> >
> > >> Subject: Re: Question regarding WebHDFS security
> > >>
> > >> For consuming REST APIs like webhdfs, where kerberos is inconvenient
> or
> > >> impossible, you may want to consider using a trusted proxy like Apache
> > Knox.
> > >> It will authenticate as knox to the backend services and act on behalf
> > of
> > >> your custom services.
> > >> It will also allow you to authenticate to Knox from the services
> using a
> > >> number of different mechanisms.
> > >>
> > >> http://knox.apache.org<http://knox.apache.org/>
> > >>
> > >> On Jul 5, 2016, at 2:43 PM, Benjamin Ross <bross@Lattice-Engines.com
> > >> <ma...@Lattice-Engines.com>> wrote:
> > >>
> > >> Hey David,
> > >> Thanks.  Yep - that's the easy part.  Let me clarify.
> > >>
> > >> Consider that we have:
> > >> 1. A Hadoop cluster running without Kerberos
> > >> 2. A number of services contacting that hadoop cluster and retrieving
> > >> data from it using WebHDFS.
> > >>
> > >> Clearly the services don't need to login to WebHDFS using credentials
> > >> because the cluster isn't kerberized just yet.
> > >>
> > >> Now what happens when we enable Kerberos on the cluster?  We still
> need
> > >> to allow those services to contact the cluster without credentials
> > until we
> > >> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
> > >>
> > >> As a possible solution, is there any way to allow unprotected access
> > from
> > >> just those machines until we can upgrade them?
> > >>
> > >> Thanks,
> > >> Ben
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> ________________________________
> > >> From: David Morel [dmorel@amakuru.net<ma...@amakuru.net>]
> > >> Sent: Tuesday, July 05, 2016 2:33 PM
> > >> To: Benjamin Ross
> > >> Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>
> > >> Subject: Re: Question regarding WebHDFS security
> > >>
> > >>
> > >> Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <bross@lattice-engines.com
> > >> <ma...@lattice-engines.com>> a écrit :
> > >> >
> > >> > All,
> > >> > We're planning the rollout of kerberizing our hadoop cluster.  The
> > >> issue is that we have several single tenant services that rely on
> > >> contacting the HDFS cluster over WebHDFS without credentials.  So, the
> > >> concern is that once we kerberize the cluster, we will no longer be
> > able to
> > >> access it without credentials from these single-tenant systems, which
> > >> results in a painful upgrade dependency.
> > >> >
> > >> > Any suggestions for dealing with this problem in a simple way?
> > >> >
> > >> > If not, any suggestion for a better forum to ask this question?
> > >> >
> > >> > Thanks in advance,
> > >> > Ben
> > >>
> > >> It's usually not super-hard to wrap your http calls with a module that
> > >> handles Kerberos, depending on what language you use. For instance
> > >> https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.
> > >>
> > >> David
> > >>
> > >>
> > >>
> > >> Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==>
> to
> > >> report this email as spam.
> > >>
> > >>
> > >>
> > >> This message has been scanned for malware by Websense.
> www.websense.com
> > <
> > >> http://www.websense.com/>
> > >>
> > >>
> > >
> >
>

Re: Question regarding WebHDFS security

Posted by Balaji Ganesan <ba...@gmail.com>.
Agree with Kris. Better off to start with Knox proxy auth, and then enable
Kerberos within the cluster. Enabling Kerberos would still involve
restarting of the services such as HDFS and Hive. I am not sure if there is
a way to enable Kerberos without downtime.

On Wed, Jul 6, 2016 at 5:22 PM, Kristopher Kane <kr...@gmail.com>
wrote:

> Hi Ben, rather. :-)
>
> On Wed, Jul 6, 2016 at 6:21 PM, Kristopher Kane <kristopher.kane@gmail.com
> >
> wrote:
>
> > Hi David,
> >
> > Looks like you are on the right track but you may have a hard time
> turning
> > off Knox auth while the cluster without kerberos - at least I have never
> > done this.  Might be best to assume Knox authentication from the start
> and
> > then you don't have to worry about it once the cluster is Kerberized. ->
> > This is the approach I would go with.
> >
> > Depending on your WebHDFS usage you might consider multiple Knox
> instances
> > behind a load balancer of your choice - like Apache httpd or HAProxy.
> > Remember that your WebHDFS useage was a data transfer from the DataNode
> > direct and Knox will become a funnel for that same data transfer which
> > might require a fan out of Knox instances for load distribution.
> >
> > Kris
> >
> >
> >
> >
> > On Wed, Jul 6, 2016 at 10:43 AM, Benjamin Ross <
> bross@lattice-engines.com>
> > wrote:
> >
> >> Hi Apache Knox devs,
> >> I've been in contact with Larry McCay to figure out a reasonable
> solution
> >> for my use case.  I haven't gotten a chance to play around with Knox yet
> >> but in theory it will solve my problem - a three-phased upgrade should
> >> hopefully work.  My specific use case is described below.  Any ideas are
> >> welcome.  Thanks in advance.
> >>
> >> Consider that we have:
> >> 1. A Hadoop cluster running without Kerberos
> >> 2. A number of services contacting that hadoop cluster and retrieving
> >> data from it using WebHDFS.
> >>
> >> Now what happens when we enable Kerberos on the cluster?  We still need
> >> to allow those services to contact the cluster without credentials
> until we
> >> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
> >>
> >> As a possible solution, is there any way to allow unprotected access
> from
> >> just those machines until we can upgrade them?
> >>
> >> Thanks,
> >> Ben
> >> As a proposed solution for a zero-downtime upgrade, it looks like a
> >> potential path forward is something like:
> >>
> >>   1.  Stand up Apache Knox
> >>   2.  Modify the webhdfs traffic to point to the proxy and provide
> >> credentials (that are ignored)
> >>   3.  Kerberize the Hadoop cluster and modify proxy so that it provides
> >> credentials
> >>
> >> Thanks in advance,
> >> Ben
> >>
> >>
> >> ________________________________
> >> From: Larry McCay [lmccay@hortonworks.com]
> >> Sent: Wednesday, July 06, 2016 7:27 AM
> >> To: Benjamin Ross
> >> Cc: David Morel; user@hadoop.apache.org
> >> Subject: Re: Question regarding WebHDFS security
> >>
> >> Hi Ben -
> >>
> >> It doesn’t really work exactly that way but will likely be able to
> handle
> >> your usecase.
> >> I suggest that you bring the conversation over to the dev@ for Knox.
> >>
> >> We can delve into the details of your usecase and your options there.
> >>
> >> thanks,
> >>
> >> —larry
> >>
> >> On Jul 5, 2016, at 10:58 PM, Benjamin Ross <bross@Lattice-Engines.com
> >> <ma...@Lattice-Engines.com>> wrote:
> >>
> >> Thanks Larry.  I'll need to look into the details quite a bit further,
> >> but I take it that I can define some mapping such that requests for
> >> particular file paths will trigger particular credentials to be used
> (until
> >> everything's upgraded)?  Currently all requests come in using permissive
> >> auth with username yarn.  Once we enable Kerberos, I'd optimally like
> for
> >> that to translate to use some set of Kerberos credentials if the path is
> >> /foo and some other set of credentials if the path is /bar.  This will
> only
> >> be temporary until things are fully upgraded.
> >>
> >> Appreciate the help.
> >> Ben
> >>
> >>
> >> ________________________________
> >> From: Larry McCay [lmccay@hortonworks.com<mailto:lmccay@hortonworks.com
> >]
> >> Sent: Tuesday, July 05, 2016 4:23 PM
> >> To: Benjamin Ross
> >> Cc: David Morel; user@hadoop.apache.org<ma...@hadoop.apache.org>
> >> Subject: Re: Question regarding WebHDFS security
> >>
> >> For consuming REST APIs like webhdfs, where kerberos is inconvenient or
> >> impossible, you may want to consider using a trusted proxy like Apache
> Knox.
> >> It will authenticate as knox to the backend services and act on behalf
> of
> >> your custom services.
> >> It will also allow you to authenticate to Knox from the services using a
> >> number of different mechanisms.
> >>
> >> http://knox.apache.org<http://knox.apache.org/>
> >>
> >> On Jul 5, 2016, at 2:43 PM, Benjamin Ross <bross@Lattice-Engines.com
> >> <ma...@Lattice-Engines.com>> wrote:
> >>
> >> Hey David,
> >> Thanks.  Yep - that's the easy part.  Let me clarify.
> >>
> >> Consider that we have:
> >> 1. A Hadoop cluster running without Kerberos
> >> 2. A number of services contacting that hadoop cluster and retrieving
> >> data from it using WebHDFS.
> >>
> >> Clearly the services don't need to login to WebHDFS using credentials
> >> because the cluster isn't kerberized just yet.
> >>
> >> Now what happens when we enable Kerberos on the cluster?  We still need
> >> to allow those services to contact the cluster without credentials
> until we
> >> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
> >>
> >> As a possible solution, is there any way to allow unprotected access
> from
> >> just those machines until we can upgrade them?
> >>
> >> Thanks,
> >> Ben
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: David Morel [dmorel@amakuru.net<ma...@amakuru.net>]
> >> Sent: Tuesday, July 05, 2016 2:33 PM
> >> To: Benjamin Ross
> >> Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>
> >> Subject: Re: Question regarding WebHDFS security
> >>
> >>
> >> Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <bross@lattice-engines.com
> >> <ma...@lattice-engines.com>> a écrit :
> >> >
> >> > All,
> >> > We're planning the rollout of kerberizing our hadoop cluster.  The
> >> issue is that we have several single tenant services that rely on
> >> contacting the HDFS cluster over WebHDFS without credentials.  So, the
> >> concern is that once we kerberize the cluster, we will no longer be
> able to
> >> access it without credentials from these single-tenant systems, which
> >> results in a painful upgrade dependency.
> >> >
> >> > Any suggestions for dealing with this problem in a simple way?
> >> >
> >> > If not, any suggestion for a better forum to ask this question?
> >> >
> >> > Thanks in advance,
> >> > Ben
> >>
> >> It's usually not super-hard to wrap your http calls with a module that
> >> handles Kerberos, depending on what language you use. For instance
> >> https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.
> >>
> >> David
> >>
> >>
> >>
> >> Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
> >> report this email as spam.
> >>
> >>
> >>
> >> This message has been scanned for malware by Websense. www.websense.com
> <
> >> http://www.websense.com/>
> >>
> >>
> >
>

Re: Question regarding WebHDFS security

Posted by Kristopher Kane <kr...@gmail.com>.
Hi Ben, rather. :-)

On Wed, Jul 6, 2016 at 6:21 PM, Kristopher Kane <kr...@gmail.com>
wrote:

> Hi David,
>
> Looks like you are on the right track but you may have a hard time turning
> off Knox auth while the cluster without kerberos - at least I have never
> done this.  Might be best to assume Knox authentication from the start and
> then you don't have to worry about it once the cluster is Kerberized. ->
> This is the approach I would go with.
>
> Depending on your WebHDFS usage you might consider multiple Knox instances
> behind a load balancer of your choice - like Apache httpd or HAProxy.
> Remember that your WebHDFS useage was a data transfer from the DataNode
> direct and Knox will become a funnel for that same data transfer which
> might require a fan out of Knox instances for load distribution.
>
> Kris
>
>
>
>
> On Wed, Jul 6, 2016 at 10:43 AM, Benjamin Ross <br...@lattice-engines.com>
> wrote:
>
>> Hi Apache Knox devs,
>> I've been in contact with Larry McCay to figure out a reasonable solution
>> for my use case.  I haven't gotten a chance to play around with Knox yet
>> but in theory it will solve my problem - a three-phased upgrade should
>> hopefully work.  My specific use case is described below.  Any ideas are
>> welcome.  Thanks in advance.
>>
>> Consider that we have:
>> 1. A Hadoop cluster running without Kerberos
>> 2. A number of services contacting that hadoop cluster and retrieving
>> data from it using WebHDFS.
>>
>> Now what happens when we enable Kerberos on the cluster?  We still need
>> to allow those services to contact the cluster without credentials until we
>> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
>>
>> As a possible solution, is there any way to allow unprotected access from
>> just those machines until we can upgrade them?
>>
>> Thanks,
>> Ben
>> As a proposed solution for a zero-downtime upgrade, it looks like a
>> potential path forward is something like:
>>
>>   1.  Stand up Apache Knox
>>   2.  Modify the webhdfs traffic to point to the proxy and provide
>> credentials (that are ignored)
>>   3.  Kerberize the Hadoop cluster and modify proxy so that it provides
>> credentials
>>
>> Thanks in advance,
>> Ben
>>
>>
>> ________________________________
>> From: Larry McCay [lmccay@hortonworks.com]
>> Sent: Wednesday, July 06, 2016 7:27 AM
>> To: Benjamin Ross
>> Cc: David Morel; user@hadoop.apache.org
>> Subject: Re: Question regarding WebHDFS security
>>
>> Hi Ben -
>>
>> It doesn’t really work exactly that way but will likely be able to handle
>> your usecase.
>> I suggest that you bring the conversation over to the dev@ for Knox.
>>
>> We can delve into the details of your usecase and your options there.
>>
>> thanks,
>>
>> —larry
>>
>> On Jul 5, 2016, at 10:58 PM, Benjamin Ross <bross@Lattice-Engines.com
>> <ma...@Lattice-Engines.com>> wrote:
>>
>> Thanks Larry.  I'll need to look into the details quite a bit further,
>> but I take it that I can define some mapping such that requests for
>> particular file paths will trigger particular credentials to be used (until
>> everything's upgraded)?  Currently all requests come in using permissive
>> auth with username yarn.  Once we enable Kerberos, I'd optimally like for
>> that to translate to use some set of Kerberos credentials if the path is
>> /foo and some other set of credentials if the path is /bar.  This will only
>> be temporary until things are fully upgraded.
>>
>> Appreciate the help.
>> Ben
>>
>>
>> ________________________________
>> From: Larry McCay [lmccay@hortonworks.com<ma...@hortonworks.com>]
>> Sent: Tuesday, July 05, 2016 4:23 PM
>> To: Benjamin Ross
>> Cc: David Morel; user@hadoop.apache.org<ma...@hadoop.apache.org>
>> Subject: Re: Question regarding WebHDFS security
>>
>> For consuming REST APIs like webhdfs, where kerberos is inconvenient or
>> impossible, you may want to consider using a trusted proxy like Apache Knox.
>> It will authenticate as knox to the backend services and act on behalf of
>> your custom services.
>> It will also allow you to authenticate to Knox from the services using a
>> number of different mechanisms.
>>
>> http://knox.apache.org<http://knox.apache.org/>
>>
>> On Jul 5, 2016, at 2:43 PM, Benjamin Ross <bross@Lattice-Engines.com
>> <ma...@Lattice-Engines.com>> wrote:
>>
>> Hey David,
>> Thanks.  Yep - that's the easy part.  Let me clarify.
>>
>> Consider that we have:
>> 1. A Hadoop cluster running without Kerberos
>> 2. A number of services contacting that hadoop cluster and retrieving
>> data from it using WebHDFS.
>>
>> Clearly the services don't need to login to WebHDFS using credentials
>> because the cluster isn't kerberized just yet.
>>
>> Now what happens when we enable Kerberos on the cluster?  We still need
>> to allow those services to contact the cluster without credentials until we
>> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
>>
>> As a possible solution, is there any way to allow unprotected access from
>> just those machines until we can upgrade them?
>>
>> Thanks,
>> Ben
>>
>>
>>
>>
>>
>> ________________________________
>> From: David Morel [dmorel@amakuru.net<ma...@amakuru.net>]
>> Sent: Tuesday, July 05, 2016 2:33 PM
>> To: Benjamin Ross
>> Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>
>> Subject: Re: Question regarding WebHDFS security
>>
>>
>> Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <bross@lattice-engines.com
>> <ma...@lattice-engines.com>> a écrit :
>> >
>> > All,
>> > We're planning the rollout of kerberizing our hadoop cluster.  The
>> issue is that we have several single tenant services that rely on
>> contacting the HDFS cluster over WebHDFS without credentials.  So, the
>> concern is that once we kerberize the cluster, we will no longer be able to
>> access it without credentials from these single-tenant systems, which
>> results in a painful upgrade dependency.
>> >
>> > Any suggestions for dealing with this problem in a simple way?
>> >
>> > If not, any suggestion for a better forum to ask this question?
>> >
>> > Thanks in advance,
>> > Ben
>>
>> It's usually not super-hard to wrap your http calls with a module that
>> handles Kerberos, depending on what language you use. For instance
>> https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.
>>
>> David
>>
>>
>>
>> Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
>> report this email as spam.
>>
>>
>>
>> This message has been scanned for malware by Websense. www.websense.com<
>> http://www.websense.com/>
>>
>>
>

Re: Question regarding WebHDFS security

Posted by Kristopher Kane <kr...@gmail.com>.
Hi David,

Looks like you are on the right track but you may have a hard time turning
off Knox auth while the cluster without kerberos - at least I have never
done this.  Might be best to assume Knox authentication from the start and
then you don't have to worry about it once the cluster is Kerberized. ->
This is the approach I would go with.

Depending on your WebHDFS usage you might consider multiple Knox instances
behind a load balancer of your choice - like Apache httpd or HAProxy.
Remember that your WebHDFS useage was a data transfer from the DataNode
direct and Knox will become a funnel for that same data transfer which
might require a fan out of Knox instances for load distribution.

Kris




On Wed, Jul 6, 2016 at 10:43 AM, Benjamin Ross <br...@lattice-engines.com>
wrote:

> Hi Apache Knox devs,
> I've been in contact with Larry McCay to figure out a reasonable solution
> for my use case.  I haven't gotten a chance to play around with Knox yet
> but in theory it will solve my problem - a three-phased upgrade should
> hopefully work.  My specific use case is described below.  Any ideas are
> welcome.  Thanks in advance.
>
> Consider that we have:
> 1. A Hadoop cluster running without Kerberos
> 2. A number of services contacting that hadoop cluster and retrieving data
> from it using WebHDFS.
>
> Now what happens when we enable Kerberos on the cluster?  We still need to
> allow those services to contact the cluster without credentials until we
> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
>
> As a possible solution, is there any way to allow unprotected access from
> just those machines until we can upgrade them?
>
> Thanks,
> Ben
> As a proposed solution for a zero-downtime upgrade, it looks like a
> potential path forward is something like:
>
>   1.  Stand up Apache Knox
>   2.  Modify the webhdfs traffic to point to the proxy and provide
> credentials (that are ignored)
>   3.  Kerberize the Hadoop cluster and modify proxy so that it provides
> credentials
>
> Thanks in advance,
> Ben
>
>
> ________________________________
> From: Larry McCay [lmccay@hortonworks.com]
> Sent: Wednesday, July 06, 2016 7:27 AM
> To: Benjamin Ross
> Cc: David Morel; user@hadoop.apache.org
> Subject: Re: Question regarding WebHDFS security
>
> Hi Ben -
>
> It doesn’t really work exactly that way but will likely be able to handle
> your usecase.
> I suggest that you bring the conversation over to the dev@ for Knox.
>
> We can delve into the details of your usecase and your options there.
>
> thanks,
>
> —larry
>
> On Jul 5, 2016, at 10:58 PM, Benjamin Ross <bross@Lattice-Engines.com
> <ma...@Lattice-Engines.com>> wrote:
>
> Thanks Larry.  I'll need to look into the details quite a bit further, but
> I take it that I can define some mapping such that requests for particular
> file paths will trigger particular credentials to be used (until
> everything's upgraded)?  Currently all requests come in using permissive
> auth with username yarn.  Once we enable Kerberos, I'd optimally like for
> that to translate to use some set of Kerberos credentials if the path is
> /foo and some other set of credentials if the path is /bar.  This will only
> be temporary until things are fully upgraded.
>
> Appreciate the help.
> Ben
>
>
> ________________________________
> From: Larry McCay [lmccay@hortonworks.com<ma...@hortonworks.com>]
> Sent: Tuesday, July 05, 2016 4:23 PM
> To: Benjamin Ross
> Cc: David Morel; user@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: Re: Question regarding WebHDFS security
>
> For consuming REST APIs like webhdfs, where kerberos is inconvenient or
> impossible, you may want to consider using a trusted proxy like Apache Knox.
> It will authenticate as knox to the backend services and act on behalf of
> your custom services.
> It will also allow you to authenticate to Knox from the services using a
> number of different mechanisms.
>
> http://knox.apache.org<http://knox.apache.org/>
>
> On Jul 5, 2016, at 2:43 PM, Benjamin Ross <bross@Lattice-Engines.com
> <ma...@Lattice-Engines.com>> wrote:
>
> Hey David,
> Thanks.  Yep - that's the easy part.  Let me clarify.
>
> Consider that we have:
> 1. A Hadoop cluster running without Kerberos
> 2. A number of services contacting that hadoop cluster and retrieving data
> from it using WebHDFS.
>
> Clearly the services don't need to login to WebHDFS using credentials
> because the cluster isn't kerberized just yet.
>
> Now what happens when we enable Kerberos on the cluster?  We still need to
> allow those services to contact the cluster without credentials until we
> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
>
> As a possible solution, is there any way to allow unprotected access from
> just those machines until we can upgrade them?
>
> Thanks,
> Ben
>
>
>
>
>
> ________________________________
> From: David Morel [dmorel@amakuru.net<ma...@amakuru.net>]
> Sent: Tuesday, July 05, 2016 2:33 PM
> To: Benjamin Ross
> Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: Re: Question regarding WebHDFS security
>
>
> Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <bross@lattice-engines.com
> <ma...@lattice-engines.com>> a écrit :
> >
> > All,
> > We're planning the rollout of kerberizing our hadoop cluster.  The issue
> is that we have several single tenant services that rely on contacting the
> HDFS cluster over WebHDFS without credentials.  So, the concern is that
> once we kerberize the cluster, we will no longer be able to access it
> without credentials from these single-tenant systems, which results in a
> painful upgrade dependency.
> >
> > Any suggestions for dealing with this problem in a simple way?
> >
> > If not, any suggestion for a better forum to ask this question?
> >
> > Thanks in advance,
> > Ben
>
> It's usually not super-hard to wrap your http calls with a module that
> handles Kerberos, depending on what language you use. For instance
> https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.
>
> David
>
>
>
> Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
> report this email as spam.
>
>
>
> This message has been scanned for malware by Websense. www.websense.com<
> http://www.websense.com/>
>
>