You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Julien Phalip <jp...@gmail.com> on 2019/09/20 16:37:01 UTC

Delegation tokens for HDFS

Hi,

My understanding is that the most common (perhaps the only?) way to let
users run Hive queries on datasets stored in HDFS, is to configure Hive as
a proxy user in the namenodes config.

I'm wondering if, instead of using proxy user privileges, a Hive client
could be configured to first collect HDFS delegation tokens for the user
and then pass those tokens to the Hive server. That way, the Hive server
would use the tokens to authenticate with HDFS on behalf of the user.

Spark offers something similar to that with the
spark.yarn.access.hadoopFileSystems
<https://spark.apache.org/docs/latest/running-on-yarn.html#kerberos>
property. By chance, is there a way to achieve the same thing for Hive when
using a client like Beeline?

Thank you,

Julien

Re: Delegation tokens for HDFS

Posted by Julien Phalip <jp...@gmail.com>.
Hi, thanks for your reply.

Regarding your statement:

> If you aren't using Hive Server 2, the user acquires tokens before the
query gets submitted to Yarn.

So is it right to say that Beeline doesn't support this pattern, i.e.
collecting HDFS delegation tokens before submitting the job? Do you know
which other clients/services can support this?

Also, do you know if HDFS delegation tokens can be obtained for Hive when
using Tez instead of Yarn?

Thank you,

Julien

On Sat, Sep 21, 2019 at 12:52 AM Owen O'Malley <ow...@gmail.com>
wrote:

> If you are using Hive Server 2 through jdbc:
>
>    - The most common way is to have the data only accessible to the
>    'hive' user. Since the users don't have access to the underlying HDFS
>    files, Hive can enforce column/row permissions.
>    - The other option is to use doAs and run as the user. That requires
>    giving the 'hive' user proxy privileges.
>
> If you aren't using Hive Server 2, the user acquires tokens before the
> query gets submitted to Yarn.
>
> There are trade offs in each of the models.
>
> .. Owen
>
> On Fri, Sep 20, 2019 at 9:37 AM Julien Phalip <jp...@gmail.com> wrote:
>
>> Hi,
>>
>> My understanding is that the most common (perhaps the only?) way to let
>> users run Hive queries on datasets stored in HDFS, is to configure Hive as
>> a proxy user in the namenodes config.
>>
>> I'm wondering if, instead of using proxy user privileges, a Hive client
>> could be configured to first collect HDFS delegation tokens for the user
>> and then pass those tokens to the Hive server. That way, the Hive server
>> would use the tokens to authenticate with HDFS on behalf of the user.
>>
>> Spark offers something similar to that with the
>> spark.yarn.access.hadoopFileSystems
>> <https://spark.apache.org/docs/latest/running-on-yarn.html#kerberos>
>> property. By chance, is there a way to achieve the same thing for Hive when
>> using a client like Beeline?
>>
>> Thank you,
>>
>> Julien
>>
>

Re: Delegation tokens for HDFS

Posted by Owen O'Malley <ow...@gmail.com>.
If you are using Hive Server 2 through jdbc:

   - The most common way is to have the data only accessible to the 'hive'
   user. Since the users don't have access to the underlying HDFS files, Hive
   can enforce column/row permissions.
   - The other option is to use doAs and run as the user. That requires
   giving the 'hive' user proxy privileges.

If you aren't using Hive Server 2, the user acquires tokens before the
query gets submitted to Yarn.

There are trade offs in each of the models.

.. Owen

On Fri, Sep 20, 2019 at 9:37 AM Julien Phalip <jp...@gmail.com> wrote:

> Hi,
>
> My understanding is that the most common (perhaps the only?) way to let
> users run Hive queries on datasets stored in HDFS, is to configure Hive as
> a proxy user in the namenodes config.
>
> I'm wondering if, instead of using proxy user privileges, a Hive client
> could be configured to first collect HDFS delegation tokens for the user
> and then pass those tokens to the Hive server. That way, the Hive server
> would use the tokens to authenticate with HDFS on behalf of the user.
>
> Spark offers something similar to that with the
> spark.yarn.access.hadoopFileSystems
> <https://spark.apache.org/docs/latest/running-on-yarn.html#kerberos>
> property. By chance, is there a way to achieve the same thing for Hive when
> using a client like Beeline?
>
> Thank you,
>
> Julien
>