You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Aaron T. Myers (JIRA)" <ji...@apache.org> on 2010/10/05 01:28:33 UTC

[jira] Created: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Add support for reading multiple hadoop delegation token files
--------------------------------------------------------------

                 Key: HADOOP-6988
                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
             Project: Hadoop Common
          Issue Type: Improvement
          Components: security
    Affects Versions: 0.22.0
            Reporter: Aaron T. Myers
            Assignee: Aaron T. Myers


It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918379#action_12918379 ] 

Alejandro Abdelnur commented on HADOOP-6988:
--------------------------------------------

PLEASE, DO NOT REMOVE THIS VARIABLE:

Oozie relies on this variable to be able to dispatch mapreduce/streaming/pipes/pig/hive/sqoop actions. Oozie uses a launcher MR job to start all those action types. This launcher MR job uses the variable to propagate the delegation token to the code that launches the corresponding action.

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918053#action_12918053 ] 

Aaron T. Myers commented on HADOOP-6988:
----------------------------------------

Thanks for the comments, Allen.

The environment variable already exists in hadoop trunk; it's just only capable of specifying a single path to a token file. I'd like to be able to specify multiple token files via this variable. This is really only for convenience, as it's entirely possible to stuff multiple delegation token objects into a single credentials object, which is then serialized to a file. I considered creating a tool which would be capable of merging multiple delegation token files into one, but this seemed like a cleaner solution.

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919078#action_12919078 ] 

Owen O'Malley commented on HADOOP-6988:
---------------------------------------

The intent of the environment variable is *not* for job submission. I really don't see any value in making it multi-valued.

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-6988.0.txt, hadoop-6988.1.txt
>
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917891#action_12917891 ] 

Allen Wittenauer commented on HADOOP-6988:
------------------------------------------

This seems like a great way to inject tokens into a process that it shouldn't have.

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918419#action_12918419 ] 

Allen Wittenauer commented on HADOOP-6988:
------------------------------------------

I'm sorry, but it is completely short sighted to have a single use env var like this.  If we need to modify the tasks environment for something else, are we going to introduce another environment variable?  How many are too many?



> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-6988.0.txt, hadoop-6988.1.txt
>
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron T. Myers updated HADOOP-6988:
-----------------------------------

    Attachment: hadoop-6988.1.txt

Same patch, this time with -p0.

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-6988.0.txt, hadoop-6988.1.txt
>
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918145#action_12918145 ] 

Owen O'Malley commented on HADOOP-6988:
---------------------------------------

The environment variable should *not* be multi-valued. It is used to communicate the job's token store to sub-processes of the task. Since a task can't be in more than one job, there isn't any need.

What is the use case for having multiple token files? The rest of the lists use commas, so this should be the same. Wouldn't it be easier to write a tool that allows you to combine multiple token files together into a single one?

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919181#action_12919181 ] 

Devaraj Das commented on HADOOP-6988:
-------------------------------------

Although it is true that HADOOP_TOKEN_FILE_LOCATION can be used to make normal hdfs commands work, the intent for having this was to support security for Map/Reduce tasks, and, hadoop streaming apps that internally invoke command-line hdfs operations (as Owen had pointed out earlier). If you want to pass multiple tokens during job submission, the preferred approach would be to write the tokens into a file (using the Credentials class's utilities), and then point mapreduce.job.credentials.binary to that file. 
Thinking about it, wouldn't the option of defining mapreduce.job.hdfs-servers in the job configuration work for you. The JobClient will automatically get delegation tokens from those namenodes and all tasks of the job can use those tokens..

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-6988.0.txt, hadoop-6988.1.txt
>
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918648#action_12918648 ] 

Eli Collins commented on HADOOP-6988:
-------------------------------------

Being able to specify both MR and HDFS delegation token files upfront when submitting a job seems reasonable, avoids requiring all clients kinit or use a tool to merge token files.   Env variable philosophy aside (Allen, want to open a jira with an alternative?) does anyone object to modifying this existing variable so more than one file can be specified?


> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-6988.0.txt, hadoop-6988.1.txt
>
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919082#action_12919082 ] 

Aaron T. Myers commented on HADOOP-6988:
----------------------------------------

Even if it's not intended for job submission, it's documented as being the preferred way to pass token files to bin/hadoop for the purpose of running HDFS commands.

>From the HDFS user guide:

{quote}
The HDFS fetchdt command is not a Hadoop shell command. It can be run as 'bin/hadoop fetchdt DTfile'. After you got the token you can run an HDFS command without having Kerberos tickets, by pointing HADOOP_TOKEN_FILE_LOCATION environmental variable to the delegation token file.
{quote}

Thus, this change could be useful for any HDFS command which is capable of communicating with multiple distinct NNs.

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-6988.0.txt, hadoop-6988.1.txt
>
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918073#action_12918073 ] 

Allen Wittenauer commented on HADOOP-6988:
------------------------------------------

Grr.  I really wish we'd stop creating pet environment variables.  This is ridiculous.

Can we remove this env var as part of this JIRA?  What takes precendence the env var or the mapreduce.job.credentials.binary jobconf setting?  What is the interaction?  If the answer is "we have to look at the code" then we've failed.

It makes much more sense to have mapreduce.job.credentials.binary to support a comma delimited set (to be consistent with the rest of the job conf.  Never mind that colon is the traditional directory delimiter on OS X.)

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron T. Myers updated HADOOP-6988:
-----------------------------------

    Attachment: hadoop-6988.0.txt

Adding support for HADOOP_TOKEN_FILE_LOCATION being interpreted as a comma-separated list of paths to delegation token files.

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-6988.0.txt
>
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918368#action_12918368 ] 

Aaron T. Myers commented on HADOOP-6988:
----------------------------------------

Allen: I agree with Owen - it doesn't make sense to remove this environment variable. It is primarily used internally, though its use is documented in hdfs_user_guide.xml and mapred_tutorial.xml. The interaction between this and the conf var is that all tokens specified via either method are added to a single credentials object - nothing's going to take precedence.

Owen: the motivation for this change is not with regard to passing delegation tokens from job to tasks, but rather with submitting jobs in the first place, which is another use for HADOOP_TOKEN_FILE_LOCATION. I'd like to be able to specify multiple tokens (gotten from fetchdt, or via other means) which a job could use to, for example, authenticate to multiple NNs and JTs. I considered creating a tool which would be capable of merging multiple delegation token files into one, but this seemed like a cleaner solution.

Good point on making this comma-separated. I'll definitely do that.

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918042#action_12918042 ] 

Allen Wittenauer commented on HADOOP-6988:
------------------------------------------

At a minimum, -1 on the environment variable.

Shouldn't HADOOP_CLIENT_OPTS be sufficient for passing extra -D params?  We have an abundance of environment variables that users can't handle as it is.  

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6988) Add support for reading multiple hadoop delegation token files

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919344#action_12919344 ] 

Aaron T. Myers commented on HADOOP-6988:
----------------------------------------

Thanks for the thoughtful comments, Devaraj.

As I said earlier, this is really only for convenience, as it's entirely possible to stuff multiple delegation token objects into a single credentials object, which is then serialized to a file. I considered creating a tool which would be capable of merging multiple delegation token files into one, but this seemed like a cleaner solution. Rather than having every script/job/program that wants to pass multiple independently-fetched delegation token files first invoke some command to merge them, just specify them all via the method that already exists.

The problem with specifying mapreduce.job.hdfs-servers for my particular use-case is that delegation tokens can't be fetched if the application which is submitting the job is only authenticated via a delegation token in the first place. That said, I see this issue as being largely orthogonal from the core question of whether or not it is reasonable to want to specify multiple delegation token files via the system that already exists.

> Add support for reading multiple hadoop delegation token files
> --------------------------------------------------------------
>
>                 Key: HADOOP-6988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6988
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.22.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-6988.0.txt, hadoop-6988.1.txt
>
>
> It would be nice if there were a way to specify multiple delegation token files via the HADOOP_TOKEN_FILE_LOCATION environment variable and the "mapreduce.job.credentials.binary" configuration value. I suggest a colon-separated list of paths, each of which is read as a separate delegation token file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.