You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2009/09/22 23:30:18 UTC

[jira] Created: (MAPREDUCE-1026) Shuffle should be secure

Shuffle should be secure
------------------------

                 Key: MAPREDUCE-1026
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: security
            Reporter: Owen O'Malley
            Assignee: Devaraj Das


Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Open  (was: Patch Available)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789083#action_12789083 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

I don't think so. In the local mode, shuffle shouldn't be invoked at all...

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758485#action_12758485 ] 

Allen Wittenauer commented on MAPREDUCE-1026:
---------------------------------------------

> 10 characters from [a-zA-Z0-9]

This seems like a fairly small key space that one could Hadoop on a small cluster to break. :)  Why not just use MD5 or SHA1 128 or 256 bit keys?

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759337#action_12759337 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

Summarizing some offline discussions:
1. Performance issues to do with 1.5 extra round trips to the TaskTracker for HTTP Digest authentication could be a significant cost when the map outputs are small.
2. Instead of that, can we do the following:
   2.1. Tasks authenticate to the TaskTrackers by simply passing the key in the URL. This doesn't cost us anything.
   2.2. Map tasks encrypts the final spill file on the map side when they are written to disk (and reducers decrypt them). This could be done using a key different from the shuffle key used in 2.1.
The idea is that at some point we anyway should have encrypted map outputs to have maximum security for the intermediate outputs. We can do that on-the-wire via https, or, have encrypted files. The latter should be much less costly when compared with the former. The point of having both 2.1 and 2.2 is to make the transfer very secure without introducing overheads to do with extra round trips for (digest) authentication.

Thoughts?

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Patch Available  (was: Open)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Patch Available  (was: Open)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026-13.patch

added port number to the hashed url.



> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Patch Available  (was: Open)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779817#action_12779817 ] 

Hadoop QA commented on MAPREDUCE-1026:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425415/MAPREDUCE-1026-13.patch
  against trunk revision 881673.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/console

This message is automatically generated.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779761#action_12779761 ] 

Hadoop QA commented on MAPREDUCE-1026:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425383/MAPREDUCE-1026-12.patch
  against trunk revision 881673.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/250/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/250/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/250/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/250/console

This message is automatically generated.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758493#action_12758493 ] 

Owen O'Malley commented on MAPREDUCE-1026:
------------------------------------------

I just wanted to get a proposal out there. 66^10 is very big. It is roughly 2^60.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758487#action_12758487 ] 

Owen O'Malley commented on MAPREDUCE-1026:
------------------------------------------

Avro RPC won't have bulk data or authentication for a while, I suspect.

But the answer is yes, once there is authentication on the rpc, we can use that. In particular, the rpc will be able to use token/secret keys for authentication and that would be appropriate for this context. (Clearly a key exchange involving the kdc would never be performant enough for the shuffle.)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026.patch

first draft

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773396#action_12773396 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

Looked at the patch in brief. Some first level comments:
1) Remove the method setJobTokenFile from JobConf. This is really a TT-Task configuration.
2) It probably makes sense to have the task read the configuration from the localized file directly. Since the token will be used (later on in a separate jira) to bootstrap even the task<->TT mutual authentication, it it better to check permissions on the localized file before trusting the key. The other option is to have the task read it from the hdfs.. 
3) What happens if the shuffle fails due to authentication problems? Maybe that needs to be handled specially w.r.t things like fetch failure notifications, and the reduce task killing itself after some trials..
4) The JobTracker should create the job-token file during running initTasks for the job in question.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026-2.patch

added test

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758477#action_12758477 ] 

Jeff Hammerbacher commented on MAPREDUCE-1026:
----------------------------------------------

Hey Owen (and probably Doug),

While we're here: how would this strategy change if map output was transferred to the reducers using Avro's RPC? Is there authentication in the handshake, and encryption (ssl?) for the data?

Just trying to educate myself for The Future (tm).

Thanks,
Jeff

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775250#action_12775250 ] 

Hadoop QA commented on MAPREDUCE-1026:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424422/MAPREDUCE-1026-2.patch
  against trunk revision 834284.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    -1 findbugs.  The patch appears to cause Findbugs to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/233/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/233/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/233/console

This message is automatically generated.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781512#action_12781512 ] 

Boris Shkolnik commented on MAPREDUCE-1026:
-------------------------------------------

created MAPREDUCE-1236 for LOG.isdebugenabled issue

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026-1.patch

bq. 1) The tasktracker needs to maintain a mapping from JobIDs to job-tokens
done
bq. 2) The call to localizeJobTokenFile should be done before the call to taskController.initializeJob(context) in the TaskTracker.localizeJob method. Could the localizeJobTokenFile be called within TaskTracker.localizeJobFiles
bq. 3) Minor: for the request/response HTTP headers, make the first character upper case
done
bq. 4) HMacUtil could override the equals method and put in logic for comapring two HMacUtil objects, instead of defining verifyHash.
We are note really comparing HMacUtil objects, they are just utilities. So I think verifyHash() should be more logical.
bq. 5) The Comp class in StoreKeys.java seems to be unused. StoreKeys could be Writable (as opposed to having to define load/store methods) 
Comp is used in the TreeMap constructor as the comparator.

Also added synchronization around the map of StoreKeys updates in TaskTracker.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760395#action_12760395 ] 

Kan Zhang commented on MAPREDUCE-1026:
--------------------------------------

> The one way to do that is to have the TT send their passwords in the response to the map output request.
How is TT password generated? The same way as the reduce Task password? They can't be the same password since otherwise TT can simply read the password from reducer request and send it back as response. HTTP Digest authentication makes it possible to use the same password for mutual authentication.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Open  (was: Patch Available)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758610#action_12758610 ] 

Owen O'Malley commented on MAPREDUCE-1026:
------------------------------------------

1. Of course

2. I'm pretty agnostic what the authentication mechanism is, other than I don't want an extra round trip. I don't see any way of doing a hash without an extra round trip on the connection open. On the other hand, doing a password doesn't reveal anything that isn't already known. If the attacker can sniff the network, they already know the secret.

3. If there is a better key length, we can use it. 66^10 is big enough to be safe. 

4. Of course

5. The key is per a job of course, but there is no advantage to having the JobTracker pick it. Either way it will be framework code that picks it. Putting it in the job conf is easy, and secure (once MAPREDUCE-181 goes in). Given that the key will be at the JobTracker and all of the TaskTracker's, I don't see the submitting node as a problem.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780854#action_12780854 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

I missed some LOG.debug statements that creates string objects unnecessarily. We should make the LOGs conditional on 'if (isDebugEnabled)' in a separate jira.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789044#action_12789044 ] 

Aaron Kimball commented on MAPREDUCE-1026:
------------------------------------------

I am finding a NullPointerException in Shuffle when I run things with the LocalJobRunner:

{code}
09/12/10 16:08:58 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:108)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:358)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:299)
{code}

{{reduceTask.getJobTokens()}} is returning null; I can't see anyplace in LocalJobRunner where the JobTokens object is being initialized. I think this patch is to blame?

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026-9.patch

review notes implemented.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated MAPREDUCE-1026:
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.22.0
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Boris!

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784923#action_12784923 ] 

Hudson commented on MAPREDUCE-1026:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #162 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/162/])
    

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780850#action_12780850 ] 

Hudson commented on MAPREDUCE-1026:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #126 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/126/])
    . Does mutual authentication of the shuffle transfers using a shared JobTracker generated key. Contributed by Boris Shkolnik.


> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026-3.patch

fixed warnings

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026.patch

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773596#action_12773596 ] 

Kan Zhang commented on MAPREDUCE-1026:
--------------------------------------

@Devaraj
> Since the token will be used (later on in a separate jira) to bootstrap even the task<->TT mutual authentication
Are you talking about Task<->TT heartbeats over RPC? For this connection, I suggest we use a separate key (in the format of Delegation token) that is generated by TT and given to Task just before it is launched. This way the key is known only to the local task and helps prevent Tasks running on other machines connecting this TT accidentally. In terms of implementation, TT can do this in the same way that NN does, e.g., instantiate a DelegationTokenHandler for generating Delegation token and couple it with RPC (no need to persist the MasterKey though).

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760388#action_12760388 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

BTW we also think that TTs should authenticate themselves to the reduce tasks (to protect the reduces against malicious TTs that might serve up wrong map outputs). The one way to do that is to have the TT send their passwords in the response to the map output request. 

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776260#action_12776260 ] 

Hadoop QA commented on MAPREDUCE-1026:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424539/MAPREDUCE-1026-3.patch
  against trunk revision 834284.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 161 release audit warnings (more than the trunk's current 159 warnings).

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/console

This message is automatically generated.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated MAPREDUCE-1026:
-----------------------------------

    Attachment: 1026-bp20-bugfix.patch

This fixes a bug in the original Y20 backport. Not for commit here.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: 1026-bp20-bugfix.patch, MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch, MR-1026-0_20.2.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780843#action_12780843 ] 

Konstantin Boudnik commented on MAPREDUCE-1026:
-----------------------------------------------

Technically, a JIRA has to be reviewed before the commit could happen.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773599#action_12773599 ] 

Kan Zhang commented on MAPREDUCE-1026:
--------------------------------------

> This way the key is known only to the local task
Also, no need to persist this key as part of the job. This key is just a runtime artifact of the Task and TT.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779327#action_12779327 ] 

Hadoop QA commented on MAPREDUCE-1026:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425287/MAPREDUCE-1026-9.patch
  against trunk revision 881536.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/142/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/142/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/142/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/142/console

This message is automatically generated.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026-12.patch

ivy.xml update for contribs 

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026-7.patch

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026-15.patch

moved secureShuffleUtils and JobTokens into o.a.h.mapreduce.security package

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773701#action_12773701 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

Looked at the patch some more. Few more comments:
1) The tasktracker needs to maintain a mapping from JobIDs to job-tokens
2) The call to localizeJobTokenFile should be done before the call to taskController.initializeJob(context) in the TaskTracker.localizeJob method. Could the localizeJobTokenFile be called within TaskTracker.localizeJobFiles
3) Minor: for the request/response HTTP headers, make the first character upper case
4) HMacUtil could override the equals method and put in logic for comapring two HMacUtil objects, instead of defining verifyHash.
5) The Comp class in StoreKeys.java seems to be unused. StoreKeys could be Writable (as opposed to having to define load/store methods)

For the case where a reduce task fails due to the TaskTracker(s) not being authentic, we probably need care. Two things might happen - the JobTracker might get enough notifications from other reduces in the system, and it might just decide to re-execute the map. The other situation is what is bothering me - the reduce task would kill itself after a certain threshold number of trials. This would be bad. IIRC it is not predictable which one could happen first.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Jitendra Nath Pandey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jitendra Nath Pandey updated MAPREDUCE-1026:
--------------------------------------------

    Attachment: MR-1026-0_20.2.patch

Patch for Hadoop-20 added.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch, MR-1026-0_20.2.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Open  (was: Patch Available)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Patch Available  (was: Open)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758545#action_12758545 ] 

Kan Zhang commented on MAPREDUCE-1026:
--------------------------------------

I had some rough idea for this when I opened HADOOP-4991. Briefly,
1. The output of Map tasks of a job should be accessed only by Reduce tasks of the same job.
2. Since currently this access is done over HTTP, I suggest we use HTTP DIGEST authentication mechanism as defined in RFC 2617. This is better than HTTP BASIC authentication since in the case of HTTP DIGEST, the secret key is never sent over to the server in the clear and it allows for mutual authentication.
3. We should use whatever key length that is recommended by the standard and JCE implementation.
4. The key is per-job and should be chosen by the JobTracker at job submission and persisted in the job conf in such a way that only tasks of that job + TT/JT can access it. I favor chosen by JT over chosen by JobClient for 2 reasons.
- The key is considered an internal detail of the M/R framework and should be transparent to anyone outside the M/R cluster, including the JobClient.
- You don't need to worry about the key being accidentally disclosed before/after being submitted to the JT at the client site.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Patch Available  (was: Open)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759627#action_12759627 ] 

Owen O'Malley commented on MAPREDUCE-1026:
------------------------------------------

To clarify, in this jira you intend to:

1. Use a job specific random key, which is included in the URL of the fetch.
2. Allow jobs to request encryption of the map output using a second job specific random key.  I assume the configuration boolean would be something like mapred.job.shuffle.encrypt. If the outputs are encrypted, I assume that we checksum the unencrypted data and include the checksum in the encryption.

Once you have done that, there isn't any motivation to pay for https. 

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Open  (was: Patch Available)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Patch Available  (was: Open)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780285#action_12780285 ] 

Hadoop QA commented on MAPREDUCE-1026:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425504/MAPREDUCE-1026-14.patch
  against trunk revision 881673.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/console

This message is automatically generated.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Attachment: MAPREDUCE-1026-14.patch

addressed minor findbug nit

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod K V updated MAPREDUCE-1026:
---------------------------------

    Issue Type: Sub-task  (was: Improvement)
        Parent: MAPREDUCE-563

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773709#action_12773709 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

My worry on the reduce task killing itself can be ignored. That is the right thing to happen as Boris and I discussed offline..

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759634#action_12759634 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

bq. 1. Use a job specific random key, which is included in the URL of the fetch.
Yes.
bq. 2. Allow jobs to request encryption of the map output using a second job specific random key. I assume the configuration boolean would be something like mapred.job.shuffle.encrypt.
Yes.

bq. If the outputs are encrypted, I assume that we checksum the unencrypted data and include the checksum in the encryption.
I am not sure whether this is required to be done. The encrypted bytes would be checksummed automatically as we write them to the disk. Do we need to build the extra logic of checksumming the unencrypted bytes (that might be a big deal when we have multiple map output spills that we finally merge at the end, and spill to disk). I propose we just live with the (auto) checksum of the encrypted bytes.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758452#action_12758452 ] 

Owen O'Malley commented on MAPREDUCE-1026:
------------------------------------------

The JobClient should create a random key of 10 characters from [a-zA-Z0-9] and put it in the job conf as secret.mapred.job.shuffle.key. I'd propose that we add all secret keys in a sub-tree of the config key space (secret.*) so that the web ui can hide them. The reducer can include the key in the url and the TaskTracker can check to make sure it is correct.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated MAPREDUCE-1026:
--------------------------------------

    Status: Open  (was: Patch Available)

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769011#action_12769011 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

Actually, it probably makes sense to write the job token file during the job initialization. The other place is to do it in the submitJob RPC method but it would mean the RPC handler is blocked during the HDFS access.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760397#action_12760397 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

Yes the thought is to have a different key that the client generates during job submission.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773611#action_12773611 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

Kan the RPC port on the TaskTracker is supposed to be bound to only localhost. So others outside the node in question shouldn't be able to do RPC. 
But lets keep that discussion to a separate jira. 

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>         Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated MAPREDUCE-1026:
-----------------------------------

    Assignee: Boris Shkolnik  (was: Devaraj Das)

Summarizing:
1) The JobTracker generates the job token and persists that to the HDFS in the jobId directory
2) The TaskTracker, as part of localization reads the token file, and localizes it in the secure location on the local disk
3) ReduceTask reads that file, and computes a HMAC-SHA1 of the URL using the token as the key, and sends it to the TT as part of the Map output request
4) The TT hosting the map output, reads the same key, and validates the HMAC. If the validation is successful, the TT computes a HMAC-SHA1 of the HMAC-SHA1 that it just received, and sends it as a HTTP header in the map output response.
5) The reduce task in turn validates that. If the validation is successful, it accepts the map output bytes.

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Boris Shkolnik
>
> Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.