You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Kan Zhang (JIRA)" <ji...@apache.org> on 2008/11/03 03:56:44 UTC

[jira] Created: (HADOOP-4575) An independent HTTPS proxy for HDFS

An independent HTTPS proxy for HDFS
-----------------------------------

                 Key: HADOOP-4575
                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
             Project: Hadoop Core
          Issue Type: New Feature
            Reporter: Kan Zhang
            Assignee: Kan Zhang
             Fix For: 0.20.0


Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Attachment: 4575-3-core.patch
                4575-3-contrib.patch

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, 4575-3-contrib.patch, 4575-3-core.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Attachment:     (was: 4575-1-core.patch)

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-4575:
------------------------------------

    Release Note: Introduced independent HSFTP proxy server for authenticated access to clusters.

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, 4575-3-contrib.patch, 4575-3-core.patch, 4575-3.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Attachment: 4575-3.patch

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, 4575-3-contrib.patch, 4575-3-core.patch, 4575-3.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Attachment: 4575-2.patch

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-4575:
------------------------------------

    Component/s: security

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: security
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, 4575-3-contrib.patch, 4575-3-core.patch, 4575-3.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Attachment: 4575-1-contrib.patch
                4575-1.patch

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644843#action_12644843 ] 

Chris Douglas commented on HADOOP-4575:
---------------------------------------

On core changes:
* There is a race condition in UserGroupInformationManager::getUgiForUser; the critical section protects the get and put, but simultaneous requests for an absent ugi will likely create duplicate entries and insert the last acquirer of the latter lock. o.a.h.mapred.IndexCache uses a standard idiom that applies here. Alternatively, simply making the method synchronized would perform adequately and avoid considerable complication, particularly given the cleanup semantics for this cache.
* A cleanup thread is overkill. It should be sufficient to verify the timestamp of the item out of the cache and discard it if it's too old (refreshing the ugi in the cache). If the size of the cache is a concern, setting a threshold triggering cleanup on insertion is a sound strategy. The ugi+timestamp is almost certainly less than 100 bytes of memory, and closer to half that in most cases. If the limit is only checked on insertion and entries aren't pushed out of the cache, there's no risk for the degenerate case where the cache is scanned for each access. Further, only valid users are inserted, so the DoS threat is low.
* Since there are no references to UserGroupInformationManager in core, perhaps it can live with the proxy in contrib (including the changes to hadoop-default.xml)
* I don't understand the changes to FileDataServlet. It introduces a race condition in the jspHelper field initialization, which should be fixed. Is this getting around a static initialization order problem? If so, a proper pattern is preferred.
* UnixUserGroupInformation::getUgiForUser is a dangerous method:
{code}

  public static UnixUserGroupInformation getUgiForUser(String userName)
      throws IOException {
    String[] cmd = new String[] { "bash", "-c", "id -Gn " + userName };
    String[] groups = Shell.execCommand(cmd).split("\\s+");
    return new UnixUserGroupInformation(userName, groups);
  }
{code}
Though the proxy does a fair amount of sanity checking and input sanitizing, future callers may not be so careful (e.g. userName = {{"foo ; rm \-rf /"}}). Further, the login functionality is more general and a public static method on UnixUGI will impede future design. As a separate issue, replacing Shell::getGroupsCommand with "id -Gn" and the StringTokenizer with String::split are sound improvements. For this issue, though I can see why this functionality is associated with UnixUGI, looking up groups for arbitrary users belongs in the proxy code and is not required in core. The method should also do some of its own sanity checks.
* HftpFileSystem.ran should be {{final}}

On contrib:
* Why does ProxyHttpServer not extend or own an HttpServer? There is considerable overlap in their interfaces and implementations.
* I don't understand why the proxy would return a status of [402|http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.3] when the client doesn't provide a cert. Is there a canonical response for this case?
* Given that authentication is bypassed for testing:
{noformat}
+    } else { // http request, set ugi for servlets, only for testing purposes
+      String ugi = rqst.getParameter("ugi");
+      rqst.setAttribute("authorized.ugi", new UnixUserGroupInformation(ugi
+          .split(",")));
+    }
{noformat}
if a non-ssl listener is attached, this should probably log at WARN level on init at least.
* Calling System.exit from createHdfsProxy is unnecessarily forceful; returning null for success and throwing on failure is clearer. Similarly, the switch stmt is less readable than humble if stmts.

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Status: Patch Available  (was: Open)

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645999#action_12645999 ] 

Hudson commented on HADOOP-4575:
--------------------------------

Integrated in Hadoop-trunk #655 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/655/])
    . Add a proxy service for relaying HsftpFileSystem requests.
Includes client authentication via user certificates and config-based
access control. (Kan Zhang via cdouglas)


> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, 4575-3-contrib.patch, 4575-3-core.patch, 4575-3.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Status: Patch Available  (was: Open)

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Attachment: 4575-2-core.patch
                4575-2-contrib.patch

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645669#action_12645669 ] 

Hadoop QA commented on HADOOP-4575:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12393458/4575-3.patch
  against trunk revision 711734.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 10 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/console

This message is automatically generated.

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, 4575-3-contrib.patch, 4575-3-core.patch, 4575-3.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Attachment: lib.tgz

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645468#action_12645468 ] 

Chris Douglas commented on HADOOP-4575:
---------------------------------------

The core changes look fine. The contrib changes are OK for now, but in the future:
* If there were a limit on even the number of elements in the cache, checking that condition is less expensive than iterating over all the users in the cache for each insertion.
* Where there is not a compatibility requirement for previous versions, the code duplication in ProxyHttpServer should be reduced. In particular, 0.20 should have a version with the "right" abstractions.
* No information would be an improvement over a reserved HTTP error code. Something generic, like 400, perhaps?

Compiling against trunk, I got no errors when applying only the patch. Are the jars in lib.tgz supposed to go into src/contrib/hdfsproxy/lib ?

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Attachment: 4575-1-core.patch

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Status: Open  (was: Patch Available)

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4575:
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Kan

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, 4575-3-contrib.patch, 4575-3-core.patch, 4575-3.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645553#action_12645553 ] 

Kan Zhang commented on HADOOP-4575:
-----------------------------------

> If there were a limit on even the number of elements in the cache...
> Something generic, like 400, perhaps?
I attached a new patch (4575-3) that addressed the above concerns.

> Are the jars in lib.tgz supposed to go into src/contrib/hdfsproxy/lib ?
Yes.

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, 4575-3-contrib.patch, 4575-3-core.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645371#action_12645371 ] 

Kan Zhang commented on HADOOP-4575:
-----------------------------------

> There is a race condition in UserGroupInformationManager::getUgiForUser
I was aware of it and chose to punt on it since it doesn't affect correctness (the worst can happen is a few extra ugi objects get created and immediately garbage collected) and I was reluctant to make it a synchronized method since it may launch a shell command. Now that you view it as a problem, I've changed the code to use synchronized methods and no longer synchronize on the ugiCache object. This problem should go away.

> It introduces a race condition in the jspHelper field initialization
This is essentially the same issue as the above and I again chose to punt on it. As we discussed, the proper fix should be making jspHelper a singleton. I'll leave that to a separate JIRA and simply noted it in the source code.

> Why does ProxyHttpServer not extend or own an HttpServer?
For easier porting to 17, 18 and 19.

> I don't understand why the proxy would return a status of 402
Just a return code I picked that doesn't already have a defined semantics.

> Calling System.exit from createHdfsProxy is unnecessarily forceful...
I still prefer the code in the patch.

Your other comments are incorporated into the new patch. Thanks!

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, 4575-2-contrib.patch, 4575-2-core.patch, 4575-2.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kan Zhang updated HADOOP-4575:
------------------------------

    Attachment: 4575-1-core.patch

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4575) An independent HTTPS proxy for HDFS

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644890#action_12644890 ] 

Hadoop QA commented on HADOOP-4575:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12393231/4575-1-core.patch
  against trunk revision 709609.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/console

This message is automatically generated.

> An independent HTTPS proxy for HDFS
> -----------------------------------
>
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, lib.tgz
>
>
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.