You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2024/01/02 12:01:00 UTC

[jira] [Moved] (HADOOP-19021) in hadoop-azure, use jdk11 HttpClient instead of legacy java.net.HttpURLConnection, for supporting http2 and connection keep alive

     [ https://issues.apache.org/jira/browse/HADOOP-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran moved MAPREDUCE-7467 to HADOOP-19021:
----------------------------------------------------

                  Key: HADOOP-19021  (was: MAPREDUCE-7467)
    Affects Version/s: 3.4.0
                           (was: 3.3.5)
                           (was: 3.3.3)
                           (was: 3.3.4)
                           (was: 3.3.6)
           Issue Type: Improvement  (was: Wish)
              Project: Hadoop Common  (was: Hadoop Map/Reduce)

> in hadoop-azure, use jdk11 HttpClient instead of legacy java.net.HttpURLConnection, for supporting http2 and connection keep alive
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19021
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19021
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 3.4.0
>            Reporter: Arnaud Nauwynck
>            Priority: Critical
>
> As described in Jira Title: "in hadoop-azure, use jdk11 HttpClient instead of legacy java.net.HttpURLConnection, for supporting http2 and connection keep alive"
> Few remarks:
> 1/ The official Azure SDK supports either OkHttp or Netty for the Http transport.
> 2/ the actual hadoop-azure use the class java.net.HttpURLConnection, which is slow.
>   It does not use Http2, does not optimize SSL Hand-shake very well, and does not keep TCP connection alive for re-use.
> 3/ JDK since version >=11 have a new class HttpClient which should be a better replacement 
> 4/ it might be possible to introduce a configuration property (with defaut to use legacy class) , and an abstract factory to create connection via either HttpURLConnection or any other pluggeable implementation (jdk 11 HttpClient, OkHttp, Netty, ...)
> 5/ the official Azure SDK is maintained by Microsoft, so should better follow bug fixes and improvements than custom hadoop implementation?
> [https://learn.microsoft.com/en-us/java/api/overview/azure/storage-file-datalake-readme?view=azure-java-stable
> |https://learn.microsoft.com/en-us/java/api/overview/azure/storage-file-datalake-readme?view=azure-java-stable]
> 6/ when we use code with the official Azure SDK and Hadoop(in Spark), it is chocking to have 2 different implementations within the same JVM... 
> 7/ The official Azure SDK has more features that what allows the legacy hadoop class FileSystem to do... In particular, we can append (=upload) file by multiple threads (upload by fragments at different offsets), then flush when every fragments are sent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org