You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Andrew Wang <an...@cloudera.com> on 2017/08/25 23:33:39 UTC

2017-08-25 Hadoop 3 release status update

Hi all,

I've written up a status report for the current state of Hadoop 3 on the
wiki. I've also pasted it below for your convenience.

Cheers,
Andrew

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-08-25

Another month flew by without an update. This is a big one.

Red flags:

   - 11 blockers still on the dashboard, with some filed recently. Need to
   burn these down.
   - There are many branch merges proposals flying around for features that
   were not originally being tracked for beta1 and GA. Introducing new code
   always comes with risk, so I'm working with the different contributors
   involved to discuss target versions, confirm readiness, and define quality
   bars for merge.

Miscellaneous blockers:

   - HADOOP-14284 <https://issues.apache.org/jira/browse/HADOOP-14284> (Shade
   Guava everywhere): We have agreement to shade the yarn client JAR. Shading
   hadoop-hdfs is still being discussed.
   - HADOOP-13363 <https://issues.apache.org/jira/browse/HADOOP-13363> (Upgrade
   to protobuf 3): Waiting on the Guava shading first.
   - YARN-7076 <https://issues.apache.org/jira/browse/YARN-7076>: New
   blocker, we need an assignee.
   - YARN-7094 <https://issues.apache.org/jira/browse/YARN-7094> (Document
   that server-side graceful decom is currently not recommended): Robert has a
   patch up, needs review. This is a stopgap for the old blocker YARN-5464.
   - YARN-5536 <https://issues.apache.org/jira/browse/YARN-5536> (Multiple
   format support (JSON, etc.) for exclude node file in NM graceful
   decommission with timeout): Robert has a proposal that needs to be pushed
   on.

beta1 features:

   - Erasure coding
      - There are three must-dos. Two have patches, one might not be a
      must-do.
      - I pinged the pluggable policy JIRA to see if metadata and API
      compatibility is complete.
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
      - Sean has HDFS rolling upgrade scripts up, waiting on Ray to add
      some YARN/MR coverage too.
      - Need to do a final runthrough of the JACC reports for YARN and HDFS.
   - Classpath isolation (HADOOP-11656)
      - We're down to the wire on this, I pinged Sean for an update.
   - Compat guide (HADOOP-13714
   <https://issues.apache.org/jira/browse/HADOOP-13714>)
      - I pinged the JIRA on this too, no updated patch since May

Features under discussion:

I discussed with a number of lead contributors on these features that were
previously not on my radar.

3.0.0-beta1:

   - YARN native services (Jian He)
      - I was convinced that this is very separate from the core. I'll get
      someone from Cloudera to run it through our integration tests to
verify it
      doesn't break anything downstream, then happy to merge.
   - TSv2 alpha 2 (Vrushali C)
   - Despite being called "alpha 2", this is more like "beta" in terms of
      readiness. Twitter is planning to roll it out to production. Seems quite
      done.
      - I double checked with Haibo, and he successfully ran it through our
      internal integration testing.

3.0.0 GA:

   - Resource profiles (Wangda Tan)
      - Alpha feature, APIs are not stable yet. Has some compatible PB
      changes, will verify rolling upgrade from branch-2. Touches some
core parts
      of YARN.
      - Decided that it's too close to beta1 for this, we're going to test
      it a lot and make sure it's ready for 3.0.0 GA.
   - HDFS router-based federation (Chris Douglas)
   - This is like YARN federation, very separate and doesn't add new APIs,
      run in production at MSFT.
      - If it passes Cloudera internal integration testing, I'm fine
      putting this in for GA.

3.1.0:

   - Storage Policy Satisfier (Uma Gangumalla)
      - We're resolving some design discussions on JIRA. Plan is to do some
      MVP work on the API to get this into 3.1, and if we're happy with the
      second phase, consider for 3.0 GA.
   - HDFS tiered storage (Chris Douglas):
   - This touches some core stuff, and the write path is still being worked
      on. Still somewhat useful with just the read path. Targeting at
3.1.0 gives
      enough time to wrap this up.