You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "Jing Zhao (Jira)" <ji...@apache.org> on 2022/12/19 22:58:00 UTC

[jira] [Created] (HDFS-16875) Erasure Coding: data access proxy to allow old clients to read EC data

Jing Zhao created HDFS-16875:
--------------------------------

             Summary: Erasure Coding: data access proxy to allow old clients to read EC data
                 Key: HDFS-16875
                 URL: https://issues.apache.org/jira/browse/HDFS-16875
             Project: Hadoop HDFS
          Issue Type: New Feature
          Components: ec, erasure-coding
            Reporter: Jing Zhao
            Assignee: Jing Zhao


Erasure Coding is only supported by Hadoop 3, while many production deployments still depend on Hadoop 2. Upgrading the whole data tech stack to the Hadoop 3 release may involve big migration efforts and even reliability risks, considering the incompatibilities between these two Hadoop major releases as well as the potential uncovered issues and risks hidden in newer releases. Therefore, we need to find a solution, with the least amount of migration effort and risk, to adopt Erasure Coding for cost efficiency but still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in a transparent manner.

Internally we have developed an EC access proxy which translates the EC data for old clients. We also extend the NameNode RPC so it can recognize HDFS clients with/without the EC support, and redirect the old clients to the proxy. With the proxy we set up separate Erasure Coding clusters storing hundreds of PB of data, while leaving other production clusters and all the upper layer applications untouched.

Considering some changes are made at fundamental components of HDFS (e.g., client-NN RPC header), we do not aim to merge the change to trunk. We will use this ticket to share the design and implementation details (including the code) and collect feedback. We may use a separate github repo to open source the implementation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org