You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2023/06/09 10:47:00 UTC
[jira] [Commented] (HADOOP-18762) Incorporate Qiniu Cloud Kodo File System Implementation

    [ https://issues.apache.org/jira/browse/HADOOP-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730918#comment-17730918 ] 

Steve Loughran commented on HADOOP-18762:
-----------------------------------------

Hello and welcome

I'd like to start with a fairly ruthless question here: why does this need to be in the Hadoop source tree?

Google GCS is self contained for example. This lets them release on their own schedule and support older versions of hadoop. It has also allowed them to move to being Java 11+ only. If you are targeting spark, and you can get an independent release added as a dependency of the spark-hadoop-cloud module, spark can pick it up in a new release (*).

Putting it into the hadoop source tree is going to force us to either take on the maintenance, or, more likely: neglect it. A real trouble spot is CVEs here: we do have to update our dependencies to handle them and without integration test coverage we are likely to break things. This is actually why I have been cutting back on old modules (openstack) and I am staring at some other historical code (hadoop-pipes...) wondering how best to retire them. All code we delete is guaranteed to be free of CVEs.

Certainly I will not be in likely to test it before any release, nor will I field any bug reports.

We've made the contract test public for anyone to play with and anything you can do to improve them would be good. I can certainly give a review of the code too. But right now I am pretty reluctant to add another hard-to-test store.

(*) regarding spark, I have some downstream qualification tests there which you can fork to help validate integration: https://github.com/hortonworks-spark/cloud-integration



> Incorporate Qiniu Cloud Kodo File System Implementation
> -------------------------------------------------------
>
>                 Key: HADOOP-18762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18762
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 3.3.9
>            Reporter: Zhiqiang Zhang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Qiniu-Kodo-Integrated.pdf
>
>
> Qiniu Kodo is a self-developed unstructured data storage management platform by Qiniu Cloud Storage that supports center and edge storage. The platform has been verified by a large number of users for many years and has been widely used in various scenarios of massive data management. It is widely used in many cloud service users in China, but currently in the Apache Hadoop project, there is a lack of a solution that supports Kodo through Hadoop/Spark directly.
> The purpose of this project is to integrate Kodo into Hadoop/Spark projects, so that users can operate Kodo through the API of Hadoop/Spark without additional learning costs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org