You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Xuanwo <xu...@apache.org> on 2023/12/20 10:16:42 UTC

Re: [DISCUSS][HDFS] Add rust binding for libhdfs

I'm fine to start work under a new repo, and I'm willing to help maintain this repo. The repo could name after hadoop-libhdfs-rust or just libhdfs-rust. 

I'm PPMC member of other ASF projects so I know how to do release and how to make sure the license fit the requirements. I'm willing the become the RM until we find more committers for this sub-project.

I'm currently looking for committers willing to help me review PRs and validate my releases. Is there anyone interested in sponsoring me?

On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
> > What is libdirent? How is it relevant in this context? 
> 
> Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC does not provide this header which causes issues when building libhdfs on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC port of the dirent.h API for Windows.
> 
> Fortunately, hdfs has already done similar work in [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can migrate to use hdfs's own implementation instead.
> 
> > How tightly coupled is it to a specific Hadoop version?
> 
> Thanks to hdfs's stable API, there is no breakage between different hadoop version (only addition). So the version matrix will be like:
> 
> - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
> ...
> - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
> ...
> - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
> 
> > The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
> 
> Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool developed by the Rust Team to automatically generate Rust FFI bindings for C (and some C++) libraries. Other parts are related to building and linking, similar to Makefile, such as finding libjvm and libhdfs.
> 
> In general, the task that libhdfs-rust performs is simple: it provides an API to Rust and links it with libhdfs.so, which I believe is easy to test.
> 
> [libdirect]: https://github.com/tronkko/dirent
> [native/libhdfspp/lib/x-platform]: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
> [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
> 
> 
> On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
>> Inline
>> 
>> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ay...@gmail.com> wrote:
>>> Forwarding from dev@hadoop to relevant ML
>>> 
>>> Original mail: https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
>>> 
>>> -Ayush
>>> 
>>> On 2023/07/15 09:18:42 Xuanwo wrote:
>>> > Hello, everyone.
>>> >
>>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for Rust. I want to know is it a good idea of accepting hdfs-sys as a part of hadoop project?
>>> >
>>> > Users of hdfs-sys for now:
>>> >
>>> > - [OpenDAL]: An Apache Incubator project that allows users to easily and efficiently retrieve data from various storage services in a unified way.
>>> > - [Databend]: A modern cloud data warehouse focusing on reducing cost and complexity for your massive-scale analytics needs. (via OpenDAL)
>>> > - [RisingWave]: The distributed streaming database: SQL stream processing with Postgres-like experience. (via OpenDAL)
>>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse framework
>>> >
>>> > Licenses information of hdfs-sys:
>>> >
>>> > - hdfs-sys itself licensed under Apache-2.0
>>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual licensed under Apache-2.0 and MIT. 
>>> >
>>> > Works need to do if accept:
>>> >
>>> > - Replace libdirent with the same dirent API implemented in HDFS project.
>>> > - Remove all bundled hdfs C code.
>> What is libdirent? How is it relevant in this context? 
>> 
>> How tightly coupled is it to a specific Hadoop version? I am wondering if it's possible to host it in a separate Hadoop repo, if it's accepted. The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
>>> >
>>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
>>> > [OpenDAL]: https://github.com/apache/incubator-opendal
>>> > [Databend]: https://github.com/datafuselabs/databend
>>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
>>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
>>> >
>>> > Xuanwo
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@hadoop.apache.org
>>> > For additional commands, e-mail: dev-help@hadoop.apache.org
>>> >
>>> >
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: common-dev-help@hadoop.apache.org
> 
> Xuanwo
> 

Xuanwo

Re: [DISCUSS][HDFS] Add rust binding for libhdfs

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
i hear owen o'malley has been learning rust, and as he left cloudera a year
ago, he'll be missing github and JIRA....

On Thu, 21 Dec 2023 at 15:00, Ayush Saxena <ay...@gmail.com> wrote:

> It looks pretty challenging to me. Most of the committers aren't
> technically equipped to review this code, so getting the initial code
> reviewed & merged itself would be a challenge, as none of us can
> actually review the code.
>
> Looking at the repo, it has only 1 or 2 major contributors, which
> itself is a red flag, the bus factor is pretty low, if we don't find
> volunteers in future, we would be stuck with some dead code, which
> most of us don't know how to fix or maintain. If there is any CVE
> reported from this code post release, that would be a challenge for us
> to fix
>
> Quoting:
> > the Rust
> community has developed around 10 different HDFS client projects.
> However, almost all of them
> are no longer maintained.
>
> If they couldn't do, how we will be able to do that? and this isn't a
> very good statistic to quote :-)
>
>
> Well, I don't have objections on having this as a separate repo in
> Hadoop, if others are fine with it, I can try to help whatever is in
> my capacity, but I still have doubts on how easy would it be to push
> code or get votes on release of this project, which most of the people
> doesn't have knowledge & developing a community and stuff seems like a
> incubator thing to me.
>
> -Ayush
>
> On Thu, 21 Dec 2023 at 19:01, Xuanwo <xu...@apache.org> wrote:
> >
> > Thanks Xiaoqiao He!
> >
> > Let me provide more context about this project.
> >
> > libhdfs-rust aims to provide native HDFS client support for Rust, a
> rapidly growing systems
> > programming language commonly used in modern infrastructure such as
> databases. With
> > libhdfs-rust, Rust developers can more easily integrate with HDFS.
> libhdfs-rust is analogous
> > to both libhdfs (C API) and libhdfspp <
> https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp>
> (C++ API). Its current codebase builds upon libhdfs, but
> > there are plans to rewrite it entirely in pure rust. Consequently,
> libhdfs-rust will interface
> > directly with the HDFS Java client via JNI, making it fully parallel to
> both libhdfs and libhdfs-cpp.
> >
> > There are three possible ways for us to take:
> >
> > We have three options to consider:
> >
> > A: Integrate libhdfs-rust into the Hadoop repository, placing it under
> >     'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'.
> > B: Accept libhdfs-rust as a subproject and establish a new repository
> >     named 'hadoop-hdfs-rust-client' (or another suitable name).
> > C: Maintain libhdfs-rust as an independent project outside of Hadoop.
> >
> > I personally prefer Option B since:
> >
> > For Option A
> >
> > The release process for Hadoop is already quite complex. We should avoid
> placing additional
> > burdens on the Release Managers, especially when it involves integrating
> a new language.
> >
> > And it's impossible to wait for libhdfs-rust mature and stable enough to
> catch up the release train.
> >
> > For Option C
> >
> > libhdfs-rust is exactly the same with libhdfs & libhdfspp <
> https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp>
> but for rust. Building a community for
> > libhdfs-rust outside of Hadoop is challenging. In fact, numerous
> attempts have been made: the Rust
> > community has developed around 10 different HDFS client projects.
> However, almost all of them
> > are no longer maintained.
> >
> > In conclusion, I believe that Option B is the best choice for us: we can
> develop a rust project in hadoop
> > community, attract more rust users, and recruit additional committers
> from the rust community.
> >
> >
> > On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote:
> > > Thanks Xuanwo for your work. I believe it is valuable to enlarge
> hadoop ecosystem.
> > >
> > > I am also concerned that it will involve more hard work to release and
> version match,
> > > especially for one who is not familiar with C or Rust.
> > > Moreover, I am not aware the difference between `accept hdfs-sys as
> part of hadoop
> > > project` and `one separate project`.
> > >
> > > I think one smooth solution is reference hadoop-thirdparty[1] which is
> one hadoop
> > > sub-project but split to separate repo and release line etc, if it is
> accepted.
> > >
> > > cc @Ayush Saxena <ma...@gmail.com> @Wei-Chiu Chuang <mailto:
> weichiu@apache.org> @Iñigo Goiri <ma...@gmail.com> @Shilun Fan
> <ma...@foxmail.com> and other folks, what
> > > do you think? Thanks.
> > >
> > > Best Regards,
> > > - He Xiaoqiao
> > >
> > > [1] https://github.com/apache/hadoop-thirdparty
> > >
> > > On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xu...@apache.org> wrote:
> > >> I'm fine to start work under a new repo, and I'm willing to help
> maintain this repo. The repo could name after hadoop-libhdfs-rust or just
> libhdfs-rust.
> > >>
> > >> I'm PPMC member of other ASF projects so I know how to do release and
> how to make sure the license fit the requirements. I'm willing the become
> the RM until we find more committers for this sub-project.
> > >>
> > >> I'm currently looking for committers willing to help me review PRs
> and validate my releases. Is there anyone interested in sponsoring me?
> > >>
> > >> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
> > >> > > What is libdirent? How is it relevant in this context?
> > >> >
> > >> > Since version 3.3, libhdfs depends on the dirent.h API. However,
> MSVC does not provide this header which causes issues when building libhdfs
> on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a
> MSVC port of the dirent.h API for Windows.
> > >> >
> > >> > Fortunately, hdfs has already done similar work in
> [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can
> migrate to use hdfs's own implementation instead.
> > >> >
> > >> > > How tightly coupled is it to a specific Hadoop version?
> > >> >
> > >> > Thanks to hdfs's stable API, there is no breakage between different
> hadoop version (only addition). So the version matrix will be like:
> > >> >
> > >> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
> > >> > ...
> > >> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
> > >> > ...
> > >> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
> > >> >
> > >> > > The concern I have as a release manager is that it makes my life
> harder to ensure the quality of a language binding that I am not familiar
> with.
> > >> >
> > >> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a
> tool developed by the Rust Team to automatically generate Rust FFI bindings
> for C (and some C++) libraries. Other parts are related to building and
> linking, similar to Makefile, such as finding libjvm and libhdfs.
> > >> >
> > >> > In general, the task that libhdfs-rust performs is simple: it
> provides an API to Rust and links it with libhdfs.so, which I believe is
> easy to test.
> > >> >
> > >> > [libdirect]: https://github.com/tronkko/dirent
> > >> > [native/libhdfspp/lib/x-platform]:
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
> > >> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
> > >> >
> > >> >
> > >> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
> > >> >> Inline
> > >> >>
> > >> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ay...@gmail.com>
> wrote:
> > >> >>> Forwarding from dev@hadoop to relevant ML
> > >> >>>
> > >> >>> Original mail:
> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
> > >> >>>
> > >> >>> -Ayush
> > >> >>>
> > >> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
> > >> >>> > Hello, everyone.
> > >> >>> >
> > >> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C
> API for Rust. I want to know is it a good idea of accepting hdfs-sys as a
> part of hadoop project?
> > >> >>> >
> > >> >>> > Users of hdfs-sys for now:
> > >> >>> >
> > >> >>> > - [OpenDAL]: An Apache Incubator project that allows users to
> easily and efficiently retrieve data from various storage services in a
> unified way.
> > >> >>> > - [Databend]: A modern cloud data warehouse focusing on
> reducing cost and complexity for your massive-scale analytics needs. (via
> OpenDAL)
> > >> >>> > - [RisingWave]: The distributed streaming database: SQL stream
> processing with Postgres-like experience. (via OpenDAL)
> > >> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native
> Lakehouse framework
> > >> >>> >
> > >> >>> > Licenses information of hdfs-sys:
> > >> >>> >
> > >> >>> > - hdfs-sys itself licensed under Apache-2.0
> > >> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73,
> glob@0.3.1, hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they
> are all dual licensed under Apache-2.0 and MIT.
> > >> >>> >
> > >> >>> > Works need to do if accept:
> > >> >>> >
> > >> >>> > - Replace libdirent with the same dirent API implemented in
> HDFS project.
> > >> >>> > - Remove all bundled hdfs C code.
> > >> >> What is libdirent? How is it relevant in this context?
> > >> >>
> > >> >> How tightly coupled is it to a specific Hadoop version? I am
> wondering if it's possible to host it in a separate Hadoop repo, if it's
> accepted. The concern I have as a release manager is that it makes my life
> harder to ensure the quality of a language binding that I am not familiar
> with.
> > >> >>> >
> > >> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
> > >> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
> > >> >>> > [Databend]: https://github.com/datafuselabs/databend
> > >> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
> > >> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
> > >> >>> >
> > >> >>> > Xuanwo
> > >> >>> >
> > >> >>> >
> ---------------------------------------------------------------------
> > >> >>> > To unsubscribe, e-mail: dev-unsubscribe@hadoop.apache.org
> > >> >>> > For additional commands, e-mail: dev-help@hadoop.apache.org
> > >> >>> >
> > >> >>> >
> > >> >>>
> > >> >>>
> ---------------------------------------------------------------------
> > >> >>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > >> >>> For additional commands, e-mail:
> common-dev-help@hadoop.apache.org
> > >> >
> > >> > Xuanwo
> > >> >
> > >>
> > >> Xuanwo
> >
> > Xuanwo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS][HDFS] Add rust binding for libhdfs

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
i hear owen o'malley has been learning rust, and as he left cloudera a year
ago, he'll be missing github and JIRA....

On Thu, 21 Dec 2023 at 15:00, Ayush Saxena <ay...@gmail.com> wrote:

> It looks pretty challenging to me. Most of the committers aren't
> technically equipped to review this code, so getting the initial code
> reviewed & merged itself would be a challenge, as none of us can
> actually review the code.
>
> Looking at the repo, it has only 1 or 2 major contributors, which
> itself is a red flag, the bus factor is pretty low, if we don't find
> volunteers in future, we would be stuck with some dead code, which
> most of us don't know how to fix or maintain. If there is any CVE
> reported from this code post release, that would be a challenge for us
> to fix
>
> Quoting:
> > the Rust
> community has developed around 10 different HDFS client projects.
> However, almost all of them
> are no longer maintained.
>
> If they couldn't do, how we will be able to do that? and this isn't a
> very good statistic to quote :-)
>
>
> Well, I don't have objections on having this as a separate repo in
> Hadoop, if others are fine with it, I can try to help whatever is in
> my capacity, but I still have doubts on how easy would it be to push
> code or get votes on release of this project, which most of the people
> doesn't have knowledge & developing a community and stuff seems like a
> incubator thing to me.
>
> -Ayush
>
> On Thu, 21 Dec 2023 at 19:01, Xuanwo <xu...@apache.org> wrote:
> >
> > Thanks Xiaoqiao He!
> >
> > Let me provide more context about this project.
> >
> > libhdfs-rust aims to provide native HDFS client support for Rust, a
> rapidly growing systems
> > programming language commonly used in modern infrastructure such as
> databases. With
> > libhdfs-rust, Rust developers can more easily integrate with HDFS.
> libhdfs-rust is analogous
> > to both libhdfs (C API) and libhdfspp <
> https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp>
> (C++ API). Its current codebase builds upon libhdfs, but
> > there are plans to rewrite it entirely in pure rust. Consequently,
> libhdfs-rust will interface
> > directly with the HDFS Java client via JNI, making it fully parallel to
> both libhdfs and libhdfs-cpp.
> >
> > There are three possible ways for us to take:
> >
> > We have three options to consider:
> >
> > A: Integrate libhdfs-rust into the Hadoop repository, placing it under
> >     'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'.
> > B: Accept libhdfs-rust as a subproject and establish a new repository
> >     named 'hadoop-hdfs-rust-client' (or another suitable name).
> > C: Maintain libhdfs-rust as an independent project outside of Hadoop.
> >
> > I personally prefer Option B since:
> >
> > For Option A
> >
> > The release process for Hadoop is already quite complex. We should avoid
> placing additional
> > burdens on the Release Managers, especially when it involves integrating
> a new language.
> >
> > And it's impossible to wait for libhdfs-rust mature and stable enough to
> catch up the release train.
> >
> > For Option C
> >
> > libhdfs-rust is exactly the same with libhdfs & libhdfspp <
> https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp>
> but for rust. Building a community for
> > libhdfs-rust outside of Hadoop is challenging. In fact, numerous
> attempts have been made: the Rust
> > community has developed around 10 different HDFS client projects.
> However, almost all of them
> > are no longer maintained.
> >
> > In conclusion, I believe that Option B is the best choice for us: we can
> develop a rust project in hadoop
> > community, attract more rust users, and recruit additional committers
> from the rust community.
> >
> >
> > On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote:
> > > Thanks Xuanwo for your work. I believe it is valuable to enlarge
> hadoop ecosystem.
> > >
> > > I am also concerned that it will involve more hard work to release and
> version match,
> > > especially for one who is not familiar with C or Rust.
> > > Moreover, I am not aware the difference between `accept hdfs-sys as
> part of hadoop
> > > project` and `one separate project`.
> > >
> > > I think one smooth solution is reference hadoop-thirdparty[1] which is
> one hadoop
> > > sub-project but split to separate repo and release line etc, if it is
> accepted.
> > >
> > > cc @Ayush Saxena <ma...@gmail.com> @Wei-Chiu Chuang <mailto:
> weichiu@apache.org> @Iñigo Goiri <ma...@gmail.com> @Shilun Fan
> <ma...@foxmail.com> and other folks, what
> > > do you think? Thanks.
> > >
> > > Best Regards,
> > > - He Xiaoqiao
> > >
> > > [1] https://github.com/apache/hadoop-thirdparty
> > >
> > > On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xu...@apache.org> wrote:
> > >> I'm fine to start work under a new repo, and I'm willing to help
> maintain this repo. The repo could name after hadoop-libhdfs-rust or just
> libhdfs-rust.
> > >>
> > >> I'm PPMC member of other ASF projects so I know how to do release and
> how to make sure the license fit the requirements. I'm willing the become
> the RM until we find more committers for this sub-project.
> > >>
> > >> I'm currently looking for committers willing to help me review PRs
> and validate my releases. Is there anyone interested in sponsoring me?
> > >>
> > >> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
> > >> > > What is libdirent? How is it relevant in this context?
> > >> >
> > >> > Since version 3.3, libhdfs depends on the dirent.h API. However,
> MSVC does not provide this header which causes issues when building libhdfs
> on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a
> MSVC port of the dirent.h API for Windows.
> > >> >
> > >> > Fortunately, hdfs has already done similar work in
> [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can
> migrate to use hdfs's own implementation instead.
> > >> >
> > >> > > How tightly coupled is it to a specific Hadoop version?
> > >> >
> > >> > Thanks to hdfs's stable API, there is no breakage between different
> hadoop version (only addition). So the version matrix will be like:
> > >> >
> > >> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
> > >> > ...
> > >> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
> > >> > ...
> > >> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
> > >> >
> > >> > > The concern I have as a release manager is that it makes my life
> harder to ensure the quality of a language binding that I am not familiar
> with.
> > >> >
> > >> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a
> tool developed by the Rust Team to automatically generate Rust FFI bindings
> for C (and some C++) libraries. Other parts are related to building and
> linking, similar to Makefile, such as finding libjvm and libhdfs.
> > >> >
> > >> > In general, the task that libhdfs-rust performs is simple: it
> provides an API to Rust and links it with libhdfs.so, which I believe is
> easy to test.
> > >> >
> > >> > [libdirect]: https://github.com/tronkko/dirent
> > >> > [native/libhdfspp/lib/x-platform]:
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
> > >> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
> > >> >
> > >> >
> > >> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
> > >> >> Inline
> > >> >>
> > >> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ay...@gmail.com>
> wrote:
> > >> >>> Forwarding from dev@hadoop to relevant ML
> > >> >>>
> > >> >>> Original mail:
> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
> > >> >>>
> > >> >>> -Ayush
> > >> >>>
> > >> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
> > >> >>> > Hello, everyone.
> > >> >>> >
> > >> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C
> API for Rust. I want to know is it a good idea of accepting hdfs-sys as a
> part of hadoop project?
> > >> >>> >
> > >> >>> > Users of hdfs-sys for now:
> > >> >>> >
> > >> >>> > - [OpenDAL]: An Apache Incubator project that allows users to
> easily and efficiently retrieve data from various storage services in a
> unified way.
> > >> >>> > - [Databend]: A modern cloud data warehouse focusing on
> reducing cost and complexity for your massive-scale analytics needs. (via
> OpenDAL)
> > >> >>> > - [RisingWave]: The distributed streaming database: SQL stream
> processing with Postgres-like experience. (via OpenDAL)
> > >> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native
> Lakehouse framework
> > >> >>> >
> > >> >>> > Licenses information of hdfs-sys:
> > >> >>> >
> > >> >>> > - hdfs-sys itself licensed under Apache-2.0
> > >> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73,
> glob@0.3.1, hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they
> are all dual licensed under Apache-2.0 and MIT.
> > >> >>> >
> > >> >>> > Works need to do if accept:
> > >> >>> >
> > >> >>> > - Replace libdirent with the same dirent API implemented in
> HDFS project.
> > >> >>> > - Remove all bundled hdfs C code.
> > >> >> What is libdirent? How is it relevant in this context?
> > >> >>
> > >> >> How tightly coupled is it to a specific Hadoop version? I am
> wondering if it's possible to host it in a separate Hadoop repo, if it's
> accepted. The concern I have as a release manager is that it makes my life
> harder to ensure the quality of a language binding that I am not familiar
> with.
> > >> >>> >
> > >> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
> > >> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
> > >> >>> > [Databend]: https://github.com/datafuselabs/databend
> > >> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
> > >> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
> > >> >>> >
> > >> >>> > Xuanwo
> > >> >>> >
> > >> >>> >
> ---------------------------------------------------------------------
> > >> >>> > To unsubscribe, e-mail: dev-unsubscribe@hadoop.apache.org
> > >> >>> > For additional commands, e-mail: dev-help@hadoop.apache.org
> > >> >>> >
> > >> >>> >
> > >> >>>
> > >> >>>
> ---------------------------------------------------------------------
> > >> >>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > >> >>> For additional commands, e-mail:
> common-dev-help@hadoop.apache.org
> > >> >
> > >> > Xuanwo
> > >> >
> > >>
> > >> Xuanwo
> >
> > Xuanwo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS][HDFS] Add rust binding for libhdfs

Posted by Ayush Saxena <ay...@gmail.com>.
It looks pretty challenging to me. Most of the committers aren't
technically equipped to review this code, so getting the initial code
reviewed & merged itself would be a challenge, as none of us can
actually review the code.

Looking at the repo, it has only 1 or 2 major contributors, which
itself is a red flag, the bus factor is pretty low, if we don't find
volunteers in future, we would be stuck with some dead code, which
most of us don't know how to fix or maintain. If there is any CVE
reported from this code post release, that would be a challenge for us
to fix

Quoting:
> the Rust
community has developed around 10 different HDFS client projects.
However, almost all of them
are no longer maintained.

If they couldn't do, how we will be able to do that? and this isn't a
very good statistic to quote :-)


Well, I don't have objections on having this as a separate repo in
Hadoop, if others are fine with it, I can try to help whatever is in
my capacity, but I still have doubts on how easy would it be to push
code or get votes on release of this project, which most of the people
doesn't have knowledge & developing a community and stuff seems like a
incubator thing to me.

-Ayush

On Thu, 21 Dec 2023 at 19:01, Xuanwo <xu...@apache.org> wrote:
>
> Thanks Xiaoqiao He!
>
> Let me provide more context about this project.
>
> libhdfs-rust aims to provide native HDFS client support for Rust, a rapidly growing systems
> programming language commonly used in modern infrastructure such as databases. With
> libhdfs-rust, Rust developers can more easily integrate with HDFS. libhdfs-rust is analogous
> to both libhdfs (C API) and libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> (C++ API). Its current codebase builds upon libhdfs, but
> there are plans to rewrite it entirely in pure rust. Consequently, libhdfs-rust will interface
> directly with the HDFS Java client via JNI, making it fully parallel to both libhdfs and libhdfs-cpp.
>
> There are three possible ways for us to take:
>
> We have three options to consider:
>
> A: Integrate libhdfs-rust into the Hadoop repository, placing it under
>     'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'.
> B: Accept libhdfs-rust as a subproject and establish a new repository
>     named 'hadoop-hdfs-rust-client' (or another suitable name).
> C: Maintain libhdfs-rust as an independent project outside of Hadoop.
>
> I personally prefer Option B since:
>
> For Option A
>
> The release process for Hadoop is already quite complex. We should avoid placing additional
> burdens on the Release Managers, especially when it involves integrating a new language.
>
> And it's impossible to wait for libhdfs-rust mature and stable enough to catch up the release train.
>
> For Option C
>
> libhdfs-rust is exactly the same with libhdfs & libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> but for rust. Building a community for
> libhdfs-rust outside of Hadoop is challenging. In fact, numerous attempts have been made: the Rust
> community has developed around 10 different HDFS client projects. However, almost all of them
> are no longer maintained.
>
> In conclusion, I believe that Option B is the best choice for us: we can develop a rust project in hadoop
> community, attract more rust users, and recruit additional committers from the rust community.
>
>
> On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote:
> > Thanks Xuanwo for your work. I believe it is valuable to enlarge hadoop ecosystem.
> >
> > I am also concerned that it will involve more hard work to release and version match,
> > especially for one who is not familiar with C or Rust.
> > Moreover, I am not aware the difference between `accept hdfs-sys as part of hadoop
> > project` and `one separate project`.
> >
> > I think one smooth solution is reference hadoop-thirdparty[1] which is one hadoop
> > sub-project but split to separate repo and release line etc, if it is accepted.
> >
> > cc @Ayush Saxena <ma...@gmail.com> @Wei-Chiu Chuang <ma...@apache.org> @Iñigo Goiri <ma...@gmail.com> @Shilun Fan <ma...@foxmail.com> and other folks, what
> > do you think? Thanks.
> >
> > Best Regards,
> > - He Xiaoqiao
> >
> > [1] https://github.com/apache/hadoop-thirdparty
> >
> > On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xu...@apache.org> wrote:
> >> I'm fine to start work under a new repo, and I'm willing to help maintain this repo. The repo could name after hadoop-libhdfs-rust or just libhdfs-rust.
> >>
> >> I'm PPMC member of other ASF projects so I know how to do release and how to make sure the license fit the requirements. I'm willing the become the RM until we find more committers for this sub-project.
> >>
> >> I'm currently looking for committers willing to help me review PRs and validate my releases. Is there anyone interested in sponsoring me?
> >>
> >> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
> >> > > What is libdirent? How is it relevant in this context?
> >> >
> >> > Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC does not provide this header which causes issues when building libhdfs on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC port of the dirent.h API for Windows.
> >> >
> >> > Fortunately, hdfs has already done similar work in [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can migrate to use hdfs's own implementation instead.
> >> >
> >> > > How tightly coupled is it to a specific Hadoop version?
> >> >
> >> > Thanks to hdfs's stable API, there is no breakage between different hadoop version (only addition). So the version matrix will be like:
> >> >
> >> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
> >> > ...
> >> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
> >> > ...
> >> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
> >> >
> >> > > The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
> >> >
> >> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool developed by the Rust Team to automatically generate Rust FFI bindings for C (and some C++) libraries. Other parts are related to building and linking, similar to Makefile, such as finding libjvm and libhdfs.
> >> >
> >> > In general, the task that libhdfs-rust performs is simple: it provides an API to Rust and links it with libhdfs.so, which I believe is easy to test.
> >> >
> >> > [libdirect]: https://github.com/tronkko/dirent
> >> > [native/libhdfspp/lib/x-platform]: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
> >> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
> >> >
> >> >
> >> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
> >> >> Inline
> >> >>
> >> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ay...@gmail.com> wrote:
> >> >>> Forwarding from dev@hadoop to relevant ML
> >> >>>
> >> >>> Original mail: https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
> >> >>>
> >> >>> -Ayush
> >> >>>
> >> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
> >> >>> > Hello, everyone.
> >> >>> >
> >> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for Rust. I want to know is it a good idea of accepting hdfs-sys as a part of hadoop project?
> >> >>> >
> >> >>> > Users of hdfs-sys for now:
> >> >>> >
> >> >>> > - [OpenDAL]: An Apache Incubator project that allows users to easily and efficiently retrieve data from various storage services in a unified way.
> >> >>> > - [Databend]: A modern cloud data warehouse focusing on reducing cost and complexity for your massive-scale analytics needs. (via OpenDAL)
> >> >>> > - [RisingWave]: The distributed streaming database: SQL stream processing with Postgres-like experience. (via OpenDAL)
> >> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse framework
> >> >>> >
> >> >>> > Licenses information of hdfs-sys:
> >> >>> >
> >> >>> > - hdfs-sys itself licensed under Apache-2.0
> >> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual licensed under Apache-2.0 and MIT.
> >> >>> >
> >> >>> > Works need to do if accept:
> >> >>> >
> >> >>> > - Replace libdirent with the same dirent API implemented in HDFS project.
> >> >>> > - Remove all bundled hdfs C code.
> >> >> What is libdirent? How is it relevant in this context?
> >> >>
> >> >> How tightly coupled is it to a specific Hadoop version? I am wondering if it's possible to host it in a separate Hadoop repo, if it's accepted. The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
> >> >>> >
> >> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
> >> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
> >> >>> > [Databend]: https://github.com/datafuselabs/databend
> >> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
> >> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
> >> >>> >
> >> >>> > Xuanwo
> >> >>> >
> >> >>> > ---------------------------------------------------------------------
> >> >>> > To unsubscribe, e-mail: dev-unsubscribe@hadoop.apache.org
> >> >>> > For additional commands, e-mail: dev-help@hadoop.apache.org
> >> >>> >
> >> >>> >
> >> >>>
> >> >>> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> >> >>> For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >> >
> >> > Xuanwo
> >> >
> >>
> >> Xuanwo
>
> Xuanwo

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [DISCUSS][HDFS] Add rust binding for libhdfs

Posted by Ayush Saxena <ay...@gmail.com>.
It looks pretty challenging to me. Most of the committers aren't
technically equipped to review this code, so getting the initial code
reviewed & merged itself would be a challenge, as none of us can
actually review the code.

Looking at the repo, it has only 1 or 2 major contributors, which
itself is a red flag, the bus factor is pretty low, if we don't find
volunteers in future, we would be stuck with some dead code, which
most of us don't know how to fix or maintain. If there is any CVE
reported from this code post release, that would be a challenge for us
to fix

Quoting:
> the Rust
community has developed around 10 different HDFS client projects.
However, almost all of them
are no longer maintained.

If they couldn't do, how we will be able to do that? and this isn't a
very good statistic to quote :-)


Well, I don't have objections on having this as a separate repo in
Hadoop, if others are fine with it, I can try to help whatever is in
my capacity, but I still have doubts on how easy would it be to push
code or get votes on release of this project, which most of the people
doesn't have knowledge & developing a community and stuff seems like a
incubator thing to me.

-Ayush

On Thu, 21 Dec 2023 at 19:01, Xuanwo <xu...@apache.org> wrote:
>
> Thanks Xiaoqiao He!
>
> Let me provide more context about this project.
>
> libhdfs-rust aims to provide native HDFS client support for Rust, a rapidly growing systems
> programming language commonly used in modern infrastructure such as databases. With
> libhdfs-rust, Rust developers can more easily integrate with HDFS. libhdfs-rust is analogous
> to both libhdfs (C API) and libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> (C++ API). Its current codebase builds upon libhdfs, but
> there are plans to rewrite it entirely in pure rust. Consequently, libhdfs-rust will interface
> directly with the HDFS Java client via JNI, making it fully parallel to both libhdfs and libhdfs-cpp.
>
> There are three possible ways for us to take:
>
> We have three options to consider:
>
> A: Integrate libhdfs-rust into the Hadoop repository, placing it under
>     'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'.
> B: Accept libhdfs-rust as a subproject and establish a new repository
>     named 'hadoop-hdfs-rust-client' (or another suitable name).
> C: Maintain libhdfs-rust as an independent project outside of Hadoop.
>
> I personally prefer Option B since:
>
> For Option A
>
> The release process for Hadoop is already quite complex. We should avoid placing additional
> burdens on the Release Managers, especially when it involves integrating a new language.
>
> And it's impossible to wait for libhdfs-rust mature and stable enough to catch up the release train.
>
> For Option C
>
> libhdfs-rust is exactly the same with libhdfs & libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> but for rust. Building a community for
> libhdfs-rust outside of Hadoop is challenging. In fact, numerous attempts have been made: the Rust
> community has developed around 10 different HDFS client projects. However, almost all of them
> are no longer maintained.
>
> In conclusion, I believe that Option B is the best choice for us: we can develop a rust project in hadoop
> community, attract more rust users, and recruit additional committers from the rust community.
>
>
> On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote:
> > Thanks Xuanwo for your work. I believe it is valuable to enlarge hadoop ecosystem.
> >
> > I am also concerned that it will involve more hard work to release and version match,
> > especially for one who is not familiar with C or Rust.
> > Moreover, I am not aware the difference between `accept hdfs-sys as part of hadoop
> > project` and `one separate project`.
> >
> > I think one smooth solution is reference hadoop-thirdparty[1] which is one hadoop
> > sub-project but split to separate repo and release line etc, if it is accepted.
> >
> > cc @Ayush Saxena <ma...@gmail.com> @Wei-Chiu Chuang <ma...@apache.org> @Iñigo Goiri <ma...@gmail.com> @Shilun Fan <ma...@foxmail.com> and other folks, what
> > do you think? Thanks.
> >
> > Best Regards,
> > - He Xiaoqiao
> >
> > [1] https://github.com/apache/hadoop-thirdparty
> >
> > On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xu...@apache.org> wrote:
> >> I'm fine to start work under a new repo, and I'm willing to help maintain this repo. The repo could name after hadoop-libhdfs-rust or just libhdfs-rust.
> >>
> >> I'm PPMC member of other ASF projects so I know how to do release and how to make sure the license fit the requirements. I'm willing the become the RM until we find more committers for this sub-project.
> >>
> >> I'm currently looking for committers willing to help me review PRs and validate my releases. Is there anyone interested in sponsoring me?
> >>
> >> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
> >> > > What is libdirent? How is it relevant in this context?
> >> >
> >> > Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC does not provide this header which causes issues when building libhdfs on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC port of the dirent.h API for Windows.
> >> >
> >> > Fortunately, hdfs has already done similar work in [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can migrate to use hdfs's own implementation instead.
> >> >
> >> > > How tightly coupled is it to a specific Hadoop version?
> >> >
> >> > Thanks to hdfs's stable API, there is no breakage between different hadoop version (only addition). So the version matrix will be like:
> >> >
> >> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
> >> > ...
> >> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
> >> > ...
> >> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
> >> >
> >> > > The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
> >> >
> >> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool developed by the Rust Team to automatically generate Rust FFI bindings for C (and some C++) libraries. Other parts are related to building and linking, similar to Makefile, such as finding libjvm and libhdfs.
> >> >
> >> > In general, the task that libhdfs-rust performs is simple: it provides an API to Rust and links it with libhdfs.so, which I believe is easy to test.
> >> >
> >> > [libdirect]: https://github.com/tronkko/dirent
> >> > [native/libhdfspp/lib/x-platform]: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
> >> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
> >> >
> >> >
> >> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
> >> >> Inline
> >> >>
> >> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ay...@gmail.com> wrote:
> >> >>> Forwarding from dev@hadoop to relevant ML
> >> >>>
> >> >>> Original mail: https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
> >> >>>
> >> >>> -Ayush
> >> >>>
> >> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
> >> >>> > Hello, everyone.
> >> >>> >
> >> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for Rust. I want to know is it a good idea of accepting hdfs-sys as a part of hadoop project?
> >> >>> >
> >> >>> > Users of hdfs-sys for now:
> >> >>> >
> >> >>> > - [OpenDAL]: An Apache Incubator project that allows users to easily and efficiently retrieve data from various storage services in a unified way.
> >> >>> > - [Databend]: A modern cloud data warehouse focusing on reducing cost and complexity for your massive-scale analytics needs. (via OpenDAL)
> >> >>> > - [RisingWave]: The distributed streaming database: SQL stream processing with Postgres-like experience. (via OpenDAL)
> >> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse framework
> >> >>> >
> >> >>> > Licenses information of hdfs-sys:
> >> >>> >
> >> >>> > - hdfs-sys itself licensed under Apache-2.0
> >> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual licensed under Apache-2.0 and MIT.
> >> >>> >
> >> >>> > Works need to do if accept:
> >> >>> >
> >> >>> > - Replace libdirent with the same dirent API implemented in HDFS project.
> >> >>> > - Remove all bundled hdfs C code.
> >> >> What is libdirent? How is it relevant in this context?
> >> >>
> >> >> How tightly coupled is it to a specific Hadoop version? I am wondering if it's possible to host it in a separate Hadoop repo, if it's accepted. The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
> >> >>> >
> >> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
> >> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
> >> >>> > [Databend]: https://github.com/datafuselabs/databend
> >> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
> >> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
> >> >>> >
> >> >>> > Xuanwo
> >> >>> >
> >> >>> > ---------------------------------------------------------------------
> >> >>> > To unsubscribe, e-mail: dev-unsubscribe@hadoop.apache.org
> >> >>> > For additional commands, e-mail: dev-help@hadoop.apache.org
> >> >>> >
> >> >>> >
> >> >>>
> >> >>> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> >> >>> For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >> >
> >> > Xuanwo
> >> >
> >>
> >> Xuanwo
>
> Xuanwo

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [DISCUSS][HDFS] Add rust binding for libhdfs

Posted by Xuanwo <xu...@apache.org>.
Thanks Xiaoqiao He!

Let me provide more context about this project.

libhdfs-rust aims to provide native HDFS client support for Rust, a rapidly growing systems
programming language commonly used in modern infrastructure such as databases. With 
libhdfs-rust, Rust developers can more easily integrate with HDFS. libhdfs-rust is analogous
to both libhdfs (C API) and libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> (C++ API). Its current codebase builds upon libhdfs, but 
there are plans to rewrite it entirely in pure rust. Consequently, libhdfs-rust will interface 
directly with the HDFS Java client via JNI, making it fully parallel to both libhdfs and libhdfs-cpp.

There are three possible ways for us to take:

We have three options to consider:

A: Integrate libhdfs-rust into the Hadoop repository, placing it under 
    'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'.
B: Accept libhdfs-rust as a subproject and establish a new repository 
    named 'hadoop-hdfs-rust-client' (or another suitable name).
C: Maintain libhdfs-rust as an independent project outside of Hadoop.

I personally prefer Option B since:

For Option A

The release process for Hadoop is already quite complex. We should avoid placing additional 
burdens on the Release Managers, especially when it involves integrating a new language.

And it's impossible to wait for libhdfs-rust mature and stable enough to catch up the release train.

For Option C

libhdfs-rust is exactly the same with libhdfs & libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> but for rust. Building a community for 
libhdfs-rust outside of Hadoop is challenging. In fact, numerous attempts have been made: the Rust 
community has developed around 10 different HDFS client projects. However, almost all of them 
are no longer maintained.

In conclusion, I believe that Option B is the best choice for us: we can develop a rust project in hadoop 
community, attract more rust users, and recruit additional committers from the rust community.


On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote:
> Thanks Xuanwo for your work. I believe it is valuable to enlarge hadoop ecosystem.
> 
> I am also concerned that it will involve more hard work to release and version match,
> especially for one who is not familiar with C or Rust. 
> Moreover, I am not aware the difference between `accept hdfs-sys as part of hadoop
> project` and `one separate project`.
> 
> I think one smooth solution is reference hadoop-thirdparty[1] which is one hadoop
> sub-project but split to separate repo and release line etc, if it is accepted.
> 
> cc @Ayush Saxena <ma...@gmail.com> @Wei-Chiu Chuang <ma...@apache.org> @Iñigo Goiri <ma...@gmail.com> @Shilun Fan <ma...@foxmail.com> and other folks, what
> do you think? Thanks.
> 
> Best Regards,
> - He Xiaoqiao
> 
> [1] https://github.com/apache/hadoop-thirdparty
> 
> On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xu...@apache.org> wrote:
>> I'm fine to start work under a new repo, and I'm willing to help maintain this repo. The repo could name after hadoop-libhdfs-rust or just libhdfs-rust. 
>> 
>> I'm PPMC member of other ASF projects so I know how to do release and how to make sure the license fit the requirements. I'm willing the become the RM until we find more committers for this sub-project.
>> 
>> I'm currently looking for committers willing to help me review PRs and validate my releases. Is there anyone interested in sponsoring me?
>> 
>> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
>> > > What is libdirent? How is it relevant in this context? 
>> > 
>> > Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC does not provide this header which causes issues when building libhdfs on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC port of the dirent.h API for Windows.
>> > 
>> > Fortunately, hdfs has already done similar work in [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can migrate to use hdfs's own implementation instead.
>> > 
>> > > How tightly coupled is it to a specific Hadoop version?
>> > 
>> > Thanks to hdfs's stable API, there is no breakage between different hadoop version (only addition). So the version matrix will be like:
>> > 
>> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
>> > ...
>> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
>> > ...
>> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
>> > 
>> > > The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
>> > 
>> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool developed by the Rust Team to automatically generate Rust FFI bindings for C (and some C++) libraries. Other parts are related to building and linking, similar to Makefile, such as finding libjvm and libhdfs.
>> > 
>> > In general, the task that libhdfs-rust performs is simple: it provides an API to Rust and links it with libhdfs.so, which I believe is easy to test.
>> > 
>> > [libdirect]: https://github.com/tronkko/dirent
>> > [native/libhdfspp/lib/x-platform]: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
>> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
>> > 
>> > 
>> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
>> >> Inline
>> >> 
>> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ay...@gmail.com> wrote:
>> >>> Forwarding from dev@hadoop to relevant ML
>> >>> 
>> >>> Original mail: https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
>> >>> 
>> >>> -Ayush
>> >>> 
>> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
>> >>> > Hello, everyone.
>> >>> >
>> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for Rust. I want to know is it a good idea of accepting hdfs-sys as a part of hadoop project?
>> >>> >
>> >>> > Users of hdfs-sys for now:
>> >>> >
>> >>> > - [OpenDAL]: An Apache Incubator project that allows users to easily and efficiently retrieve data from various storage services in a unified way.
>> >>> > - [Databend]: A modern cloud data warehouse focusing on reducing cost and complexity for your massive-scale analytics needs. (via OpenDAL)
>> >>> > - [RisingWave]: The distributed streaming database: SQL stream processing with Postgres-like experience. (via OpenDAL)
>> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse framework
>> >>> >
>> >>> > Licenses information of hdfs-sys:
>> >>> >
>> >>> > - hdfs-sys itself licensed under Apache-2.0
>> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual licensed under Apache-2.0 and MIT. 
>> >>> >
>> >>> > Works need to do if accept:
>> >>> >
>> >>> > - Replace libdirent with the same dirent API implemented in HDFS project.
>> >>> > - Remove all bundled hdfs C code.
>> >> What is libdirent? How is it relevant in this context? 
>> >> 
>> >> How tightly coupled is it to a specific Hadoop version? I am wondering if it's possible to host it in a separate Hadoop repo, if it's accepted. The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
>> >>> >
>> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
>> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
>> >>> > [Databend]: https://github.com/datafuselabs/databend
>> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
>> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
>> >>> >
>> >>> > Xuanwo
>> >>> >
>> >>> > ---------------------------------------------------------------------
>> >>> > To unsubscribe, e-mail: dev-unsubscribe@hadoop.apache.org
>> >>> > For additional commands, e-mail: dev-help@hadoop.apache.org
>> >>> >
>> >>> >
>> >>> 
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> >>> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> > 
>> > Xuanwo
>> > 
>> 
>> Xuanwo

Xuanwo

Re: [DISCUSS][HDFS] Add rust binding for libhdfs

Posted by Xuanwo <xu...@apache.org>.
Thanks Xiaoqiao He!

Let me provide more context about this project.

libhdfs-rust aims to provide native HDFS client support for Rust, a rapidly growing systems
programming language commonly used in modern infrastructure such as databases. With 
libhdfs-rust, Rust developers can more easily integrate with HDFS. libhdfs-rust is analogous
to both libhdfs (C API) and libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> (C++ API). Its current codebase builds upon libhdfs, but 
there are plans to rewrite it entirely in pure rust. Consequently, libhdfs-rust will interface 
directly with the HDFS Java client via JNI, making it fully parallel to both libhdfs and libhdfs-cpp.

There are three possible ways for us to take:

We have three options to consider:

A: Integrate libhdfs-rust into the Hadoop repository, placing it under 
    'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'.
B: Accept libhdfs-rust as a subproject and establish a new repository 
    named 'hadoop-hdfs-rust-client' (or another suitable name).
C: Maintain libhdfs-rust as an independent project outside of Hadoop.

I personally prefer Option B since:

For Option A

The release process for Hadoop is already quite complex. We should avoid placing additional 
burdens on the Release Managers, especially when it involves integrating a new language.

And it's impossible to wait for libhdfs-rust mature and stable enough to catch up the release train.

For Option C

libhdfs-rust is exactly the same with libhdfs & libhdfspp <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp> but for rust. Building a community for 
libhdfs-rust outside of Hadoop is challenging. In fact, numerous attempts have been made: the Rust 
community has developed around 10 different HDFS client projects. However, almost all of them 
are no longer maintained.

In conclusion, I believe that Option B is the best choice for us: we can develop a rust project in hadoop 
community, attract more rust users, and recruit additional committers from the rust community.


On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote:
> Thanks Xuanwo for your work. I believe it is valuable to enlarge hadoop ecosystem.
> 
> I am also concerned that it will involve more hard work to release and version match,
> especially for one who is not familiar with C or Rust. 
> Moreover, I am not aware the difference between `accept hdfs-sys as part of hadoop
> project` and `one separate project`.
> 
> I think one smooth solution is reference hadoop-thirdparty[1] which is one hadoop
> sub-project but split to separate repo and release line etc, if it is accepted.
> 
> cc @Ayush Saxena <ma...@gmail.com> @Wei-Chiu Chuang <ma...@apache.org> @Iñigo Goiri <ma...@gmail.com> @Shilun Fan <ma...@foxmail.com> and other folks, what
> do you think? Thanks.
> 
> Best Regards,
> - He Xiaoqiao
> 
> [1] https://github.com/apache/hadoop-thirdparty
> 
> On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xu...@apache.org> wrote:
>> I'm fine to start work under a new repo, and I'm willing to help maintain this repo. The repo could name after hadoop-libhdfs-rust or just libhdfs-rust. 
>> 
>> I'm PPMC member of other ASF projects so I know how to do release and how to make sure the license fit the requirements. I'm willing the become the RM until we find more committers for this sub-project.
>> 
>> I'm currently looking for committers willing to help me review PRs and validate my releases. Is there anyone interested in sponsoring me?
>> 
>> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
>> > > What is libdirent? How is it relevant in this context? 
>> > 
>> > Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC does not provide this header which causes issues when building libhdfs on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC port of the dirent.h API for Windows.
>> > 
>> > Fortunately, hdfs has already done similar work in [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can migrate to use hdfs's own implementation instead.
>> > 
>> > > How tightly coupled is it to a specific Hadoop version?
>> > 
>> > Thanks to hdfs's stable API, there is no breakage between different hadoop version (only addition). So the version matrix will be like:
>> > 
>> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
>> > ...
>> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
>> > ...
>> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
>> > 
>> > > The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
>> > 
>> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool developed by the Rust Team to automatically generate Rust FFI bindings for C (and some C++) libraries. Other parts are related to building and linking, similar to Makefile, such as finding libjvm and libhdfs.
>> > 
>> > In general, the task that libhdfs-rust performs is simple: it provides an API to Rust and links it with libhdfs.so, which I believe is easy to test.
>> > 
>> > [libdirect]: https://github.com/tronkko/dirent
>> > [native/libhdfspp/lib/x-platform]: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
>> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
>> > 
>> > 
>> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
>> >> Inline
>> >> 
>> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ay...@gmail.com> wrote:
>> >>> Forwarding from dev@hadoop to relevant ML
>> >>> 
>> >>> Original mail: https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
>> >>> 
>> >>> -Ayush
>> >>> 
>> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
>> >>> > Hello, everyone.
>> >>> >
>> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for Rust. I want to know is it a good idea of accepting hdfs-sys as a part of hadoop project?
>> >>> >
>> >>> > Users of hdfs-sys for now:
>> >>> >
>> >>> > - [OpenDAL]: An Apache Incubator project that allows users to easily and efficiently retrieve data from various storage services in a unified way.
>> >>> > - [Databend]: A modern cloud data warehouse focusing on reducing cost and complexity for your massive-scale analytics needs. (via OpenDAL)
>> >>> > - [RisingWave]: The distributed streaming database: SQL stream processing with Postgres-like experience. (via OpenDAL)
>> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse framework
>> >>> >
>> >>> > Licenses information of hdfs-sys:
>> >>> >
>> >>> > - hdfs-sys itself licensed under Apache-2.0
>> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual licensed under Apache-2.0 and MIT. 
>> >>> >
>> >>> > Works need to do if accept:
>> >>> >
>> >>> > - Replace libdirent with the same dirent API implemented in HDFS project.
>> >>> > - Remove all bundled hdfs C code.
>> >> What is libdirent? How is it relevant in this context? 
>> >> 
>> >> How tightly coupled is it to a specific Hadoop version? I am wondering if it's possible to host it in a separate Hadoop repo, if it's accepted. The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with.
>> >>> >
>> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
>> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
>> >>> > [Databend]: https://github.com/datafuselabs/databend
>> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
>> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
>> >>> >
>> >>> > Xuanwo
>> >>> >
>> >>> > ---------------------------------------------------------------------
>> >>> > To unsubscribe, e-mail: dev-unsubscribe@hadoop.apache.org
>> >>> > For additional commands, e-mail: dev-help@hadoop.apache.org
>> >>> >
>> >>> >
>> >>> 
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> >>> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> > 
>> > Xuanwo
>> > 
>> 
>> Xuanwo

Xuanwo

Re: [DISCUSS][HDFS] Add rust binding for libhdfs

Posted by Xiaoqiao He <he...@apache.org>.
Thanks Xuanwo for your work. I believe it is valuable to enlarge hadoop
ecosystem.

I am also concerned that it will involve more hard work to release and
version match,
especially for one who is not familiar with C or Rust.
Moreover, I am not aware the difference between `accept hdfs-sys as part of
hadoop
project` and `one separate project`.

I think one smooth solution is reference hadoop-thirdparty[1] which is one
hadoop
sub-project but split to separate repo and release line etc, if it is
accepted.

cc @Ayush Saxena <ay...@gmail.com> @Wei-Chiu Chuang
<we...@apache.org> @Iñigo
Goiri <el...@gmail.com> @Shilun Fan <sl...@foxmail.com> and other
folks, what
do you think? Thanks.

Best Regards,
- He Xiaoqiao

[1] https://github.com/apache/hadoop-thirdparty

On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xu...@apache.org> wrote:

> I'm fine to start work under a new repo, and I'm willing to help maintain
> this repo. The repo could name after hadoop-libhdfs-rust or just
> libhdfs-rust.
>
> I'm PPMC member of other ASF projects so I know how to do release and how
> to make sure the license fit the requirements. I'm willing the become the
> RM until we find more committers for this sub-project.
>
> I'm currently looking for committers willing to help me review PRs and
> validate my releases. Is there anyone interested in sponsoring me?
>
> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
> > > What is libdirent? How is it relevant in this context?
> >
> > Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC
> does not provide this header which causes issues when building libhdfs on
> Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC
> port of the dirent.h API for Windows.
> >
> > Fortunately, hdfs has already done similar work in
> [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can
> migrate to use hdfs's own implementation instead.
> >
> > > How tightly coupled is it to a specific Hadoop version?
> >
> > Thanks to hdfs's stable API, there is no breakage between different
> hadoop version (only addition). So the version matrix will be like:
> >
> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
> > ...
> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
> > ...
> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
> >
> > > The concern I have as a release manager is that it makes my life
> harder to ensure the quality of a language binding that I am not familiar
> with.
> >
> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool
> developed by the Rust Team to automatically generate Rust FFI bindings for
> C (and some C++) libraries. Other parts are related to building and
> linking, similar to Makefile, such as finding libjvm and libhdfs.
> >
> > In general, the task that libhdfs-rust performs is simple: it provides
> an API to Rust and links it with libhdfs.so, which I believe is easy to
> test.
> >
> > [libdirect]: https://github.com/tronkko/dirent
> > [native/libhdfspp/lib/x-platform]:
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
> >
> >
> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
> >> Inline
> >>
> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ay...@gmail.com>
> wrote:
> >>> Forwarding from dev@hadoop to relevant ML
> >>>
> >>> Original mail:
> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
> >>>
> >>> -Ayush
> >>>
> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
> >>> > Hello, everyone.
> >>> >
> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for
> Rust. I want to know is it a good idea of accepting hdfs-sys as a part of
> hadoop project?
> >>> >
> >>> > Users of hdfs-sys for now:
> >>> >
> >>> > - [OpenDAL]: An Apache Incubator project that allows users to easily
> and efficiently retrieve data from various storage services in a unified
> way.
> >>> > - [Databend]: A modern cloud data warehouse focusing on reducing
> cost and complexity for your massive-scale analytics needs. (via OpenDAL)
> >>> > - [RisingWave]: The distributed streaming database: SQL stream
> processing with Postgres-like experience. (via OpenDAL)
> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse
> framework
> >>> >
> >>> > Licenses information of hdfs-sys:
> >>> >
> >>> > - hdfs-sys itself licensed under Apache-2.0
> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1,
> hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual
> licensed under Apache-2.0 and MIT.
> >>> >
> >>> > Works need to do if accept:
> >>> >
> >>> > - Replace libdirent with the same dirent API implemented in HDFS
> project.
> >>> > - Remove all bundled hdfs C code.
> >> What is libdirent? How is it relevant in this context?
> >>
> >> How tightly coupled is it to a specific Hadoop version? I am wondering
> if it's possible to host it in a separate Hadoop repo, if it's accepted.
> The concern I have as a release manager is that it makes my life harder to
> ensure the quality of a language binding that I am not familiar with.
> >>> >
> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
> >>> > [Databend]: https://github.com/datafuselabs/databend
> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
> >>> >
> >>> > Xuanwo
> >>> >
> >>> > ---------------------------------------------------------------------
> >>> > To unsubscribe, e-mail: dev-unsubscribe@hadoop.apache.org
> >>> > For additional commands, e-mail: dev-help@hadoop.apache.org
> >>> >
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> >>> For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >
> > Xuanwo
> >
>
> Xuanwo
>

Re: [DISCUSS][HDFS] Add rust binding for libhdfs

Posted by Xiaoqiao He <he...@apache.org>.
Thanks Xuanwo for your work. I believe it is valuable to enlarge hadoop
ecosystem.

I am also concerned that it will involve more hard work to release and
version match,
especially for one who is not familiar with C or Rust.
Moreover, I am not aware the difference between `accept hdfs-sys as part of
hadoop
project` and `one separate project`.

I think one smooth solution is reference hadoop-thirdparty[1] which is one
hadoop
sub-project but split to separate repo and release line etc, if it is
accepted.

cc @Ayush Saxena <ay...@gmail.com> @Wei-Chiu Chuang
<we...@apache.org> @Iñigo
Goiri <el...@gmail.com> @Shilun Fan <sl...@foxmail.com> and other
folks, what
do you think? Thanks.

Best Regards,
- He Xiaoqiao

[1] https://github.com/apache/hadoop-thirdparty

On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xu...@apache.org> wrote:

> I'm fine to start work under a new repo, and I'm willing to help maintain
> this repo. The repo could name after hadoop-libhdfs-rust or just
> libhdfs-rust.
>
> I'm PPMC member of other ASF projects so I know how to do release and how
> to make sure the license fit the requirements. I'm willing the become the
> RM until we find more committers for this sub-project.
>
> I'm currently looking for committers willing to help me review PRs and
> validate my releases. Is there anyone interested in sponsoring me?
>
> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
> > > What is libdirent? How is it relevant in this context?
> >
> > Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC
> does not provide this header which causes issues when building libhdfs on
> Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC
> port of the dirent.h API for Windows.
> >
> > Fortunately, hdfs has already done similar work in
> [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can
> migrate to use hdfs's own implementation instead.
> >
> > > How tightly coupled is it to a specific Hadoop version?
> >
> > Thanks to hdfs's stable API, there is no breakage between different
> hadoop version (only addition). So the version matrix will be like:
> >
> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
> > ...
> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
> > ...
> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
> >
> > > The concern I have as a release manager is that it makes my life
> harder to ensure the quality of a language binding that I am not familiar
> with.
> >
> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool
> developed by the Rust Team to automatically generate Rust FFI bindings for
> C (and some C++) libraries. Other parts are related to building and
> linking, similar to Makefile, such as finding libjvm and libhdfs.
> >
> > In general, the task that libhdfs-rust performs is simple: it provides
> an API to Rust and links it with libhdfs.so, which I believe is easy to
> test.
> >
> > [libdirect]: https://github.com/tronkko/dirent
> > [native/libhdfspp/lib/x-platform]:
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
> >
> >
> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
> >> Inline
> >>
> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ay...@gmail.com>
> wrote:
> >>> Forwarding from dev@hadoop to relevant ML
> >>>
> >>> Original mail:
> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
> >>>
> >>> -Ayush
> >>>
> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
> >>> > Hello, everyone.
> >>> >
> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for
> Rust. I want to know is it a good idea of accepting hdfs-sys as a part of
> hadoop project?
> >>> >
> >>> > Users of hdfs-sys for now:
> >>> >
> >>> > - [OpenDAL]: An Apache Incubator project that allows users to easily
> and efficiently retrieve data from various storage services in a unified
> way.
> >>> > - [Databend]: A modern cloud data warehouse focusing on reducing
> cost and complexity for your massive-scale analytics needs. (via OpenDAL)
> >>> > - [RisingWave]: The distributed streaming database: SQL stream
> processing with Postgres-like experience. (via OpenDAL)
> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse
> framework
> >>> >
> >>> > Licenses information of hdfs-sys:
> >>> >
> >>> > - hdfs-sys itself licensed under Apache-2.0
> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1,
> hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual
> licensed under Apache-2.0 and MIT.
> >>> >
> >>> > Works need to do if accept:
> >>> >
> >>> > - Replace libdirent with the same dirent API implemented in HDFS
> project.
> >>> > - Remove all bundled hdfs C code.
> >> What is libdirent? How is it relevant in this context?
> >>
> >> How tightly coupled is it to a specific Hadoop version? I am wondering
> if it's possible to host it in a separate Hadoop repo, if it's accepted.
> The concern I have as a release manager is that it makes my life harder to
> ensure the quality of a language binding that I am not familiar with.
> >>> >
> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
> >>> > [Databend]: https://github.com/datafuselabs/databend
> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
> >>> >
> >>> > Xuanwo
> >>> >
> >>> > ---------------------------------------------------------------------
> >>> > To unsubscribe, e-mail: dev-unsubscribe@hadoop.apache.org
> >>> > For additional commands, e-mail: dev-help@hadoop.apache.org
> >>> >
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> >>> For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >
> > Xuanwo
> >
>
> Xuanwo
>