You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by David Dali Susanibar Arce <da...@gmail.com> on 2022/05/25 12:02:39 UTC

[DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Hi All,

This email's purpose is a request for comments to migrate Arrow Java to JPMS
Java Platform Module System <https://openjdk.java.net/projects/jigsaw/spec/>
JSE 9+ (1).

Current status:

- Arrow Java use JSE1.8 specification

- Arrow Java works with JSE1.8/9/11/17

- This is possible because Java offers “legacy mode”

Proposal:

Migrate to JPMS Java Platform Module System. This Draft PR
<https://github.com/apache/arrow/pull/13072>(2) contains an initial port of
the modules: Format / Memory Core / Memory Netty / Memory Unsafe / Vector
for evaluation.

Main Reason to migrate:

- JPMS offer Strong encapsulation, Well-defined interfaces
<https://github.com/nipafx/demo-jigsaw-reflection>, Explicit dependencies.
<https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)

- JPMS offer reliable configuration and security to hide platform internals.

- JPMS offers a partial solution to solve problems about read (80%) /write
(20%) code.

- JPMS offer optimization for readability about read/write ratio (90/10)
thru module-info.java.

- Consistency logs, JPMS implement consistency logs to really use that to
solve the current problem.

- Be able to customize JRE needed with only modules needed (not
java.desktop for example and others) thru JLink.

- Modules have also been implemented by other languages such as Javascript
(ES2015), C++(C++20), Net (Nuget/NetCore)..

- Consider taking a look at this discussion about pros/cons
<https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/>
(5).

- Eventual migration to JPMS is a practical necessity as more projects
migrate.

Effort:

- First of all we need to decide to move from JSE1.8 to JSE9+ or be able to
offer support for both jar components JSE1.8 and JSE9+ included.

- Go bottom up for JPMS.

- Packages need to be unique (i.e. org.apache.arrow.memory /
io.netty.buffer). Review Draft PR with initial proposal.

- Dependencies also need to be modularized. If some of our current
dependencies are not able to be used as a module this will be a blocker for
our modules (we could patch that but this is an extra effort).

Killers:

- FIXME! I need your support to identify killer reasons to be able to push
this implementation.

Please let us know if Arrow Java to JPMS Java Platform Module System is
needed and should be implemented.

Please use this file for any comments
https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing

Resources used:

(1): https://openjdk.java.net/projects/jigsaw/spec/

(2): https://github.com/apache/arrow/pull/13072

(3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/

(4): https://github.com/nipafx/demo-jigsaw-reflection

(5):
https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/

Best regards,

-- 
David

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by David Li <li...@apache.org>.
That seems fine. Is there somewhere more canonical we should be putting the libraries? For instance I poked netty-tcnative and it puts the libraries under META-INF/native.

On Wed, Dec 13, 2023, at 17:52, James Duong wrote:
> Another issue popped up while doing modules with native libraries.
>
> We currently put native libraries in the arch folder in the root of 
> each built JAR. JPMS is interpreting this as a package name (eg x86_64).
>
> This is breaking when a user tries to use multiple modules with native 
> libs (or in the case of dataset, have a module with a native lib depend 
> on another module with a native lib). JPMS sees the arch package in two 
> modules and treats this as a package collision.
>
> I’m thinking we prefix the arch directory with another directory 
> indicating which module the binary came from (eg arrow-c-data/x86_64). 
> This will probably require code changes, CMake changes, and test 
> changes.
>
>
> From: Jean-Baptiste Onofré <jb...@nanthrax.net>
> Date: Wednesday, December 13, 2023 at 11:45 AM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> Agree with David on that.
>
> Regards
> JB
>
> On Wed, Dec 13, 2023 at 8:17 PM David Li <li...@apache.org> wrote:
>>
>> I think we will just have to live with requiring different flags for different modules. If we can detect the lack of the flag at runtime somehow and present an appropriate error message (the way we do for the memory module) that would be best.
>>
>> On Wed, Dec 13, 2023, at 12:33, James Duong wrote:
>> > For the work on modularization, I’ve merged the flight-grpc and
>> > flight-core modules. I haven’t moved any other packages.
>> >
>> > I also encountered this problem with flight-core. Flight-core derives
>> > from io.netty.buffer (specifically CompositeByteBuf), but we cannot use
>> > Netty buffer with JPMS without requiring users to monkey-patch it with
>> > patch-module. I have flight-core using Netty buffer on the classpath,
>> > but deriving from CompositeByteBuf requires adding
>> > --add-reads=org.apache.arrow.flight.core=ALL-UNNAMED to the command
>> > line since Netty’s being treated as part of the unnamed module in this
>> > context.
>> >
>> > This isn’t terribly onerous from the user’s perspective (especially
>> > compared to monkey-patching), but we start getting into the business of
>> > having different command-line arguments for different modules the user
>> > might be using.
>> >
>> > We could tell users to always add
>> > –add-reads=org.apache.arrow.flight.core=ALL-UNNAMED, but if they aren’t
>> > using Flight, they’ll get a warning about an unrecognized module. Any
>> > thoughts about the best way to present these command-line changes?
>> >
>> >
>> > From: Jean-Baptiste Onofré <jb...@nanthrax.net>
>> > Date: Wednesday, December 13, 2023 at 6:58 AM
>> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> > Module System
>> > Hi,
>> >
>> > We have a "split packages" between flight-core and flight-grpc. I
>> > don't think anyone is using only flight-core to create a new
>> > "transport".
>> > I think it's acceptable to combine flight-core and flight-grpc, and
>> > maybe reshape a bit (imho some classes from flight-grpc should be
>> > actually in flight-core).
>> > Users can always reshade (even if it's painful).
>> >
>> > Regards
>> > JB
>> >
>> > On Tue, Dec 12, 2023 at 8:52 PM David Li <li...@apache.org> wrote:
>> >>
>> >> I would be OK with combining flight-core and flight-grpc. It's not clear to me anyways that the current split would actually be helpful if we do ever want to support Flight without gRPC.
>> >>
>> >> On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
>> >> > An update on this work.
>> >> >
>> >> > I’ve been able to update the following to JPMS modules now:
>> >> >
>> >> >   *   arrow-memory-core
>> >> >   *   arrow-memory-unsafe
>> >> >   *   arrow-format
>> >> >   *   arrow-vector
>> >> >   *   arrow-compression
>> >> >   *   arrow-tools
>> >> >   *   arrow-avro
>> >> >   *   arrow-jdbc
>> >> >   *   arrow-algorithm
>> >> >
>> >> > I don’t think the following should be modularized (but would appreciate
>> >> > feedback as well):
>> >> >
>> >> >   *   arrow-performance (test-only module)
>> >> >   *   flight-integration-tests (test-only module)
>> >> >   *   flight-jdbc-core
>> >> >   *   flight-jdbc-driver (Need to think about this. MySQL suggests they
>> >> > want to modularize their JDBC driver:
>> >> > https://bugs.mysql.com/bug.php?id=97683 . How does this interact with
>> >> > the ServiceLoader changes in JPMS as well).
>> >> >
>> >> > I’m starting the flight-related modules now. The first issue I’ve
>> >> > noticed is that flight-core and flight-grpc reuse packages. I’d like to
>> >> > combine flight-core and flight-grpc because of this (currently we only
>> >> > have Flight using grpc in Java).
>> >> >
>> >> > After Flight I’d take a look at the modules that have native code:
>> >> >
>> >> >   *   arrow-c-data
>> >> >   *   arrow-orc
>> >> >   *   arrow-dataset
>> >> >   *   arrow-gandiva
>> >> >
>> >> >
>> >> > From: James Duong <Ja...@improving.com.INVALID>
>> >> > Date: Tuesday, December 5, 2023 at 1:39 PM
>> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> >> > Module System
>> >> > I expect that we’d still have to change CLI arguments in JDK17.
>> >> >
>> >> > The need for changing the CLI arguments is due to memory-core now being
>> >> > a named module while requiring access to Unsafe and using reflection
>> >> > for accessing internals of java.nio.Buffer.
>> >> >
>> >> > We’re using JDK8 code for doing both currently. It might be worth
>> >> > looking into if there’s a JDK9+ way of doing this without needing
>> >> > reflection and compiling memory-core as a multi-release JAR.
>> >> >
>> >> > I don’t expect more CLI argument changes unless we use reflection on
>> >> > NIO classes in other modules.
>> >> >
>> >> >
>> >> > From: Adam Lippai <ad...@rigo.sk>
>> >> > Date: Tuesday, December 5, 2023 at 9:28 AM
>> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> >> > Module System
>> >> > I believe Spark 4.0 was mentioned before. It’ll require Java 17 and
>> >> > will be
>> >> > released in a few months (June?).
>> >> >
>> >> > Best regards,
>> >> > Adam Lippai
>> >> >
>> >> > On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:
>> >> >
>> >> >> Thanks James for delving into this mess.
>> >> >>
>> >> >> It looks like this change is unavoidable if we want to modularize? I think
>> >> >> this is OK. Will the CLI argument change as we continue modularizing, or is
>> >> >> this the only change that will be needed?
>> >> >>
>> >> >> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
>> >> >> > Hello,
>> >> >> >
>> >> >> > I did some work to separate the below PR into smaller PRs.
>> >> >> >
>> >> >> >
>> >> >> >   *   Updating the versions of dependencies and maven plugins is done
>> >> >> > and merged into master.
>> >> >> >   *   I separated out the work modularizing arrow-vector,
>> >> >> > arrow-memory-core/unsafe, and arrow-memory-netty.
>> >> >> >
>> >> >> > Modularizing arrow-memory-core requires a smaller change to user
>> >> >> > command-line arguments. Instead of:
>> >> >> > --add-opens=java.base/java.nio=ALL-UNNAMED
>> >> >> >
>> >> >> > The user needs to add:
>> >> >> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
>> >> >> >
>> >> >> > I initially tried to modularize arrow-vector separately from
>> >> >> > arrow-memory-core but found that any meaningful operation in
>> >> >> > arrow-vector would trigger an illegal access in memory-core if it
>> >> >> > wasn’t modularized.
>> >> >> >
>> >> >> > I was able to run the tests for arrow-compression and arrow-tools
>> >> >> > successfully after modularizing memory-core, memory-unsafe-, and
>> >> >> > arrow-vector. Note that I had more success by making memory-core and
>> >> >> > memory-unsafe automatic modules.
>> >> >> >
>> >> >> > I think we should make a decision here on if we want to bite the bullet
>> >> >> > and introduce a breaking user-facing change around command-line
>> >> >> > options. The other option is to wait for JDK 21 to modularize. That’s
>> >> >> > farther down the line and requires refactoring much of the memory
>> >> >> > module code and implementing a module using the foreign memory
>> >> >> > interface.
>> >> >> >
>> >> >> > From: James Duong <Ja...@improving.com.INVALID>
>> >> >> > Date: Tuesday, November 28, 2023 at 6:48 PM
>> >> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> >> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> >> >> > Module System
>> >> >> > Hi,
>> >> >> >
>> >> >> > I’ve made some major progress on this work in this PR:
>> >> >> > https://github.com/apache/arrow/pull/38876
>> >> >> >
>> >> >> >
>> >> >> >   *   The maven plugin for compiling module-info.java files using JDK 8
>> >> >> > is working correctly.
>> >> >> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
>> >> >> > arrow-memory-unsafe, and arrow-vector have been modularized
>> >> >> > successfully.
>> >> >> >      *   Tests pass locally for all of these modules.
>> >> >> >      *   They fail in CI. This is likely from me not updating a profile
>> >> >> > somewhere.
>> >> >> >
>> >> >> > Similar to David’s PR from below, arrow-memory and modules needed to be
>> >> >> > refactored fairly significantly and split into two modules: a
>> >> >> > public-facing JPMS module and a separate module which adds to Netty’s
>> >> >> > packages (memory-netty-buffer-patch). What’s more problematic is that
>> >> >> > because we are using named modules now, users need to add more
>> >> >> > arguments to their Java command line to use arrow. If one were to use
>> >> >> > arrow-memory-netty they would need to add the following:
>> >> >> >
>> >> >> > --add-opens java.base/jdk.internal.misc=io.netty.common
>> >> >> >
>> >> >> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
>> >> >>
>> >> >> >
>> >> >> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
>> >> >> >
>> >> >> > Depending on where the memory-netty-buffer-patch JAR is located, and
>> >> >> > what version, the command the user needs to supply changes, so this
>> >> >> > seems like it’d be really inconvenient.
>> >> >> >
>> >> >> > Do we want to proceed with modularizing existing memory modules? Both
>> >> >> > netty and unsafe? Or wait until the new memory module from Java 21 is
>> >> >> > available?
>> >> >> >
>> >> >> > The module-info.java files are written fairly naively. I haven’t
>> >> >> > inspected thoroughly to determine what packages users will need.
>> >> >> >
>> >> >> > We can continue modularizing more components in a separate PR. Ideally
>> >> >> > all the user breakage (class movement, new command-line argument
>> >> >> > requirements) happens within one major Arrow version.
>> >> >> >
>> >> >> > From: James Duong <Ja...@improving.com.INVALID>
>> >> >> > Date: Tuesday, November 21, 2023 at 1:16 PM
>> >> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> >> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> >> >> > Module System
>> >> >> > I’m following up on this topic.
>> >> >> >
>> >> >> > David has a PR from last year that’s done much of the heavy lifting for
>> >> >> > refactoring the codebase to be package-friendly.
>> >> >> > https://github.com/apache/arrow/pull/13072
>> >> >> >
>> >> >> > What’s changed since and what’s left:
>> >> >> >
>> >> >> >   *   New components have been added (Flight SQL for example) that will
>> >> >> > need to be updated for modules.
>> >> >> >   *   There wasn’t a clear solution on how to do this without breaking
>> >> >> > JDK 8 support. Compiling module-info.java files require using JDK9, but
>> >> >> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
>> >> >> >      *   There is a Gradle plugin that can compile module-info.java
>> >> >> > files purely syntactically that we can adapt to maven. It has
>> >> >> > limitations (the one I see is that it can’t iterate through
>> >> >> > classloaders to handle annotations), but using this might be a good
>> >> >> > stopgap until we JDK 8 support is deprecated.
>> >> >> >   *   Some plugins need to be updated:
>> >> >> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
>> >> >> > files.
>> >> >> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
>> >> >> > existing checkstyle rules file can’t be loaded with newer versions. We
>> >> >> > can exclude module-info.java for now and have a separate Issue for
>> >> >> > updating checkstyle itself and the rules file.
>> >> >> >   *   grpc-java could not be modularized when the PR above was written.
>> >> >> >      *   Grpc 1.57 now can be modularized
>> >> >> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
>> >> >> >
>> >> >> > From: David Dali Susanibar Arce <da...@gmail.com>
>> >> >> > Date: Wednesday, May 25, 2022 at 5:02 AM
>> >> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> >> >> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
>> >> >> System
>> >> >> > Hi All,
>> >> >> >
>> >> >> > This email's purpose is a request for comments to migrate Arrow Java to
>> >> >> JPMS
>> >> >> > Java Platform Module System <
>> >> >> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3e%3e>
>> >> >> > JSE 9+ (1).
>> >> >> >
>> >> >> > Current status:
>> >> >> >
>> >> >> > - Arrow Java use JSE1.8 specification
>> >> >> >
>> >> >> > - Arrow Java works with JSE1.8/9/11/17
>> >> >> >
>> >> >> > - This is possible because Java offers “legacy mode”
>> >> >> >
>> >> >> > Proposal:
>> >> >> >
>> >> >> > Migrate to JPMS Java Platform Module System. This Draft PR
>> >> >> > <https://github.com/apache/arrow/pull/13072>(2<
>> >> >> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3e><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3e%3e>
>> >> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
>> >> >> <
>> >> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3e><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3e%3e>
>> >> >>
>> >> >> > contains an initial port of
>> >> >> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
>> >> >> > Vector
>> >> >> > for evaluation.
>> >> >> >
>> >> >> > Main Reason to migrate:
>> >> >> >
>> >> >> > - JPMS offer Strong encapsulation, Well-defined interfaces
>> >> >> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
>> >> >> dependencies.
>> >> >> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
>> >> >> >
>> >> >> > - JPMS offer reliable configuration and security to hide platform
>> >> >> internals.
>> >> >> >
>> >> >> > - JPMS offers a partial solution to solve problems about read (80%)
>> >> >> /write
>> >> >> > (20%) code.
>> >> >> >
>> >> >> > - JPMS offer optimization for readability about read/write ratio (90/10)
>> >> >> > thru module-info.java.
>> >> >> >
>> >> >> > - Consistency logs, JPMS implement consistency logs to really use that to
>> >> >> > solve the current problem.
>> >> >> >
>> >> >> > - Be able to customize JRE needed with only modules needed (not
>> >> >> > java.desktop for example and others) thru JLink.
>> >> >> >
>> >> >> > - Modules have also been implemented by other languages such as
>> >> >> Javascript
>> >> >> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
>> >> >> >
>> >> >> > - Consider taking a look at this discussion about pros/cons
>> >> >> > <
>> >> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
>> >> >> >
>> >> >> > (5).
>> >> >> >
>> >> >> > - Eventual migration to JPMS is a practical necessity as more projects
>> >> >> > migrate.
>> >> >> >
>> >> >> > Effort:
>> >> >> >
>> >> >> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
>> >> >> to
>> >> >> > offer support for both jar components JSE1.8 and JSE9+ included.
>> >> >> >
>> >> >> > - Go bottom up for JPMS.
>> >> >> >
>> >> >> > - Packages need to be unique (i.e. org.apache.arrow.memory /
>> >> >> > io.netty.buffer). Review Draft PR with initial proposal.
>> >> >> >
>> >> >> > - Dependencies also need to be modularized. If some of our current
>> >> >> > dependencies are not able to be used as a module this will be a blocker
>> >> >> for
>> >> >> > our modules (we could patch that but this is an extra effort).
>> >> >> >
>> >> >> > Killers:
>> >> >> >
>> >> >> > - FIXME! I need your support to identify killer reasons to be able to
>> >> >> push
>> >> >> > this implementation.
>> >> >> >
>> >> >> > Please let us know if Arrow Java to JPMS Java Platform Module System is
>> >> >> > needed and should be implemented.
>> >> >> >
>> >> >> > Please use this file for any comments
>> >> >> >
>> >> >> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
>> >> >> >
>> >> >> > Resources used:
>> >> >> >
>> >> >> > (1): https://openjdk.java.net/projects/jigsaw/spec/
>> >> >> >
>> >> >> > (2): https://github.com/apache/arrow/pull/13072
>> >> >> >
>> >> >> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
>> >> >> >
>> >> >> > (4): https://github.com/nipafx/demo-jigsaw-reflection
>> >> >> >
>> >> >> > (5):
>> >> >> >
>> >> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
>> >> >> >
>> >> >> > Best regards,
>> >> >> >
>> >> >> > --
>> >> >> > David
>> >> >>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by James Duong <Ja...@improving.com.INVALID>.
Another issue popped up while doing modules with native libraries.

We currently put native libraries in the arch folder in the root of each built JAR. JPMS is interpreting this as a package name (eg x86_64).

This is breaking when a user tries to use multiple modules with native libs (or in the case of dataset, have a module with a native lib depend on another module with a native lib). JPMS sees the arch package in two modules and treats this as a package collision.

I’m thinking we prefix the arch directory with another directory indicating which module the binary came from (eg arrow-c-data/x86_64). This will probably require code changes, CMake changes, and test changes.


From: Jean-Baptiste Onofré <jb...@nanthrax.net>
Date: Wednesday, December 13, 2023 at 11:45 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
Agree with David on that.

Regards
JB

On Wed, Dec 13, 2023 at 8:17 PM David Li <li...@apache.org> wrote:
>
> I think we will just have to live with requiring different flags for different modules. If we can detect the lack of the flag at runtime somehow and present an appropriate error message (the way we do for the memory module) that would be best.
>
> On Wed, Dec 13, 2023, at 12:33, James Duong wrote:
> > For the work on modularization, I’ve merged the flight-grpc and
> > flight-core modules. I haven’t moved any other packages.
> >
> > I also encountered this problem with flight-core. Flight-core derives
> > from io.netty.buffer (specifically CompositeByteBuf), but we cannot use
> > Netty buffer with JPMS without requiring users to monkey-patch it with
> > patch-module. I have flight-core using Netty buffer on the classpath,
> > but deriving from CompositeByteBuf requires adding
> > --add-reads=org.apache.arrow.flight.core=ALL-UNNAMED to the command
> > line since Netty’s being treated as part of the unnamed module in this
> > context.
> >
> > This isn’t terribly onerous from the user’s perspective (especially
> > compared to monkey-patching), but we start getting into the business of
> > having different command-line arguments for different modules the user
> > might be using.
> >
> > We could tell users to always add
> > –add-reads=org.apache.arrow.flight.core=ALL-UNNAMED, but if they aren’t
> > using Flight, they’ll get a warning about an unrecognized module. Any
> > thoughts about the best way to present these command-line changes?
> >
> >
> > From: Jean-Baptiste Onofré <jb...@nanthrax.net>
> > Date: Wednesday, December 13, 2023 at 6:58 AM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > Hi,
> >
> > We have a "split packages" between flight-core and flight-grpc. I
> > don't think anyone is using only flight-core to create a new
> > "transport".
> > I think it's acceptable to combine flight-core and flight-grpc, and
> > maybe reshape a bit (imho some classes from flight-grpc should be
> > actually in flight-core).
> > Users can always reshade (even if it's painful).
> >
> > Regards
> > JB
> >
> > On Tue, Dec 12, 2023 at 8:52 PM David Li <li...@apache.org> wrote:
> >>
> >> I would be OK with combining flight-core and flight-grpc. It's not clear to me anyways that the current split would actually be helpful if we do ever want to support Flight without gRPC.
> >>
> >> On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
> >> > An update on this work.
> >> >
> >> > I’ve been able to update the following to JPMS modules now:
> >> >
> >> >   *   arrow-memory-core
> >> >   *   arrow-memory-unsafe
> >> >   *   arrow-format
> >> >   *   arrow-vector
> >> >   *   arrow-compression
> >> >   *   arrow-tools
> >> >   *   arrow-avro
> >> >   *   arrow-jdbc
> >> >   *   arrow-algorithm
> >> >
> >> > I don’t think the following should be modularized (but would appreciate
> >> > feedback as well):
> >> >
> >> >   *   arrow-performance (test-only module)
> >> >   *   flight-integration-tests (test-only module)
> >> >   *   flight-jdbc-core
> >> >   *   flight-jdbc-driver (Need to think about this. MySQL suggests they
> >> > want to modularize their JDBC driver:
> >> > https://bugs.mysql.com/bug.php?id=97683 . How does this interact with
> >> > the ServiceLoader changes in JPMS as well).
> >> >
> >> > I’m starting the flight-related modules now. The first issue I’ve
> >> > noticed is that flight-core and flight-grpc reuse packages. I’d like to
> >> > combine flight-core and flight-grpc because of this (currently we only
> >> > have Flight using grpc in Java).
> >> >
> >> > After Flight I’d take a look at the modules that have native code:
> >> >
> >> >   *   arrow-c-data
> >> >   *   arrow-orc
> >> >   *   arrow-dataset
> >> >   *   arrow-gandiva
> >> >
> >> >
> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> > Date: Tuesday, December 5, 2023 at 1:39 PM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> > Module System
> >> > I expect that we’d still have to change CLI arguments in JDK17.
> >> >
> >> > The need for changing the CLI arguments is due to memory-core now being
> >> > a named module while requiring access to Unsafe and using reflection
> >> > for accessing internals of java.nio.Buffer.
> >> >
> >> > We’re using JDK8 code for doing both currently. It might be worth
> >> > looking into if there’s a JDK9+ way of doing this without needing
> >> > reflection and compiling memory-core as a multi-release JAR.
> >> >
> >> > I don’t expect more CLI argument changes unless we use reflection on
> >> > NIO classes in other modules.
> >> >
> >> >
> >> > From: Adam Lippai <ad...@rigo.sk>
> >> > Date: Tuesday, December 5, 2023 at 9:28 AM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> > Module System
> >> > I believe Spark 4.0 was mentioned before. It’ll require Java 17 and
> >> > will be
> >> > released in a few months (June?).
> >> >
> >> > Best regards,
> >> > Adam Lippai
> >> >
> >> > On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:
> >> >
> >> >> Thanks James for delving into this mess.
> >> >>
> >> >> It looks like this change is unavoidable if we want to modularize? I think
> >> >> this is OK. Will the CLI argument change as we continue modularizing, or is
> >> >> this the only change that will be needed?
> >> >>
> >> >> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> >> >> > Hello,
> >> >> >
> >> >> > I did some work to separate the below PR into smaller PRs.
> >> >> >
> >> >> >
> >> >> >   *   Updating the versions of dependencies and maven plugins is done
> >> >> > and merged into master.
> >> >> >   *   I separated out the work modularizing arrow-vector,
> >> >> > arrow-memory-core/unsafe, and arrow-memory-netty.
> >> >> >
> >> >> > Modularizing arrow-memory-core requires a smaller change to user
> >> >> > command-line arguments. Instead of:
> >> >> > --add-opens=java.base/java.nio=ALL-UNNAMED
> >> >> >
> >> >> > The user needs to add:
> >> >> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
> >> >> >
> >> >> > I initially tried to modularize arrow-vector separately from
> >> >> > arrow-memory-core but found that any meaningful operation in
> >> >> > arrow-vector would trigger an illegal access in memory-core if it
> >> >> > wasn’t modularized.
> >> >> >
> >> >> > I was able to run the tests for arrow-compression and arrow-tools
> >> >> > successfully after modularizing memory-core, memory-unsafe-, and
> >> >> > arrow-vector. Note that I had more success by making memory-core and
> >> >> > memory-unsafe automatic modules.
> >> >> >
> >> >> > I think we should make a decision here on if we want to bite the bullet
> >> >> > and introduce a breaking user-facing change around command-line
> >> >> > options. The other option is to wait for JDK 21 to modularize. That’s
> >> >> > farther down the line and requires refactoring much of the memory
> >> >> > module code and implementing a module using the foreign memory
> >> >> > interface.
> >> >> >
> >> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> >> > Date: Tuesday, November 28, 2023 at 6:48 PM
> >> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> >> > Module System
> >> >> > Hi,
> >> >> >
> >> >> > I’ve made some major progress on this work in this PR:
> >> >> > https://github.com/apache/arrow/pull/38876
> >> >> >
> >> >> >
> >> >> >   *   The maven plugin for compiling module-info.java files using JDK 8
> >> >> > is working correctly.
> >> >> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
> >> >> > arrow-memory-unsafe, and arrow-vector have been modularized
> >> >> > successfully.
> >> >> >      *   Tests pass locally for all of these modules.
> >> >> >      *   They fail in CI. This is likely from me not updating a profile
> >> >> > somewhere.
> >> >> >
> >> >> > Similar to David’s PR from below, arrow-memory and modules needed to be
> >> >> > refactored fairly significantly and split into two modules: a
> >> >> > public-facing JPMS module and a separate module which adds to Netty’s
> >> >> > packages (memory-netty-buffer-patch). What’s more problematic is that
> >> >> > because we are using named modules now, users need to add more
> >> >> > arguments to their Java command line to use arrow. If one were to use
> >> >> > arrow-memory-netty they would need to add the following:
> >> >> >
> >> >> > --add-opens java.base/jdk.internal.misc=io.netty.common
> >> >> >
> >> >> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
> >> >>
> >> >> >
> >> >> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
> >> >> >
> >> >> > Depending on where the memory-netty-buffer-patch JAR is located, and
> >> >> > what version, the command the user needs to supply changes, so this
> >> >> > seems like it’d be really inconvenient.
> >> >> >
> >> >> > Do we want to proceed with modularizing existing memory modules? Both
> >> >> > netty and unsafe? Or wait until the new memory module from Java 21 is
> >> >> > available?
> >> >> >
> >> >> > The module-info.java files are written fairly naively. I haven’t
> >> >> > inspected thoroughly to determine what packages users will need.
> >> >> >
> >> >> > We can continue modularizing more components in a separate PR. Ideally
> >> >> > all the user breakage (class movement, new command-line argument
> >> >> > requirements) happens within one major Arrow version.
> >> >> >
> >> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> >> > Date: Tuesday, November 21, 2023 at 1:16 PM
> >> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> >> > Module System
> >> >> > I’m following up on this topic.
> >> >> >
> >> >> > David has a PR from last year that’s done much of the heavy lifting for
> >> >> > refactoring the codebase to be package-friendly.
> >> >> > https://github.com/apache/arrow/pull/13072
> >> >> >
> >> >> > What’s changed since and what’s left:
> >> >> >
> >> >> >   *   New components have been added (Flight SQL for example) that will
> >> >> > need to be updated for modules.
> >> >> >   *   There wasn’t a clear solution on how to do this without breaking
> >> >> > JDK 8 support. Compiling module-info.java files require using JDK9, but
> >> >> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
> >> >> >      *   There is a Gradle plugin that can compile module-info.java
> >> >> > files purely syntactically that we can adapt to maven. It has
> >> >> > limitations (the one I see is that it can’t iterate through
> >> >> > classloaders to handle annotations), but using this might be a good
> >> >> > stopgap until we JDK 8 support is deprecated.
> >> >> >   *   Some plugins need to be updated:
> >> >> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
> >> >> > files.
> >> >> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
> >> >> > existing checkstyle rules file can’t be loaded with newer versions. We
> >> >> > can exclude module-info.java for now and have a separate Issue for
> >> >> > updating checkstyle itself and the rules file.
> >> >> >   *   grpc-java could not be modularized when the PR above was written.
> >> >> >      *   Grpc 1.57 now can be modularized
> >> >> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
> >> >> >
> >> >> > From: David Dali Susanibar Arce <da...@gmail.com>
> >> >> > Date: Wednesday, May 25, 2022 at 5:02 AM
> >> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> >> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
> >> >> System
> >> >> > Hi All,
> >> >> >
> >> >> > This email's purpose is a request for comments to migrate Arrow Java to
> >> >> JPMS
> >> >> > Java Platform Module System <
> >> >> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3e%3e>
> >> >> > JSE 9+ (1).
> >> >> >
> >> >> > Current status:
> >> >> >
> >> >> > - Arrow Java use JSE1.8 specification
> >> >> >
> >> >> > - Arrow Java works with JSE1.8/9/11/17
> >> >> >
> >> >> > - This is possible because Java offers “legacy mode”
> >> >> >
> >> >> > Proposal:
> >> >> >
> >> >> > Migrate to JPMS Java Platform Module System. This Draft PR
> >> >> > <https://github.com/apache/arrow/pull/13072>(2<
> >> >> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3e><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3e%3e>
> >> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
> >> >> <
> >> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3e><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3e%3e>
> >> >>
> >> >> > contains an initial port of
> >> >> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
> >> >> > Vector
> >> >> > for evaluation.
> >> >> >
> >> >> > Main Reason to migrate:
> >> >> >
> >> >> > - JPMS offer Strong encapsulation, Well-defined interfaces
> >> >> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
> >> >> dependencies.
> >> >> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
> >> >> >
> >> >> > - JPMS offer reliable configuration and security to hide platform
> >> >> internals.
> >> >> >
> >> >> > - JPMS offers a partial solution to solve problems about read (80%)
> >> >> /write
> >> >> > (20%) code.
> >> >> >
> >> >> > - JPMS offer optimization for readability about read/write ratio (90/10)
> >> >> > thru module-info.java.
> >> >> >
> >> >> > - Consistency logs, JPMS implement consistency logs to really use that to
> >> >> > solve the current problem.
> >> >> >
> >> >> > - Be able to customize JRE needed with only modules needed (not
> >> >> > java.desktop for example and others) thru JLink.
> >> >> >
> >> >> > - Modules have also been implemented by other languages such as
> >> >> Javascript
> >> >> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
> >> >> >
> >> >> > - Consider taking a look at this discussion about pros/cons
> >> >> > <
> >> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >> >> >
> >> >> > (5).
> >> >> >
> >> >> > - Eventual migration to JPMS is a practical necessity as more projects
> >> >> > migrate.
> >> >> >
> >> >> > Effort:
> >> >> >
> >> >> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
> >> >> to
> >> >> > offer support for both jar components JSE1.8 and JSE9+ included.
> >> >> >
> >> >> > - Go bottom up for JPMS.
> >> >> >
> >> >> > - Packages need to be unique (i.e. org.apache.arrow.memory /
> >> >> > io.netty.buffer). Review Draft PR with initial proposal.
> >> >> >
> >> >> > - Dependencies also need to be modularized. If some of our current
> >> >> > dependencies are not able to be used as a module this will be a blocker
> >> >> for
> >> >> > our modules (we could patch that but this is an extra effort).
> >> >> >
> >> >> > Killers:
> >> >> >
> >> >> > - FIXME! I need your support to identify killer reasons to be able to
> >> >> push
> >> >> > this implementation.
> >> >> >
> >> >> > Please let us know if Arrow Java to JPMS Java Platform Module System is
> >> >> > needed and should be implemented.
> >> >> >
> >> >> > Please use this file for any comments
> >> >> >
> >> >> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
> >> >> >
> >> >> > Resources used:
> >> >> >
> >> >> > (1): https://openjdk.java.net/projects/jigsaw/spec/
> >> >> >
> >> >> > (2): https://github.com/apache/arrow/pull/13072
> >> >> >
> >> >> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
> >> >> >
> >> >> > (4): https://github.com/nipafx/demo-jigsaw-reflection
> >> >> >
> >> >> > (5):
> >> >> >
> >> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >> >> >
> >> >> > Best regards,
> >> >> >
> >> >> > --
> >> >> > David
> >> >>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Agree with David on that.

Regards
JB

On Wed, Dec 13, 2023 at 8:17 PM David Li <li...@apache.org> wrote:
>
> I think we will just have to live with requiring different flags for different modules. If we can detect the lack of the flag at runtime somehow and present an appropriate error message (the way we do for the memory module) that would be best.
>
> On Wed, Dec 13, 2023, at 12:33, James Duong wrote:
> > For the work on modularization, I’ve merged the flight-grpc and
> > flight-core modules. I haven’t moved any other packages.
> >
> > I also encountered this problem with flight-core. Flight-core derives
> > from io.netty.buffer (specifically CompositeByteBuf), but we cannot use
> > Netty buffer with JPMS without requiring users to monkey-patch it with
> > patch-module. I have flight-core using Netty buffer on the classpath,
> > but deriving from CompositeByteBuf requires adding
> > --add-reads=org.apache.arrow.flight.core=ALL-UNNAMED to the command
> > line since Netty’s being treated as part of the unnamed module in this
> > context.
> >
> > This isn’t terribly onerous from the user’s perspective (especially
> > compared to monkey-patching), but we start getting into the business of
> > having different command-line arguments for different modules the user
> > might be using.
> >
> > We could tell users to always add
> > –add-reads=org.apache.arrow.flight.core=ALL-UNNAMED, but if they aren’t
> > using Flight, they’ll get a warning about an unrecognized module. Any
> > thoughts about the best way to present these command-line changes?
> >
> >
> > From: Jean-Baptiste Onofré <jb...@nanthrax.net>
> > Date: Wednesday, December 13, 2023 at 6:58 AM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > Hi,
> >
> > We have a "split packages" between flight-core and flight-grpc. I
> > don't think anyone is using only flight-core to create a new
> > "transport".
> > I think it's acceptable to combine flight-core and flight-grpc, and
> > maybe reshape a bit (imho some classes from flight-grpc should be
> > actually in flight-core).
> > Users can always reshade (even if it's painful).
> >
> > Regards
> > JB
> >
> > On Tue, Dec 12, 2023 at 8:52 PM David Li <li...@apache.org> wrote:
> >>
> >> I would be OK with combining flight-core and flight-grpc. It's not clear to me anyways that the current split would actually be helpful if we do ever want to support Flight without gRPC.
> >>
> >> On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
> >> > An update on this work.
> >> >
> >> > I’ve been able to update the following to JPMS modules now:
> >> >
> >> >   *   arrow-memory-core
> >> >   *   arrow-memory-unsafe
> >> >   *   arrow-format
> >> >   *   arrow-vector
> >> >   *   arrow-compression
> >> >   *   arrow-tools
> >> >   *   arrow-avro
> >> >   *   arrow-jdbc
> >> >   *   arrow-algorithm
> >> >
> >> > I don’t think the following should be modularized (but would appreciate
> >> > feedback as well):
> >> >
> >> >   *   arrow-performance (test-only module)
> >> >   *   flight-integration-tests (test-only module)
> >> >   *   flight-jdbc-core
> >> >   *   flight-jdbc-driver (Need to think about this. MySQL suggests they
> >> > want to modularize their JDBC driver:
> >> > https://bugs.mysql.com/bug.php?id=97683 . How does this interact with
> >> > the ServiceLoader changes in JPMS as well).
> >> >
> >> > I’m starting the flight-related modules now. The first issue I’ve
> >> > noticed is that flight-core and flight-grpc reuse packages. I’d like to
> >> > combine flight-core and flight-grpc because of this (currently we only
> >> > have Flight using grpc in Java).
> >> >
> >> > After Flight I’d take a look at the modules that have native code:
> >> >
> >> >   *   arrow-c-data
> >> >   *   arrow-orc
> >> >   *   arrow-dataset
> >> >   *   arrow-gandiva
> >> >
> >> >
> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> > Date: Tuesday, December 5, 2023 at 1:39 PM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> > Module System
> >> > I expect that we’d still have to change CLI arguments in JDK17.
> >> >
> >> > The need for changing the CLI arguments is due to memory-core now being
> >> > a named module while requiring access to Unsafe and using reflection
> >> > for accessing internals of java.nio.Buffer.
> >> >
> >> > We’re using JDK8 code for doing both currently. It might be worth
> >> > looking into if there’s a JDK9+ way of doing this without needing
> >> > reflection and compiling memory-core as a multi-release JAR.
> >> >
> >> > I don’t expect more CLI argument changes unless we use reflection on
> >> > NIO classes in other modules.
> >> >
> >> >
> >> > From: Adam Lippai <ad...@rigo.sk>
> >> > Date: Tuesday, December 5, 2023 at 9:28 AM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> > Module System
> >> > I believe Spark 4.0 was mentioned before. It’ll require Java 17 and
> >> > will be
> >> > released in a few months (June?).
> >> >
> >> > Best regards,
> >> > Adam Lippai
> >> >
> >> > On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:
> >> >
> >> >> Thanks James for delving into this mess.
> >> >>
> >> >> It looks like this change is unavoidable if we want to modularize? I think
> >> >> this is OK. Will the CLI argument change as we continue modularizing, or is
> >> >> this the only change that will be needed?
> >> >>
> >> >> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> >> >> > Hello,
> >> >> >
> >> >> > I did some work to separate the below PR into smaller PRs.
> >> >> >
> >> >> >
> >> >> >   *   Updating the versions of dependencies and maven plugins is done
> >> >> > and merged into master.
> >> >> >   *   I separated out the work modularizing arrow-vector,
> >> >> > arrow-memory-core/unsafe, and arrow-memory-netty.
> >> >> >
> >> >> > Modularizing arrow-memory-core requires a smaller change to user
> >> >> > command-line arguments. Instead of:
> >> >> > --add-opens=java.base/java.nio=ALL-UNNAMED
> >> >> >
> >> >> > The user needs to add:
> >> >> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
> >> >> >
> >> >> > I initially tried to modularize arrow-vector separately from
> >> >> > arrow-memory-core but found that any meaningful operation in
> >> >> > arrow-vector would trigger an illegal access in memory-core if it
> >> >> > wasn’t modularized.
> >> >> >
> >> >> > I was able to run the tests for arrow-compression and arrow-tools
> >> >> > successfully after modularizing memory-core, memory-unsafe-, and
> >> >> > arrow-vector. Note that I had more success by making memory-core and
> >> >> > memory-unsafe automatic modules.
> >> >> >
> >> >> > I think we should make a decision here on if we want to bite the bullet
> >> >> > and introduce a breaking user-facing change around command-line
> >> >> > options. The other option is to wait for JDK 21 to modularize. That’s
> >> >> > farther down the line and requires refactoring much of the memory
> >> >> > module code and implementing a module using the foreign memory
> >> >> > interface.
> >> >> >
> >> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> >> > Date: Tuesday, November 28, 2023 at 6:48 PM
> >> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> >> > Module System
> >> >> > Hi,
> >> >> >
> >> >> > I’ve made some major progress on this work in this PR:
> >> >> > https://github.com/apache/arrow/pull/38876
> >> >> >
> >> >> >
> >> >> >   *   The maven plugin for compiling module-info.java files using JDK 8
> >> >> > is working correctly.
> >> >> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
> >> >> > arrow-memory-unsafe, and arrow-vector have been modularized
> >> >> > successfully.
> >> >> >      *   Tests pass locally for all of these modules.
> >> >> >      *   They fail in CI. This is likely from me not updating a profile
> >> >> > somewhere.
> >> >> >
> >> >> > Similar to David’s PR from below, arrow-memory and modules needed to be
> >> >> > refactored fairly significantly and split into two modules: a
> >> >> > public-facing JPMS module and a separate module which adds to Netty’s
> >> >> > packages (memory-netty-buffer-patch). What’s more problematic is that
> >> >> > because we are using named modules now, users need to add more
> >> >> > arguments to their Java command line to use arrow. If one were to use
> >> >> > arrow-memory-netty they would need to add the following:
> >> >> >
> >> >> > --add-opens java.base/jdk.internal.misc=io.netty.common
> >> >> >
> >> >> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
> >> >>
> >> >> >
> >> >> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
> >> >> >
> >> >> > Depending on where the memory-netty-buffer-patch JAR is located, and
> >> >> > what version, the command the user needs to supply changes, so this
> >> >> > seems like it’d be really inconvenient.
> >> >> >
> >> >> > Do we want to proceed with modularizing existing memory modules? Both
> >> >> > netty and unsafe? Or wait until the new memory module from Java 21 is
> >> >> > available?
> >> >> >
> >> >> > The module-info.java files are written fairly naively. I haven’t
> >> >> > inspected thoroughly to determine what packages users will need.
> >> >> >
> >> >> > We can continue modularizing more components in a separate PR. Ideally
> >> >> > all the user breakage (class movement, new command-line argument
> >> >> > requirements) happens within one major Arrow version.
> >> >> >
> >> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> >> > Date: Tuesday, November 21, 2023 at 1:16 PM
> >> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> >> > Module System
> >> >> > I’m following up on this topic.
> >> >> >
> >> >> > David has a PR from last year that’s done much of the heavy lifting for
> >> >> > refactoring the codebase to be package-friendly.
> >> >> > https://github.com/apache/arrow/pull/13072
> >> >> >
> >> >> > What’s changed since and what’s left:
> >> >> >
> >> >> >   *   New components have been added (Flight SQL for example) that will
> >> >> > need to be updated for modules.
> >> >> >   *   There wasn’t a clear solution on how to do this without breaking
> >> >> > JDK 8 support. Compiling module-info.java files require using JDK9, but
> >> >> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
> >> >> >      *   There is a Gradle plugin that can compile module-info.java
> >> >> > files purely syntactically that we can adapt to maven. It has
> >> >> > limitations (the one I see is that it can’t iterate through
> >> >> > classloaders to handle annotations), but using this might be a good
> >> >> > stopgap until we JDK 8 support is deprecated.
> >> >> >   *   Some plugins need to be updated:
> >> >> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
> >> >> > files.
> >> >> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
> >> >> > existing checkstyle rules file can’t be loaded with newer versions. We
> >> >> > can exclude module-info.java for now and have a separate Issue for
> >> >> > updating checkstyle itself and the rules file.
> >> >> >   *   grpc-java could not be modularized when the PR above was written.
> >> >> >      *   Grpc 1.57 now can be modularized
> >> >> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
> >> >> >
> >> >> > From: David Dali Susanibar Arce <da...@gmail.com>
> >> >> > Date: Wednesday, May 25, 2022 at 5:02 AM
> >> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> >> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
> >> >> System
> >> >> > Hi All,
> >> >> >
> >> >> > This email's purpose is a request for comments to migrate Arrow Java to
> >> >> JPMS
> >> >> > Java Platform Module System <
> >> >> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3e>
> >> >> > JSE 9+ (1).
> >> >> >
> >> >> > Current status:
> >> >> >
> >> >> > - Arrow Java use JSE1.8 specification
> >> >> >
> >> >> > - Arrow Java works with JSE1.8/9/11/17
> >> >> >
> >> >> > - This is possible because Java offers “legacy mode”
> >> >> >
> >> >> > Proposal:
> >> >> >
> >> >> > Migrate to JPMS Java Platform Module System. This Draft PR
> >> >> > <https://github.com/apache/arrow/pull/13072>(2<
> >> >> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3e>
> >> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
> >> >> <
> >> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3e>
> >> >>
> >> >> > contains an initial port of
> >> >> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
> >> >> > Vector
> >> >> > for evaluation.
> >> >> >
> >> >> > Main Reason to migrate:
> >> >> >
> >> >> > - JPMS offer Strong encapsulation, Well-defined interfaces
> >> >> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
> >> >> dependencies.
> >> >> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
> >> >> >
> >> >> > - JPMS offer reliable configuration and security to hide platform
> >> >> internals.
> >> >> >
> >> >> > - JPMS offers a partial solution to solve problems about read (80%)
> >> >> /write
> >> >> > (20%) code.
> >> >> >
> >> >> > - JPMS offer optimization for readability about read/write ratio (90/10)
> >> >> > thru module-info.java.
> >> >> >
> >> >> > - Consistency logs, JPMS implement consistency logs to really use that to
> >> >> > solve the current problem.
> >> >> >
> >> >> > - Be able to customize JRE needed with only modules needed (not
> >> >> > java.desktop for example and others) thru JLink.
> >> >> >
> >> >> > - Modules have also been implemented by other languages such as
> >> >> Javascript
> >> >> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
> >> >> >
> >> >> > - Consider taking a look at this discussion about pros/cons
> >> >> > <
> >> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >> >> >
> >> >> > (5).
> >> >> >
> >> >> > - Eventual migration to JPMS is a practical necessity as more projects
> >> >> > migrate.
> >> >> >
> >> >> > Effort:
> >> >> >
> >> >> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
> >> >> to
> >> >> > offer support for both jar components JSE1.8 and JSE9+ included.
> >> >> >
> >> >> > - Go bottom up for JPMS.
> >> >> >
> >> >> > - Packages need to be unique (i.e. org.apache.arrow.memory /
> >> >> > io.netty.buffer). Review Draft PR with initial proposal.
> >> >> >
> >> >> > - Dependencies also need to be modularized. If some of our current
> >> >> > dependencies are not able to be used as a module this will be a blocker
> >> >> for
> >> >> > our modules (we could patch that but this is an extra effort).
> >> >> >
> >> >> > Killers:
> >> >> >
> >> >> > - FIXME! I need your support to identify killer reasons to be able to
> >> >> push
> >> >> > this implementation.
> >> >> >
> >> >> > Please let us know if Arrow Java to JPMS Java Platform Module System is
> >> >> > needed and should be implemented.
> >> >> >
> >> >> > Please use this file for any comments
> >> >> >
> >> >> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
> >> >> >
> >> >> > Resources used:
> >> >> >
> >> >> > (1): https://openjdk.java.net/projects/jigsaw/spec/
> >> >> >
> >> >> > (2): https://github.com/apache/arrow/pull/13072
> >> >> >
> >> >> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
> >> >> >
> >> >> > (4): https://github.com/nipafx/demo-jigsaw-reflection
> >> >> >
> >> >> > (5):
> >> >> >
> >> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >> >> >
> >> >> > Best regards,
> >> >> >
> >> >> > --
> >> >> > David
> >> >>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by David Li <li...@apache.org>.
I think we will just have to live with requiring different flags for different modules. If we can detect the lack of the flag at runtime somehow and present an appropriate error message (the way we do for the memory module) that would be best.

On Wed, Dec 13, 2023, at 12:33, James Duong wrote:
> For the work on modularization, I’ve merged the flight-grpc and 
> flight-core modules. I haven’t moved any other packages.
>
> I also encountered this problem with flight-core. Flight-core derives 
> from io.netty.buffer (specifically CompositeByteBuf), but we cannot use 
> Netty buffer with JPMS without requiring users to monkey-patch it with 
> patch-module. I have flight-core using Netty buffer on the classpath, 
> but deriving from CompositeByteBuf requires adding 
> --add-reads=org.apache.arrow.flight.core=ALL-UNNAMED to the command 
> line since Netty’s being treated as part of the unnamed module in this 
> context.
>
> This isn’t terribly onerous from the user’s perspective (especially 
> compared to monkey-patching), but we start getting into the business of 
> having different command-line arguments for different modules the user 
> might be using.
>
> We could tell users to always add 
> –add-reads=org.apache.arrow.flight.core=ALL-UNNAMED, but if they aren’t 
> using Flight, they’ll get a warning about an unrecognized module. Any 
> thoughts about the best way to present these command-line changes?
>
>
> From: Jean-Baptiste Onofré <jb...@nanthrax.net>
> Date: Wednesday, December 13, 2023 at 6:58 AM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> Hi,
>
> We have a "split packages" between flight-core and flight-grpc. I
> don't think anyone is using only flight-core to create a new
> "transport".
> I think it's acceptable to combine flight-core and flight-grpc, and
> maybe reshape a bit (imho some classes from flight-grpc should be
> actually in flight-core).
> Users can always reshade (even if it's painful).
>
> Regards
> JB
>
> On Tue, Dec 12, 2023 at 8:52 PM David Li <li...@apache.org> wrote:
>>
>> I would be OK with combining flight-core and flight-grpc. It's not clear to me anyways that the current split would actually be helpful if we do ever want to support Flight without gRPC.
>>
>> On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
>> > An update on this work.
>> >
>> > I’ve been able to update the following to JPMS modules now:
>> >
>> >   *   arrow-memory-core
>> >   *   arrow-memory-unsafe
>> >   *   arrow-format
>> >   *   arrow-vector
>> >   *   arrow-compression
>> >   *   arrow-tools
>> >   *   arrow-avro
>> >   *   arrow-jdbc
>> >   *   arrow-algorithm
>> >
>> > I don’t think the following should be modularized (but would appreciate
>> > feedback as well):
>> >
>> >   *   arrow-performance (test-only module)
>> >   *   flight-integration-tests (test-only module)
>> >   *   flight-jdbc-core
>> >   *   flight-jdbc-driver (Need to think about this. MySQL suggests they
>> > want to modularize their JDBC driver:
>> > https://bugs.mysql.com/bug.php?id=97683 . How does this interact with
>> > the ServiceLoader changes in JPMS as well).
>> >
>> > I’m starting the flight-related modules now. The first issue I’ve
>> > noticed is that flight-core and flight-grpc reuse packages. I’d like to
>> > combine flight-core and flight-grpc because of this (currently we only
>> > have Flight using grpc in Java).
>> >
>> > After Flight I’d take a look at the modules that have native code:
>> >
>> >   *   arrow-c-data
>> >   *   arrow-orc
>> >   *   arrow-dataset
>> >   *   arrow-gandiva
>> >
>> >
>> > From: James Duong <Ja...@improving.com.INVALID>
>> > Date: Tuesday, December 5, 2023 at 1:39 PM
>> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> > Module System
>> > I expect that we’d still have to change CLI arguments in JDK17.
>> >
>> > The need for changing the CLI arguments is due to memory-core now being
>> > a named module while requiring access to Unsafe and using reflection
>> > for accessing internals of java.nio.Buffer.
>> >
>> > We’re using JDK8 code for doing both currently. It might be worth
>> > looking into if there’s a JDK9+ way of doing this without needing
>> > reflection and compiling memory-core as a multi-release JAR.
>> >
>> > I don’t expect more CLI argument changes unless we use reflection on
>> > NIO classes in other modules.
>> >
>> >
>> > From: Adam Lippai <ad...@rigo.sk>
>> > Date: Tuesday, December 5, 2023 at 9:28 AM
>> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> > Module System
>> > I believe Spark 4.0 was mentioned before. It’ll require Java 17 and
>> > will be
>> > released in a few months (June?).
>> >
>> > Best regards,
>> > Adam Lippai
>> >
>> > On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:
>> >
>> >> Thanks James for delving into this mess.
>> >>
>> >> It looks like this change is unavoidable if we want to modularize? I think
>> >> this is OK. Will the CLI argument change as we continue modularizing, or is
>> >> this the only change that will be needed?
>> >>
>> >> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
>> >> > Hello,
>> >> >
>> >> > I did some work to separate the below PR into smaller PRs.
>> >> >
>> >> >
>> >> >   *   Updating the versions of dependencies and maven plugins is done
>> >> > and merged into master.
>> >> >   *   I separated out the work modularizing arrow-vector,
>> >> > arrow-memory-core/unsafe, and arrow-memory-netty.
>> >> >
>> >> > Modularizing arrow-memory-core requires a smaller change to user
>> >> > command-line arguments. Instead of:
>> >> > --add-opens=java.base/java.nio=ALL-UNNAMED
>> >> >
>> >> > The user needs to add:
>> >> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
>> >> >
>> >> > I initially tried to modularize arrow-vector separately from
>> >> > arrow-memory-core but found that any meaningful operation in
>> >> > arrow-vector would trigger an illegal access in memory-core if it
>> >> > wasn’t modularized.
>> >> >
>> >> > I was able to run the tests for arrow-compression and arrow-tools
>> >> > successfully after modularizing memory-core, memory-unsafe-, and
>> >> > arrow-vector. Note that I had more success by making memory-core and
>> >> > memory-unsafe automatic modules.
>> >> >
>> >> > I think we should make a decision here on if we want to bite the bullet
>> >> > and introduce a breaking user-facing change around command-line
>> >> > options. The other option is to wait for JDK 21 to modularize. That’s
>> >> > farther down the line and requires refactoring much of the memory
>> >> > module code and implementing a module using the foreign memory
>> >> > interface.
>> >> >
>> >> > From: James Duong <Ja...@improving.com.INVALID>
>> >> > Date: Tuesday, November 28, 2023 at 6:48 PM
>> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> >> > Module System
>> >> > Hi,
>> >> >
>> >> > I’ve made some major progress on this work in this PR:
>> >> > https://github.com/apache/arrow/pull/38876
>> >> >
>> >> >
>> >> >   *   The maven plugin for compiling module-info.java files using JDK 8
>> >> > is working correctly.
>> >> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
>> >> > arrow-memory-unsafe, and arrow-vector have been modularized
>> >> > successfully.
>> >> >      *   Tests pass locally for all of these modules.
>> >> >      *   They fail in CI. This is likely from me not updating a profile
>> >> > somewhere.
>> >> >
>> >> > Similar to David’s PR from below, arrow-memory and modules needed to be
>> >> > refactored fairly significantly and split into two modules: a
>> >> > public-facing JPMS module and a separate module which adds to Netty’s
>> >> > packages (memory-netty-buffer-patch). What’s more problematic is that
>> >> > because we are using named modules now, users need to add more
>> >> > arguments to their Java command line to use arrow. If one were to use
>> >> > arrow-memory-netty they would need to add the following:
>> >> >
>> >> > --add-opens java.base/jdk.internal.misc=io.netty.common
>> >> >
>> >> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
>> >>
>> >> >
>> >> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
>> >> >
>> >> > Depending on where the memory-netty-buffer-patch JAR is located, and
>> >> > what version, the command the user needs to supply changes, so this
>> >> > seems like it’d be really inconvenient.
>> >> >
>> >> > Do we want to proceed with modularizing existing memory modules? Both
>> >> > netty and unsafe? Or wait until the new memory module from Java 21 is
>> >> > available?
>> >> >
>> >> > The module-info.java files are written fairly naively. I haven’t
>> >> > inspected thoroughly to determine what packages users will need.
>> >> >
>> >> > We can continue modularizing more components in a separate PR. Ideally
>> >> > all the user breakage (class movement, new command-line argument
>> >> > requirements) happens within one major Arrow version.
>> >> >
>> >> > From: James Duong <Ja...@improving.com.INVALID>
>> >> > Date: Tuesday, November 21, 2023 at 1:16 PM
>> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> >> > Module System
>> >> > I’m following up on this topic.
>> >> >
>> >> > David has a PR from last year that’s done much of the heavy lifting for
>> >> > refactoring the codebase to be package-friendly.
>> >> > https://github.com/apache/arrow/pull/13072
>> >> >
>> >> > What’s changed since and what’s left:
>> >> >
>> >> >   *   New components have been added (Flight SQL for example) that will
>> >> > need to be updated for modules.
>> >> >   *   There wasn’t a clear solution on how to do this without breaking
>> >> > JDK 8 support. Compiling module-info.java files require using JDK9, but
>> >> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
>> >> >      *   There is a Gradle plugin that can compile module-info.java
>> >> > files purely syntactically that we can adapt to maven. It has
>> >> > limitations (the one I see is that it can’t iterate through
>> >> > classloaders to handle annotations), but using this might be a good
>> >> > stopgap until we JDK 8 support is deprecated.
>> >> >   *   Some plugins need to be updated:
>> >> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
>> >> > files.
>> >> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
>> >> > existing checkstyle rules file can’t be loaded with newer versions. We
>> >> > can exclude module-info.java for now and have a separate Issue for
>> >> > updating checkstyle itself and the rules file.
>> >> >   *   grpc-java could not be modularized when the PR above was written.
>> >> >      *   Grpc 1.57 now can be modularized
>> >> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
>> >> >
>> >> > From: David Dali Susanibar Arce <da...@gmail.com>
>> >> > Date: Wednesday, May 25, 2022 at 5:02 AM
>> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> >> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
>> >> System
>> >> > Hi All,
>> >> >
>> >> > This email's purpose is a request for comments to migrate Arrow Java to
>> >> JPMS
>> >> > Java Platform Module System <
>> >> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3e>
>> >> > JSE 9+ (1).
>> >> >
>> >> > Current status:
>> >> >
>> >> > - Arrow Java use JSE1.8 specification
>> >> >
>> >> > - Arrow Java works with JSE1.8/9/11/17
>> >> >
>> >> > - This is possible because Java offers “legacy mode”
>> >> >
>> >> > Proposal:
>> >> >
>> >> > Migrate to JPMS Java Platform Module System. This Draft PR
>> >> > <https://github.com/apache/arrow/pull/13072>(2<
>> >> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3e>
>> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
>> >> <
>> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3e>
>> >>
>> >> > contains an initial port of
>> >> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
>> >> > Vector
>> >> > for evaluation.
>> >> >
>> >> > Main Reason to migrate:
>> >> >
>> >> > - JPMS offer Strong encapsulation, Well-defined interfaces
>> >> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
>> >> dependencies.
>> >> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
>> >> >
>> >> > - JPMS offer reliable configuration and security to hide platform
>> >> internals.
>> >> >
>> >> > - JPMS offers a partial solution to solve problems about read (80%)
>> >> /write
>> >> > (20%) code.
>> >> >
>> >> > - JPMS offer optimization for readability about read/write ratio (90/10)
>> >> > thru module-info.java.
>> >> >
>> >> > - Consistency logs, JPMS implement consistency logs to really use that to
>> >> > solve the current problem.
>> >> >
>> >> > - Be able to customize JRE needed with only modules needed (not
>> >> > java.desktop for example and others) thru JLink.
>> >> >
>> >> > - Modules have also been implemented by other languages such as
>> >> Javascript
>> >> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
>> >> >
>> >> > - Consider taking a look at this discussion about pros/cons
>> >> > <
>> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
>> >> >
>> >> > (5).
>> >> >
>> >> > - Eventual migration to JPMS is a practical necessity as more projects
>> >> > migrate.
>> >> >
>> >> > Effort:
>> >> >
>> >> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
>> >> to
>> >> > offer support for both jar components JSE1.8 and JSE9+ included.
>> >> >
>> >> > - Go bottom up for JPMS.
>> >> >
>> >> > - Packages need to be unique (i.e. org.apache.arrow.memory /
>> >> > io.netty.buffer). Review Draft PR with initial proposal.
>> >> >
>> >> > - Dependencies also need to be modularized. If some of our current
>> >> > dependencies are not able to be used as a module this will be a blocker
>> >> for
>> >> > our modules (we could patch that but this is an extra effort).
>> >> >
>> >> > Killers:
>> >> >
>> >> > - FIXME! I need your support to identify killer reasons to be able to
>> >> push
>> >> > this implementation.
>> >> >
>> >> > Please let us know if Arrow Java to JPMS Java Platform Module System is
>> >> > needed and should be implemented.
>> >> >
>> >> > Please use this file for any comments
>> >> >
>> >> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
>> >> >
>> >> > Resources used:
>> >> >
>> >> > (1): https://openjdk.java.net/projects/jigsaw/spec/
>> >> >
>> >> > (2): https://github.com/apache/arrow/pull/13072
>> >> >
>> >> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
>> >> >
>> >> > (4): https://github.com/nipafx/demo-jigsaw-reflection
>> >> >
>> >> > (5):
>> >> >
>> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
>> >> >
>> >> > Best regards,
>> >> >
>> >> > --
>> >> > David
>> >>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by James Duong <Ja...@improving.com.INVALID>.
For the work on modularization, I’ve merged the flight-grpc and flight-core modules. I haven’t moved any other packages.

I also encountered this problem with flight-core. Flight-core derives from io.netty.buffer (specifically CompositeByteBuf), but we cannot use Netty buffer with JPMS without requiring users to monkey-patch it with patch-module. I have flight-core using Netty buffer on the classpath, but deriving from CompositeByteBuf requires adding --add-reads=org.apache.arrow.flight.core=ALL-UNNAMED to the command line since Netty’s being treated as part of the unnamed module in this context.

This isn’t terribly onerous from the user’s perspective (especially compared to monkey-patching), but we start getting into the business of having different command-line arguments for different modules the user might be using.

We could tell users to always add –add-reads=org.apache.arrow.flight.core=ALL-UNNAMED, but if they aren’t using Flight, they’ll get a warning about an unrecognized module. Any thoughts about the best way to present these command-line changes?


From: Jean-Baptiste Onofré <jb...@nanthrax.net>
Date: Wednesday, December 13, 2023 at 6:58 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
Hi,

We have a "split packages" between flight-core and flight-grpc. I
don't think anyone is using only flight-core to create a new
"transport".
I think it's acceptable to combine flight-core and flight-grpc, and
maybe reshape a bit (imho some classes from flight-grpc should be
actually in flight-core).
Users can always reshade (even if it's painful).

Regards
JB

On Tue, Dec 12, 2023 at 8:52 PM David Li <li...@apache.org> wrote:
>
> I would be OK with combining flight-core and flight-grpc. It's not clear to me anyways that the current split would actually be helpful if we do ever want to support Flight without gRPC.
>
> On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
> > An update on this work.
> >
> > I’ve been able to update the following to JPMS modules now:
> >
> >   *   arrow-memory-core
> >   *   arrow-memory-unsafe
> >   *   arrow-format
> >   *   arrow-vector
> >   *   arrow-compression
> >   *   arrow-tools
> >   *   arrow-avro
> >   *   arrow-jdbc
> >   *   arrow-algorithm
> >
> > I don’t think the following should be modularized (but would appreciate
> > feedback as well):
> >
> >   *   arrow-performance (test-only module)
> >   *   flight-integration-tests (test-only module)
> >   *   flight-jdbc-core
> >   *   flight-jdbc-driver (Need to think about this. MySQL suggests they
> > want to modularize their JDBC driver:
> > https://bugs.mysql.com/bug.php?id=97683 . How does this interact with
> > the ServiceLoader changes in JPMS as well).
> >
> > I’m starting the flight-related modules now. The first issue I’ve
> > noticed is that flight-core and flight-grpc reuse packages. I’d like to
> > combine flight-core and flight-grpc because of this (currently we only
> > have Flight using grpc in Java).
> >
> > After Flight I’d take a look at the modules that have native code:
> >
> >   *   arrow-c-data
> >   *   arrow-orc
> >   *   arrow-dataset
> >   *   arrow-gandiva
> >
> >
> > From: James Duong <Ja...@improving.com.INVALID>
> > Date: Tuesday, December 5, 2023 at 1:39 PM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > I expect that we’d still have to change CLI arguments in JDK17.
> >
> > The need for changing the CLI arguments is due to memory-core now being
> > a named module while requiring access to Unsafe and using reflection
> > for accessing internals of java.nio.Buffer.
> >
> > We’re using JDK8 code for doing both currently. It might be worth
> > looking into if there’s a JDK9+ way of doing this without needing
> > reflection and compiling memory-core as a multi-release JAR.
> >
> > I don’t expect more CLI argument changes unless we use reflection on
> > NIO classes in other modules.
> >
> >
> > From: Adam Lippai <ad...@rigo.sk>
> > Date: Tuesday, December 5, 2023 at 9:28 AM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > I believe Spark 4.0 was mentioned before. It’ll require Java 17 and
> > will be
> > released in a few months (June?).
> >
> > Best regards,
> > Adam Lippai
> >
> > On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:
> >
> >> Thanks James for delving into this mess.
> >>
> >> It looks like this change is unavoidable if we want to modularize? I think
> >> this is OK. Will the CLI argument change as we continue modularizing, or is
> >> this the only change that will be needed?
> >>
> >> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> >> > Hello,
> >> >
> >> > I did some work to separate the below PR into smaller PRs.
> >> >
> >> >
> >> >   *   Updating the versions of dependencies and maven plugins is done
> >> > and merged into master.
> >> >   *   I separated out the work modularizing arrow-vector,
> >> > arrow-memory-core/unsafe, and arrow-memory-netty.
> >> >
> >> > Modularizing arrow-memory-core requires a smaller change to user
> >> > command-line arguments. Instead of:
> >> > --add-opens=java.base/java.nio=ALL-UNNAMED
> >> >
> >> > The user needs to add:
> >> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
> >> >
> >> > I initially tried to modularize arrow-vector separately from
> >> > arrow-memory-core but found that any meaningful operation in
> >> > arrow-vector would trigger an illegal access in memory-core if it
> >> > wasn’t modularized.
> >> >
> >> > I was able to run the tests for arrow-compression and arrow-tools
> >> > successfully after modularizing memory-core, memory-unsafe-, and
> >> > arrow-vector. Note that I had more success by making memory-core and
> >> > memory-unsafe automatic modules.
> >> >
> >> > I think we should make a decision here on if we want to bite the bullet
> >> > and introduce a breaking user-facing change around command-line
> >> > options. The other option is to wait for JDK 21 to modularize. That’s
> >> > farther down the line and requires refactoring much of the memory
> >> > module code and implementing a module using the foreign memory
> >> > interface.
> >> >
> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> > Date: Tuesday, November 28, 2023 at 6:48 PM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> > Module System
> >> > Hi,
> >> >
> >> > I’ve made some major progress on this work in this PR:
> >> > https://github.com/apache/arrow/pull/38876
> >> >
> >> >
> >> >   *   The maven plugin for compiling module-info.java files using JDK 8
> >> > is working correctly.
> >> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
> >> > arrow-memory-unsafe, and arrow-vector have been modularized
> >> > successfully.
> >> >      *   Tests pass locally for all of these modules.
> >> >      *   They fail in CI. This is likely from me not updating a profile
> >> > somewhere.
> >> >
> >> > Similar to David’s PR from below, arrow-memory and modules needed to be
> >> > refactored fairly significantly and split into two modules: a
> >> > public-facing JPMS module and a separate module which adds to Netty’s
> >> > packages (memory-netty-buffer-patch). What’s more problematic is that
> >> > because we are using named modules now, users need to add more
> >> > arguments to their Java command line to use arrow. If one were to use
> >> > arrow-memory-netty they would need to add the following:
> >> >
> >> > --add-opens java.base/jdk.internal.misc=io.netty.common
> >> >
> >> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
> >>
> >> >
> >> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
> >> >
> >> > Depending on where the memory-netty-buffer-patch JAR is located, and
> >> > what version, the command the user needs to supply changes, so this
> >> > seems like it’d be really inconvenient.
> >> >
> >> > Do we want to proceed with modularizing existing memory modules? Both
> >> > netty and unsafe? Or wait until the new memory module from Java 21 is
> >> > available?
> >> >
> >> > The module-info.java files are written fairly naively. I haven’t
> >> > inspected thoroughly to determine what packages users will need.
> >> >
> >> > We can continue modularizing more components in a separate PR. Ideally
> >> > all the user breakage (class movement, new command-line argument
> >> > requirements) happens within one major Arrow version.
> >> >
> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> > Date: Tuesday, November 21, 2023 at 1:16 PM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> > Module System
> >> > I’m following up on this topic.
> >> >
> >> > David has a PR from last year that’s done much of the heavy lifting for
> >> > refactoring the codebase to be package-friendly.
> >> > https://github.com/apache/arrow/pull/13072
> >> >
> >> > What’s changed since and what’s left:
> >> >
> >> >   *   New components have been added (Flight SQL for example) that will
> >> > need to be updated for modules.
> >> >   *   There wasn’t a clear solution on how to do this without breaking
> >> > JDK 8 support. Compiling module-info.java files require using JDK9, but
> >> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
> >> >      *   There is a Gradle plugin that can compile module-info.java
> >> > files purely syntactically that we can adapt to maven. It has
> >> > limitations (the one I see is that it can’t iterate through
> >> > classloaders to handle annotations), but using this might be a good
> >> > stopgap until we JDK 8 support is deprecated.
> >> >   *   Some plugins need to be updated:
> >> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
> >> > files.
> >> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
> >> > existing checkstyle rules file can’t be loaded with newer versions. We
> >> > can exclude module-info.java for now and have a separate Issue for
> >> > updating checkstyle itself and the rules file.
> >> >   *   grpc-java could not be modularized when the PR above was written.
> >> >      *   Grpc 1.57 now can be modularized
> >> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
> >> >
> >> > From: David Dali Susanibar Arce <da...@gmail.com>
> >> > Date: Wednesday, May 25, 2022 at 5:02 AM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
> >> System
> >> > Hi All,
> >> >
> >> > This email's purpose is a request for comments to migrate Arrow Java to
> >> JPMS
> >> > Java Platform Module System <
> >> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e%3e>
> >> > JSE 9+ (1).
> >> >
> >> > Current status:
> >> >
> >> > - Arrow Java use JSE1.8 specification
> >> >
> >> > - Arrow Java works with JSE1.8/9/11/17
> >> >
> >> > - This is possible because Java offers “legacy mode”
> >> >
> >> > Proposal:
> >> >
> >> > Migrate to JPMS Java Platform Module System. This Draft PR
> >> > <https://github.com/apache/arrow/pull/13072>(2<
> >> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e%3e>
> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
> >> <
> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e%3e>
> >>
> >> > contains an initial port of
> >> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
> >> > Vector
> >> > for evaluation.
> >> >
> >> > Main Reason to migrate:
> >> >
> >> > - JPMS offer Strong encapsulation, Well-defined interfaces
> >> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
> >> dependencies.
> >> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
> >> >
> >> > - JPMS offer reliable configuration and security to hide platform
> >> internals.
> >> >
> >> > - JPMS offers a partial solution to solve problems about read (80%)
> >> /write
> >> > (20%) code.
> >> >
> >> > - JPMS offer optimization for readability about read/write ratio (90/10)
> >> > thru module-info.java.
> >> >
> >> > - Consistency logs, JPMS implement consistency logs to really use that to
> >> > solve the current problem.
> >> >
> >> > - Be able to customize JRE needed with only modules needed (not
> >> > java.desktop for example and others) thru JLink.
> >> >
> >> > - Modules have also been implemented by other languages such as
> >> Javascript
> >> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
> >> >
> >> > - Consider taking a look at this discussion about pros/cons
> >> > <
> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >> >
> >> > (5).
> >> >
> >> > - Eventual migration to JPMS is a practical necessity as more projects
> >> > migrate.
> >> >
> >> > Effort:
> >> >
> >> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
> >> to
> >> > offer support for both jar components JSE1.8 and JSE9+ included.
> >> >
> >> > - Go bottom up for JPMS.
> >> >
> >> > - Packages need to be unique (i.e. org.apache.arrow.memory /
> >> > io.netty.buffer). Review Draft PR with initial proposal.
> >> >
> >> > - Dependencies also need to be modularized. If some of our current
> >> > dependencies are not able to be used as a module this will be a blocker
> >> for
> >> > our modules (we could patch that but this is an extra effort).
> >> >
> >> > Killers:
> >> >
> >> > - FIXME! I need your support to identify killer reasons to be able to
> >> push
> >> > this implementation.
> >> >
> >> > Please let us know if Arrow Java to JPMS Java Platform Module System is
> >> > needed and should be implemented.
> >> >
> >> > Please use this file for any comments
> >> >
> >> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
> >> >
> >> > Resources used:
> >> >
> >> > (1): https://openjdk.java.net/projects/jigsaw/spec/
> >> >
> >> > (2): https://github.com/apache/arrow/pull/13072
> >> >
> >> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
> >> >
> >> > (4): https://github.com/nipafx/demo-jigsaw-reflection
> >> >
> >> > (5):
> >> >
> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >> >
> >> > Best regards,
> >> >
> >> > --
> >> > David
> >>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi,

We have a "split packages" between flight-core and flight-grpc. I
don't think anyone is using only flight-core to create a new
"transport".
I think it's acceptable to combine flight-core and flight-grpc, and
maybe reshape a bit (imho some classes from flight-grpc should be
actually in flight-core).
Users can always reshade (even if it's painful).

Regards
JB

On Tue, Dec 12, 2023 at 8:52 PM David Li <li...@apache.org> wrote:
>
> I would be OK with combining flight-core and flight-grpc. It's not clear to me anyways that the current split would actually be helpful if we do ever want to support Flight without gRPC.
>
> On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
> > An update on this work.
> >
> > I’ve been able to update the following to JPMS modules now:
> >
> >   *   arrow-memory-core
> >   *   arrow-memory-unsafe
> >   *   arrow-format
> >   *   arrow-vector
> >   *   arrow-compression
> >   *   arrow-tools
> >   *   arrow-avro
> >   *   arrow-jdbc
> >   *   arrow-algorithm
> >
> > I don’t think the following should be modularized (but would appreciate
> > feedback as well):
> >
> >   *   arrow-performance (test-only module)
> >   *   flight-integration-tests (test-only module)
> >   *   flight-jdbc-core
> >   *   flight-jdbc-driver (Need to think about this. MySQL suggests they
> > want to modularize their JDBC driver:
> > https://bugs.mysql.com/bug.php?id=97683 . How does this interact with
> > the ServiceLoader changes in JPMS as well).
> >
> > I’m starting the flight-related modules now. The first issue I’ve
> > noticed is that flight-core and flight-grpc reuse packages. I’d like to
> > combine flight-core and flight-grpc because of this (currently we only
> > have Flight using grpc in Java).
> >
> > After Flight I’d take a look at the modules that have native code:
> >
> >   *   arrow-c-data
> >   *   arrow-orc
> >   *   arrow-dataset
> >   *   arrow-gandiva
> >
> >
> > From: James Duong <Ja...@improving.com.INVALID>
> > Date: Tuesday, December 5, 2023 at 1:39 PM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > I expect that we’d still have to change CLI arguments in JDK17.
> >
> > The need for changing the CLI arguments is due to memory-core now being
> > a named module while requiring access to Unsafe and using reflection
> > for accessing internals of java.nio.Buffer.
> >
> > We’re using JDK8 code for doing both currently. It might be worth
> > looking into if there’s a JDK9+ way of doing this without needing
> > reflection and compiling memory-core as a multi-release JAR.
> >
> > I don’t expect more CLI argument changes unless we use reflection on
> > NIO classes in other modules.
> >
> >
> > From: Adam Lippai <ad...@rigo.sk>
> > Date: Tuesday, December 5, 2023 at 9:28 AM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > I believe Spark 4.0 was mentioned before. It’ll require Java 17 and
> > will be
> > released in a few months (June?).
> >
> > Best regards,
> > Adam Lippai
> >
> > On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:
> >
> >> Thanks James for delving into this mess.
> >>
> >> It looks like this change is unavoidable if we want to modularize? I think
> >> this is OK. Will the CLI argument change as we continue modularizing, or is
> >> this the only change that will be needed?
> >>
> >> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> >> > Hello,
> >> >
> >> > I did some work to separate the below PR into smaller PRs.
> >> >
> >> >
> >> >   *   Updating the versions of dependencies and maven plugins is done
> >> > and merged into master.
> >> >   *   I separated out the work modularizing arrow-vector,
> >> > arrow-memory-core/unsafe, and arrow-memory-netty.
> >> >
> >> > Modularizing arrow-memory-core requires a smaller change to user
> >> > command-line arguments. Instead of:
> >> > --add-opens=java.base/java.nio=ALL-UNNAMED
> >> >
> >> > The user needs to add:
> >> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
> >> >
> >> > I initially tried to modularize arrow-vector separately from
> >> > arrow-memory-core but found that any meaningful operation in
> >> > arrow-vector would trigger an illegal access in memory-core if it
> >> > wasn’t modularized.
> >> >
> >> > I was able to run the tests for arrow-compression and arrow-tools
> >> > successfully after modularizing memory-core, memory-unsafe-, and
> >> > arrow-vector. Note that I had more success by making memory-core and
> >> > memory-unsafe automatic modules.
> >> >
> >> > I think we should make a decision here on if we want to bite the bullet
> >> > and introduce a breaking user-facing change around command-line
> >> > options. The other option is to wait for JDK 21 to modularize. That’s
> >> > farther down the line and requires refactoring much of the memory
> >> > module code and implementing a module using the foreign memory
> >> > interface.
> >> >
> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> > Date: Tuesday, November 28, 2023 at 6:48 PM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> > Module System
> >> > Hi,
> >> >
> >> > I’ve made some major progress on this work in this PR:
> >> > https://github.com/apache/arrow/pull/38876
> >> >
> >> >
> >> >   *   The maven plugin for compiling module-info.java files using JDK 8
> >> > is working correctly.
> >> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
> >> > arrow-memory-unsafe, and arrow-vector have been modularized
> >> > successfully.
> >> >      *   Tests pass locally for all of these modules.
> >> >      *   They fail in CI. This is likely from me not updating a profile
> >> > somewhere.
> >> >
> >> > Similar to David’s PR from below, arrow-memory and modules needed to be
> >> > refactored fairly significantly and split into two modules: a
> >> > public-facing JPMS module and a separate module which adds to Netty’s
> >> > packages (memory-netty-buffer-patch). What’s more problematic is that
> >> > because we are using named modules now, users need to add more
> >> > arguments to their Java command line to use arrow. If one were to use
> >> > arrow-memory-netty they would need to add the following:
> >> >
> >> > --add-opens java.base/jdk.internal.misc=io.netty.common
> >> >
> >> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
> >>
> >> >
> >> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
> >> >
> >> > Depending on where the memory-netty-buffer-patch JAR is located, and
> >> > what version, the command the user needs to supply changes, so this
> >> > seems like it’d be really inconvenient.
> >> >
> >> > Do we want to proceed with modularizing existing memory modules? Both
> >> > netty and unsafe? Or wait until the new memory module from Java 21 is
> >> > available?
> >> >
> >> > The module-info.java files are written fairly naively. I haven’t
> >> > inspected thoroughly to determine what packages users will need.
> >> >
> >> > We can continue modularizing more components in a separate PR. Ideally
> >> > all the user breakage (class movement, new command-line argument
> >> > requirements) happens within one major Arrow version.
> >> >
> >> > From: James Duong <Ja...@improving.com.INVALID>
> >> > Date: Tuesday, November 21, 2023 at 1:16 PM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> >> > Module System
> >> > I’m following up on this topic.
> >> >
> >> > David has a PR from last year that’s done much of the heavy lifting for
> >> > refactoring the codebase to be package-friendly.
> >> > https://github.com/apache/arrow/pull/13072
> >> >
> >> > What’s changed since and what’s left:
> >> >
> >> >   *   New components have been added (Flight SQL for example) that will
> >> > need to be updated for modules.
> >> >   *   There wasn’t a clear solution on how to do this without breaking
> >> > JDK 8 support. Compiling module-info.java files require using JDK9, but
> >> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
> >> >      *   There is a Gradle plugin that can compile module-info.java
> >> > files purely syntactically that we can adapt to maven. It has
> >> > limitations (the one I see is that it can’t iterate through
> >> > classloaders to handle annotations), but using this might be a good
> >> > stopgap until we JDK 8 support is deprecated.
> >> >   *   Some plugins need to be updated:
> >> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
> >> > files.
> >> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
> >> > existing checkstyle rules file can’t be loaded with newer versions. We
> >> > can exclude module-info.java for now and have a separate Issue for
> >> > updating checkstyle itself and the rules file.
> >> >   *   grpc-java could not be modularized when the PR above was written.
> >> >      *   Grpc 1.57 now can be modularized
> >> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
> >> >
> >> > From: David Dali Susanibar Arce <da...@gmail.com>
> >> > Date: Wednesday, May 25, 2022 at 5:02 AM
> >> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> >> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
> >> System
> >> > Hi All,
> >> >
> >> > This email's purpose is a request for comments to migrate Arrow Java to
> >> JPMS
> >> > Java Platform Module System <
> >> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e>
> >> > JSE 9+ (1).
> >> >
> >> > Current status:
> >> >
> >> > - Arrow Java use JSE1.8 specification
> >> >
> >> > - Arrow Java works with JSE1.8/9/11/17
> >> >
> >> > - This is possible because Java offers “legacy mode”
> >> >
> >> > Proposal:
> >> >
> >> > Migrate to JPMS Java Platform Module System. This Draft PR
> >> > <https://github.com/apache/arrow/pull/13072>(2<
> >> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e>
> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
> >> <
> >> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e>
> >>
> >> > contains an initial port of
> >> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
> >> > Vector
> >> > for evaluation.
> >> >
> >> > Main Reason to migrate:
> >> >
> >> > - JPMS offer Strong encapsulation, Well-defined interfaces
> >> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
> >> dependencies.
> >> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
> >> >
> >> > - JPMS offer reliable configuration and security to hide platform
> >> internals.
> >> >
> >> > - JPMS offers a partial solution to solve problems about read (80%)
> >> /write
> >> > (20%) code.
> >> >
> >> > - JPMS offer optimization for readability about read/write ratio (90/10)
> >> > thru module-info.java.
> >> >
> >> > - Consistency logs, JPMS implement consistency logs to really use that to
> >> > solve the current problem.
> >> >
> >> > - Be able to customize JRE needed with only modules needed (not
> >> > java.desktop for example and others) thru JLink.
> >> >
> >> > - Modules have also been implemented by other languages such as
> >> Javascript
> >> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
> >> >
> >> > - Consider taking a look at this discussion about pros/cons
> >> > <
> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >> >
> >> > (5).
> >> >
> >> > - Eventual migration to JPMS is a practical necessity as more projects
> >> > migrate.
> >> >
> >> > Effort:
> >> >
> >> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
> >> to
> >> > offer support for both jar components JSE1.8 and JSE9+ included.
> >> >
> >> > - Go bottom up for JPMS.
> >> >
> >> > - Packages need to be unique (i.e. org.apache.arrow.memory /
> >> > io.netty.buffer). Review Draft PR with initial proposal.
> >> >
> >> > - Dependencies also need to be modularized. If some of our current
> >> > dependencies are not able to be used as a module this will be a blocker
> >> for
> >> > our modules (we could patch that but this is an extra effort).
> >> >
> >> > Killers:
> >> >
> >> > - FIXME! I need your support to identify killer reasons to be able to
> >> push
> >> > this implementation.
> >> >
> >> > Please let us know if Arrow Java to JPMS Java Platform Module System is
> >> > needed and should be implemented.
> >> >
> >> > Please use this file for any comments
> >> >
> >> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
> >> >
> >> > Resources used:
> >> >
> >> > (1): https://openjdk.java.net/projects/jigsaw/spec/
> >> >
> >> > (2): https://github.com/apache/arrow/pull/13072
> >> >
> >> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
> >> >
> >> > (4): https://github.com/nipafx/demo-jigsaw-reflection
> >> >
> >> > (5):
> >> >
> >> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >> >
> >> > Best regards,
> >> >
> >> > --
> >> > David
> >>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by David Li <li...@apache.org>.
I would be OK with combining flight-core and flight-grpc. It's not clear to me anyways that the current split would actually be helpful if we do ever want to support Flight without gRPC.

On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
> An update on this work.
>
> I’ve been able to update the following to JPMS modules now:
>
>   *   arrow-memory-core
>   *   arrow-memory-unsafe
>   *   arrow-format
>   *   arrow-vector
>   *   arrow-compression
>   *   arrow-tools
>   *   arrow-avro
>   *   arrow-jdbc
>   *   arrow-algorithm
>
> I don’t think the following should be modularized (but would appreciate 
> feedback as well):
>
>   *   arrow-performance (test-only module)
>   *   flight-integration-tests (test-only module)
>   *   flight-jdbc-core
>   *   flight-jdbc-driver (Need to think about this. MySQL suggests they 
> want to modularize their JDBC driver: 
> https://bugs.mysql.com/bug.php?id=97683 . How does this interact with 
> the ServiceLoader changes in JPMS as well).
>
> I’m starting the flight-related modules now. The first issue I’ve 
> noticed is that flight-core and flight-grpc reuse packages. I’d like to 
> combine flight-core and flight-grpc because of this (currently we only 
> have Flight using grpc in Java).
>
> After Flight I’d take a look at the modules that have native code:
>
>   *   arrow-c-data
>   *   arrow-orc
>   *   arrow-dataset
>   *   arrow-gandiva
>
>
> From: James Duong <Ja...@improving.com.INVALID>
> Date: Tuesday, December 5, 2023 at 1:39 PM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> I expect that we’d still have to change CLI arguments in JDK17.
>
> The need for changing the CLI arguments is due to memory-core now being 
> a named module while requiring access to Unsafe and using reflection 
> for accessing internals of java.nio.Buffer.
>
> We’re using JDK8 code for doing both currently. It might be worth 
> looking into if there’s a JDK9+ way of doing this without needing 
> reflection and compiling memory-core as a multi-release JAR.
>
> I don’t expect more CLI argument changes unless we use reflection on 
> NIO classes in other modules.
>
>
> From: Adam Lippai <ad...@rigo.sk>
> Date: Tuesday, December 5, 2023 at 9:28 AM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> I believe Spark 4.0 was mentioned before. It’ll require Java 17 and 
> will be
> released in a few months (June?).
>
> Best regards,
> Adam Lippai
>
> On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:
>
>> Thanks James for delving into this mess.
>>
>> It looks like this change is unavoidable if we want to modularize? I think
>> this is OK. Will the CLI argument change as we continue modularizing, or is
>> this the only change that will be needed?
>>
>> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
>> > Hello,
>> >
>> > I did some work to separate the below PR into smaller PRs.
>> >
>> >
>> >   *   Updating the versions of dependencies and maven plugins is done
>> > and merged into master.
>> >   *   I separated out the work modularizing arrow-vector,
>> > arrow-memory-core/unsafe, and arrow-memory-netty.
>> >
>> > Modularizing arrow-memory-core requires a smaller change to user
>> > command-line arguments. Instead of:
>> > --add-opens=java.base/java.nio=ALL-UNNAMED
>> >
>> > The user needs to add:
>> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
>> >
>> > I initially tried to modularize arrow-vector separately from
>> > arrow-memory-core but found that any meaningful operation in
>> > arrow-vector would trigger an illegal access in memory-core if it
>> > wasn’t modularized.
>> >
>> > I was able to run the tests for arrow-compression and arrow-tools
>> > successfully after modularizing memory-core, memory-unsafe-, and
>> > arrow-vector. Note that I had more success by making memory-core and
>> > memory-unsafe automatic modules.
>> >
>> > I think we should make a decision here on if we want to bite the bullet
>> > and introduce a breaking user-facing change around command-line
>> > options. The other option is to wait for JDK 21 to modularize. That’s
>> > farther down the line and requires refactoring much of the memory
>> > module code and implementing a module using the foreign memory
>> > interface.
>> >
>> > From: James Duong <Ja...@improving.com.INVALID>
>> > Date: Tuesday, November 28, 2023 at 6:48 PM
>> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> > Module System
>> > Hi,
>> >
>> > I’ve made some major progress on this work in this PR:
>> > https://github.com/apache/arrow/pull/38876
>> >
>> >
>> >   *   The maven plugin for compiling module-info.java files using JDK 8
>> > is working correctly.
>> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
>> > arrow-memory-unsafe, and arrow-vector have been modularized
>> > successfully.
>> >      *   Tests pass locally for all of these modules.
>> >      *   They fail in CI. This is likely from me not updating a profile
>> > somewhere.
>> >
>> > Similar to David’s PR from below, arrow-memory and modules needed to be
>> > refactored fairly significantly and split into two modules: a
>> > public-facing JPMS module and a separate module which adds to Netty’s
>> > packages (memory-netty-buffer-patch). What’s more problematic is that
>> > because we are using named modules now, users need to add more
>> > arguments to their Java command line to use arrow. If one were to use
>> > arrow-memory-netty they would need to add the following:
>> >
>> > --add-opens java.base/jdk.internal.misc=io.netty.common
>> >
>> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
>>
>> >
>> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
>> >
>> > Depending on where the memory-netty-buffer-patch JAR is located, and
>> > what version, the command the user needs to supply changes, so this
>> > seems like it’d be really inconvenient.
>> >
>> > Do we want to proceed with modularizing existing memory modules? Both
>> > netty and unsafe? Or wait until the new memory module from Java 21 is
>> > available?
>> >
>> > The module-info.java files are written fairly naively. I haven’t
>> > inspected thoroughly to determine what packages users will need.
>> >
>> > We can continue modularizing more components in a separate PR. Ideally
>> > all the user breakage (class movement, new command-line argument
>> > requirements) happens within one major Arrow version.
>> >
>> > From: James Duong <Ja...@improving.com.INVALID>
>> > Date: Tuesday, November 21, 2023 at 1:16 PM
>> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> > Module System
>> > I’m following up on this topic.
>> >
>> > David has a PR from last year that’s done much of the heavy lifting for
>> > refactoring the codebase to be package-friendly.
>> > https://github.com/apache/arrow/pull/13072
>> >
>> > What’s changed since and what’s left:
>> >
>> >   *   New components have been added (Flight SQL for example) that will
>> > need to be updated for modules.
>> >   *   There wasn’t a clear solution on how to do this without breaking
>> > JDK 8 support. Compiling module-info.java files require using JDK9, but
>> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
>> >      *   There is a Gradle plugin that can compile module-info.java
>> > files purely syntactically that we can adapt to maven. It has
>> > limitations (the one I see is that it can’t iterate through
>> > classloaders to handle annotations), but using this might be a good
>> > stopgap until we JDK 8 support is deprecated.
>> >   *   Some plugins need to be updated:
>> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
>> > files.
>> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
>> > existing checkstyle rules file can’t be loaded with newer versions. We
>> > can exclude module-info.java for now and have a separate Issue for
>> > updating checkstyle itself and the rules file.
>> >   *   grpc-java could not be modularized when the PR above was written.
>> >      *   Grpc 1.57 now can be modularized
>> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
>> >
>> > From: David Dali Susanibar Arce <da...@gmail.com>
>> > Date: Wednesday, May 25, 2022 at 5:02 AM
>> > To: dev@arrow.apache.org <de...@arrow.apache.org>
>> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
>> System
>> > Hi All,
>> >
>> > This email's purpose is a request for comments to migrate Arrow Java to
>> JPMS
>> > Java Platform Module System <
>> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e>
>> > JSE 9+ (1).
>> >
>> > Current status:
>> >
>> > - Arrow Java use JSE1.8 specification
>> >
>> > - Arrow Java works with JSE1.8/9/11/17
>> >
>> > - This is possible because Java offers “legacy mode”
>> >
>> > Proposal:
>> >
>> > Migrate to JPMS Java Platform Module System. This Draft PR
>> > <https://github.com/apache/arrow/pull/13072>(2<
>> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e>
>> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
>> <
>> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e>
>>
>> > contains an initial port of
>> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
>> > Vector
>> > for evaluation.
>> >
>> > Main Reason to migrate:
>> >
>> > - JPMS offer Strong encapsulation, Well-defined interfaces
>> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
>> dependencies.
>> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
>> >
>> > - JPMS offer reliable configuration and security to hide platform
>> internals.
>> >
>> > - JPMS offers a partial solution to solve problems about read (80%)
>> /write
>> > (20%) code.
>> >
>> > - JPMS offer optimization for readability about read/write ratio (90/10)
>> > thru module-info.java.
>> >
>> > - Consistency logs, JPMS implement consistency logs to really use that to
>> > solve the current problem.
>> >
>> > - Be able to customize JRE needed with only modules needed (not
>> > java.desktop for example and others) thru JLink.
>> >
>> > - Modules have also been implemented by other languages such as
>> Javascript
>> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
>> >
>> > - Consider taking a look at this discussion about pros/cons
>> > <
>> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
>> >
>> > (5).
>> >
>> > - Eventual migration to JPMS is a practical necessity as more projects
>> > migrate.
>> >
>> > Effort:
>> >
>> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
>> to
>> > offer support for both jar components JSE1.8 and JSE9+ included.
>> >
>> > - Go bottom up for JPMS.
>> >
>> > - Packages need to be unique (i.e. org.apache.arrow.memory /
>> > io.netty.buffer). Review Draft PR with initial proposal.
>> >
>> > - Dependencies also need to be modularized. If some of our current
>> > dependencies are not able to be used as a module this will be a blocker
>> for
>> > our modules (we could patch that but this is an extra effort).
>> >
>> > Killers:
>> >
>> > - FIXME! I need your support to identify killer reasons to be able to
>> push
>> > this implementation.
>> >
>> > Please let us know if Arrow Java to JPMS Java Platform Module System is
>> > needed and should be implemented.
>> >
>> > Please use this file for any comments
>> >
>> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
>> >
>> > Resources used:
>> >
>> > (1): https://openjdk.java.net/projects/jigsaw/spec/
>> >
>> > (2): https://github.com/apache/arrow/pull/13072
>> >
>> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
>> >
>> > (4): https://github.com/nipafx/demo-jigsaw-reflection
>> >
>> > (5):
>> >
>> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
>> >
>> > Best regards,
>> >
>> > --
>> > David
>>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by James Duong <Ja...@improving.com.INVALID>.
An update on this work.

I’ve been able to update the following to JPMS modules now:

  *   arrow-memory-core
  *   arrow-memory-unsafe
  *   arrow-format
  *   arrow-vector
  *   arrow-compression
  *   arrow-tools
  *   arrow-avro
  *   arrow-jdbc
  *   arrow-algorithm

I don’t think the following should be modularized (but would appreciate feedback as well):

  *   arrow-performance (test-only module)
  *   flight-integration-tests (test-only module)
  *   flight-jdbc-core
  *   flight-jdbc-driver (Need to think about this. MySQL suggests they want to modularize their JDBC driver: https://bugs.mysql.com/bug.php?id=97683 . How does this interact with the ServiceLoader changes in JPMS as well).

I’m starting the flight-related modules now. The first issue I’ve noticed is that flight-core and flight-grpc reuse packages. I’d like to combine flight-core and flight-grpc because of this (currently we only have Flight using grpc in Java).

After Flight I’d take a look at the modules that have native code:

  *   arrow-c-data
  *   arrow-orc
  *   arrow-dataset
  *   arrow-gandiva


From: James Duong <Ja...@improving.com.INVALID>
Date: Tuesday, December 5, 2023 at 1:39 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
I expect that we’d still have to change CLI arguments in JDK17.

The need for changing the CLI arguments is due to memory-core now being a named module while requiring access to Unsafe and using reflection for accessing internals of java.nio.Buffer.

We’re using JDK8 code for doing both currently. It might be worth looking into if there’s a JDK9+ way of doing this without needing reflection and compiling memory-core as a multi-release JAR.

I don’t expect more CLI argument changes unless we use reflection on NIO classes in other modules.


From: Adam Lippai <ad...@rigo.sk>
Date: Tuesday, December 5, 2023 at 9:28 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
I believe Spark 4.0 was mentioned before. It’ll require Java 17 and will be
released in a few months (June?).

Best regards,
Adam Lippai

On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:

> Thanks James for delving into this mess.
>
> It looks like this change is unavoidable if we want to modularize? I think
> this is OK. Will the CLI argument change as we continue modularizing, or is
> this the only change that will be needed?
>
> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> > Hello,
> >
> > I did some work to separate the below PR into smaller PRs.
> >
> >
> >   *   Updating the versions of dependencies and maven plugins is done
> > and merged into master.
> >   *   I separated out the work modularizing arrow-vector,
> > arrow-memory-core/unsafe, and arrow-memory-netty.
> >
> > Modularizing arrow-memory-core requires a smaller change to user
> > command-line arguments. Instead of:
> > --add-opens=java.base/java.nio=ALL-UNNAMED
> >
> > The user needs to add:
> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
> >
> > I initially tried to modularize arrow-vector separately from
> > arrow-memory-core but found that any meaningful operation in
> > arrow-vector would trigger an illegal access in memory-core if it
> > wasn’t modularized.
> >
> > I was able to run the tests for arrow-compression and arrow-tools
> > successfully after modularizing memory-core, memory-unsafe-, and
> > arrow-vector. Note that I had more success by making memory-core and
> > memory-unsafe automatic modules.
> >
> > I think we should make a decision here on if we want to bite the bullet
> > and introduce a breaking user-facing change around command-line
> > options. The other option is to wait for JDK 21 to modularize. That’s
> > farther down the line and requires refactoring much of the memory
> > module code and implementing a module using the foreign memory
> > interface.
> >
> > From: James Duong <Ja...@improving.com.INVALID>
> > Date: Tuesday, November 28, 2023 at 6:48 PM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > Hi,
> >
> > I’ve made some major progress on this work in this PR:
> > https://github.com/apache/arrow/pull/38876
> >
> >
> >   *   The maven plugin for compiling module-info.java files using JDK 8
> > is working correctly.
> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
> > arrow-memory-unsafe, and arrow-vector have been modularized
> > successfully.
> >      *   Tests pass locally for all of these modules.
> >      *   They fail in CI. This is likely from me not updating a profile
> > somewhere.
> >
> > Similar to David’s PR from below, arrow-memory and modules needed to be
> > refactored fairly significantly and split into two modules: a
> > public-facing JPMS module and a separate module which adds to Netty’s
> > packages (memory-netty-buffer-patch). What’s more problematic is that
> > because we are using named modules now, users need to add more
> > arguments to their Java command line to use arrow. If one were to use
> > arrow-memory-netty they would need to add the following:
> >
> > --add-opens java.base/jdk.internal.misc=io.netty.common
> >
> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
>
> >
> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
> >
> > Depending on where the memory-netty-buffer-patch JAR is located, and
> > what version, the command the user needs to supply changes, so this
> > seems like it’d be really inconvenient.
> >
> > Do we want to proceed with modularizing existing memory modules? Both
> > netty and unsafe? Or wait until the new memory module from Java 21 is
> > available?
> >
> > The module-info.java files are written fairly naively. I haven’t
> > inspected thoroughly to determine what packages users will need.
> >
> > We can continue modularizing more components in a separate PR. Ideally
> > all the user breakage (class movement, new command-line argument
> > requirements) happens within one major Arrow version.
> >
> > From: James Duong <Ja...@improving.com.INVALID>
> > Date: Tuesday, November 21, 2023 at 1:16 PM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > I’m following up on this topic.
> >
> > David has a PR from last year that’s done much of the heavy lifting for
> > refactoring the codebase to be package-friendly.
> > https://github.com/apache/arrow/pull/13072
> >
> > What’s changed since and what’s left:
> >
> >   *   New components have been added (Flight SQL for example) that will
> > need to be updated for modules.
> >   *   There wasn’t a clear solution on how to do this without breaking
> > JDK 8 support. Compiling module-info.java files require using JDK9, but
> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
> >      *   There is a Gradle plugin that can compile module-info.java
> > files purely syntactically that we can adapt to maven. It has
> > limitations (the one I see is that it can’t iterate through
> > classloaders to handle annotations), but using this might be a good
> > stopgap until we JDK 8 support is deprecated.
> >   *   Some plugins need to be updated:
> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
> > files.
> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
> > existing checkstyle rules file can’t be loaded with newer versions. We
> > can exclude module-info.java for now and have a separate Issue for
> > updating checkstyle itself and the rules file.
> >   *   grpc-java could not be modularized when the PR above was written.
> >      *   Grpc 1.57 now can be modularized
> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
> >
> > From: David Dali Susanibar Arce <da...@gmail.com>
> > Date: Wednesday, May 25, 2022 at 5:02 AM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
> System
> > Hi All,
> >
> > This email's purpose is a request for comments to migrate Arrow Java to
> JPMS
> > Java Platform Module System <
> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e>
> > JSE 9+ (1).
> >
> > Current status:
> >
> > - Arrow Java use JSE1.8 specification
> >
> > - Arrow Java works with JSE1.8/9/11/17
> >
> > - This is possible because Java offers “legacy mode”
> >
> > Proposal:
> >
> > Migrate to JPMS Java Platform Module System. This Draft PR
> > <https://github.com/apache/arrow/pull/13072>(2<
> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e>
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
> <
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e>
>
> > contains an initial port of
> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
> > Vector
> > for evaluation.
> >
> > Main Reason to migrate:
> >
> > - JPMS offer Strong encapsulation, Well-defined interfaces
> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
> dependencies.
> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
> >
> > - JPMS offer reliable configuration and security to hide platform
> internals.
> >
> > - JPMS offers a partial solution to solve problems about read (80%)
> /write
> > (20%) code.
> >
> > - JPMS offer optimization for readability about read/write ratio (90/10)
> > thru module-info.java.
> >
> > - Consistency logs, JPMS implement consistency logs to really use that to
> > solve the current problem.
> >
> > - Be able to customize JRE needed with only modules needed (not
> > java.desktop for example and others) thru JLink.
> >
> > - Modules have also been implemented by other languages such as
> Javascript
> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
> >
> > - Consider taking a look at this discussion about pros/cons
> > <
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > (5).
> >
> > - Eventual migration to JPMS is a practical necessity as more projects
> > migrate.
> >
> > Effort:
> >
> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
> to
> > offer support for both jar components JSE1.8 and JSE9+ included.
> >
> > - Go bottom up for JPMS.
> >
> > - Packages need to be unique (i.e. org.apache.arrow.memory /
> > io.netty.buffer). Review Draft PR with initial proposal.
> >
> > - Dependencies also need to be modularized. If some of our current
> > dependencies are not able to be used as a module this will be a blocker
> for
> > our modules (we could patch that but this is an extra effort).
> >
> > Killers:
> >
> > - FIXME! I need your support to identify killer reasons to be able to
> push
> > this implementation.
> >
> > Please let us know if Arrow Java to JPMS Java Platform Module System is
> > needed and should be implemented.
> >
> > Please use this file for any comments
> >
> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
> >
> > Resources used:
> >
> > (1): https://openjdk.java.net/projects/jigsaw/spec/
> >
> > (2): https://github.com/apache/arrow/pull/13072
> >
> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
> >
> > (4): https://github.com/nipafx/demo-jigsaw-reflection
> >
> > (5):
> >
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > Best regards,
> >
> > --
> > David
>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by James Duong <Ja...@improving.com.INVALID>.
I expect that we’d still have to change CLI arguments in JDK17.

The need for changing the CLI arguments is due to memory-core now being a named module while requiring access to Unsafe and using reflection for accessing internals of java.nio.Buffer.

We’re using JDK8 code for doing both currently. It might be worth looking into if there’s a JDK9+ way of doing this without needing reflection and compiling memory-core as a multi-release JAR.

I don’t expect more CLI argument changes unless we use reflection on NIO classes in other modules.


From: Adam Lippai <ad...@rigo.sk>
Date: Tuesday, December 5, 2023 at 9:28 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
I believe Spark 4.0 was mentioned before. It’ll require Java 17 and will be
released in a few months (June?).

Best regards,
Adam Lippai

On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:

> Thanks James for delving into this mess.
>
> It looks like this change is unavoidable if we want to modularize? I think
> this is OK. Will the CLI argument change as we continue modularizing, or is
> this the only change that will be needed?
>
> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> > Hello,
> >
> > I did some work to separate the below PR into smaller PRs.
> >
> >
> >   *   Updating the versions of dependencies and maven plugins is done
> > and merged into master.
> >   *   I separated out the work modularizing arrow-vector,
> > arrow-memory-core/unsafe, and arrow-memory-netty.
> >
> > Modularizing arrow-memory-core requires a smaller change to user
> > command-line arguments. Instead of:
> > --add-opens=java.base/java.nio=ALL-UNNAMED
> >
> > The user needs to add:
> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
> >
> > I initially tried to modularize arrow-vector separately from
> > arrow-memory-core but found that any meaningful operation in
> > arrow-vector would trigger an illegal access in memory-core if it
> > wasn’t modularized.
> >
> > I was able to run the tests for arrow-compression and arrow-tools
> > successfully after modularizing memory-core, memory-unsafe-, and
> > arrow-vector. Note that I had more success by making memory-core and
> > memory-unsafe automatic modules.
> >
> > I think we should make a decision here on if we want to bite the bullet
> > and introduce a breaking user-facing change around command-line
> > options. The other option is to wait for JDK 21 to modularize. That’s
> > farther down the line and requires refactoring much of the memory
> > module code and implementing a module using the foreign memory
> > interface.
> >
> > From: James Duong <Ja...@improving.com.INVALID>
> > Date: Tuesday, November 28, 2023 at 6:48 PM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > Hi,
> >
> > I’ve made some major progress on this work in this PR:
> > https://github.com/apache/arrow/pull/38876
> >
> >
> >   *   The maven plugin for compiling module-info.java files using JDK 8
> > is working correctly.
> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
> > arrow-memory-unsafe, and arrow-vector have been modularized
> > successfully.
> >      *   Tests pass locally for all of these modules.
> >      *   They fail in CI. This is likely from me not updating a profile
> > somewhere.
> >
> > Similar to David’s PR from below, arrow-memory and modules needed to be
> > refactored fairly significantly and split into two modules: a
> > public-facing JPMS module and a separate module which adds to Netty’s
> > packages (memory-netty-buffer-patch). What’s more problematic is that
> > because we are using named modules now, users need to add more
> > arguments to their Java command line to use arrow. If one were to use
> > arrow-memory-netty they would need to add the following:
> >
> > --add-opens java.base/jdk.internal.misc=io.netty.common
> >
> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
>
> >
> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
> >
> > Depending on where the memory-netty-buffer-patch JAR is located, and
> > what version, the command the user needs to supply changes, so this
> > seems like it’d be really inconvenient.
> >
> > Do we want to proceed with modularizing existing memory modules? Both
> > netty and unsafe? Or wait until the new memory module from Java 21 is
> > available?
> >
> > The module-info.java files are written fairly naively. I haven’t
> > inspected thoroughly to determine what packages users will need.
> >
> > We can continue modularizing more components in a separate PR. Ideally
> > all the user breakage (class movement, new command-line argument
> > requirements) happens within one major Arrow version.
> >
> > From: James Duong <Ja...@improving.com.INVALID>
> > Date: Tuesday, November 21, 2023 at 1:16 PM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > I’m following up on this topic.
> >
> > David has a PR from last year that’s done much of the heavy lifting for
> > refactoring the codebase to be package-friendly.
> > https://github.com/apache/arrow/pull/13072
> >
> > What’s changed since and what’s left:
> >
> >   *   New components have been added (Flight SQL for example) that will
> > need to be updated for modules.
> >   *   There wasn’t a clear solution on how to do this without breaking
> > JDK 8 support. Compiling module-info.java files require using JDK9, but
> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
> >      *   There is a Gradle plugin that can compile module-info.java
> > files purely syntactically that we can adapt to maven. It has
> > limitations (the one I see is that it can’t iterate through
> > classloaders to handle annotations), but using this might be a good
> > stopgap until we JDK 8 support is deprecated.
> >   *   Some plugins need to be updated:
> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
> > files.
> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
> > existing checkstyle rules file can’t be loaded with newer versions. We
> > can exclude module-info.java for now and have a separate Issue for
> > updating checkstyle itself and the rules file.
> >   *   grpc-java could not be modularized when the PR above was written.
> >      *   Grpc 1.57 now can be modularized
> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
> >
> > From: David Dali Susanibar Arce <da...@gmail.com>
> > Date: Wednesday, May 25, 2022 at 5:02 AM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
> System
> > Hi All,
> >
> > This email's purpose is a request for comments to migrate Arrow Java to
> JPMS
> > Java Platform Module System <
> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e>
> > JSE 9+ (1).
> >
> > Current status:
> >
> > - Arrow Java use JSE1.8 specification
> >
> > - Arrow Java works with JSE1.8/9/11/17
> >
> > - This is possible because Java offers “legacy mode”
> >
> > Proposal:
> >
> > Migrate to JPMS Java Platform Module System. This Draft PR
> > <https://github.com/apache/arrow/pull/13072>(2<
> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c>
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
> <
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)>
>
> > contains an initial port of
> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
> > Vector
> > for evaluation.
> >
> > Main Reason to migrate:
> >
> > - JPMS offer Strong encapsulation, Well-defined interfaces
> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
> dependencies.
> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
> >
> > - JPMS offer reliable configuration and security to hide platform
> internals.
> >
> > - JPMS offers a partial solution to solve problems about read (80%)
> /write
> > (20%) code.
> >
> > - JPMS offer optimization for readability about read/write ratio (90/10)
> > thru module-info.java.
> >
> > - Consistency logs, JPMS implement consistency logs to really use that to
> > solve the current problem.
> >
> > - Be able to customize JRE needed with only modules needed (not
> > java.desktop for example and others) thru JLink.
> >
> > - Modules have also been implemented by other languages such as
> Javascript
> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
> >
> > - Consider taking a look at this discussion about pros/cons
> > <
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > (5).
> >
> > - Eventual migration to JPMS is a practical necessity as more projects
> > migrate.
> >
> > Effort:
> >
> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
> to
> > offer support for both jar components JSE1.8 and JSE9+ included.
> >
> > - Go bottom up for JPMS.
> >
> > - Packages need to be unique (i.e. org.apache.arrow.memory /
> > io.netty.buffer). Review Draft PR with initial proposal.
> >
> > - Dependencies also need to be modularized. If some of our current
> > dependencies are not able to be used as a module this will be a blocker
> for
> > our modules (we could patch that but this is an extra effort).
> >
> > Killers:
> >
> > - FIXME! I need your support to identify killer reasons to be able to
> push
> > this implementation.
> >
> > Please let us know if Arrow Java to JPMS Java Platform Module System is
> > needed and should be implemented.
> >
> > Please use this file for any comments
> >
> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
> >
> > Resources used:
> >
> > (1): https://openjdk.java.net/projects/jigsaw/spec/
> >
> > (2): https://github.com/apache/arrow/pull/13072
> >
> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
> >
> > (4): https://github.com/nipafx/demo-jigsaw-reflection
> >
> > (5):
> >
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > Best regards,
> >
> > --
> > David
>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by Adam Lippai <ad...@rigo.sk>.
I believe Spark 4.0 was mentioned before. It’ll require Java 17 and will be
released in a few months (June?).

Best regards,
Adam Lippai

On Tue, Dec 5, 2023 at 12:05 David Li <li...@apache.org> wrote:

> Thanks James for delving into this mess.
>
> It looks like this change is unavoidable if we want to modularize? I think
> this is OK. Will the CLI argument change as we continue modularizing, or is
> this the only change that will be needed?
>
> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> > Hello,
> >
> > I did some work to separate the below PR into smaller PRs.
> >
> >
> >   *   Updating the versions of dependencies and maven plugins is done
> > and merged into master.
> >   *   I separated out the work modularizing arrow-vector,
> > arrow-memory-core/unsafe, and arrow-memory-netty.
> >
> > Modularizing arrow-memory-core requires a smaller change to user
> > command-line arguments. Instead of:
> > --add-opens=java.base/java.nio=ALL-UNNAMED
> >
> > The user needs to add:
> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
> >
> > I initially tried to modularize arrow-vector separately from
> > arrow-memory-core but found that any meaningful operation in
> > arrow-vector would trigger an illegal access in memory-core if it
> > wasn’t modularized.
> >
> > I was able to run the tests for arrow-compression and arrow-tools
> > successfully after modularizing memory-core, memory-unsafe-, and
> > arrow-vector. Note that I had more success by making memory-core and
> > memory-unsafe automatic modules.
> >
> > I think we should make a decision here on if we want to bite the bullet
> > and introduce a breaking user-facing change around command-line
> > options. The other option is to wait for JDK 21 to modularize. That’s
> > farther down the line and requires refactoring much of the memory
> > module code and implementing a module using the foreign memory
> > interface.
> >
> > From: James Duong <Ja...@improving.com.INVALID>
> > Date: Tuesday, November 28, 2023 at 6:48 PM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > Hi,
> >
> > I’ve made some major progress on this work in this PR:
> > https://github.com/apache/arrow/pull/38876
> >
> >
> >   *   The maven plugin for compiling module-info.java files using JDK 8
> > is working correctly.
> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
> > arrow-memory-unsafe, and arrow-vector have been modularized
> > successfully.
> >      *   Tests pass locally for all of these modules.
> >      *   They fail in CI. This is likely from me not updating a profile
> > somewhere.
> >
> > Similar to David’s PR from below, arrow-memory and modules needed to be
> > refactored fairly significantly and split into two modules: a
> > public-facing JPMS module and a separate module which adds to Netty’s
> > packages (memory-netty-buffer-patch). What’s more problematic is that
> > because we are using named modules now, users need to add more
> > arguments to their Java command line to use arrow. If one were to use
> > arrow-memory-netty they would need to add the following:
> >
> > --add-opens java.base/jdk.internal.misc=io.netty.common
> >
> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
>
> >
> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
> >
> > Depending on where the memory-netty-buffer-patch JAR is located, and
> > what version, the command the user needs to supply changes, so this
> > seems like it’d be really inconvenient.
> >
> > Do we want to proceed with modularizing existing memory modules? Both
> > netty and unsafe? Or wait until the new memory module from Java 21 is
> > available?
> >
> > The module-info.java files are written fairly naively. I haven’t
> > inspected thoroughly to determine what packages users will need.
> >
> > We can continue modularizing more components in a separate PR. Ideally
> > all the user breakage (class movement, new command-line argument
> > requirements) happens within one major Arrow version.
> >
> > From: James Duong <Ja...@improving.com.INVALID>
> > Date: Tuesday, November 21, 2023 at 1:16 PM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > I’m following up on this topic.
> >
> > David has a PR from last year that’s done much of the heavy lifting for
> > refactoring the codebase to be package-friendly.
> > https://github.com/apache/arrow/pull/13072
> >
> > What’s changed since and what’s left:
> >
> >   *   New components have been added (Flight SQL for example) that will
> > need to be updated for modules.
> >   *   There wasn’t a clear solution on how to do this without breaking
> > JDK 8 support. Compiling module-info.java files require using JDK9, but
> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
> >      *   There is a Gradle plugin that can compile module-info.java
> > files purely syntactically that we can adapt to maven. It has
> > limitations (the one I see is that it can’t iterate through
> > classloaders to handle annotations), but using this might be a good
> > stopgap until we JDK 8 support is deprecated.
> >   *   Some plugins need to be updated:
> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
> > files.
> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
> > existing checkstyle rules file can’t be loaded with newer versions. We
> > can exclude module-info.java for now and have a separate Issue for
> > updating checkstyle itself and the rules file.
> >   *   grpc-java could not be modularized when the PR above was written.
> >      *   Grpc 1.57 now can be modularized
> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
> >
> > From: David Dali Susanibar Arce <da...@gmail.com>
> > Date: Wednesday, May 25, 2022 at 5:02 AM
> > To: dev@arrow.apache.org <de...@arrow.apache.org>
> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
> System
> > Hi All,
> >
> > This email's purpose is a request for comments to migrate Arrow Java to
> JPMS
> > Java Platform Module System <
> https://openjdk.java.net/projects/jigsaw/spec/>
> > JSE 9+ (1).
> >
> > Current status:
> >
> > - Arrow Java use JSE1.8 specification
> >
> > - Arrow Java works with JSE1.8/9/11/17
> >
> > - This is possible because Java offers “legacy mode”
> >
> > Proposal:
> >
> > Migrate to JPMS Java Platform Module System. This Draft PR
> > <https://github.com/apache/arrow/pull/13072>(2<
> https://github.com/apache/arrow/pull/13072%3e(2<
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
> <
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)
>
> > contains an initial port of
> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
> > Vector
> > for evaluation.
> >
> > Main Reason to migrate:
> >
> > - JPMS offer Strong encapsulation, Well-defined interfaces
> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
> dependencies.
> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
> >
> > - JPMS offer reliable configuration and security to hide platform
> internals.
> >
> > - JPMS offers a partial solution to solve problems about read (80%)
> /write
> > (20%) code.
> >
> > - JPMS offer optimization for readability about read/write ratio (90/10)
> > thru module-info.java.
> >
> > - Consistency logs, JPMS implement consistency logs to really use that to
> > solve the current problem.
> >
> > - Be able to customize JRE needed with only modules needed (not
> > java.desktop for example and others) thru JLink.
> >
> > - Modules have also been implemented by other languages such as
> Javascript
> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
> >
> > - Consider taking a look at this discussion about pros/cons
> > <
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > (5).
> >
> > - Eventual migration to JPMS is a practical necessity as more projects
> > migrate.
> >
> > Effort:
> >
> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
> to
> > offer support for both jar components JSE1.8 and JSE9+ included.
> >
> > - Go bottom up for JPMS.
> >
> > - Packages need to be unique (i.e. org.apache.arrow.memory /
> > io.netty.buffer). Review Draft PR with initial proposal.
> >
> > - Dependencies also need to be modularized. If some of our current
> > dependencies are not able to be used as a module this will be a blocker
> for
> > our modules (we could patch that but this is an extra effort).
> >
> > Killers:
> >
> > - FIXME! I need your support to identify killer reasons to be able to
> push
> > this implementation.
> >
> > Please let us know if Arrow Java to JPMS Java Platform Module System is
> > needed and should be implemented.
> >
> > Please use this file for any comments
> >
> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
> >
> > Resources used:
> >
> > (1): https://openjdk.java.net/projects/jigsaw/spec/
> >
> > (2): https://github.com/apache/arrow/pull/13072
> >
> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
> >
> > (4): https://github.com/nipafx/demo-jigsaw-reflection
> >
> > (5):
> >
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > Best regards,
> >
> > --
> > David
>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by David Li <li...@apache.org>.
Thanks James for delving into this mess.

It looks like this change is unavoidable if we want to modularize? I think this is OK. Will the CLI argument change as we continue modularizing, or is this the only change that will be needed?

On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> Hello,
>
> I did some work to separate the below PR into smaller PRs.
>
>
>   *   Updating the versions of dependencies and maven plugins is done 
> and merged into master.
>   *   I separated out the work modularizing arrow-vector, 
> arrow-memory-core/unsafe, and arrow-memory-netty.
>
> Modularizing arrow-memory-core requires a smaller change to user 
> command-line arguments. Instead of:
> --add-opens=java.base/java.nio=ALL-UNNAMED
>
> The user needs to add:
> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
>
> I initially tried to modularize arrow-vector separately from 
> arrow-memory-core but found that any meaningful operation in 
> arrow-vector would trigger an illegal access in memory-core if it 
> wasn’t modularized.
>
> I was able to run the tests for arrow-compression and arrow-tools 
> successfully after modularizing memory-core, memory-unsafe-, and 
> arrow-vector. Note that I had more success by making memory-core and 
> memory-unsafe automatic modules.
>
> I think we should make a decision here on if we want to bite the bullet 
> and introduce a breaking user-facing change around command-line 
> options. The other option is to wait for JDK 21 to modularize. That’s 
> farther down the line and requires refactoring much of the memory 
> module code and implementing a module using the foreign memory 
> interface.
>
> From: James Duong <Ja...@improving.com.INVALID>
> Date: Tuesday, November 28, 2023 at 6:48 PM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> Hi,
>
> I’ve made some major progress on this work in this PR:
> https://github.com/apache/arrow/pull/38876
>
>
>   *   The maven plugin for compiling module-info.java files using JDK 8 
> is working correctly.
>   *   arrow-format, arrow-memory-core, arrow-memory-netty, 
> arrow-memory-unsafe, and arrow-vector have been modularized 
> successfully.
>      *   Tests pass locally for all of these modules.
>      *   They fail in CI. This is likely from me not updating a profile 
> somewhere.
>
> Similar to David’s PR from below, arrow-memory and modules needed to be 
> refactored fairly significantly and split into two modules: a 
> public-facing JPMS module and a separate module which adds to Netty’s 
> packages (memory-netty-buffer-patch). What’s more problematic is that 
> because we are using named modules now, users need to add more 
> arguments to their Java command line to use arrow. If one were to use 
> arrow-memory-netty they would need to add the following:
>
> --add-opens java.base/jdk.internal.misc=io.netty.common 
> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar 
> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
>
> Depending on where the memory-netty-buffer-patch JAR is located, and 
> what version, the command the user needs to supply changes, so this 
> seems like it’d be really inconvenient.
>
> Do we want to proceed with modularizing existing memory modules? Both 
> netty and unsafe? Or wait until the new memory module from Java 21 is 
> available?
>
> The module-info.java files are written fairly naively. I haven’t 
> inspected thoroughly to determine what packages users will need.
>
> We can continue modularizing more components in a separate PR. Ideally 
> all the user breakage (class movement, new command-line argument 
> requirements) happens within one major Arrow version.
>
> From: James Duong <Ja...@improving.com.INVALID>
> Date: Tuesday, November 21, 2023 at 1:16 PM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> I’m following up on this topic.
>
> David has a PR from last year that’s done much of the heavy lifting for 
> refactoring the codebase to be package-friendly.
> https://github.com/apache/arrow/pull/13072
>
> What’s changed since and what’s left:
>
>   *   New components have been added (Flight SQL for example) that will 
> need to be updated for modules.
>   *   There wasn’t a clear solution on how to do this without breaking 
> JDK 8 support. Compiling module-info.java files require using JDK9, but 
> using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
>      *   There is a Gradle plugin that can compile module-info.java 
> files purely syntactically that we can adapt to maven. It has 
> limitations (the one I see is that it can’t iterate through 
> classloaders to handle annotations), but using this might be a good 
> stopgap until we JDK 8 support is deprecated.
>   *   Some plugins need to be updated:
>      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class 
> files.
>      *   checkstyle 3.1.0 can’t parse module-info.java files. Our 
> existing checkstyle rules file can’t be loaded with newer versions. We 
> can exclude module-info.java for now and have a separate Issue for 
> updating checkstyle itself and the rules file.
>   *   grpc-java could not be modularized when the PR above was written.
>      *   Grpc 1.57 now can be modularized 
> (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
>
> From: David Dali Susanibar Arce <da...@gmail.com>
> Date: Wednesday, May 25, 2022 at 5:02 AM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
> Hi All,
>
> This email's purpose is a request for comments to migrate Arrow Java to JPMS
> Java Platform Module System <https://openjdk.java.net/projects/jigsaw/spec/>
> JSE 9+ (1).
>
> Current status:
>
> - Arrow Java use JSE1.8 specification
>
> - Arrow Java works with JSE1.8/9/11/17
>
> - This is possible because Java offers “legacy mode”
>
> Proposal:
>
> Migrate to JPMS Java Platform Module System. This Draft PR
> <https://github.com/apache/arrow/pull/13072>(2<https://github.com/apache/arrow/pull/13072%3e(2<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>) 
> contains an initial port of
> the modules: Format / Memory Core / Memory Netty / Memory Unsafe / 
> Vector
> for evaluation.
>
> Main Reason to migrate:
>
> - JPMS offer Strong encapsulation, Well-defined interfaces
> <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit dependencies.
> <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
>
> - JPMS offer reliable configuration and security to hide platform internals.
>
> - JPMS offers a partial solution to solve problems about read (80%) /write
> (20%) code.
>
> - JPMS offer optimization for readability about read/write ratio (90/10)
> thru module-info.java.
>
> - Consistency logs, JPMS implement consistency logs to really use that to
> solve the current problem.
>
> - Be able to customize JRE needed with only modules needed (not
> java.desktop for example and others) thru JLink.
>
> - Modules have also been implemented by other languages such as Javascript
> (ES2015), C++(C++20), Net (Nuget/NetCore)..
>
> - Consider taking a look at this discussion about pros/cons
> <https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/>
> (5).
>
> - Eventual migration to JPMS is a practical necessity as more projects
> migrate.
>
> Effort:
>
> - First of all we need to decide to move from JSE1.8 to JSE9+ or be able to
> offer support for both jar components JSE1.8 and JSE9+ included.
>
> - Go bottom up for JPMS.
>
> - Packages need to be unique (i.e. org.apache.arrow.memory /
> io.netty.buffer). Review Draft PR with initial proposal.
>
> - Dependencies also need to be modularized. If some of our current
> dependencies are not able to be used as a module this will be a blocker for
> our modules (we could patch that but this is an extra effort).
>
> Killers:
>
> - FIXME! I need your support to identify killer reasons to be able to push
> this implementation.
>
> Please let us know if Arrow Java to JPMS Java Platform Module System is
> needed and should be implemented.
>
> Please use this file for any comments
> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
>
> Resources used:
>
> (1): https://openjdk.java.net/projects/jigsaw/spec/
>
> (2): https://github.com/apache/arrow/pull/13072
>
> (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
>
> (4): https://github.com/nipafx/demo-jigsaw-reflection
>
> (5):
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
>
> Best regards,
>
> --
> David

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by James Duong <Ja...@improving.com.INVALID>.
Hello,

I did some work to separate the below PR into smaller PRs.


  *   Updating the versions of dependencies and maven plugins is done and merged into master.
  *   I separated out the work modularizing arrow-vector, arrow-memory-core/unsafe, and arrow-memory-netty.

Modularizing arrow-memory-core requires a smaller change to user command-line arguments. Instead of:
--add-opens=java.base/java.nio=ALL-UNNAMED

The user needs to add:
--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED

I initially tried to modularize arrow-vector separately from arrow-memory-core but found that any meaningful operation in arrow-vector would trigger an illegal access in memory-core if it wasn’t modularized.

I was able to run the tests for arrow-compression and arrow-tools successfully after modularizing memory-core, memory-unsafe-, and arrow-vector. Note that I had more success by making memory-core and memory-unsafe automatic modules.

I think we should make a decision here on if we want to bite the bullet and introduce a breaking user-facing change around command-line options. The other option is to wait for JDK 21 to modularize. That’s farther down the line and requires refactoring much of the memory module code and implementing a module using the foreign memory interface.

From: James Duong <Ja...@improving.com.INVALID>
Date: Tuesday, November 28, 2023 at 6:48 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
Hi,

I’ve made some major progress on this work in this PR:
https://github.com/apache/arrow/pull/38876


  *   The maven plugin for compiling module-info.java files using JDK 8 is working correctly.
  *   arrow-format, arrow-memory-core, arrow-memory-netty, arrow-memory-unsafe, and arrow-vector have been modularized successfully.
     *   Tests pass locally for all of these modules.
     *   They fail in CI. This is likely from me not updating a profile somewhere.

Similar to David’s PR from below, arrow-memory and modules needed to be refactored fairly significantly and split into two modules: a public-facing JPMS module and a separate module which adds to Netty’s packages (memory-netty-buffer-patch). What’s more problematic is that because we are using named modules now, users need to add more arguments to their Java command line to use arrow. If one were to use arrow-memory-netty they would need to add the following:

--add-opens java.base/jdk.internal.misc=io.netty.common --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED

Depending on where the memory-netty-buffer-patch JAR is located, and what version, the command the user needs to supply changes, so this seems like it’d be really inconvenient.

Do we want to proceed with modularizing existing memory modules? Both netty and unsafe? Or wait until the new memory module from Java 21 is available?

The module-info.java files are written fairly naively. I haven’t inspected thoroughly to determine what packages users will need.

We can continue modularizing more components in a separate PR. Ideally all the user breakage (class movement, new command-line argument requirements) happens within one major Arrow version.

From: James Duong <Ja...@improving.com.INVALID>
Date: Tuesday, November 21, 2023 at 1:16 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
I’m following up on this topic.

David has a PR from last year that’s done much of the heavy lifting for refactoring the codebase to be package-friendly.
https://github.com/apache/arrow/pull/13072

What’s changed since and what’s left:

  *   New components have been added (Flight SQL for example) that will need to be updated for modules.
  *   There wasn’t a clear solution on how to do this without breaking JDK 8 support. Compiling module-info.java files require using JDK9, but using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
     *   There is a Gradle plugin that can compile module-info.java files purely syntactically that we can adapt to maven. It has limitations (the one I see is that it can’t iterate through classloaders to handle annotations), but using this might be a good stopgap until we JDK 8 support is deprecated.
  *   Some plugins need to be updated:
     *   maven-dependency-plugin 3.0.1 can’t parse module-info.class files.
     *   checkstyle 3.1.0 can’t parse module-info.java files. Our existing checkstyle rules file can’t be loaded with newer versions. We can exclude module-info.java for now and have a separate Issue for updating checkstyle itself and the rules file.
  *   grpc-java could not be modularized when the PR above was written.
     *   Grpc 1.57 now can be modularized (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)

From: David Dali Susanibar Arce <da...@gmail.com>
Date: Wednesday, May 25, 2022 at 5:02 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
Hi All,

This email's purpose is a request for comments to migrate Arrow Java to JPMS
Java Platform Module System <https://openjdk.java.net/projects/jigsaw/spec/>
JSE 9+ (1).

Current status:

- Arrow Java use JSE1.8 specification

- Arrow Java works with JSE1.8/9/11/17

- This is possible because Java offers “legacy mode”

Proposal:

Migrate to JPMS Java Platform Module System. This Draft PR
<https://github.com/apache/arrow/pull/13072>(2<https://github.com/apache/arrow/pull/13072%3e(2<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>) contains an initial port of
the modules: Format / Memory Core / Memory Netty / Memory Unsafe / Vector
for evaluation.

Main Reason to migrate:

- JPMS offer Strong encapsulation, Well-defined interfaces
<https://github.com/nipafx/demo-jigsaw-reflection>, Explicit dependencies.
<https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)

- JPMS offer reliable configuration and security to hide platform internals.

- JPMS offers a partial solution to solve problems about read (80%) /write
(20%) code.

- JPMS offer optimization for readability about read/write ratio (90/10)
thru module-info.java.

- Consistency logs, JPMS implement consistency logs to really use that to
solve the current problem.

- Be able to customize JRE needed with only modules needed (not
java.desktop for example and others) thru JLink.

- Modules have also been implemented by other languages such as Javascript
(ES2015), C++(C++20), Net (Nuget/NetCore)..

- Consider taking a look at this discussion about pros/cons
<https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/>
(5).

- Eventual migration to JPMS is a practical necessity as more projects
migrate.

Effort:

- First of all we need to decide to move from JSE1.8 to JSE9+ or be able to
offer support for both jar components JSE1.8 and JSE9+ included.

- Go bottom up for JPMS.

- Packages need to be unique (i.e. org.apache.arrow.memory /
io.netty.buffer). Review Draft PR with initial proposal.

- Dependencies also need to be modularized. If some of our current
dependencies are not able to be used as a module this will be a blocker for
our modules (we could patch that but this is an extra effort).

Killers:

- FIXME! I need your support to identify killer reasons to be able to push
this implementation.

Please let us know if Arrow Java to JPMS Java Platform Module System is
needed and should be implemented.

Please use this file for any comments
https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing

Resources used:

(1): https://openjdk.java.net/projects/jigsaw/spec/

(2): https://github.com/apache/arrow/pull/13072

(3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/

(4): https://github.com/nipafx/demo-jigsaw-reflection

(5):
https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/

Best regards,

--
David

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by James Duong <Ja...@improving.com.INVALID>.
Hi,

I’ve made some major progress on this work in this PR:
https://github.com/apache/arrow/pull/38876


  *   The maven plugin for compiling module-info.java files using JDK 8 is working correctly.
  *   arrow-format, arrow-memory-core, arrow-memory-netty, arrow-memory-unsafe, and arrow-vector have been modularized successfully.
     *   Tests pass locally for all of these modules.
     *   They fail in CI. This is likely from me not updating a profile somewhere.

Similar to David’s PR from below, arrow-memory and modules needed to be refactored fairly significantly and split into two modules: a public-facing JPMS module and a separate module which adds to Netty’s packages (memory-netty-buffer-patch). What’s more problematic is that because we are using named modules now, users need to add more arguments to their Java command line to use arrow. If one were to use arrow-memory-netty they would need to add the following:

--add-opens java.base/jdk.internal.misc=io.netty.common --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED

Depending on where the memory-netty-buffer-patch JAR is located, and what version, the command the user needs to supply changes, so this seems like it’d be really inconvenient.

Do we want to proceed with modularizing existing memory modules? Both netty and unsafe? Or wait until the new memory module from Java 21 is available?

The module-info.java files are written fairly naively. I haven’t inspected thoroughly to determine what packages users will need.

We can continue modularizing more components in a separate PR. Ideally all the user breakage (class movement, new command-line argument requirements) happens within one major Arrow version.

From: James Duong <Ja...@improving.com.INVALID>
Date: Tuesday, November 21, 2023 at 1:16 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
I’m following up on this topic.

David has a PR from last year that’s done much of the heavy lifting for refactoring the codebase to be package-friendly.
https://github.com/apache/arrow/pull/13072

What’s changed since and what’s left:

  *   New components have been added (Flight SQL for example) that will need to be updated for modules.
  *   There wasn’t a clear solution on how to do this without breaking JDK 8 support. Compiling module-info.java files require using JDK9, but using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
     *   There is a Gradle plugin that can compile module-info.java files purely syntactically that we can adapt to maven. It has limitations (the one I see is that it can’t iterate through classloaders to handle annotations), but using this might be a good stopgap until we JDK 8 support is deprecated.
  *   Some plugins need to be updated:
     *   maven-dependency-plugin 3.0.1 can’t parse module-info.class files.
     *   checkstyle 3.1.0 can’t parse module-info.java files. Our existing checkstyle rules file can’t be loaded with newer versions. We can exclude module-info.java for now and have a separate Issue for updating checkstyle itself and the rules file.
  *   grpc-java could not be modularized when the PR above was written.
     *   Grpc 1.57 now can be modularized (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)

From: David Dali Susanibar Arce <da...@gmail.com>
Date: Wednesday, May 25, 2022 at 5:02 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
Hi All,

This email's purpose is a request for comments to migrate Arrow Java to JPMS
Java Platform Module System <https://openjdk.java.net/projects/jigsaw/spec/>
JSE 9+ (1).

Current status:

- Arrow Java use JSE1.8 specification

- Arrow Java works with JSE1.8/9/11/17

- This is possible because Java offers “legacy mode”

Proposal:

Migrate to JPMS Java Platform Module System. This Draft PR
<https://github.com/apache/arrow/pull/13072>(2<https://github.com/apache/arrow/pull/13072%3e(2<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>) contains an initial port of
the modules: Format / Memory Core / Memory Netty / Memory Unsafe / Vector
for evaluation.

Main Reason to migrate:

- JPMS offer Strong encapsulation, Well-defined interfaces
<https://github.com/nipafx/demo-jigsaw-reflection>, Explicit dependencies.
<https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)

- JPMS offer reliable configuration and security to hide platform internals.

- JPMS offers a partial solution to solve problems about read (80%) /write
(20%) code.

- JPMS offer optimization for readability about read/write ratio (90/10)
thru module-info.java.

- Consistency logs, JPMS implement consistency logs to really use that to
solve the current problem.

- Be able to customize JRE needed with only modules needed (not
java.desktop for example and others) thru JLink.

- Modules have also been implemented by other languages such as Javascript
(ES2015), C++(C++20), Net (Nuget/NetCore)..

- Consider taking a look at this discussion about pros/cons
<https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/>
(5).

- Eventual migration to JPMS is a practical necessity as more projects
migrate.

Effort:

- First of all we need to decide to move from JSE1.8 to JSE9+ or be able to
offer support for both jar components JSE1.8 and JSE9+ included.

- Go bottom up for JPMS.

- Packages need to be unique (i.e. org.apache.arrow.memory /
io.netty.buffer). Review Draft PR with initial proposal.

- Dependencies also need to be modularized. If some of our current
dependencies are not able to be used as a module this will be a blocker for
our modules (we could patch that but this is an extra effort).

Killers:

- FIXME! I need your support to identify killer reasons to be able to push
this implementation.

Please let us know if Arrow Java to JPMS Java Platform Module System is
needed and should be implemented.

Please use this file for any comments
https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing

Resources used:

(1): https://openjdk.java.net/projects/jigsaw/spec/

(2): https://github.com/apache/arrow/pull/13072

(3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/

(4): https://github.com/nipafx/demo-jigsaw-reflection

(5):
https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/

Best regards,

--
David

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

Posted by James Duong <Ja...@improving.com.INVALID>.
I’m following up on this topic.

David has a PR from last year that’s done much of the heavy lifting for refactoring the codebase to be package-friendly.
https://github.com/apache/arrow/pull/13072

What’s changed since and what’s left:

  *   New components have been added (Flight SQL for example) that will need to be updated for modules.
  *   There wasn’t a clear solution on how to do this without breaking JDK 8 support. Compiling module-info.java files require using JDK9, but using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
     *   There is a Gradle plugin that can compile module-info.java files purely syntactically that we can adapt to maven. It has limitations (the one I see is that it can’t iterate through classloaders to handle annotations), but using this might be a good stopgap until we JDK 8 support is deprecated.
  *   Some plugins need to be updated:
     *   maven-dependency-plugin 3.0.1 can’t parse module-info.class files.
     *   checkstyle 3.1.0 can’t parse module-info.java files. Our existing checkstyle rules file can’t be loaded with newer versions. We can exclude module-info.java for now and have a separate Issue for updating checkstyle itself and the rules file.
  *   grpc-java could not be modularized when the PR above was written.
     *   Grpc 1.57 now can be modularized (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)

From: David Dali Susanibar Arce <da...@gmail.com>
Date: Wednesday, May 25, 2022 at 5:02 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
Hi All,

This email's purpose is a request for comments to migrate Arrow Java to JPMS
Java Platform Module System <https://openjdk.java.net/projects/jigsaw/spec/>
JSE 9+ (1).

Current status:

- Arrow Java use JSE1.8 specification

- Arrow Java works with JSE1.8/9/11/17

- This is possible because Java offers “legacy mode”

Proposal:

Migrate to JPMS Java Platform Module System. This Draft PR
<https://github.com/apache/arrow/pull/13072>(2<https://github.com/apache/arrow/pull/13072%3e(2>) contains an initial port of
the modules: Format / Memory Core / Memory Netty / Memory Unsafe / Vector
for evaluation.

Main Reason to migrate:

- JPMS offer Strong encapsulation, Well-defined interfaces
<https://github.com/nipafx/demo-jigsaw-reflection>, Explicit dependencies.
<https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)

- JPMS offer reliable configuration and security to hide platform internals.

- JPMS offers a partial solution to solve problems about read (80%) /write
(20%) code.

- JPMS offer optimization for readability about read/write ratio (90/10)
thru module-info.java.

- Consistency logs, JPMS implement consistency logs to really use that to
solve the current problem.

- Be able to customize JRE needed with only modules needed (not
java.desktop for example and others) thru JLink.

- Modules have also been implemented by other languages such as Javascript
(ES2015), C++(C++20), Net (Nuget/NetCore)..

- Consider taking a look at this discussion about pros/cons
<https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/>
(5).

- Eventual migration to JPMS is a practical necessity as more projects
migrate.

Effort:

- First of all we need to decide to move from JSE1.8 to JSE9+ or be able to
offer support for both jar components JSE1.8 and JSE9+ included.

- Go bottom up for JPMS.

- Packages need to be unique (i.e. org.apache.arrow.memory /
io.netty.buffer). Review Draft PR with initial proposal.

- Dependencies also need to be modularized. If some of our current
dependencies are not able to be used as a module this will be a blocker for
our modules (we could patch that but this is an extra effort).

Killers:

- FIXME! I need your support to identify killer reasons to be able to push
this implementation.

Please let us know if Arrow Java to JPMS Java Platform Module System is
needed and should be implemented.

Please use this file for any comments
https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing

Resources used:

(1): https://openjdk.java.net/projects/jigsaw/spec/

(2): https://github.com/apache/arrow/pull/13072

(3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/

(4): https://github.com/nipafx/demo-jigsaw-reflection

(5):
https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/

Best regards,

--
David