You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "rmuir (via GitHub)" <gi...@apache.org> on 2023/05/17 00:30:13 UTC

[GitHub] [lucene] rmuir opened a new issue, #12302: vector API integration, plan B

rmuir opened a new issue, #12302:
URL: https://github.com/apache/lucene/issues/12302

   ### Description
   
   For years we have explored using the vector api to actually take advantage of SIMD units on the hardware. A couple of approaches have been attempted so far:
   1. try to coerce the hotspot superword autovectorization into giving better results: I think @jpountz may know details of this and it currently is limited to postings list decode, attempting to use 64-bit long. It is better than nothing, but not good enough. especially for stuff like vectors encoding and other bottlenecks.
   2. vectorize code and hope that the openjdk feature will "graduate" soon. This is not happening. Java is dying and becoming the next COBOL, in my opinion, as a result of ignoring the problem and delaying until "perfection". For evidence, simply look at vector api being in "6th incubation" with really no api changes happening, just waiting on other features (some wet dream project valhalla or whatever) which will prolly never land before we retire.
   3. hackedy-hรคcks: this is stuff to bypass the problem, such as prototype i wrote in frustration *two years ago* here: https://github.com/apache/lucene/pull/18 . It is dirty and insecure but it demonstrates we can potentially make stuff easier on the user and take advantage of the hardware.
   
   Currently, unless the user is extremely technical, they can't make use of the vector support their hardware has, which is terribly sad. If they have immense resources/funding/etc, they can fork lucene and patch the source code, and maintain a fork, hooking in incubator openjdk stuff, but that's too hard on users.
   
   I think we have to draw a line in the sand, basically we can not rely upon openjdk to be managed as a performant project, their decisions make no sense, we have to play a little less nicer and apply some hacks! otherwise give up and switch to a different programming language with better perf!
   
   So I'd suggest to look at the work @uschindler has done with mmap and the preview apis, and let's carve out a path where we use the vector api *IFF* the user opts in via the command-line.
   
   Proposal (depends entirely upon user's jdk version and supported flags): 
   1. user does nothing and runs lucene without special flags: they get a warning message logged to the console (once!) telling them they need to add some stuff to the commandline for best performance. something such as "vector falling back to scalar implementation: please add "--add-modules .x.y.z...."
   2. user supplies that command-line argument and lucene is faster and uses correct incubating vector api associated with their jdk version.
   
   Actually the system @uschindler developed I think is the correct design for this, the only trick is that the incubating api is more difficult than the preview api. So we need more build system support, it could require more stuff to be downloaded or build to be slower. But I think its the right decision? 
   
   We don't want to have base64-encoded hackiness that is hard to maintain, at the same time, we need to give the users option to opt-in to actually making use their hardware. I think we should suffer the complexity to make this easy on them. It fucking sucks that openjdk makes this almost impossible, but we need to do it for our users. That's what being a library is all about.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553609151

   > I'm not a commiter. Should I fork your fork and do PRs on it? And is [this the branch](https://github.com/ChrisHegarty/lucene/tree/panama_vector) we should base my work on?
   
   It is the correct branch. I would just send @ChrisHegarty a pull request if u have contributions. I am a committee but Might do the same thing myself if I have some time to hack on it soon. Pull requests are nice since we can review each other's changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1554052266

   I don't understand what's your intention is. The general setup is there. No infrastructure work needed anymore. The decision which variants of the code life in parallel is unrelated to this PR.
   
   Please keep on mind that not all code of Lucene can/may use those APIs at the moment, because vector specific apis can't be part of Lucene public API. So the pattern for other places would be similar to this one. Implement all algorithms with a simple API in the provider and call it from VectorUtil. More complex integrations like postings lists may need more thoughts.
   
   The amount of work for keeping support for various java versions depends on the number of implementations in the provider and how complex the glue code to lucene is. The current vector dot product is do simple, you could duplicate the code to support various versions 
   
   But for now let's start with java 20. The vector code in java 17 is outdated and has several performance traps. Memorymapping uding segments was added in 19 to Lucene. So don't let's go before that point in time.
   
   As Robert said: in contrast to mmap which is a preview API and FULLY supported by Lucene, the incubator code requires users to add command line flags and the apis are still in flux, I would only keep rather recent versions.
   
   We do not need to make any guarantee to which versions are supported. If a specific version falls out of our support matrix, the code will fall back to the implementation that's already there. You can be sure that we will make no official guarantees. It may change with every Lucene release. It will not break for anybody, just get slower, so no support guarantees are needed. We are an open force project not a commercial project!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1552872873

   yes, as nothing has been done yet :) I personally plan to start a branch if nobody beats me to it, but i'm stuck in a conference and without bandwidth this week.
   
   You can get an idea of what it takes to add support for every JDK release here using Uwe's mmap setup: https://github.com/apache/lucene/pull/12294
   
   It is for a preview api, but i think most of the logic applies. maybe the only difference here being at "read time" where we may have to load the class differently and use methodhandle guard (like the crappy prototype i posted in the description). But we can maintain .java files and not use base64! :)
   
   There are also some relevant special jenkins build jobs and stuff and yeah, we should be testing against EA releases if possible to stay current, just like Uwe does for mmap.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551026488

   > Of course we can also include the implementation classes into the main JAR file.
   
   To be clear this is what I propose, not making things pluggable. That's not related and not what I want. I want it to just work, similar to mmap. 
   
   Also we could use the approach for the postings and possibly docvalues decode. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551365070

   This does not look too complex: https://github.com/apache/lucene/blob/main/gradle/java/memorysegment-mrjar.gradle


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] nknize commented on issue #12302: vector API integration, plan B

Posted by "nknize (via GitHub)" <gi...@apache.org>.
nknize commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551636108

   > 2\. Java is dying and becoming the next COBOL
   
   ๐Ÿ˜† This is timely and 
   
   
   > let's carve out a path where we use the vector api IFF the user opts in via the command-line.
   
   YES!!! This is timely and the [heart of my comment on the proposal to increase KNN dimensionality ](https://github.com/apache/lucene/issues/11507#issuecomment-1548612414). 
   
   > Actually the system @uschindler developed I think is the correct design for this, .... we need more build system support, it could require more stuff to be downloaded or build to be slower. But I think its the right decision?
   
   ๐Ÿ’ฏ agree @uschindler low friction to enable preview is exactly the mechanism I was thinking about in my post. I'm not a gradle expert (ping @markrmiller ) but I think gradle's java toolchain allows for exactly this scenario. We did it for enabling preview features on OpenSearch, as did Elasticsearch so why can't we use this path in the upstream lucene project? Thank you for raising this proposal @rmuir 
   
   > ...and lucene is faster and uses correct incubating vector api 
   
   One quick question, though. What does this look like in practice? Do we create separate classes in `sandbox` to keep this isolated? Are you suggesting introducing a new experimental `vector` module w/ vector encapsulated logic? Or are you suggesting sprinkling `if (runtimeVersion >= 19 && runtimeVersion <= 21) {` logic around the existing vector implemenations (`e.g., `VectorUtil` and KNNVector stuff)? Or a mix of the above? Or is that what we're here to brainstorm?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551698369

   > > 2. Java is dying and becoming the next COBOL
   > 
   > ๐Ÿ˜†
   > 
   > > let's carve out a path where we use the vector api IFF the user opts in via the command-line.
   > 
   > YES!!! This is timely and the [heart of my comment on the proposal to increase KNN dimensionality ](https://github.com/apache/lucene/issues/11507#issuecomment-1548612414) (_is it time we explore a new optional Lucene Vector module that supports cutting edge JDK features through gradle tooling for optimizing the vector use case?_).
   > 
   > > Actually the system @uschindler developed I think is the correct design for this, .... we need more build system support, it could require more stuff to be downloaded or build to be slower. But I think its the right decision?
   > 
   > ๐Ÿ’ฏ agree @uschindler low friction to enable preview is exactly the mechanism I was thinking about in my post. I'm not a gradle expert (ping @markrmiller ) but I think gradle's java toolchain allows for this type of scenario? We did it for enabling preview features on OpenSearch, as did Elasticsearch so why can't we use this path in the upstream lucene project? Thank you for raising this proposal @rmuir
   > 
   > > ...and lucene is faster and uses correct incubating vector api
   > 
   > One quick question, though. What does this look like in practice? Do we create separate classes in `sandbox` to keep this isolated? Are you suggesting introducing a new experimental `vector` module w/ vector encapsulated logic? Or are you suggesting sprinkling `if (runtimeVersion >= 19 && runtimeVersion <= 21) {` logic around the existing vector implemenations (e.g., `VectorUtil` and KNNVector stuff)? Or a mix of the above? Or is that what we're here to brainstorm?
   
   The compilation problem is already solved. No need for toolchains. We tried that already, it's a mess especially because it does not support ea releases. Read the code referred above for memory mapping. Works identical for vectors and incubator. We compile against stubs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551363152

   > I realize if we do this, it will make the gradle uglier and less elegant. I think theres no way around it, we are re-implementing mr-jar basically. we just have to try to contain the complexity as much as we can... but I really feel this is just something we need to do on behalf of our users.
   
   Actually when I did this i made this very self contained. It is just a loop over java versions and a modified "jar" task coping gradle outputs to a different prefix dir in the JAR and setting the manifest bit.
   
   When we add vectors into the game we may need a bit more separation with build dirs / source sets. We currently have "main19", "main20", "main21", but maybe the names should in future include the type (panama, vector), so we can compile separately.
   
   I am ready to help you, just share when you have  a branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on issue #12302: vector API integration, plan B

Posted by "dweiss (via GitHub)" <gi...@apache.org>.
dweiss commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551497887

   I like this too, especially the part that it basically auto-wires based on whether you enable the corresponding jvm feature on cmd line.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1550830310

   Hi @rmuir,
   I agree with your proposals. #12219 looks like a first start: Making the implementations "pluggable" it allows users to have a custom implementation as separate jar file / module. This could also be a solution to add an implementation jar for each java version as a separate Lucene module. For compiling them, we can use exactly the same aproach like for MMapDirectory. Of course we can also include the implementation classes into the main JAR file.
   
   About the compilation:
   - yes, we can extract APIJAR stubs for the verctor API. I would use separate APIJAR stubs for this, especially as it is different java modules. The reason is how the compilation with module pathcing works in my code.
   - actually compiling might be simpler with the APIJAR stubs for incubator modules: They live in a private module hidden by default in the jdk. So the signatures are not visible unless you explicitely add the module to compiler. So here we can use a very simple approach: Just compile the vectorclasses against incubator classes without any module-patching, just add the APIJAR as a normal dependency. As the package names are unique there's no problem! This is the main reason why the APIJAR files need to be separate from the foreign ones.
   - We can use the same approach like for MMAP at runtime: Use the MR-JAR mechanism to separate the classes and hide them from stupid IDEs (actually the current approach would not need a MR-JAR at all, it is just used to "hide" the Java 19, 20, 21 classes from stupid IDEs). Of course user has to pass the `--add-module` explicitely. I'd not add the module to module-info as this would cause warning for ALL endusers.
   - at a later stage when the vector API goes into preview phase, we can merge all this with the current MMAP code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553314511

   BTW, and not even remotely suggested - I'm not trying to take over this work, just sketch out a few concrete things to help get it started. Whatever collaboration mechanism is typically used when developing such changes for Lucene should also be used here. How should be best work together on this? where will the code/branch live?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] contrebande-labs commented on issue #12302: vector API integration, plan B

Posted by "contrebande-labs (via GitHub)" <gi...@apache.org>.
contrebande-labs commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553824444

   From what I can see, the classes/files modified so far are:
   
   * [VectorUtil](https://github.com/ChrisHegarty/lucene/blob/panama_vector/lucene/core/src/java/org/apache/lucene/util/VectorUtil.java)
   * [JDKVectorUtilProvider](https://github.com/ChrisHegarty/lucene/blob/panama_vector/lucene/core/src/java20/org/apache/lucene/util/JDKVectorUtilProvider.java)
   * [panama-foreign.gradle](https://github.com/ChrisHegarty/lucene/blob/panama_vector/gradle/generation/panama-foreign.gradle) and [ExtractForeignAPI](https://github.com/ChrisHegarty/lucene/blob/panama_vector/gradle/generation/panama-foreign/ExtractForeignAPI.java)
   
   Would it make sense for me to port and maintain these changes to either/or [Java 17](https://openjdk.org/jeps/414), [Java 18](https://openjdk.org/jeps/417), [Java 19](https://openjdk.org/jeps/426) .. and [Java 21](https://openjdk.org/jeps/448) based on @ChrisHegarty's original [Java 20](https://openjdk.org/jeps/438) code? Also, should I start a panama-specific CI stack on my Github account with unit tests and benchmarks (I could write these too). Whatever you guys think is the priority...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1554223493

   ++ to all of what @uschindler and @rmuir said relating to which JDK versions we add support for. Specifically, let's start with JDK 20 **only**. After which we can prepare for, and test with, JDK 21 EA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] contrebande-labs commented on issue #12302: vector API integration, plan B

Posted by "contrebande-labs (via GitHub)" <gi...@apache.org>.
contrebande-labs commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1554466920

   Guys, relax. My intention is to _**HELP**_. I was trying to find something that can be done in parallel to what @ChrisHegarty is doing so I don't step on his toes. If you want help, let me know. And be specific about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553402537

   Ok, cool. Committers please commit directly. Consider the branch in my personal fork as our shared place for collaboration. Let me know if you encounter any issues. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] contrebande-labs commented on issue #12302: vector API integration, plan B

Posted by "contrebande-labs (via GitHub)" <gi...@apache.org>.
contrebande-labs commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1552245148

   Is there room for a helping hand here? Tests, benchmarks, continuous integration, documentation, anything ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553605796

   I don't think there is any official plan or anything here. Sorry I don't really have any answers.
   
   I dont think there is any rule on which jdk versions are supported. For mmap, the ones supported are the ones that @uschindler supports, he is shouldering all of the maintenance work.
   
   Personally I think, for this one as a start, it is an incubating module and so it is acceptable to only support the very latest jdk release... If someone wants to use the bleeding edge then they must use the bleeding edge. Let's start minimal and try to reduce our maintenance costs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on issue #12302: vector API integration, plan B

Posted by "msokolov (via GitHub)" <gi...@apache.org>.
msokolov commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551254446

   +1 to this - I would be grateful to you all if you are able to get this working. Don't really want to have to die with a sword in my hand to get to the promised land of vector API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty closed issue #12302: vector API integration, plan B

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty closed issue #12302: vector API integration, plan B
URL: https://github.com/apache/lucene/issues/12302


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551298163

   I am at a conference this week but i will try to make a prototype, building upon work uwe has done. for example using the apijar technique to prevent having to juggle N jvms on every build/developer machine. This means we can maintain `.java` files, hopefully semi-normally not base64d bytecode.
   
   I realize if we do this, it will make the gradle uglier and less elegant. I think theres no way around it, we are re-implementing mr-jar basically. we just have to try to contain the complexity as much as we can... but I really feel this is just something we need to do on behalf of our users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551473467

   Correction. 
   
   The incubating Vector API must be loaded as part of the boot layer - it must be added by the command line --add-modules flag **OR _required_ by the implementing module**. Loading into a separate loader is not possible because of the qualified exports from java.base.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553342721

   thank you for getting it started: it is great. Lets just iterate forwards with your branch?
   
   personally ive never had an issue collaborating on a fork like this (such as [ChrisHegarty:panama_vector](https://github.com/ChrisHegarty/lucene/tree/panama_vector)). In the worst case anyone can send PRs to you to make changes. I think if you are a committer to lucene you can probably push directly to the branch without PR.
   
   you can always create a `panama_vector` branch under github.com/apache/lucene if you prefer that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553375573

   By default all Lucene committers can commit to your branch. This is enabled by default and you had to agree when creating the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553009202

   Next I might refactor VectorUtil to be an interface, and have a different implementation for JDK 20, using a similar guard mechanism as is done for mmap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1550523577

   it goes without saying, but i think there are a couple obvious hotspots where we'd want to integrate. I realize containing the complexity may be difficult:
   
   1. the `VectorUtil` methods (dotProduct etc). These are kinda simplest and contained and the obvious way to prove the approach.
   2. the postings list FOR/PFOR decode.
   3. possibly the docvalues decode.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551468645

   I agree with the high-level idea - use the Incubating Vector API internally in Lucene, in a way that is opt-in and also constrained by module/SPI/loading techniques (so as to reduce the static usage to only a small defined set of code). We would very much like to use this in Elasticsearch.
   
    > I am ready to help you, just share when you have a branch.
   Same.
   
   Related, but not directly proposed. Unfortunately (or not), the incubating Vector API must be loaded as part of the boot layer - it must be added by the command line `--add-modules` flag. Loading into a separate loader is not possible because of the qualified exports from `java.base`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553232796

   I updates https://github.com/apache/lucene/pull/12311, with a basic `dotProduct(float[], float[])`. Maybe this is a reasonable place to start.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] contrebande-labs commented on issue #12302: vector API integration, plan B

Posted by "contrebande-labs (via GitHub)" <gi...@apache.org>.
contrebande-labs commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553571328

   > @ChrisHegarty [said](https://github.com/apache/lucene/issues/12302#issuecomment-1553402537) Ok, cool. Committers please commit directly. Consider the branch in my personal fork as our shared place for collaboration. Let me know if you encounter any issues.
   
   I'm not a commiter. Should I fork your fork and do PRs on it? And is [this the branch](https://github.com/ChrisHegarty/lucene/tree/panama_vector) we should base my work on? I also think that PRs are good for collaboration. But if someone can maintain a small, workable bounty list somewhere, I think that'd be great! Maybe we can do a voice call to gather and break down all the ideas ? Or you can submit your ideas here and I can try to maintain the bounty list myself in a Github Kanban/Project on my own fork....
   
   > @rmuir [said](https://github.com/apache/lucene/issues/12302#issuecomment-1552872873): You can get an idea of what it takes to add support for every JDK release here using Uwe's mmap setup
   
   So, it seems like official support for preview/incubating/experimental/jvm-sensitive features will go back to two LTSes ago? Therefore, when Java 21 is released, you will maintain experimental extensions available since Java 17. Is this correct? If not, when are you planning to drop support for 19 and 20 ? And what will be the policy in the future?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] kwatters commented on issue #12302: vector API integration, plan B

Posted by "kwatters (via GitHub)" <gi...@apache.org>.
kwatters commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1556050926

   For what it's worth, I did an alternative impl of vector utils using nd4j to compute cosine similarity... And shockingly, it was not any faster.  I am very much in favor of having interfaces here so we can experiment with native implementations...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551695383

   > Correction.
   > 
   > The incubating Vector API must be loaded as part of the boot layer - it must be added by the command line --add-modules flag **OR explicitly _required_ by the consuming module**. Loading into a separate loader is not possible because of the qualified exports from java.base.
   
   My idea was just about compiling. As said before you have to enable it via command line.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

Posted by "rmuir (via GitHub)" <gi...@apache.org>.
rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1553609725

   Sorry for the chaotic typos/answers, on a phone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on issue #12302: vector API integration, plan B

Posted by "ChrisHegarty (via GitHub)" <gi...@apache.org>.
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1552998982

   To better understand the existing "non-final JDK API stub" mechanism, I quickly put together the small set of changes that we need to get started - that generates the Vector API stubs, https://github.com/apache/lucene/pull/12311.   We can use this, or not, but it's just a start. We'll need same for JDK 21 EA, but that is trivial, and needed for Foreign too.
   
   @uschindler The mechanism looks good. As you said, it's compile-time only. If we felt the need, we could choose to move the implementations (in org.apache.lucene.store) to a non-exported package, but that is not strictly needed - I'm just pondering if any runtime restrictions would be beneficial.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org