You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2023/05/19 09:45:10 UTC

[GitHub] [pinot] gortiz opened a new issue, #10783: Extensible PinotDataBuffer SPI

gortiz opened a new issue, #10783:
URL: https://github.com/apache/pinot/issues/10783

   This is a PEP-Request. I propose to create a extensible SPI that can be used to change the PinotDataBuffer implementation used at runtime. Treat this issue as a clone of https://github.com/apache/pinot/issues/9162, but more formal.
   
   # What needs to be done?
   
   Right now PinotDataBuffer instances are obtained by calling static methods:
   - PinotDataBuffer allocateDirect(long size, ByteOrder byteOrder, String description)
   - PinotDataBuffer loadFile(File file, long offset, long size, ByteOrder byteOrder, String description)
   - PinotDataBuffer mapFile(File file, boolean readOnly, long offset, long size, ByteOrder byteOrder, String description)
   
   These methods have a static implementation that return a PinotDataBuffer that is backed by a ByteBuffer or by a LArray, depending on whether the requested size is greater than 2GBs or not. ByteBuffers are faster and more reliable than LArray, but they cannot be larger than Integer.MAX_INT (aka 2GB - 1byte).
   
   Here I propose to change that implementation. Instead of having a static implementation, the algorithm used to instantiate the PinotDataBuffer will be delegated to an interface. The specific implementation will be set at Pinot startup time by reading the property `pinot.offheap.buffer.factory`, which will be an optional property whose value is a String. In case it exists, it should the qualified class name of a Class that implements `PinotDataBufferFactory`. There will be also another optional property called `pinot.offheap.prioritize.bytebuffer` which will be a boolean. If it is true, then the ByteBuffer implementation will be used when less than 2GB buffers are requested.
   
   # Why the feature is needed 
   
   The reason to be able to have new PinotDataBuffer implementations is that LArray is not compatible with Java >= 16, as detected in #8529. It also seems that our LArray implementation may have some bugs (see https://github.com/apache/pinot/pull/10774) and it doesn't seem that LArray will be updated (See https://github.com/xerial/larray/issues/75), so we need to move on.
   
   # Initial idea/proposal
   
   Adding a SPI is something we have done several times in the last months/years. Most of the times we use one or more configuration properties to decide which instance we need to use and set that instance in a static attribute. That system works, but it is problematic when we want to change the implementation in an isolated way (for example, when doing tests).
   
   That is why I propose to extend this system with a thread local on top of that. Something like:
   
   ```java
     /**
      * The default {@link PinotBufferFactory} used by all threads that do not define their own factory.
      */
     private static PinotBufferFactory _defaultFactory = createDefaultFactory();
     /**
      * A thread local variable that can be used to customize the {@link PinotBufferFactory} used on tests. This is mostly
      * useful in tests.
      */
     private static final ThreadLocal<PinotBufferFactory> _FACTORY = new ThreadLocal<>();
   
     /**
      * Creates the default factory depending on the JVM version
      */
     public static PinotBufferFactory createDefaultFactory() {...}
   
     /**
      * Changes the default factory
      */
     public static void setDefaultFactory(PinotBufferFactory) {...}
   
     /**
      * Change the {@link PinotBufferFactory} used by the current thread.
      *
      * @see #loadDefaultFactory(PinotConfiguration)
      */
     public static void useFactory(PinotBufferFactory factory) {
       _FACTORY.set(factory);
     }
   
     /**
      * Returns the factory the current thread should use.
      */
     public static PinotBufferFactory getFactory() {
       PinotBufferFactory pinotBufferFactory = _FACTORY.get();
       if (pinotBufferFactory == null) {
         pinotBufferFactory = _defaultFactory;
       }
       return pinotBufferFactory;
     }
   ```
   
   Then the static methods PinotDataBuffer.allocateDirect (and others) should delegate on PinotDataBuffer.getFactory(). By doing that, tests can call `PinotDataBuffer.useFactory()` in order to use, on that thread, the buffer library they want to test. Exploratory draft https://github.com/apache/pinot/pull/10528 implements all of these and modifies tests to use this API.
   
   # Actual buffer implementations
   
   This is an optional part of the PEP. What we request is to have the PinotBufferFactory SPI. This is what we would like to do with it, but it could be part of another PEP if necessary.
   
   Given that we cannot use LArray in modern versions of Java, we have three alternatives :
   - Use other third party libraries
   - Implement our own library on top of Unsafe
   - Use Foreign Memory API
   
   I tried to use Chronicle Bytes in the past with partial success, see https://github.com/apache/pinot/pull/9842. Given that I don't know more third party buffer libraries that use long offsets (Netty and Agrona use ints), I explored the last two alternatives in https://github.com/apache/pinot/pull/10528.
   
   Foreign Memory API is clearly the future, as it provides exactly what we need: A high performance buffer API that is maintained (given that will be included in the JVM) and uses longs as offsets and sizes. But it has two issues:
   - It is not included in Java 17
   - It is still in preview mode and it will continue like that in Java 21 (see [this post where one of the author explains the reasons and the changes since Java 20](https://github.com/minborg/articles/tree/jep442/2023/March/22-jep442-FFM-Third-Preview)).
   
   Therefore we have three options:
   1. Wait until it is stable (possible in Java 22 or 23, which won't be LTS, so maybe we need to wait 2 years until Java 25).
   2. Use it as a preview in Java 21. This would imply that we could either run with Java 11+LArray or Java 21+Foreign, but do not support Java 17. Also, in Java 21 we would need to compile and start with --enable-preview.
   3. Create our own library.
   
   Previously mentioned https://github.com/apache/pinot/pull/10528 does create two buffer implementations: One on top of Unsafe and another on top of Foreign Memory API. The latter is trivial to implement, but it is very difficult to include in the current Apache Pinot CI. The problems I found are:
   - Some Maven plugins don't work in Java 21-ea (at least spotless doesn't work)
   - GitHub Actions cannot be configured to use Java 21-ea (see https://github.com/actions/setup-java/issues/492#issuecomment-1551508335).
   - In general, it is difficult to maintain in the same compilation unit code that has to be compiled with Java 11 and code that has to be compiled with Java 21
   - Presto still requires to use Java 8, which makes it even more difficult.
   
   In case we decide to use my draft in the actual implementation of the PEP, I would suggest to remove the Foreign Memory API implementation from the branch and optionally move it to another github repo.
   
   I've also modified BenchmarkPinotDataBuffer in order to run with the Unsafe based implementation. The modifications are in the draft.
   I've run the benchmark in a M1 Pro and in a Ryzen 9 3900X with Ubuntu 22.10. The exact results of the benchmark can be found [here](https://docs.google.com/spreadsheets/d/1xgxR6_mMwq2zhe8ZkpR5X4VKLfvwyTqT6qFZsMkh49E/edit?usp=sharing) but the following chars should be good enough:
   
   Please note that LArray only runs in Java 11, so there are no data with LArray in Java 17 and 21.
   
   This one shows the cost of executing batch writes (aka call PinotDataBuffer.readFrom) and batch read (aka call PinotDataBuffer.copyTo) with a byte array of 1024 elements:
   ![Batch](https://github.com/apache/pinot/assets/1913993/2f5c22fd-a673-4410-ae5f-039f9147bdce)
   
   It is important to note that the implementation used in both LArray and Unsafe when dealing with batch reads and writes is to create a temporal direct byte buffer that points to the same address of the current buffer, so we can assume that the performance difference is due to the new instance creation.
   
   This one shows the cost of executing non batch writes (calling PinotDataBuffer.putByte()) and reads (calling PinotDataBuffer.getByte) in a loop with 1024 consecutive offsets starting from a random one:
   ![No batch](https://github.com/apache/pinot/assets/1913993/1fe73241-8ead-42f0-b5d8-b4fae426af75)
   
   In my opinion the latest is the most interesting, as it shows the improvements introduced in Java 17 and 21 JIT when dealing with loops.
   
   I've also tried to add the Foreign Memory API implementation in the benchmark, but I found several problems so I gave up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on issue #10783: Extensible PinotDataBuffer SPI

Posted by "mayankshriv (via GitHub)" <gi...@apache.org>.
mayankshriv commented on issue #10783:
URL: https://github.com/apache/pinot/issues/10783#issuecomment-1555012258

   +1 from me as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on issue #10783: Extensible PinotDataBuffer SPI

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on issue #10783:
URL: https://github.com/apache/pinot/issues/10783#issuecomment-1561048683

   I've removed the Foreign Memory API library from https://github.com/apache/pinot/pull/10528 and fixed a bug in `MultiNodesOfflineClusterIntegrationTest`. Now it seems I consistently have green tests in the branch. I think it is ready to review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on issue #10783: Extensible PinotDataBuffer SPI

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on issue #10783:
URL: https://github.com/apache/pinot/issues/10783#issuecomment-1558902526

   I was able to find the reason why some tests were failing in https://github.com/apache/pinot/pull/10528. It was a Maven config problem related on how I was configuring the surefire `argLine` property. 
   
   Now I have green lights in Java 11 and Java 20. Last execution failed in Java 17 and I'm trying to verify if it is a flaky test or something actually related to these changes.
   
   I'm thinking on removing the draft tag from the PR. We should discuss whether to include the Foreign Memory API or not in the PR. I think it is better to just drop that code and add them later once is it stable (or if we want to invest resources into trying with --preview-enabled)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on issue #10783: Extensible PinotDataBuffer SPI

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on issue #10783:
URL: https://github.com/apache/pinot/issues/10783#issuecomment-1731135839

   We may want to close this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on issue #10783: Extensible PinotDataBuffer SPI

Posted by "kishoreg (via GitHub)" <gi...@apache.org>.
kishoreg commented on issue #10783:
URL: https://github.com/apache/pinot/issues/10783#issuecomment-1554671555

   This is amazing @gortiz . I am +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on issue #10783: Extensible PinotDataBuffer SPI

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on issue #10783:
URL: https://github.com/apache/pinot/issues/10783#issuecomment-1555340395

   +1, awesome work!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on issue #10783: Extensible PinotDataBuffer SPI

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on issue #10783:
URL: https://github.com/apache/pinot/issues/10783#issuecomment-1556883919

   > Review the SPI and Unsafe based implementation in https://github.com/apache/pinot/pull/10528
   
   I have some issues there. Some tests always get out of memory in GitHub Actions, while my amd pc and my M1 laptop run them without problems, even when running with `mvn -P github-actions`. Given that I cannot reproduce the problem locally, it is being very difficult to fix the problem. Any help there would be helpful.
   
   > Later on we should move to Foreign Memory when stable support is available ?
   
   I guess we should decide what we want to do with that. We can either:
   - Wait until Foreign Memory API is production ready and stay in Java 11 meanwhile.
      - Versions on which we could run Pinot: [11, 15] and [22, ...). 
          - Assuming that Foreign Memory API is released in Java 22.
      - Advantage: we will only use proven and stable apis. 
      - Disadvantages: But it implies that we would need to either run with Java 11 until Java 23 or we would need to use non LTS versions.
   - Support preview of Foreign Memory API: 
      - Versions on which we could run Pinot: [11, 15] and [21, ...).
      - Advantage: We would support Java 21 if running with --enable-preview
      - Disadvantage: 
         - If OpenJDK decides to add more breaking changes in Java 22, we would need to maintain different versions of the code.
         - We would need to run with --preview-enable, which is not intended to be executed in production.
   - Use our own library on top of Unsafe
      - Versions on which we could run Pinot: [11, ...).
      - Advantages:
         - In Java [11, 15] we can choose to use LArray or Unsafe
         - In Java [16, 20] we can only use Unsafe
         - In Java 21 we can use Unsafe or run with --enable-preview and use Foreign Memory API
         - Once Foreign Memory API is released, we could always use Foreign Memory API
     - Disadvantages:
         - We need to maintain the Unsafe library. 
            - Errors there may kill the JVM with segmentation faults. Details may depend on CPU architectures and OS. For example, mmap needs to be page aligned in Linux (otherwise segmentation fault kills the JVM).
            - Unsafe methods do change between major JVM versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on issue #10783: Extensible PinotDataBuffer SPI

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia commented on issue #10783:
URL: https://github.com/apache/pinot/issues/10783#issuecomment-1555401853

   +1. Thanks @gortiz 
   
   So I am guessing the next plan of action is to:
   
   - Review the SPI and Unsafe based implementation in https://github.com/apache/pinot/pull/10528
   
   Also, Foreign Memory is not ruled out right. Later on we should move to Foreign Memory API when stable support is available ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz closed issue #10783: Extensible PinotDataBuffer SPI

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz closed issue #10783: Extensible PinotDataBuffer SPI
URL: https://github.com/apache/pinot/issues/10783


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org