You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/18 15:11:45 UTC

[GitHub] [arrow] Sebastiaan-Alvarez-Rodriguez commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

Sebastiaan-Alvarez-Rodriguez commented on pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#issuecomment-694925466


   hi there,
   
   Could you please add a few compilation/installation instructions to your branch?
   I just completed a long debug session to find out why the JNI runtime did not work.
   
   ### CPP
   Compiled `<project-root>/cpp/` project using:
   ```bash
   cd <project-root>/cpp
   cmake -DARROW_DATASET=ON -DARROW_JNI=ON -DARROW_PARQUET=ON -DARROW_IPC=ON .
   sudo make install
   ```
   
   Confirmed shared libs are indeed visible using `whereis libarrow_dataset_jni`.
   
   ### Java
   Compiled `<project-root>/java/dataset/` using
   ```bash
   cd <project-root>/java/dataset
   mvn clean install -Dmaven.test.skip=true # skipping tests to let build succeed
   ```
   
   To test whether it works, successfully ran a sample program:
   ```java
   
   public class test {
       public static void main(String[] args) {
           System.loadLibrary("arrow_dataset_jni");
           System.out.println("Own classLoader works fine");
       }
   }
   ```
   So, my sample program's classloader can access `libarrow_dataset_jni.so` just fine.
   
   However, the following does not work:
   ```java
   import org.apache.arrow.memory.RootAllocator;
   import org.apache.arrow.dataset.file.FileFormat;
   import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
   
   public class test {
       private static FileSystemDatasetFactory getDatasetFactory() {
           RootAllocator allocator = new RootAllocator(Long.MAX_VALUE);
           return new FileSystemDatasetFactory(allocator, FileFormat.PARQUET, "/path/to/pq.parquet"); //crash
       }
   
       public static void main(String[] args) {
           System.loadLibrary("arrow_dataset_jni");
           System.out.println("Own classLoader works fine");
           FileSystemDatasetFactory test = getDatasetFactory();
           System.out.println("Not so good! I crash before I can print this");
       }
   }
   ```
   
   The relevant part of the crash is:
   ```
   Caused by: java.lang.IllegalStateException: error loading native libraries: java.io.FileNotFoundException: libarrow_dataset_jni.so
       at org.apache.arrow.dataset.jni.JniLoader.load(JniLoader.java:91)
       at org.apache.arrow.dataset.jni.JniLoader.loadRemaining(JniLoader.java:73)
       at org.apache.arrow.dataset.jni.JniLoader.ensureLoaded(JniLoader.java:60)
       at org.apache.arrow.dataset.file.JniWrapper.<init>(JniWrapper.java:34)
   ```
   
   The source code of the Dataset Java library causing the error is found [here](https://github.com/zhztheplayer/arrow-1/blob/fd163e199a10a7225765b0c30fbf60d8df8d20db/java/dataset/src/main/java/org/apache/arrow/dataset/jni/JniLoader.java#L78-L94).
   
   The problem occurs because `JniWrapper.class.getClassLoader().getResourceAsStream(libraryToLoad)` always returns `null`.
   Only when adding a symlink (`ln -s /usr/local/lib/libarrow_dataset_jni.so libarrow_dataset_jni.so`) to the package directory, it is able to find the correct file.
   
   Why do you create a temporary file, store the contents of the shared library in it, and then call `System.load(tmpfile)` and not just directly call it using `System.loadLibrary(libraryToLoad);`?
   
   Maybe it is a good idea to pick one of the following:
    1. Change the `'java.io.FileNotFoundException: libarrow_dataset_jni.so'` to display the full path (so people know where your library looks to find the shared object)
    2. Make a small `JNI_dataset_dev_install_note.md` somewhere on getting this library to work.
    3. Use `System.loadLibrary(libraryToLoad);`
    4. Use some other way to find the location of a shared library and copy it to a tmpfile
   
   I believe it would help many people who are not familiar with `getClassLoader().getResourceAsStream(...)` (like myself)
   
   
   Have a nice day,
   Sebastiaan


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org