You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "tklinchik (via GitHub)" <gi...@apache.org> on 2023/04/02 21:17:57 UTC

[GitHub] [arrow] tklinchik opened a new issue, #34850: Pandas created arrow files cause exception in Java using ArrowFileReader API

tklinchik opened a new issue, #34850:
URL: https://github.com/apache/arrow/issues/34850

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   Using python3 with pyarrow 11.0.0 to create a simple arrow file from a data frame causes following exception on Java reading side using version 11.0.0 of Java library:
   
   ```
   [main] INFO org.apache.arrow.memory.BaseAllocator - Debug mode enabled.
   [main] INFO org.apache.arrow.memory.DefaultAllocationManagerOption - allocation manager type not specified, using netty as the default type
   [main] INFO org.apache.arrow.memory.CheckAllocator - Using DefaultAllocationManager at memory-unsafe/10.0.1/d131a028bf7c1a6f56f14bb0212603cd2dae1555/arrow-memory-unsafe-10.0.1.jar!/org/apache/arrow/memory/DefaultAllocationManagerFactory.class
   Exception in thread "main" java.lang.IndexOutOfBoundsException: index: 0, length: 128 (expected: range(0, 16))
   	at org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:701)
   	at org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:955)
   	at org.apache.arrow.vector.BaseFixedWidthVector.reAlloc(BaseFixedWidthVector.java:451)
   	at org.apache.arrow.vector.BaseFixedWidthVector.setValueCount(BaseFixedWidthVector.java:732)
   	at org.apache.arrow.vector.VectorSchemaRoot.setRowCount(VectorSchemaRoot.java:240)
   	at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:86)
   	at org.apache.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:220)
   	at org.apache.arrow.vector.ipc.ArrowFileReader.loadNextBatch(ArrowFileReader.java:166)
   	at org.apache.arrow.vector.ipc.ArrowFileReader.loadRecordBatch(ArrowFileReader.java:197)
   ```
   
   
   
   ### Component(s)
   
   Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tklinchik commented on issue #34850: [Java] pandas created arrow files cause exception in Java using ArrowFileReader API

Posted by "tklinchik (via GitHub)" <gi...@apache.org>.
tklinchik commented on issue #34850:
URL: https://github.com/apache/arrow/issues/34850#issuecomment-1499781744

   Thank you. This is very helpful.
   Quick question though, Python API at least as accessed from Pandas figures out compression on its own on read side. I was wondering if the intent for Reader to do the same if file header contains metadata about compression details or is it always up to the user to track that and use correct compression on read?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tklinchik commented on issue #34850: Pandas created arrow files cause exception in Java using ArrowFileReader API

Posted by "tklinchik (via GitHub)" <gi...@apache.org>.
tklinchik commented on issue #34850:
URL: https://github.com/apache/arrow/issues/34850#issuecomment-1493444509

   I figured out what was causing it. Appears that adding explicit `compression='uncompressed'` on python export side will result in correct processing. Does `ArrowFileReader` support compression which is turned on by default on Python side?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tklinchik closed issue #34850: [Java] pandas created arrow files cause exception in Java using ArrowFileReader API

Posted by "tklinchik (via GitHub)" <gi...@apache.org>.
tklinchik closed issue #34850: [Java] pandas created arrow files cause exception in Java using ArrowFileReader API
URL: https://github.com/apache/arrow/issues/34850


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] davisusanibar commented on issue #34850: [Java] pandas created arrow files cause exception in Java using ArrowFileReader API

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on issue #34850:
URL: https://github.com/apache/arrow/issues/34850#issuecomment-1521788450

   Hi @tklinchik. Currently there is a Java module for Compression options, it is up to the user to add that into their dependencies.
   
   There is a last changes that give you more information if module was not added with message like this `Exception in thread "main" java.lang.IllegalArgumentException: Please add arrow-compression module to use CommonsCompressionFactory for LZ4_FRAME`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #34850: [Java] pandas created arrow files cause exception in Java using ArrowFileReader API

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #34850:
URL: https://github.com/apache/arrow/issues/34850#issuecomment-1493504972

   Could you provide programs that reproduces this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] davisusanibar commented on issue #34850: [Java] pandas created arrow files cause exception in Java using ArrowFileReader API

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on issue #34850:
URL: https://github.com/apache/arrow/issues/34850#issuecomment-1496273772

   > I figured out what was causing it. Appears that adding explicit `compression='uncompressed'` on python export side will result in correct processing. Does `ArrowFileReader` support compression which is turned on by default on Python side?
   
   Hi @tklinchik , use CommonsCompressionFactory for compressed files (Lz4/Zstd currently supported)
   
   ```
   File file = new File("lz4.arrow");
   try (
       BufferAllocator rootAllocator = new RootAllocator();
       FileInputStream fileInputStream = new FileInputStream(file);
       // ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), rootAllocator) // Use CommonsCompressionFactory for compressed files
       ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(),
           rootAllocator, CommonsCompressionFactory.INSTANCE)
   ) {
       System.out.println("Record batches in file: " + reader.getRecordBlocks().size());
       for (ArrowBlock arrowBlock : reader.getRecordBlocks()) {
           reader.loadRecordBatch(arrowBlock);
           VectorSchemaRoot vectorSchemaRootRecover = reader.getVectorSchemaRoot();
           System.out.println("Size: --> " + vectorSchemaRootRecover.getRowCount());
           System.out.print(vectorSchemaRootRecover.contentToTSVString());
       }
   } catch (IOException e) {
       e.printStackTrace();
   } 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tklinchik commented on issue #34850: [Java] pandas created arrow files cause exception in Java using ArrowFileReader API

Posted by "tklinchik (via GitHub)" <gi...@apache.org>.
tklinchik commented on issue #34850:
URL: https://github.com/apache/arrow/issues/34850#issuecomment-1521792197

   Got it. Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org