You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/08 16:31:22 UTC

[GitHub] [arrow-datafusion] joshuarobinson opened a new issue #1785: Panic reading avro file at datafusion-6.0.0/src/avro_to_arrow/arrow_array_reader.rs:771:37

joshuarobinson opened a new issue #1785:
URL: https://github.com/apache/arrow-datafusion/issues/1785


   **Describe the bug**
   When trying to use "read_avro()" or "register_avro()" with a certain avro file schema, I consistently get a panic.
   `thread 'main' panicked at 'expected struct got None', /home/ir/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/avro_to_arrow/arrow_array_reader.rs:771:37
   `
   The avro file is correctly decoded when using avro-tools-1.11.0.jar.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   I have simplified the logic to the following test program: https://gist.github.com/joshuarobinson/413536d5affd751eb9d8958a970e8b04
   
   and I'm attaching a [link](https://drive.google.com/file/d/1i1Gpan_PktI-wCSeRmPe54iRrbsHH8vb/view?usp=sharing) to the 6KB avro file that causes me the problem.
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   Print out the contents of the avro file with basic datafusion "df.collect.show" type logic.
   
   **Additional context**
   Add any other context about the problem here.
   The problematic avro file is part of Apache Iceberg metadata, fwiw.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Igosuki commented on issue #1785: Panic reading avro file at datafusion-6.0.0/src/avro_to_arrow/arrow_array_reader.rs:771:37

Posted by GitBox <gi...@apache.org>.
Igosuki commented on issue #1785:
URL: https://github.com/apache/arrow-datafusion/issues/1785#issuecomment-1034144675


   Avro module author here.
   
   The code expects a non null value and instead got a struct. https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/avro_to_arrow/arrow_array_reader.rs#L771
   
   I guess the code should be setting a null value instead. I haven't tested it on the arrow2 branch, which has much better performance for avro https://github.com/apache/arrow-datafusion/tree/arrow2
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Igosuki commented on issue #1785: Panic reading avro file at datafusion-6.0.0/src/avro_to_arrow/arrow_array_reader.rs:771:37

Posted by GitBox <gi...@apache.org>.
Igosuki commented on issue #1785:
URL: https://github.com/apache/arrow-datafusion/issues/1785#issuecomment-1034216909


   Well, I guess I should patch the module on master until arrow2 becomes the default for avro.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #1785: Panic reading avro file at datafusion-6.0.0/src/avro_to_arrow/arrow_array_reader.rs:771:37

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #1785:
URL: https://github.com/apache/arrow-datafusion/issues/1785#issuecomment-1034179917


   fwiw I investigated this yesterday (thanks a lot for sharing a stub of the file, @joshuarobinson!). https://github.com/jorgecarleitao/arrow2/pull/826 enables reading this file (arrow2 did not support nested Record; now it does).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] joshuarobinson commented on issue #1785: Panic reading avro file at datafusion-6.0.0/src/avro_to_arrow/arrow_array_reader.rs:771:37

Posted by GitBox <gi...@apache.org>.
joshuarobinson commented on issue #1785:
URL: https://github.com/apache/arrow-datafusion/issues/1785#issuecomment-1046884095


   thanks for the quick responses. I've managed to work around the issue with avro-rs module for the time being and I'll look for new releases :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Igosuki commented on issue #1785: Panic reading avro file at datafusion-6.0.0/src/avro_to_arrow/arrow_array_reader.rs:771:37

Posted by GitBox <gi...@apache.org>.
Igosuki commented on issue #1785:
URL: https://github.com/apache/arrow-datafusion/issues/1785#issuecomment-1047086840


   @joshuarobinson if you want, you can use the arrow2 branch, it's the one I use to read avro


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org