You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "domoritz (via GitHub)" <gi...@apache.org> on 2023/01/21 22:24:12 UTC

[GitHub] [arrow] domoritz opened a new issue, #33823: pyarrow.lib.ArrowInvalid: Not an Arrow file

domoritz opened a new issue, #33823:
URL: https://github.com/apache/arrow/issues/33823

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   I am trying to loan an arrow file
   
   ```python
   with pa.memory_map('flights-200k.arrow', 'r') as source:
       my_arrow = pa.ipc.open_file(source).read_all()
   ```
   
   but get this error 
   
   ```
     File "/opt/homebrew/Caskroom/miniforge/base/envs/ramsch/lib/python3.10/site-packages/pyarrow/ipc.py", line 228, in open_file
       return RecordBatchFileReader(
     File "/opt/homebrew/Caskroom/miniforge/base/envs/ramsch/lib/python3.10/site-packages/pyarrow/ipc.py", line 110, in __init__
       self._open(source, footer_offset=footer_offset,
     File "pyarrow/ipc.pxi", line 862, in pyarrow.lib._RecordBatchFileReader._open
     File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Not an Arrow file
   ```
   
   The arrow file is https://github.com/uwdata/flights-arrow/blob/master/flights-200k.arrow and loads fine in the arrow js library.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #33823: [C++] Improve error message when reading Streaming file with File reader and vice versa

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #33823:
URL: https://github.com/apache/arrow/issues/33823#issuecomment-1402612162

   Files written in the file format have a magic number on both sides of the data.  The error message "Not an Arrow file" is thrown when that magic number is wrong.  So we already detect this situation, we just need to be proactive about suggesting solutions / alternatives (e.g. "Not an Arrow file, perhaps this is in the streaming format?") so this should be very doable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] domoritz commented on issue #33823: [C++] Improve error message when reading Streaming file with File reader and vice versa

Posted by "domoritz (via GitHub)" <gi...@apache.org>.
domoritz commented on issue #33823:
URL: https://github.com/apache/arrow/issues/33823#issuecomment-1403015729

   I am an arrow committer and got totally thrown off by the error message and thought my file was corrupt. So yes, your suggested error message sounds great. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [C++] Improve error message when reading Streaming file with File reader and vice versa [arrow]

Posted by "pbaner16 (via GitHub)" <gi...@apache.org>.
pbaner16 commented on issue #33823:
URL: https://github.com/apache/arrow/issues/33823#issuecomment-1846437955

   Hello @domoritz -- has this issue been fixed? If not, i can contribute!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #33823: [Python] pyarrow.lib.ArrowInvalid: Not an Arrow file

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #33823:
URL: https://github.com/apache/arrow/issues/33823#issuecomment-1401746348

   Small reproducer without having to download a file:
   
   ```python
   import pyarrow as pa
   
   batch = pa.record_batch([pa.array([1, 2, 3])], ['a'])
   
   # Create an Arrow Stream file
   with pa.ipc.new_stream("test.arrows", batch.schema) as writer:
       writer.write(batch)
   
   # Read as Arrow File
   pa.ipc.open_file("test.arrows")
   # -> ... ArrowInvalid: Not an Arrow file
   ```
   
   I agree it would be nice we can give a more informative error message and hint the user they are reading a Arrow Streaming format file and not a Arrow File format file.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] domoritz commented on issue #33823: [C++] Improve error message when reading Streaming file with File reader and vice versa

Posted by "domoritz (via GitHub)" <gi...@apache.org>.
domoritz commented on issue #33823:
URL: https://github.com/apache/arrow/issues/33823#issuecomment-1407117057

   I assigned it to you. Please send a pull request soon. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on issue #33823: [Python] pyarrow.lib.ArrowInvalid: Not an Arrow file

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on issue #33823:
URL: https://github.com/apache/arrow/issues/33823#issuecomment-1401332456

   I am not sure if this is related. I had a similar experience when I have mistakenly written files but haven't closed the file writer. In your case since it is loaded in JS properly, this could be an entirely different scenario. But thought it is worth mentioning here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] domoritz commented on issue #33823: pyarrow.lib.ArrowInvalid: Not an Arrow file

Posted by "domoritz (via GitHub)" <gi...@apache.org>.
domoritz commented on issue #33823:
URL: https://github.com/apache/arrow/issues/33823#issuecomment-1399351515

   The issue is that I need to use `open_stream`. The error message should be better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #33823: [C++] Improve error message when reading Streaming file with File reader and vice versa

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #33823:
URL: https://github.com/apache/arrow/issues/33823#issuecomment-1401750543

   Similarly, reading a File with a Streaming reader also gives a non-informative error message:
   
   ```python
   with pa.ipc.new_file("test.arrow", batch.schema) as writer:
       writer.write(batch)
   
   pa.ipc.open_stream("test.arrow")
   # ... ArrowInvalid: Expected to read 1330795073 metadata bytes, but only read 486
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jalajk24 commented on issue #33823: [C++] Improve error message when reading Streaming file with File reader and vice versa

Posted by "jalajk24 (via GitHub)" <gi...@apache.org>.
jalajk24 commented on issue #33823:
URL: https://github.com/apache/arrow/issues/33823#issuecomment-1407104943

   @domoritz i would like to contribute in this project can you assign this project to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org