You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/05 15:08:30 UTC

[GitHub] [arrow] pokemaster7 opened a new issue #9104: Reading Feather File from Custom Offset

pokemaster7 opened a new issue #9104:
URL: https://github.com/apache/arrow/issues/9104


   Is it possible to embed a feather file in another file (with known offset/length) and read the feather portion in a correct and performant way?
   
   Here is a naive idea of what I'm trying to do, though it throws an error for some reason:
   
   ``` python
   import pandas as pd
   import numpy as np
   import os
   
   df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD'))
   pth = "TMP"
   with open(pth, "wb") as fh:
     fh.write(b"\x01") # custom header, one byte
     df.to_feather(fh)
   with open(pth, "rb") as gh:
     gh.read(1) # read header
     print(pd.read_feather(gh)) # throws 'pyarrow.lib.ArrowInvalid: Not a Feather V1 or Arrow IPC file'


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on issue #9104: Reading Feather File from Custom Offset

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #9104:
URL: https://github.com/apache/arrow/issues/9104#issuecomment-760301436


   See https://arrow.apache.org/docs/python/api/ipc.html#inter-process-communication and especially the `new_stream` and `open_stream` functions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on issue #9104: Reading Feather File from Custom Offset

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #9104:
URL: https://github.com/apache/arrow/issues/9104#issuecomment-760300650


   Feather files are accessed using seeks in the file, so I don't think that will work indeed. You can use a Arrow IPC stream, though, which should have similar performance characteristic.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pokemaster7 closed issue #9104: Reading Feather File from Custom Offset

Posted by GitBox <gi...@apache.org>.
pokemaster7 closed issue #9104:
URL: https://github.com/apache/arrow/issues/9104


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pokemaster7 commented on issue #9104: Reading Feather File from Custom Offset

Posted by GitBox <gi...@apache.org>.
pokemaster7 commented on issue #9104:
URL: https://github.com/apache/arrow/issues/9104#issuecomment-761114405


   Thanks @pitrou. I am somewhat disappointment that the file read API doesn't support offsets (and that passing in a partially read file object doesn't do the intuitive thing). But that workaround does work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org