You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Ambalu, Robert" <Ro...@Point72.com> on 2018/05/14 14:37:22 UTC

Appending to streaming file format

Hey, as far as I can tell it looks like appending to a streaming file format isn't currently supported, is that right?
RecordBatchStreamWriter always writes the schema up front, and it doesn't look like a schema is expected mid file ( assuming im doing this append test correctly, this is the error I hit when I try to read back this file into python:

Traceback (most recent call last):
  File "/home/ra7293/rba_arrow_mmap.py", line 9, in <module>
    table = reader.read_all()
  File "ipc.pxi", line 302, in pyarrow.lib._RecordBatchReader.read_all
  File "error.pxi", line 79, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Message not expected type: record batch, was: 1

This reader script works fine if I write once / don't append.  I can work around by not appending but creating new files any time I restart, I just wanted to confirm im not missing something.

Also, fyi, I opened a ticket last week that append is broken with the FileOutputStream ( unrelated to this email thread )
https://github.com/apache/arrow/issues/2018

Thanks
- Rob





DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.




RE: Appending to streaming file format

Posted by "Ambalu, Robert" <Ro...@Point72.com>.
Will do, thx

-----Original Message-----
From: Antoine Pitrou [mailto:antoine@python.org] 
Sent: Monday, May 14, 2018 11:18 AM
To: dev@arrow.apache.org
Subject: Re: Appending to streaming file format


Le 14/05/2018 à 17:17, Ambalu, Robert a écrit :
> Cool, thanks Antoine.  So this fixes being able to append to FielOutputStream, but it still seems as though appending to an existing streaming table not supported, is that correct?

I'm not sure about that.  I think the best is to open an issue (GitHub
or JIRA) and someone will investigate.

Regards

Antoine.


> 
> -----Original Message-----
> From: Antoine Pitrou [mailto:antoine@python.org] 
> Sent: Monday, May 14, 2018 11:07 AM
> To: dev@arrow.apache.org
> Subject: Re: Appending to streaming file format
> 
> 
> Le 14/05/2018 à 16:37, Ambalu, Robert a écrit :
>>
>> Also, fyi, I opened a ticket last week that append is broken with the FileOutputStream ( unrelated to this email thread )
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_arrow_issues_2018&d=DwIDaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=saGHLviPO9fhScNR4CP81xeAZv0qydj6cD5eJs7fZG4&m=ZgeQsCsJAulWVMk4L4bsXJWbCN1YtMAUqQ6iynOE0Sg&s=3a7YI_zxRZTqamGk1orZWoXZpHhHXiEtk5s0q9b3jK4&e=
> 
> Sorry, I hadn't seen your ticket (if you have found an actual bug, it's
> preferable to directly open a JIRA ticket, FWIW).  But that bug was
> actually fixed some time ago:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_arrow_pull_1978&d=DwIDaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=saGHLviPO9fhScNR4CP81xeAZv0qydj6cD5eJs7fZG4&m=ZgeQsCsJAulWVMk4L4bsXJWbCN1YtMAUqQ6iynOE0Sg&s=hbQCURO5DwMbpCqEPJ3Ha9Jwr97qrh_k1wzFQ9rqtro&e=
> 
> Regards
> 
> Antoine.
> 
> 
> 
> 
> 
> DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.
> 
> 
> 

Re: Appending to streaming file format

Posted by Antoine Pitrou <an...@python.org>.
Le 14/05/2018 à 17:17, Ambalu, Robert a écrit :
> Cool, thanks Antoine.  So this fixes being able to append to FielOutputStream, but it still seems as though appending to an existing streaming table not supported, is that correct?

I'm not sure about that.  I think the best is to open an issue (GitHub
or JIRA) and someone will investigate.

Regards

Antoine.


> 
> -----Original Message-----
> From: Antoine Pitrou [mailto:antoine@python.org] 
> Sent: Monday, May 14, 2018 11:07 AM
> To: dev@arrow.apache.org
> Subject: Re: Appending to streaming file format
> 
> 
> Le 14/05/2018 à 16:37, Ambalu, Robert a écrit :
>>
>> Also, fyi, I opened a ticket last week that append is broken with the FileOutputStream ( unrelated to this email thread )
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_arrow_issues_2018&d=DwIDaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=saGHLviPO9fhScNR4CP81xeAZv0qydj6cD5eJs7fZG4&m=ZgeQsCsJAulWVMk4L4bsXJWbCN1YtMAUqQ6iynOE0Sg&s=3a7YI_zxRZTqamGk1orZWoXZpHhHXiEtk5s0q9b3jK4&e=
> 
> Sorry, I hadn't seen your ticket (if you have found an actual bug, it's
> preferable to directly open a JIRA ticket, FWIW).  But that bug was
> actually fixed some time ago:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_arrow_pull_1978&d=DwIDaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=saGHLviPO9fhScNR4CP81xeAZv0qydj6cD5eJs7fZG4&m=ZgeQsCsJAulWVMk4L4bsXJWbCN1YtMAUqQ6iynOE0Sg&s=hbQCURO5DwMbpCqEPJ3Ha9Jwr97qrh_k1wzFQ9rqtro&e=
> 
> Regards
> 
> Antoine.
> 
> 
> 
> 
> 
> DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.
> 
> 
> 

RE: Appending to streaming file format

Posted by "Ambalu, Robert" <Ro...@Point72.com>.
Cool, thanks Antoine.  So this fixes being able to append to FielOutputStream, but it still seems as though appending to an existing streaming table not supported, is that correct?

-----Original Message-----
From: Antoine Pitrou [mailto:antoine@python.org] 
Sent: Monday, May 14, 2018 11:07 AM
To: dev@arrow.apache.org
Subject: Re: Appending to streaming file format


Le 14/05/2018 à 16:37, Ambalu, Robert a écrit :
> 
> Also, fyi, I opened a ticket last week that append is broken with the FileOutputStream ( unrelated to this email thread )
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_arrow_issues_2018&d=DwIDaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=saGHLviPO9fhScNR4CP81xeAZv0qydj6cD5eJs7fZG4&m=ZgeQsCsJAulWVMk4L4bsXJWbCN1YtMAUqQ6iynOE0Sg&s=3a7YI_zxRZTqamGk1orZWoXZpHhHXiEtk5s0q9b3jK4&e=

Sorry, I hadn't seen your ticket (if you have found an actual bug, it's
preferable to directly open a JIRA ticket, FWIW).  But that bug was
actually fixed some time ago:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_arrow_pull_1978&d=DwIDaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=saGHLviPO9fhScNR4CP81xeAZv0qydj6cD5eJs7fZG4&m=ZgeQsCsJAulWVMk4L4bsXJWbCN1YtMAUqQ6iynOE0Sg&s=hbQCURO5DwMbpCqEPJ3Ha9Jwr97qrh_k1wzFQ9rqtro&e=

Regards

Antoine.





DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.




Re: Appending to streaming file format

Posted by Antoine Pitrou <an...@python.org>.
Le 14/05/2018 à 16:37, Ambalu, Robert a écrit :
> 
> Also, fyi, I opened a ticket last week that append is broken with the FileOutputStream ( unrelated to this email thread )
> https://github.com/apache/arrow/issues/2018

Sorry, I hadn't seen your ticket (if you have found an actual bug, it's
preferable to directly open a JIRA ticket, FWIW).  But that bug was
actually fixed some time ago:
https://github.com/apache/arrow/pull/1978

Regards

Antoine.