You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/05 22:09:59 UTC

[GitHub] [arrow] trxcllnt opened a new pull request #7909: ARROW-9659: [C++] RecordBatchStreamReader when source is CudaBufferReader

trxcllnt opened a new pull request #7909:
URL: https://github.com/apache/arrow/pull/7909


   Related JIRA: [ARROW-9659](https://issues.apache.org/jira/browse/ARROW-9659)
   
   Prior to 1.0.0, the `RecordBatchStreamReader` was capable of reading source CudaBuffers wrapped in a `CudaBufferReader`. In 1.0.0, the Array validation routines call into Buffer::data(), which throws an error if the source isn't in host memory.
   
   This PR guards the call-sites I was able to find, but I may have missed others. I considered skipping Array validation if the buffers aren't on the host, but the other Array validation checks are still safe and useful to perform.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7909:
URL: https://github.com/apache/arrow/pull/7909#issuecomment-669549486


   https://issues.apache.org/jira/browse/ARROW-9659


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7909:
URL: https://github.com/apache/arrow/pull/7909#issuecomment-670027426


   Thanks for spotting this. I think we should find a proper way of dealing with validation of non-CPU arrays (for example by viewing the data on the CPU, using `Buffer::ViewOrCopy`).
   
   Another possibility would be to add an option in `IpcReadOptions` to disable validation (and therefore make IPC reading unsafe).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7909:
URL: https://github.com/apache/arrow/pull/7909#issuecomment-672024248


   I rebased and added some changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #7909:
URL: https://github.com/apache/arrow/pull/7909


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt commented on pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #7909:
URL: https://github.com/apache/arrow/pull/7909#issuecomment-670138897


   @pitrou I'm not a huge fan of the changes in this PR, but they seemed like the least invasive option considering the alternatives:
   
   * Avoided using `ViewOrCopy` because that could lead to many small/expensive device-to-host reads
   * Tried to avoid disabling validation, as Schema and Array/Buffer length/offset/size checks are still beneficial
   
   Additionally, setting the raw pointers to null in the two `Array::SetData` implementations is the easiest way to ensure no host routines attempt to read the underlying data (otherwise, we'd have to go through and update all the reads to use `ViewOrCopy`). That'd probably be a nice thing to do from a usability perspective, but more work than I have time to do before 1.0.1.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7909:
URL: https://github.com/apache/arrow/pull/7909#issuecomment-672932842


   Green Travis-CI build: https://travis-ci.org/github/pitrou/arrow/builds/717245892


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7909:
URL: https://github.com/apache/arrow/pull/7909#issuecomment-672026104


   This PR is definitely a bit ad hoc, but it's required to fix the observed regression. I think we need to find a strategy later on for dealing with manipulations of GPU-located arrays. I hope that NVidia can invest some design and discussion time in that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt commented on pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #7909:
URL: https://github.com/apache/arrow/pull/7909#issuecomment-669551982


   cc: @bkietz 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org