You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/09 17:02:43 UTC

[GitHub] [arrow] rok commented on issue #12553: Support for Compute Functions on Nested Arrays

rok commented on issue #12553:
URL: https://github.com/apache/arrow/issues/12553#issuecomment-1151378310

   > So now my only question is, while this seems like an optimal generalized solution for storage, how much computation is required to explode back out to the dense form in memory to do computation?
   
   I've not really benchmarked the conversion when implementing but I think it will heavily depend on your non-null distribution and even dimension order (!).
   It should be pretty easy to benchmark though, just time `sparse_tensor = pa.SparseCSFTensor.from_dense_numpy(np_array)`.
   
   > In our simple implementation since we are going by whole dimensions only, we can just use broadcast when necessary and then collapse back so the underlying data is just normal numpy arrays?
   
   I want to say yes, but I'm not 100% sure what you mean. Going from `pa.Tensor` to `np.array` and back should be zero copy AFAIK. Someone correct me if I'm wrong please!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org