You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/05/10 18:43:59 UTC

[GitHub] [arrow] westonpace commented on issue #35498: [C++][Parquet] Parquet write_to_dataset performance regression

westonpace commented on issue #35498:
URL: https://github.com/apache/arrow/issues/35498#issuecomment-1542647235

   > I don't know if no alignment enforcement would be OK
   
   Correct.  As @mapleFU mentions this could lead to undefined behavior in type punning (this is the error we were getting from flight).
   
   > I don't know if no alignment enforcement would be OK, but it sounds like a smaller alignment would do. Perhaps a good quick fix is to change ipc::kArrowAlignment in https://github.com/apache/arrow/issues/35498#issuecomment-1539766664 to the byte size of the value's type. My understanding is that [numpy ensures this alignment condition](https://numpy.org/devdocs/dev/alignment.html#).
   
   This seems like a the best solution.
   
   > My understanding is that alignment is "recommended but not required for in memory data", it's only when serializing (IPC) that the requirement is enforced
   
   Correct.  Most of Arrow-C++ aims to tolerate unaligned buffers.  However, Acero doesn't.  These sorts of type punning assumptions are subtle and there are a lot of compute functions.  If someone wanted to support this then a first step would be to create unit tests for all the compute functions on unaligned buffers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org