You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/06 18:41:39 UTC

[GitHub] [arrow] jonkeane commented on pull request #9615: ARROW-3316: [R] Multi-threaded conversion from R data.frame to Arrow table / record batch

jonkeane commented on pull request #9615:
URL: https://github.com/apache/arrow/pull/9615#issuecomment-833767798


   Ok, Finally got these benchmarks re-run and this report put together.
   
   TL;DR:
   
   For multi-core operation:
   * Dict types are massively faster
   * Smaller improvements are seen on most other types (for types that we have all-one-type benchmark fixtures for): integers, floats
   * Strings are either the same as or _slightly_ slower
   * The naturalistic datasets we have are a mixture:
     * nyctaxi is faster (especially on the first iteration)
     * fannie + chicago traffic are slightly longer (possibly because of more strings?) 
     
   For single-core operation:
   Most datasets/types have very similar performance across the branches (dicts are the only ones that stand out as seeing a decent speed up, but nowhere near what we see on the 8-core test)
   
   Here's a zip* of the report
   [parallel-data-conversion.html.zip](https://github.com/apache/arrow/files/6436928/parallel-data-conversion.html.zip)
   
   * –  to get around GH file-extension restrictions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org