You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/04/14 19:14:51 UTC

[GitHub] [arrow] westonpace commented on issue #34374: [C++] Investigate regressions caused by changing row group size from 64Mi to 1Mi.

westonpace commented on issue #34374:
URL: https://github.com/apache/arrow/issues/34374#issuecomment-1509106552

   It seems the regression is quite stable:
   
   ![image](https://user-images.githubusercontent.com/1696093/232133789-e7919732-ad39-4578-9e16-87b830f54e9b.png)
   
   I did some extensive testing of my own today.  The results when I run locally match what we are seeing in the benchmark.  For example, going from 2 to 62 row groups adds two seconds to the write time of a simple file.  I'm guessing there is a real opportunity somewhere to improve the writer performance.
   
   However, since users can always go back to the old behavior, I think the update to the default remains justified.
   
   @jorisvandenbossche any opinions?  Or else I think we can close this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org