You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/02 10:11:56 UTC

[GitHub] [arrow] hn5092 commented on pull request #7288: ARROW-8963: [C++][Parquet] optimize LeafReader::NextBatch to save memory

hn5092 commented on pull request #7288:
URL: https://github.com/apache/arrow/pull/7288#issuecomment-637438925


   > @hn5092 thank you for the PR. Could you add some benchmarks on your machine that shows this improves things? I might be looking at the wrong place but it appears memory is retained as capacity in the buffers after a reset (assuming nothing is written). I agree that the ordering is strange though.
   
   with 6_000_000 rows parquet,about 250M,read 3 long type columns:
   
   improve about 10%
   below is before and after :
   before: 
   ![image](https://user-images.githubusercontent.com/10030046/83508123-1472c080-a4fc-11ea-9570-d1685fcc83d6.png)
   
   after:
   ![image](https://user-images.githubusercontent.com/10030046/83508178-26546380-a4fc-11ea-9c1d-7832240ec1fe.png)
   
   
   before profile:
   ![image](https://user-images.githubusercontent.com/10030046/83508253-39673380-a4fc-11ea-909d-a2dd1613722b.png)
   
   after profile:
   ![image](https://user-images.githubusercontent.com/10030046/83508427-80552900-a4fc-11ea-894d-95cfdd8aa5c0.png)
   
   
   we can see the method reserve time from 24% to 16%
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org