You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/19 04:17:31 UTC

[GitHub] [arrow] emkornfield commented on pull request #7175: ARROW-8794: [C++] Expand performance coverage of parquet to arrow reading

emkornfield commented on pull request #7175:
URL: https://github.com/apache/arrow/pull/7175#issuecomment-630568952


   > BM_ReadColumn<true,Int32Type> reflects a lot the profile I get with real-life dataset (nyc taxi dataset). If this can guide you in further performance validation.
   
   I don't think I'm going to be doing much more performance related work past https://github.com/apache/arrow/pull/7143 (which if you don't mind trying out it would be good to see if that improves performance on real world data).  The last potential easy performance win is pushing the all null/no nulls remaining checks directly into the loops (for small batch sizes I wouldn't expect a huge difference there).  My main goal is to get full nested functionality working, and I got a little distracted   
   
   
   
   Other changes will probably require a bigger refactoring then I want to take on right now.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org