You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/01 21:40:30 UTC

[GitHub] [arrow-rs] yordan-pavlov commented on issue #200: Use iterators to increase performance of creating Arrow arrays

yordan-pavlov commented on issue #200:
URL: https://github.com/apache/arrow-rs/issues/200#issuecomment-830698410


   UPDATE: I have finally been able to implement enough to replace ComplexObjectArrayReader with ArrowArrayReader for reading StringArrays and run a test query which I have been using for a lot of my performance testing. 
   Initial results look promising - the overall time of the query has reduced from about 125ms to about 100ms.
   I will try to write some proper benchmarks next, in the next couple of days, in order to better compare performance against the previous implementation.
   
   In general I have found that avoiding use of intermediate arrays as much as possible does help for performance and I believe I have finally been able to validate the idea of using iterators. I also think that switching from iterators to async streams should bring further performance improvements as an async runtime should be able to better schedule a combination of disk and CPU intensive tasks.
   
   the last changes can be found here: 
   https://github.com/yordan-pavlov/arrow/commit/95ed8a020c2f44f5b30cfffd0682b98022cc4aea


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org