You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/02/02 20:33:49 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #5157: Optimizer: Avoid too many string cloning in the optimizer

alamb commented on issue #5157:
URL: https://github.com/apache/arrow-datafusion/issues/5157#issuecomment-1414339134

   I looked at the trace and here are my observations:
   
   As @tustvold  has said, if we can have `DFSchema` / `DFField`  that don't copy the values https://github.com/apache/arrow-datafusion/issues/4680 around that would help immensly
   
   A large amount of the allocations come from  `DFSchema::merge` -- see https://github.com/apache/arrow-datafusion/blob/224c682101949da57aebc36e92e5a881ef3040d4/datafusion/common/src/dfschema.rs#L135-L151
   
   ![Screenshot 2023-02-02 at 3 26 45 PM](https://user-images.githubusercontent.com/490673/216442910-599d96a1-7ffe-450c-b955-66e731c7aaf5.png)
   
   And a large part of that is how it ignores errors with `.ok()` where were quite expensive to produce
   
   It also appears there is copying going on in unwrap_cast_in_comparison and common subexpr eliminiate
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org