You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/10 22:26:21 UTC

[GitHub] [arrow-datafusion] aprimadi opened a new issue #705: Any reason why logical plan is optimized twice?

aprimadi opened a new issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705


   Hello, I am just curious why the logical plan is being optimized twice.
   
   First, it was optimized here:
   https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/execution/context.rs#L205
   
   Then it was optimized again when collecting result
   https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/execution/dataframe_impl.rs#L147
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705#issuecomment-893145214


   i think we already got the answer to the question.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705#issuecomment-881388071


   > his way, we can make sure the engine always execute with an optimized plan regardless whether it's built from sql or plan builder.
   
   I like that plan
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705#issuecomment-877873156


   That's a good point. Perhaps the better structure would be to push optimize into query evaluation/materialization methods like `save` and `collect`. This way, we can make sure the engine always execute with an optimized plan regardless whether it's built from sql or plan builder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp closed issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
houqp closed issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705#issuecomment-877718779


   probably an oversight? the optimize in sql method does look like redundant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp closed issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
houqp closed issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705#issuecomment-893145214


   i think we already got the answer to the question.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705#issuecomment-892862309


   I wonder if this ticket is tracking anything useful or if the question has been answered and we can close the ticket?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705#issuecomment-892862309


   I wonder if this ticket is tracking anything useful or if the question has been answered and we can close the ticket?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705#issuecomment-877869950


   I think it's not an oversight: `collect` is not necessarily called when executing queries, for example when writing the results to disk. As `collect` loads the result into memory in one thread/partition, this is not always what you want to do.
   We might do something to avoid optimizing twice when calling both functions, but maybe it's not a big deal for now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #705: Any reason why logical plan is optimized twice?

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #705:
URL: https://github.com/apache/arrow-datafusion/issues/705#issuecomment-881388071


   > his way, we can make sure the engine always execute with an optimized plan regardless whether it's built from sql or plan builder.
   
   I like that oplan
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org