You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/10 07:19:47 UTC

[GitHub] [arrow-datafusion] waynexia opened a new pull request, #2715: fix: check redundant fields while building projection plan

waynexia opened a new pull request, #2715:
URL: https://github.com/apache/arrow-datafusion/pull/2715

   Signed-off-by: Ruihang Xia <wa...@gmail.com>
   
   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #2712 .
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   CSE optimizer used to produce schemas with redundant fields, and dependents on other rules to handle them:
   >This projection plan will merge all fields in the `input.schema()` into its own schema. Redundant project fields are expected to be removed in other optimize phase (like `projection_push_down`).
   
   
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   CSE optimizer will now check redundant fields while building the intermediate projection plan.
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on pull request #2715: fix: check redundant fields while building projection plan

Posted by GitBox <gi...@apache.org>.
waynexia commented on PR #2715:
URL: https://github.com/apache/arrow-datafusion/pull/2715#issuecomment-1157776040

   Sorry for the late reply.
   
   > I wonder if we can somehow add a test for this? Or maybe it is not necessary
   
    Test case for this is helpful. I'll add one. But I haven't figured out when this behavior becomes a bug. As I remember our optimizer used to run optimize rules many cycles. It should be a point after the optimizer only runs once. Anyway, depending on other module's unspecified behavior is dumb, I need to avoid this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove merged pull request #2715: fix: check redundant fields while building projection plan

Posted by GitBox <gi...@apache.org>.
andygrove merged PR #2715:
URL: https://github.com/apache/arrow-datafusion/pull/2715


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on pull request #2715: fix: check redundant fields while building projection plan

Posted by GitBox <gi...@apache.org>.
waynexia commented on PR #2715:
URL: https://github.com/apache/arrow-datafusion/pull/2715#issuecomment-1157887039

   Run once will generate incorrect schema (contains duplicate fields) but won't cause panic. Only `DFSchema::new_with_metadata()` will check this. I think adding a test case for CSE to check duplicate fields would be good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on pull request #2715: fix: check redundant fields while building projection plan

Posted by GitBox <gi...@apache.org>.
waynexia commented on PR #2715:
URL: https://github.com/apache/arrow-datafusion/pull/2715#issuecomment-1152055015

   Hi @alamb 👋  , does this patch work in the original case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2715: fix: check redundant fields while building projection plan

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2715:
URL: https://github.com/apache/arrow-datafusion/pull/2715#issuecomment-1152219955

   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2715?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#2715](https://codecov.io/gh/apache/arrow-datafusion/pull/2715?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (835f264) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/67d91a7f1f26a795966c4cc0b200187778ee840c?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (67d91a7) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #2715   +/-   ##
   =======================================
     Coverage   84.71%   84.72%           
   =======================================
     Files         270      270           
     Lines       47258    47263    +5     
   =======================================
   + Hits        40036    40042    +6     
   + Misses       7222     7221    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/2715?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...tafusion/optimizer/src/common\_subexpr\_eliminate.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2715/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2NvbW1vbl9zdWJleHByX2VsaW1pbmF0ZS5ycw==) | `88.70% <100.00%> (+0.15%)` | :arrow_up: |
   | [datafusion/optimizer/src/optimizer.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2715/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL29wdGltaXplci5ycw==) | `86.66% <100.00%> (ø)` | |
   | [datafusion/common/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2715/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb21tb24vc3JjL3NjYWxhci5ycw==) | `74.82% <0.00%> (-0.12%)` | :arrow_down: |
   | [datafusion/optimizer/src/utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2715/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL3V0aWxzLnJz) | `32.79% <0.00%> (+1.07%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2715?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2715?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [67d91a7...835f264](https://codecov.io/gh/apache/arrow-datafusion/pull/2715?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2715: fix: check redundant fields while building projection plan

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2715:
URL: https://github.com/apache/arrow-datafusion/pull/2715#issuecomment-1155016056

   I wonder if we can somehow add a test for this? Or maybe it is not necessary


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2715: fix: check redundant fields while building projection plan

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2715:
URL: https://github.com/apache/arrow-datafusion/pull/2715#issuecomment-1157793324

   >  It should be a point after the optimizer only runs once. Anyway, depending on other module's unspecified behavior is dumb, I need to avoid this.
   
   Yeah, this is a good point -- I didn't know if it was possible to construct a query that hit the same problem without running the optimizer twice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org