You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "andygrove (via GitHub)" <gi...@apache.org> on 2023/02/05 16:40:21 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue, #5184: Invalid partition scheme in sort if intermediate projection has dropped a partition column

andygrove opened a new issue, #5184:
URL: https://github.com/apache/arrow-datafusion/issues/5184

   **Describe the bug**
   I am hitting this error with benchmark query 3 in ray-sql:
   
   ```
   physical_plan::to_proto() unsupported expression UnKnownColumn { name: "ps_supplycost" }
   ```
   
   Here is some debug output that I think shows the root cause. This is from walking down the plan. There is a join that uses `ps_supplycost` as a join key so the partitioning scheme includes `ps_supplycost`. This column is then dropped in a projection, but the `SortExec`'s partition scheme still uses `ps_supplycost`.
   
   ```
   plan = SortExec: [s_acctbal@0 DESC,n_name@2 ASC NULLS LAST,s_name@1 ASC NULLS LAST,p_partkey@3 ASC NULLS LAST]
   
   partitioning_scheme = Hash([Column { name: "p_partkey", index: 3 }, UnKnownColumn { name: "ps_supplycost" }], 4)
   ---
   
   plan = ProjectionExec: expr=[s_acctbal@5 as s_acctbal, s_name@2 as s_name, n_name@7 as n_name, p_partkey@0 as p_partkey, p_mfgr@1 as p_mfgr, s_address@3 as s_address, s_phone@4 as s_phone, s_comment@6 as s_comment]
   
   partitioning_scheme = Hash([Column { name: "p_partkey", index: 3 }, UnKnownColumn { name: "ps_supplycost" }], 4)
   ---
   
   plan = ProjectionExec: expr=[p_partkey@0 as p_partkey, p_mfgr@1 as p_mfgr, s_name@8 as s_name, s_address@9 as s_address, s_phone@11 as s_phone, s_acctbal@12 as s_acctbal, s_comment@13 as s_comment, n_name@15 as n_name]
   
   partitioning_scheme = Hash([Column { name: "p_partkey", index: 0 }, UnKnownColumn { name: "ps_supplycost" }], 4)
   ---
   
   plan = CoalesceBatchesExec: target_batch_size=8192
   
   partitioning_scheme = Hash([Column { name: "p_partkey", index: 0 }, Column { name: "ps_supplycost", index: 6 }], 4)
   ---
   
   plan = HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: "p_partkey", index: 0 }, Column { name: "ps_partkey", index: 0 }), (Column { name: "ps_supplycost", index: 6 }, Column { name: "__value", index: 1 })]
   
   partitioning_scheme = Hash([Column { name: "p_partkey", index: 0 }, Column { name: "ps_supplycost", index: 6 }], 4)
   ---
   ```
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove closed issue #5184: physical_plan::to_proto() unsupported expression UnKnownColumn

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove closed issue #5184: physical_plan::to_proto() unsupported expression UnKnownColumn
URL: https://github.com/apache/arrow-datafusion/issues/5184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on issue #5184: Invalid partition scheme in sort if intermediate projection has dropped a partition column

Posted by "mingmwang (via GitHub)" <gi...@apache.org>.
mingmwang commented on issue #5184:
URL: https://github.com/apache/arrow-datafusion/issues/5184#issuecomment-1432535854

   Could you please provide a SQL to reproduce the issue ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] franklsf95 commented on issue #5184: Invalid partition scheme in sort if intermediate projection has dropped a partition column

Posted by "franklsf95 (via GitHub)" <gi...@apache.org>.
franklsf95 commented on issue #5184:
URL: https://github.com/apache/arrow-datafusion/issues/5184#issuecomment-1457012261

   > Could you please provide a SQL to reproduce the issue ?
   
   https://github.com/datafusion-contrib/ray-sql/blob/main/testdata/queries/q3.sql


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org