You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/02/10 18:53:53 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5132: [BugFix] abort plan if order by column not in select list

alamb commented on code in PR #5132:
URL: https://github.com/apache/arrow-datafusion/pull/5132#discussion_r1100745375


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -403,6 +403,28 @@ impl LogicalPlanBuilder {
                 Ok(())
             })?;
 
+        // if current plan is distinct or current plan is repartition and its child plan is distinct,
+        // then this plan is a select distinct plan
+        let is_select_distinct = match self.plan {
+            LogicalPlan::Distinct(_) => true,
+            LogicalPlan::Repartition(Repartition { ref input, .. }) => {
+                matches!(input.as_ref(), &LogicalPlan::Distinct(_))
+            }
+            _ => false,

Review Comment:
   🤔  that is a good point. If the plan is being created from SQL I think this is the pattern.
   
   However, if it is built directly (via a `DataFrame` or someone using `LogicalPlanBuilder` directly) it is probably incorrect to throw an error here (as they may have meant to to run the distinct before the sort for some reason)
   
   Maybe we should move the check to the SQL planner



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org