You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/31 14:46:10 UTC

[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2638: Fix limit pushdown

Ted-Jiang commented on code in PR #2638:
URL: https://github.com/apache/arrow-datafusion/pull/2638#discussion_r885724386


##########
datafusion/core/src/optimizer/limit_push_down.rs:
##########
@@ -608,4 +790,32 @@ mod test {
 
         Ok(())
     }
+
+    #[test]
+    fn limit_should_push_down_right_outer_join_with_offset() -> Result<()> {
+        let table_scan_1 = test_table_scan()?;
+        let table_scan_2 = test_table_scan_with_name("test2")?;
+
+        let plan = LogicalPlanBuilder::from(table_scan_1)
+            .join(
+                &LogicalPlanBuilder::from(table_scan_2).build()?,
+                JoinType::Right,
+                (vec!["a"], vec!["a"]),
+                None,
+            )?
+            .offset(10)?
+            .limit(1000)?
+            .build()?;
+
+        // Limit pushdown Not supported in Join

Review Comment:
   ```suggestion
           // Limit pushdown with offset supported in right outer join
   ```



##########
datafusion/core/src/optimizer/limit_push_down.rs:
##########
@@ -40,29 +40,67 @@ impl LimitPushDown {
     }
 }
 
+/// Ancestor indicates the current ancestor in the LogicalPlan tree
+/// when traversing down related to "limit push down".
+enum Ancestor {
+    /// Limit
+    FromLimit,
+    /// Offset
+    FromOffset,
+    /// Other nodes that don't affect the adjustment of "Limit"
+    NotRelevant,
+}
+
+///
+/// When doing limit push down with "offset" and "limit" during traversal,
+/// the "limit" should be adjusted.
+/// limit_push_down is a recursive function that tracks three important information
+/// to make the adjustment.
+///
+/// 1. ancestor: the kind of Ancestor.
+/// 2. ancestor_offset: ancestor's offset value
+/// 3. ancestor_limit: ancestor's limit value
+///
+/// (ancestor_offset, ancestor_limit) is updated in the following cases
+/// 1. Ancestor_Limit(n1) -> .. -> Current_Limit(n2)
+///    When the ancestor is a "Limit" and the current node is a "Limit",
+///    it is updated to (None, min(n1, n2))).
+/// 2. Ancestor_Limit(n1) -> .. -> Current_Offset(m1)
+///    it is updated to (m1, n1 + m1).
+/// 3. Ancestor_Offset(m1) -> .. -> Current_Offset(m2)

Review Comment:
   I think without subquery, it should not allowed two `offset`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org