You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/01 20:13:36 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2638: Fix limit + offset pushdown

alamb commented on code in PR #2638:
URL: https://github.com/apache/arrow-datafusion/pull/2638#discussion_r887251666


##########
datafusion/core/src/optimizer/limit_push_down.rs:
##########
@@ -40,29 +40,67 @@ impl LimitPushDown {
     }
 }
 
+/// Ancestor indicates the current ancestor in the LogicalPlan tree
+/// when traversing down related to "limit push down".
+enum Ancestor {
+    /// Limit
+    FromLimit,
+    /// Offset
+    FromOffset,
+    /// Other nodes that don't affect the adjustment of "Limit"
+    NotRelevant,
+}
+
+///
+/// When doing limit push down with "offset" and "limit" during traversal,
+/// the "limit" should be adjusted.
+/// limit_push_down is a recursive function that tracks three important information
+/// to make the adjustment.
+///
+/// 1. ancestor: the kind of Ancestor.
+/// 2. ancestor_offset: ancestor's offset value
+/// 3. ancestor_limit: ancestor's limit value
+///
+/// (ancestor_offset, ancestor_limit) is updated in the following cases
+/// 1. Ancestor_Limit(n1) -> .. -> Current_Limit(n2)
+///    When the ancestor is a "Limit" and the current node is a "Limit",
+///    it is updated to (None, min(n1, n2))).
+/// 2. Ancestor_Limit(n1) -> .. -> Current_Offset(m1)
+///    it is updated to (m1, n1 + m1).
+/// 3. Ancestor_Offset(m1) -> .. -> Current_Offset(m2)
+///    it is updated to (m2, None).
+/// 4. Ancestor_Offset(m1) -> .. -> Current_Limit(n1)
+///    it is updated to (None, n1). Note that this won't happen when we
+///    generate the plan from SQL, it can happen when we build the plan
+///    using LogicalPlanBuilder.
 fn limit_push_down(
     _optimizer: &LimitPushDown,
-    upper_limit: Option<usize>,
+    ancestor: Ancestor,
+    ancestor_offset: Option<usize>,
+    ancestor_limit: Option<usize>,

Review Comment:
   If you wanted to be fancy / more idomatic rust and have the compiler check more of the invariants you could consider encoding the ancestor, offset and limit together.
   
   Something like
   
   ```
   enum Ancestor {
       /// Limit
       FromLimit(usize),
       /// Offset
       FromOffset(usize),
       /// Other nodes that don't affect the adjustment of "Limit"
       NotRelevant,
   }
   ```
   
   



##########
datafusion/core/src/optimizer/limit_push_down.rs:
##########
@@ -421,7 +496,27 @@ mod test {
             .build()?;
 
         let expected = "Offset: 10\
-        \n  Limit: 1010\
+        \n  Limit: 1000\

Review Comment:
   this plan change is due to the change in semantics of `LogicalPlanBuiler::offset()`, right?



##########
datafusion/core/src/optimizer/limit_push_down.rs:
##########
@@ -421,7 +496,27 @@ mod test {
             .build()?;
 
         let expected = "Offset: 10\
-        \n  Limit: 1010\
+        \n  Limit: 1000\
+        \n    Projection: #test.a\
+        \n      TableScan: test projection=None, limit=1000";
+
+        assert_optimized_plan_eq(&plan, expected);
+
+        Ok(())
+    }
+
+    #[test]
+    fn limit_pushdown_with_limit_after_offset() -> Result<()> {
+        let table_scan = test_table_scan()?;
+
+        let plan = LogicalPlanBuilder::from(table_scan)
+            .project(vec![col("a")])?
+            .offset(10)?
+            .limit(1000)?
+            .build()?;
+
+        let expected = "Limit: 1000\
+        \n  Offset: 10\
         \n    Projection: #test.a\
         \n      TableScan: test projection=None, limit=1010";

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org