You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/13 20:10:31 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #6663: Minor: Add more doc strings to WindowExpr

alamb opened a new pull request, #6663:
URL: https://github.com/apache/arrow-datafusion/pull/6663

   # Which issue does this PR close?
   
   More work towards https://github.com/apache/arrow-datafusion/issues/5781 by incrementally adding docstrings to the relevant `WindowFunction` expressions
   
   # Rationale for this change
   
   This code needs to be well documented if we expose it and even if we don't, documenting it better will make the code easier to understand
   
   # What changes are included in this PR?
   
   More Docstrings that describe the differences of functions on `PartitionEvaluator` more
   
   # Are these changes tested?
   N/A
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6663: Minor: Add more doc strings to WindowExpr

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #6663:
URL: https://github.com/apache/arrow-datafusion/pull/6663#discussion_r1230147226


##########
datafusion/physical-expr/src/window/partition_evaluator.rs:
##########
@@ -130,9 +130,41 @@ pub trait PartitionEvaluator: Debug + Send {
         ))
     }
 
-    /// Called for window functions that *do not use* values from the
-    /// the window frame, such as `ROW_NUMBER`, `RANK`, `DENSE_RANK`,
-    /// `PERCENT_RANK`, `CUME_DIST`, `LEAD`, `LAG`).
+    /// Evaluate a window function on the entire input partition,
+    /// `values`, producing one output row for each input.
+    ///
+    /// This function is called once per input *partition* for window
+    /// functions that *do not use* values from the window frame,
+    /// such as `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `PERCENT_RANK`,
+    /// `CUME_DIST`, `LEAD`, `LAG`).
+    ///
+    /// The function is passed the window as the `value` and must
+    /// produce an output column with one output row for every input
+    /// row.
+    ///
+    /// `num_rows` is requied to correctly compute the output in case
+    /// `values.len() == 0`
+    ///
+    /// Using this function is an optimization: certain window
+    /// functions are not affected by the window frame definition, and
+    /// thus using `evaluate`, DataFusion can skip the (costly) window
+    /// frame boundary calculation.
+    ///
+    /// For example, the `LAG` built in window function does not use
+    /// the values of its window frame (it can be computed in one shot
+    /// on the entire partition with `Self::evalute` regardless of the
+    /// window defined in the `OVER` clause)
+    ///
+    /// ```sql
+    /// lag(x, 1) OVER (ROWS BETWEEN 2 PRECEDING AND 3 FOLLOWING)

Review Comment:
   Hmm I think the example would be better with an ORDER BY -- I will update it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #6663: Minor: Add more doc strings to WindowExpr

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #6663:
URL: https://github.com/apache/arrow-datafusion/pull/6663


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #6663: Minor: Add more doc strings to WindowExpr

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #6663:
URL: https://github.com/apache/arrow-datafusion/pull/6663#discussion_r1229689806


##########
datafusion/physical-expr/src/window/built_in_window_function_expr.rs:
##########
@@ -95,10 +95,13 @@ pub trait BuiltInWindowFunctionExpr: Send + Sync + std::fmt::Debug {
         false
     }
 
-    /// Does the window function use the values from its window frame?
+    /// Does the window function use the values from the window frame,
+    /// if one is specified?
     ///
     /// If this function returns true, [`Self::create_evaluator`] must
-    /// implement [`PartitionEvaluator::evaluate_inside_range`]
+    /// implement [`PartitionEvaluator::evaluate_inside_range`]. See
+    /// details and examples on [`PartitionEvaluator::evaluate`].
+

Review Comment:
   ```suggestion
   ```
   redundant line



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #6663: Minor: Add more doc strings to WindowExpr

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on code in PR #6663:
URL: https://github.com/apache/arrow-datafusion/pull/6663#discussion_r1229771349


##########
datafusion/physical-expr/src/window/partition_evaluator.rs:
##########
@@ -130,9 +130,41 @@ pub trait PartitionEvaluator: Debug + Send {
         ))
     }
 
-    /// Called for window functions that *do not use* values from the
-    /// the window frame, such as `ROW_NUMBER`, `RANK`, `DENSE_RANK`,
-    /// `PERCENT_RANK`, `CUME_DIST`, `LEAD`, `LAG`).
+    /// Evaluate a window function on the entire input partition,
+    /// `values`, producing one output row for each input.
+    ///
+    /// This function is called once per input *partition* for window
+    /// functions that *do not use* values from the window frame,
+    /// such as `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `PERCENT_RANK`,
+    /// `CUME_DIST`, `LEAD`, `LAG`).
+    ///
+    /// The function is passed the window as the `value` and must
+    /// produce an output column with one output row for every input
+    /// row.
+    ///
+    /// `num_rows` is requied to correctly compute the output in case
+    /// `values.len() == 0`
+    ///
+    /// Using this function is an optimization: certain window
+    /// functions are not affected by the window frame definition, and
+    /// thus using `evaluate`, DataFusion can skip the (costly) window
+    /// frame boundary calculation.
+    ///
+    /// For example, the `LAG` built in window function does not use
+    /// the values of its window frame (it can be computed in one shot
+    /// on the entire partition with `Self::evalute` regardless of the
+    /// window defined in the `OVER` clause)
+    ///
+    /// ```sql
+    /// lag(x, 1) OVER (ROWS BETWEEN 2 PRECEDING AND 3 FOLLOWING)

Review Comment:
   interesting that lag doesn't require explicit ordering, but it works



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org