You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/04 13:26:56 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

alamb opened a new pull request, #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863

   # Which issue does this PR close?
   
   re #5772
   
   # Rationale for this change
   As I mentioned in https://github.com/apache/arrow-datafusion/pull/5772#pullrequestreview-1363813230 , I found the cognitive load of dealing with `Vec<Option<Vec<PhysicalSortRequirements>>>>` challenging and the fact that `PhysicalSortRequirements` had another `Option` inside it was also challenging when reading the code .
   
   Also, it appears that the `make_sort_requirements_from_exprs` function was not public outside of the `datafusion-physical-expr` crate (so I could not used it while updating IOx -- see https://github.com/influxdata/influxdb_iox/pull/7440). It isn't hard to rewrite but I wanted to help others potentially upgrading by adding more comments and hints of how to create this type of structure.
   
   I also think having additional documentation (and method names that are self documenting)  adding methods makes the intent clearer when reading.
   
   
   # What changes are included in this PR?
   
   1. Make fields on `PhysicalSortRequirement` private
   2. Add doc strings and examples of what the different options are, and the relevant constructors
   3. Add `From` impls 
   
   # Are these changes tested?
   Covered by existing tests
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1158902408


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,94 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new `exact` requirement, which must match the
+    /// required options and expression. See
+    /// [`PhysicalSortRequirement`] for examples.
+    pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self {
+        Self {
+            expr,
+            options: Some(options),
+        }
+    }
+
+    /// Creates a new `expr_only` requirement, which must match the
+    /// required expression. See [`PhysicalSortRequirement`] for
+    /// examples.
+    pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self {
+            expr,
+            options: None,
+        }
+    }

Review Comment:
   Changed in [8a073c4](https://github.com/apache/arrow-datafusion/pull/5863/commits/8a073c4ae2252f30c91120cb5e9e3ff3fe540f91)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#issuecomment-1496010473

   > IMO we should do this after the alias PR as de-conflicting that may be much harder than de-conflicting this.
   
   I agree -- this one is a nice to have and has no particular urgency. Lets get your PRs merged first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1157570060


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,94 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new `exact` requirement, which must match the
+    /// required options and expression. See
+    /// [`PhysicalSortRequirement`] for examples.
+    pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self {
+        Self {
+            expr,
+            options: Some(options),
+        }
+    }
+
+    /// Creates a new `expr_only` requirement, which must match the
+    /// required expression. See [`PhysicalSortRequirement`] for
+    /// examples.
+    pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self {
+            expr,
+            options: None,
+        }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {

Review Comment:
   I'd suggest `unwrap_or_else` to explicitly avoid constructing a potentially unused object. The compiler will likely take care of this, but I find making intent visible in code structure.



##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,94 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new `exact` requirement, which must match the
+    /// required options and expression. See
+    /// [`PhysicalSortRequirement`] for examples.
+    pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self {
+        Self {
+            expr,
+            options: Some(options),
+        }
+    }
+
+    /// Creates a new `expr_only` requirement, which must match the
+    /// required expression. See [`PhysicalSortRequirement`] for
+    /// examples.
+    pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self {
+            expr,
+            options: None,
+        }
+    }

Review Comment:
   You could have a single `new` and the caller can pass the second argument as `None` if they don't want to constrain sort options. I get the sense that you have two `new`s because you think it is more readable? 



##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -75,21 +79,57 @@ impl PhysicalSortExpr {
 }
 
 /// Represents sort requirement associated with a plan
+///
+/// If the requirement is *exact*
+/// ([`PhysicalSortRequirement::new_exact`]), then the sort
+/// requirement will only be satisfied if it matches both the
+/// expression *and* the sort options.
+///
+/// If the requirement is *`expr_only`
+/// ([`PhysicalSortRequirement::new_expr_only`]) then only the expr
+/// must match, to satisfy the requirement.
+///
+/// # Examples
+///
+/// Given an `exact` sort requirement of (`A`, `DESC NULLS FIRST`):
+/// * `ORDER BY A DESC NULLS FIRST` matches
+/// * `ORDER BY A ASC  NULLS FIRST` does not match (`ASC` vs `DESC`)
+/// * `ORDER BY B DESC NULLS FIRST` does not match (different expr)
+///
+/// Given an `expr_only` sort requirement of (`A`, None):
+/// * `ORDER BY A DESC NULLS FIRST` matches
+/// * `ORDER BY A ASC  NULLS FIRST` matches (`ASC` and `NULL` options ignored)
+/// * `ORDER BY B DESC NULLS FIRST` does not match  (different expr)
 #[derive(Clone, Debug)]
 pub struct PhysicalSortRequirement {
     /// Physical expression representing the column to sort
-    pub expr: Arc<dyn PhysicalExpr>,
+    expr: Arc<dyn PhysicalExpr>,
     /// Option to specify how the given column should be sorted.
-    /// If unspecified, there is no constraint on sort options.
-    pub options: Option<SortOptions>,
+    /// If unspecified, there are constraints on sort options.

Review Comment:
   This comment seems incorrect, did you mean to start with "If specified, ..."?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#issuecomment-1496283506

   The alias PR is #5867. I think apart from a simple name change of `make_sort_requirements_from_exprs`, it should be relatively easy to merge this with that. If we get these two merged up, the state of affairs on this should improve somewhat


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1158644529


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,94 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new `exact` requirement, which must match the
+    /// required options and expression. See
+    /// [`PhysicalSortRequirement`] for examples.
+    pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self {
+        Self {
+            expr,
+            options: Some(options),
+        }
+    }
+
+    /// Creates a new `expr_only` requirement, which must match the
+    /// required expression. See [`PhysicalSortRequirement`] for
+    /// examples.
+    pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self {
+            expr,
+            options: None,
+        }
+    }

Review Comment:
   The alternative I had in mind was
   ```rust
   for partition_by in partition_by_exprs {
       sort_reqs.push(PhysicalSortRequirement::new(partition_by.clone(), None));
       // sort_reqs.push(PhysicalSortRequirement::new(partition_by.clone(), Some(sort_options)));
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1158486095


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,94 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new `exact` requirement, which must match the
+    /// required options and expression. See
+    /// [`PhysicalSortRequirement`] for examples.
+    pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self {
+        Self {
+            expr,
+            options: Some(options),
+        }
+    }
+
+    /// Creates a new `expr_only` requirement, which must match the
+    /// required expression. See [`PhysicalSortRequirement`] for
+    /// examples.
+    pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self {
+            expr,
+            options: None,
+        }
+    }

Review Comment:
   Yes - that is precisely the reason.
   
   For example, I thought it was clearer that the goal was to add a relaxed sort requirement in `datafusion/core/src/physical_plan/windows/mod.rs`  with:
   
   ```rust
    for partition_by in partition_by_exprs {
           sort_reqs.push(PhysicalSortRequirement::new_expr_only(partition_by.clone()))
       }
   ```
   
   Rather than 
   ```rust
    for partition_by in partition_by_exprs {
          sort_reqs.push(PhysicalSortRequirement {
               expr: partition_by.clone(),
               options: None,
           });
       }
   ```
   
   Of course they both do the same thing eventually and I agree it is a preference thing about what makes for more readable code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.
mustafasrepo commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1159416812


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   I think we can use     
   ```rust
   pub fn to_sort_exprs<'a>(
           requirements: impl IntoIterator<Item = &'a PhysicalSortRequirement>,
       ) -> Vec<PhysicalSortExpr> 
   ```
   as signature for `to_sort_exprs` as in `from_sort_exprs`. This would help remove a couple clones in the code. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comments

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#issuecomment-1500339173

   Thank you for the reviews and comments @mustafasrepo  and @ozankabak 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1157261926


##########
datafusion/physical-expr/src/utils.rs:
##########
@@ -318,27 +314,7 @@ pub fn ordering_satisfy_requirement_concrete<F: FnOnce() -> EquivalencePropertie
 pub fn make_sort_exprs_from_requirements(
     required: &[PhysicalSortRequirement],
 ) -> Vec<PhysicalSortExpr> {
-    required
-        .iter()
-        .map(|requirement| {
-            if let Some(options) = requirement.options {
-                PhysicalSortExpr {
-                    expr: requirement.expr.clone(),
-                    options,
-                }
-            } else {
-                PhysicalSortExpr {

Review Comment:
   moved into PhysicalSortExpr::into_sort_expr() where I think it is easier to find and understand (turns out converting back / forth from `PhysicalSortExpr` and `PhysicalSortRequirement` is common



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1158882974


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,94 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new `exact` requirement, which must match the
+    /// required options and expression. See
+    /// [`PhysicalSortRequirement`] for examples.
+    pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self {
+        Self {
+            expr,
+            options: Some(options),
+        }
+    }
+
+    /// Creates a new `expr_only` requirement, which must match the
+    /// required expression. See [`PhysicalSortRequirement`] for
+    /// examples.
+    pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self {
+            expr,
+            options: None,
+        }
+    }

Review Comment:
   I can make that change if you prefer it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1158491396


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,94 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new `exact` requirement, which must match the
+    /// required options and expression. See
+    /// [`PhysicalSortRequirement`] for examples.
+    pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self {
+        Self {
+            expr,
+            options: Some(options),
+        }
+    }
+
+    /// Creates a new `expr_only` requirement, which must match the
+    /// required expression. See [`PhysicalSortRequirement`] for
+    /// examples.
+    pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self {
+            expr,
+            options: None,
+        }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {

Review Comment:
   Indeed -- I agree with you. I originally had `unwrap_or_else` for precisely the reasons you mentioned, but `clippy` told me to use `unwrap_or` instead 😭 
   
   
   
   ```
   error: unnecessary closure used to substitute value for `Option::None`
      --> datafusion/physical-expr/src/sort_expr.rs:185:23
       |
   185 |           let options = options.unwrap_or_else(|| SortOptions {
       |  _______________________^
   186 | |             descending: false,
   187 | |             nulls_first: false,
   188 | |         });
       | |__________^
       |
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations
       = note: `-D clippy::unnecessary-lazy-evaluations` implied by `-D warnings`
   help: use `unwrap_or(..)` instead
       |
   185 ~         let options = options.unwrap_or(SortOptions {
   186 +             descending: false,
   187 +             nulls_first: false,
   188 ~         });
       |
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comments

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160741345


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   Very cool -- I did not know that. If you think that is a worthwhile change, I would be happy to make a follow on PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160359648


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   @alamb, it seems like using
   ```rust
   pub fn to_sort_exprs<T: Borrow<PhysicalSortRequirement>>(
       requirements: impl IntoIterator<Item = T>,
   ) -> Vec<PhysicalSortExpr> {
       requirements
           .into_iter()
           .map(|e| PhysicalSortExpr::from(e.borrow()))
           .collect()
   }
   
   pub fn from_sort_exprs<T: Borrow<PhysicalSortExpr>>(
       ordering: impl IntoIterator<Item = T>,
   ) -> Vec<PhysicalSortRequirement> {
       ordering
           .into_iter()
           .map(|e| PhysicalSortRequirement::from(e.borrow()))
           .collect()
   }
   ```
   lets us get rid of a bunch of `clone`s and results in simpler call site code. AFAIK, this is a zero-cost mechanism. If you concur, we can use this pattern.
   
   I created [#11](https://github.com/alamb/arrow-datafusion/pull/11) on your repo so you can easily check it out and incorporate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160264651


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   I have a feeling there may be a way to do this with a `Borrow` trick to implement a single function that can deal with both owned and referenced arguments without duplicating code 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1157261926


##########
datafusion/physical-expr/src/utils.rs:
##########
@@ -318,27 +314,7 @@ pub fn ordering_satisfy_requirement_concrete<F: FnOnce() -> EquivalencePropertie
 pub fn make_sort_exprs_from_requirements(
     required: &[PhysicalSortRequirement],
 ) -> Vec<PhysicalSortExpr> {
-    required
-        .iter()
-        .map(|requirement| {
-            if let Some(options) = requirement.options {
-                PhysicalSortExpr {
-                    expr: requirement.expr.clone(),
-                    options,
-                }
-            } else {
-                PhysicalSortExpr {

Review Comment:
   moved into PhysicalSortExpr::into_sort_expr() where I think it is easier to find and understand (I needed a version of this in IOx)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160262740


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   My rationale for taking `PhysicalSortRequirement` rather than `&PhysicalSortRequirement` was actually to avoid a clone in the cases where the caller already had an owned `PhysicalSortRequirement`. Basically the clone() is moved from inside this function to the callsite
   
   Specifically in `datafusion/core/src/physical_optimizer/sort_enforcement.rs`
   
   ```rust
                       // Make sure we preserve the ordering requirements:
                       update_child_to_remove_unnecessary_sort(child, sort_onwards, &plan)?;
                       let sort_expr =
                           PhysicalSortRequirement::to_sort_exprs(required_ordering);
                       add_sort_above(child, sort_expr)?;
                       if is_sort(child) {
   ``` 
   This signature can avoid a clone. 
   
   
   If the signature looks like 
   
   ```rust
   pub fn to_sort_exprs<'a>(
           requirements: impl IntoIterator<Item = &'a PhysicalSortRequirement>,
       ) -> Vec<PhysicalSortExpr> 
   ```
   
   I think it would  require always `clone()`ing the expr and options internally to create an owned `PhysicalSortExpr` in the output
   
   Of course, we are talking about cloning an `Arc` and some `SortOption`s which presumably isn't all that expensive , so maybe it doesn't matter 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160264651


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   I have a feeling there may be a way to do this with some `Borrow` trick to implement one function who deals with owned and referenced cases without duplicating code 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160258970


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -75,21 +79,57 @@ impl PhysicalSortExpr {
 }
 
 /// Represents sort requirement associated with a plan
+///
+/// If the requirement is *exact*
+/// ([`PhysicalSortRequirement::new_exact`]), then the sort
+/// requirement will only be satisfied if it matches both the
+/// expression *and* the sort options.
+///
+/// If the requirement is *`expr_only`
+/// ([`PhysicalSortRequirement::new_expr_only`]) then only the expr
+/// must match, to satisfy the requirement.
+///

Review Comment:
   👍  that is a good spot -- fixed in 1db636ada



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.
mustafasrepo commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1159423873


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -75,21 +79,57 @@ impl PhysicalSortExpr {
 }
 
 /// Represents sort requirement associated with a plan
+///
+/// If the requirement is *exact*
+/// ([`PhysicalSortRequirement::new_exact`]), then the sort
+/// requirement will only be satisfied if it matches both the
+/// expression *and* the sort options.
+///
+/// If the requirement is *`expr_only`
+/// ([`PhysicalSortRequirement::new_expr_only`]) then only the expr
+/// must match, to satisfy the requirement.
+///

Review Comment:
   It seems that we now use `PhysicalSortRequirement::new` for initialization for both cases. These comments seem remainder for old API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mustafasrepo commented on pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.
mustafasrepo commented on PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#issuecomment-1498649629

   I have made some suggestions. Besides these suggestions, LGTM!. Thanks @alamb.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comments

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160738936


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   I see. I thought you wanted to make `to_sort_exprs` "smart" in the sense that it can take work with an owned value or a reference, using the owned or reference `From` impl depending on the argument -- making a clone only when necessary.
   
   Anyway I got curious and tried to verify whether the `borrow` mechanism was zero-cost for this purpose; it turns out it is not. It always calls the reference path. I also found how to do this -- here is an [example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=21c9901d73b494c02be88157306f7bf0) for future reference.
   
   For posterity, had we wanted to go this route, the smart `to_sort_exprs` would be:
   ```rust
   pub fn to_sort_exprs<T>(
       requirements: impl IntoIterator<Item = T>,
   ) -> Vec<PhysicalSortExpr>
   where
       PhysicalSortExpr: From<T>,
   {
       requirements
           .into_iter()
           .map(|e| PhysicalSortExpr::from(e))
           .collect()
   }
   ```
   which would route to the right `PhysicalSortExpr::from` depending on the argument.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comments

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160634355


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   I guess my point was that I think this call:
   
   ```rust
   PhysicalSortRequirement::from(e.borrow()
   ```
   
   Invokes this `From` impl:
   ```
   impl From<&PhysicalSortRequirement> for PhysicalSortExpr {
       fn from(value: &PhysicalSortRequirement) -> Self {
           value.clone().into_sort_expr()
       }
   }
   ```
   
   Which has a `clone` of the value
   
   I have removed the `From<&PhysicalSortRequirement>` traits in 12cecdb4e because it is obscuring the locations of the cloning. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#issuecomment-1496000495

   This LGTM. However, let's discuss timing. We have a type alias PR that is waiting for #5661. IMO we should do this after the alias PR as de-conflicting that will be much harder than de-conflicting this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comments

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160768358


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   I think it'd be a good change, I can already see some use cases for it in some upcoming code (e.g. streaming group bys).
   
   I updated the mini-PR I created for experimentation so you can get the changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1157570060


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,94 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new `exact` requirement, which must match the
+    /// required options and expression. See
+    /// [`PhysicalSortRequirement`] for examples.
+    pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self {
+        Self {
+            expr,
+            options: Some(options),
+        }
+    }
+
+    /// Creates a new `expr_only` requirement, which must match the
+    /// required expression. See [`PhysicalSortRequirement`] for
+    /// examples.
+    pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self {
+            expr,
+            options: None,
+        }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {

Review Comment:
   I'd suggest `unwrap_or_else` to explicitly avoid constructing a potentially unused object. The compiler will likely take care of this, but I find making intent visible in code structure is good practice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1157266692


##########
datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs:
##########
@@ -128,7 +126,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
     }
 
     fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>> {
-        vec![Some(make_sort_requirements_from_exprs(&self.expr))]
+        vec![Some(PhysicalSortRequirement::exact_from_exprs(&self.expr))]

Review Comment:
   I am imagining that when a user upgrades DataFusion they will and see that the type signature changed from `PhysicalSortExpr` to `PhysicalSortRequirement`. Then when they look at `PhysicalSortRequirement` they can find a function to convert the existing `PhysicalSortExpr` to `PhysicalSortRequirement` -- `PhysicalSortRequirement::exact_from_exprs`
   
   `make_sort_requirements_from_exprs` was both not documented as well as lost in the list of functions
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1158644529


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,94 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new `exact` requirement, which must match the
+    /// required options and expression. See
+    /// [`PhysicalSortRequirement`] for examples.
+    pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self {
+        Self {
+            expr,
+            options: Some(options),
+        }
+    }
+
+    /// Creates a new `expr_only` requirement, which must match the
+    /// required expression. See [`PhysicalSortRequirement`] for
+    /// examples.
+    pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self {
+            expr,
+            options: None,
+        }
+    }

Review Comment:
   The alternative I had in mind was
   ```rust
   for partition_by in partition_by_exprs {
       sort_reqs.push(PhysicalSortRequirement::new(partition_by.clone(), None))
       // sort_reqs.push(PhysicalSortRequirement::new(partition_by.clone(), Some(sort_options)))
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comments

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160792067


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   Thanks -- I tried to pull the commits into https://github.com/apache/arrow-datafusion/pull/5918 but they still seem to be trying to use this impl:
   
   ```
   impl From<&PhysicalSortRequirement> for PhysicalSortExpr {
   ```
   
   I may have messed up the merge somehow
   
   Let's keep collaborating on https://github.com/apache/arrow-datafusion/pull/5918 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1157261926


##########
datafusion/physical-expr/src/utils.rs:
##########
@@ -318,27 +314,7 @@ pub fn ordering_satisfy_requirement_concrete<F: FnOnce() -> EquivalencePropertie
 pub fn make_sort_exprs_from_requirements(
     required: &[PhysicalSortRequirement],
 ) -> Vec<PhysicalSortExpr> {
-    required
-        .iter()
-        .map(|requirement| {
-            if let Some(options) = requirement.options {
-                PhysicalSortExpr {
-                    expr: requirement.expr.clone(),
-                    options,
-                }
-            } else {
-                PhysicalSortExpr {

Review Comment:
   moved into PhysicalSortExpr::into_sort_expr() where I think it is easier to find and understand



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comments

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1160359648


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -107,19 +147,87 @@ impl std::fmt::Display for PhysicalSortRequirement {
 }
 
 impl PhysicalSortRequirement {
+    /// Creates a new requirement.
+    ///
+    /// If `options` is `Some(..)`, creates an `exact` requirement,
+    /// which must match both `options` and `expr`.
+    ///
+    /// If `options` is `None`, Creates a new `expr_only` requirement,
+    /// which must match only `expr`.
+    ///
+    /// See [`PhysicalSortRequirement`] for examples.
+    pub fn new(expr: Arc<dyn PhysicalExpr>, options: Option<SortOptions>) -> Self {
+        Self { expr, options }
+    }
+
+    /// Replace the required expression for this requirement with the new one
+    pub fn with_expr(mut self, expr: Arc<dyn PhysicalExpr>) -> Self {
+        self.expr = expr;
+        self
+    }
+
+    /// Converts the `PhysicalSortRequirement` to `PhysicalSortExpr`.
+    /// If required ordering is `None` for an entry, the default
+    /// ordering `ASC, NULLS LAST` is used.
+    ///
+    /// The default is picked to be consistent with
+    /// PostgreSQL: <https://www.postgresql.org/docs/current/queries-order.html>
+    pub fn into_sort_expr(self) -> PhysicalSortExpr {
+        let Self { expr, options } = self;
+
+        let options = options.unwrap_or(SortOptions {
+            descending: false,
+            nulls_first: false,
+        });
+        PhysicalSortExpr { expr, options }
+    }
+
     /// Returns whether this requirement is equal or more specific than `other`.
     pub fn compatible(&self, other: &PhysicalSortRequirement) -> bool {
         self.expr.eq(&other.expr)
             && other.options.map_or(true, |other_opts| {
                 self.options.map_or(false, |opts| opts == other_opts)
             })
     }
-}
 
-pub fn make_sort_requirements_from_exprs(
-    ordering: &[PhysicalSortExpr],
-) -> Vec<PhysicalSortRequirement> {
-    ordering.iter().map(|e| e.clone().into()).collect()
+    /// Returns [`PhysicalSortRequirement`] that requires the exact
+    /// sort of the [`PhysicalSortExpr`]s in `ordering`
+    ///
+    /// This method is designed for
+    /// use implementing [`ExecutionPlan::required_input_ordering`].
+    pub fn from_sort_exprs<'a>(
+        ordering: impl IntoIterator<Item = &'a PhysicalSortExpr>,
+    ) -> Vec<PhysicalSortRequirement> {
+        ordering
+            .into_iter()
+            .map(PhysicalSortRequirement::from)
+            .collect()
+    }
+
+    /// Converts an iterator of [`PhysicalSortRequirement`] into a Vec
+    /// of [`PhysicalSortExpr`]s.
+    ///
+    /// This function converts `PhysicalSortRequirement` to `PhysicalSortExpr`
+    /// for each entry in the input. If required ordering is None for an entry
+    /// default ordering `ASC, NULLS LAST` if given (see [`into_sort_expr`])
+    pub fn to_sort_exprs(
+        requirements: impl IntoIterator<Item = PhysicalSortRequirement>,
+    ) -> Vec<PhysicalSortExpr> {

Review Comment:
   @alamb, it seems like using
   ```rust
   pub fn to_sort_exprs<T: Borrow<PhysicalSortRequirement>>(
       requirements: impl IntoIterator<Item = T>,
   ) -> Vec<PhysicalSortExpr> {
       requirements
           .into_iter()
           .map(|e| PhysicalSortExpr::from(e.borrow()))
           .collect()
   }
   
   pub fn from_sort_exprs<T: Borrow<PhysicalSortExpr>>(
       ordering: impl IntoIterator<Item = T>,
   ) -> Vec<PhysicalSortRequirement> {
       ordering
           .into_iter()
           .map(|e| PhysicalSortRequirement::from(e.borrow()))
           .collect()
   }
   ```
   lets us get rid of a bunch of `clone`s and results in simpler call site code. AFAIK, this is a zero-cost mechanism. If you concur, we can use this pattern.
   
   I created [#11](https://github.com/alamb/arrow-datafusion/pull/11) on your repo so you can easily check it out and incorporate (assuming I'm not overlooking something and this is indeed zero cost obviously).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1158487640


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -75,21 +79,57 @@ impl PhysicalSortExpr {
 }
 
 /// Represents sort requirement associated with a plan
+///
+/// If the requirement is *exact*
+/// ([`PhysicalSortRequirement::new_exact`]), then the sort
+/// requirement will only be satisfied if it matches both the
+/// expression *and* the sort options.
+///
+/// If the requirement is *`expr_only`
+/// ([`PhysicalSortRequirement::new_expr_only`]) then only the expr
+/// must match, to satisfy the requirement.
+///
+/// # Examples
+///
+/// Given an `exact` sort requirement of (`A`, `DESC NULLS FIRST`):
+/// * `ORDER BY A DESC NULLS FIRST` matches
+/// * `ORDER BY A ASC  NULLS FIRST` does not match (`ASC` vs `DESC`)
+/// * `ORDER BY B DESC NULLS FIRST` does not match (different expr)
+///
+/// Given an `expr_only` sort requirement of (`A`, None):
+/// * `ORDER BY A DESC NULLS FIRST` matches
+/// * `ORDER BY A ASC  NULLS FIRST` matches (`ASC` and `NULL` options ignored)
+/// * `ORDER BY B DESC NULLS FIRST` does not match  (different expr)
 #[derive(Clone, Debug)]
 pub struct PhysicalSortRequirement {
     /// Physical expression representing the column to sort
-    pub expr: Arc<dyn PhysicalExpr>,
+    expr: Arc<dyn PhysicalExpr>,
     /// Option to specify how the given column should be sorted.
-    /// If unspecified, there is no constraint on sort options.
-    pub options: Option<SortOptions>,
+    /// If unspecified, there are constraints on sort options.

Review Comment:
   Nice catch. I will fix it (restore to the original) 
   
   ```suggestion
       /// If unspecified, there are no constraints on sort options.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5863: Encapsulate `PhysicalSortRequrement` and add more doc comment

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5863:
URL: https://github.com/apache/arrow-datafusion/pull/5863#discussion_r1158487640


##########
datafusion/physical-expr/src/sort_expr.rs:
##########
@@ -75,21 +79,57 @@ impl PhysicalSortExpr {
 }
 
 /// Represents sort requirement associated with a plan
+///
+/// If the requirement is *exact*
+/// ([`PhysicalSortRequirement::new_exact`]), then the sort
+/// requirement will only be satisfied if it matches both the
+/// expression *and* the sort options.
+///
+/// If the requirement is *`expr_only`
+/// ([`PhysicalSortRequirement::new_expr_only`]) then only the expr
+/// must match, to satisfy the requirement.
+///
+/// # Examples
+///
+/// Given an `exact` sort requirement of (`A`, `DESC NULLS FIRST`):
+/// * `ORDER BY A DESC NULLS FIRST` matches
+/// * `ORDER BY A ASC  NULLS FIRST` does not match (`ASC` vs `DESC`)
+/// * `ORDER BY B DESC NULLS FIRST` does not match (different expr)
+///
+/// Given an `expr_only` sort requirement of (`A`, None):
+/// * `ORDER BY A DESC NULLS FIRST` matches
+/// * `ORDER BY A ASC  NULLS FIRST` matches (`ASC` and `NULL` options ignored)
+/// * `ORDER BY B DESC NULLS FIRST` does not match  (different expr)
 #[derive(Clone, Debug)]
 pub struct PhysicalSortRequirement {
     /// Physical expression representing the column to sort
-    pub expr: Arc<dyn PhysicalExpr>,
+    expr: Arc<dyn PhysicalExpr>,
     /// Option to specify how the given column should be sorted.
-    /// If unspecified, there is no constraint on sort options.
-    pub options: Option<SortOptions>,
+    /// If unspecified, there are constraints on sort options.

Review Comment:
   Nice catch. I will fix it (restore to the original) 
   
   Update: fixed in a5cc4c25f0659b8d748e87a2ca95e37e626779c8



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org