You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "mustafasrepo (via GitHub)" <gi...@apache.org> on 2023/04/05 06:41:49 UTC

[GitHub] [arrow-datafusion] mustafasrepo opened a new pull request, #5874: Refactor to increase readability (MINOR changes)

mustafasrepo opened a new pull request, #5874:
URL: https://github.com/apache/arrow-datafusion/pull/5874

# Which issue does this PR close?

N.A

# Rationale for this change

This is a refactoring PR to increase code readability.

# What changes are included in this PR?

`can_skip_sort` and `check_alignment` now return `Result<Option<bool>>` instead of `Result<(bool, bool)>`. Also since arrow now supports `!` on `SortOptions` we remove `reverse_sort_options` method.

# Are these changes tested?
Existing tests should work.
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
-->

# Are there any user-facing changes?

No.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #5874: [MINOR]: Refactor to increase readability

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #5874:
URL: https://github.com/apache/arrow-datafusion/pull/5874#issuecomment-1498810090

   Thanks @mustafasrepo  and @comphead 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #5874: [MINOR]: Refactor to increase readability

Posted by "comphead (via GitHub)" <gi...@apache.org>.

comphead commented on code in PR #5874:
URL: https://github.com/apache/arrow-datafusion/pull/5874#discussion_r1158650954


##########
datafusion/common/src/utils.rs:
##########
@@ -162,6 +163,23 @@ where
     Ok(low)
 }
 
+/// This function finds the partition points according to `partition_columns`.
+/// If there are no sort columns, then the result will be a single element
+/// vector containing one partition range spanning all data.
+pub fn evaluate_partition_points(

Review Comment:
   Cool, thanks. Some nit: maybe we can invent better naming for `evaluation_partition_points` Partition point I found definitions in Informatica system, not sure if its the same we calculate here. 
   
   Also method returns `lexicographical_partition_ranges` so maybe we can name method similar.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5874: [MINOR]: Refactor to increase readability

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #5874:
URL: https://github.com/apache/arrow-datafusion/pull/5874#discussion_r1158955096


##########
datafusion/physical-expr/src/window/window_expr.rs:
##########
@@ -281,7 +261,7 @@ pub fn reverse_order_bys(order_bys: &[PhysicalSortExpr]) -> Vec<PhysicalSortExpr
         .iter()
         .map(|e| PhysicalSortExpr {
             expr: e.expr.clone(),
-            options: reverse_sort_options(e.options),
+            options: !e.options,

Review Comment:
   ❤️ 



##########
datafusion/core/src/physical_plan/windows/bounded_window_agg_exec.rs:
##########
@@ -628,23 +625,6 @@ impl SortedPartitionByBoundedWindowStream {
             .map(|e| e.evaluate_to_sort_column(batch))
             .collect::<Result<Vec<_>>>()
     }
-
-    /// evaluate the partition points given the sort columns; if the sort columns are

Review Comment:
   it is nice to avoid this repetition 👍 



##########
datafusion/common/src/utils.rs:
##########
@@ -162,6 +163,23 @@ where
     Ok(low)
 }
 
+/// This function finds the partition points according to `partition_columns`.
+/// If there are no sort columns, then the result will be a single element
+/// vector containing one partition range spanning all data.
+pub fn evaluate_partition_points(

Review Comment:
   I agree `evaluate_partition_ranges` sounds more specific. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #5874: [MINOR]: Refactor to increase readability

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.

mustafasrepo commented on code in PR #5874:
URL: https://github.com/apache/arrow-datafusion/pull/5874#discussion_r1158694724


##########
datafusion/common/src/utils.rs:
##########
@@ -162,6 +163,23 @@ where
     Ok(low)
 }
 
+/// This function finds the partition points according to `partition_columns`.
+/// If there are no sort columns, then the result will be a single element
+/// vector containing one partition range spanning all data.
+pub fn evaluate_partition_points(

Review Comment:
   I looked up to the Informatica system definition for partition point (I guess what you referred is [this definition](https://docs.informatica.com/data-integration/powercenter/10-5/advanced-workflow-guide/understanding-pipeline-partitioning/partitioning-attributes/partition-points.html)). These are different. If it is misleading, it is better to rename this function. What do you think about renaming it with `evaluate_partition_ranges`?. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #5874: [MINOR]: Refactor to increase readability

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb merged PR #5874:
URL: https://github.com/apache/arrow-datafusion/pull/5874


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #5874: [MINOR]: Refactor to increase readability

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.

mustafasrepo commented on code in PR #5874:
URL: https://github.com/apache/arrow-datafusion/pull/5874#discussion_r1159397476


##########
datafusion/common/src/utils.rs:
##########
@@ -162,6 +163,23 @@ where
     Ok(low)
 }
 
+/// This function finds the partition points according to `partition_columns`.
+/// If there are no sort columns, then the result will be a single element
+/// vector containing one partition range spanning all data.
+pub fn evaluate_partition_points(

Review Comment:
   I have renamed function `evaluate_partition_points` to `evaluate_partition_ranges`. Thanks for the suggestions @comphead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #5874: [MINOR]: Refactor to increase readability

Posted by "comphead (via GitHub)" <gi...@apache.org>.

comphead commented on code in PR #5874:
URL: https://github.com/apache/arrow-datafusion/pull/5874#discussion_r1158754212


##########
datafusion/common/src/utils.rs:
##########
@@ -162,6 +163,23 @@ where
     Ok(low)
 }
 
+/// This function finds the partition points according to `partition_columns`.
+/// If there are no sort columns, then the result will be a single element
+/// vector containing one partition range spanning all data.
+pub fn evaluate_partition_points(

Review Comment:
   partition ranges sounds more consistent to me, as in fact in method `if` branches we work to identify exactly ranges.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org