You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/10 18:48:00 UTC

[GitHub] [arrow-datafusion] drrtuy opened a new pull request, #3782: Optimizer now simplifies multiplication when either left or right arg is a literal zero or one and division, modulo when right arg is one

drrtuy opened a new pull request, #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782

   … 
   
   # Which issue does this PR close?
   
   Closes #3643.
   
    # Rationale for this change
   To improve expression simplification efficiency.  
   
   # What changes are included in this PR?
   
   # Are there any user-facing changes?
   No user-facing changes are expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r993277694


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

Review Comment:
   I think this code will run ~1 per query / expression, so I don't expect the any performance difference to be measurable.
   
   For what it is worth, I think both approaches are reasonable. Thank you both for the discussion. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r993277694


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

Review Comment:
   I think this code will run ~1 per query / expression, so I don't expect the any performance difference to be noticeable.
   
   For what it is worth, I think both approaches are reasonable. Thank you both for the discussion. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
alamb merged PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] drrtuy commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
drrtuy commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r993090559


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

Review Comment:
   pow call make the patch lesser but will do a real multiplication instead of a pointer deref.
   This is perf degradaton IMHO. In the worst case there will be 6 such multiplications(2 for Multiply, 2 for Divide, 2 for Modulo) and there might be multiple scalars binary ops in the expression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r993518576


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

Review Comment:
   If the function or the logic will be call many time in the runtime, it is better to use static value or const value which will get good performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#issuecomment-1276573997

   Thanks again @drrtuy and @isidentical  and @liukun4515 for the discussion and review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] drrtuy commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
drrtuy commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r993092803


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

Review Comment:
   On the other side I don't see much benefit from using pow except there will be no static array. Not a big profit given a potential perf gain.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] drrtuy commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
drrtuy commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r993093223


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

Review Comment:
   What is your argument in support for using pow() though?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] liukun4515 commented on pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#issuecomment-1274702240

   > Looks great to me -- thank you @drrtuy
   > 
   > @liukun4515 can you please confirm this is ok with you as well?
   
   Thanks @alamb  I will take a look this issue and pr  tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r991603019


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,13 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            if *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v {
+                true
+            } else {
+                false
+            }

Review Comment:
   ```suggestion
               *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#issuecomment-1276576306

   Benchmark runs are scheduled for baseline = b0f58dda5cd073c54897f58a1f71f289b6942f3c and contender = a226587de5a17f93c175727bca37844ef0bd934a. a226587de5a17f93c175727bca37844ef0bd934a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/a6fa90e282c341f888f2454ae1e762fa...2236f2f25c434caba799d992521c4f5c/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1c224d0c4e1a4b748d280617c5751528...2bc00d47ed304f738f9b9cf772c5a323/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/81062408425148b7812a1c70c72e45cd...22fa2fbf86904af4a3b05efb7f742d13/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d6b11d5025d341fc986bca1cc618ae62...6edc7d0dd95443dd8426e6f4cad4efd3/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] drrtuy commented on pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
drrtuy commented on PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#issuecomment-1276440799

   > I took the liberty of resolving the conflict in [47a13f5](https://github.com/apache/arrow-datafusion/pull/3782/commits/47a13f517ea612557f4bc0199a1fdbcb979c76f5)
   
   Much appreciated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] drrtuy commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
drrtuy commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r991655398


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,13 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            if *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v {
+                true
+            } else {
+                false
+            }

Review Comment:
   A way more laconic expression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
isidentical commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r992844418


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

Review Comment:
   Question: Is there a specific reason to embed a `POWS_OF_TEN` array instead of dynamically calculating it at the runtime? E.g.
   
   ```rust
   i128::pow(10, *_s as u32) == *v
   ``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] drrtuy commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
drrtuy commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r991660106


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,13 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            if *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v {
+                true
+            } else {
+                false
+            }

Review Comment:
   I updated the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
isidentical commented on code in PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#discussion_r992844418


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
         | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
         Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
         Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

Review Comment:
   Question: Is there a specific reason to embed a `POWS_OF_TEN` array instead of dynamically calculating it at the runtime? (or the same goes to taking to `log10()` of the `v`, instead of raising `_s`) E.g.
   
   ```rust
   i128::pow(10, *_s as u32) == *v
   ``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#issuecomment-1276420437

   I took the liberty of resolving the conflict in [47a13f5](https://github.com/apache/arrow-datafusion/pull/3782/commits/47a13f517ea612557f4bc0199a1fdbcb979c76f5)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3782: Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3782:
URL: https://github.com/apache/arrow-datafusion/pull/3782#issuecomment-1276444366

   I plan to merge this PR when CI passes (we are a bit backed up now)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org