You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "rtpsw (via GitHub)" <gi...@apache.org> on 2023/04/20 09:59:46 UTC

[GitHub] [arrow] rtpsw opened a new pull request, #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

rtpsw opened a new pull request, #35249:
URL: https://github.com/apache/arrow/pull/35249

   See 35247


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1516056347

   cc @westonpace, @icexelloss 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #35249:
URL: https://github.com/apache/arrow/pull/35249#discussion_r1173082301


##########
cpp/src/arrow/engine/substrait/extension_set.cc:
##########
@@ -944,6 +944,26 @@ ExtensionIdRegistry::SubstraitAggregateToArrow DecodeBasicAggregate(
                                call.id().name, " to have at least one argument");
       }
       case 1: {
+        std::shared_ptr<compute::FunctionOptions> options = nullptr;
+        if (arrow_function_name == "stddev" || arrow_function_name == "variance") {
+          auto maybe_dist = call.GetOption("distribution");

Review Comment:
   Can you add a link to the substrait spec here:
   https://github.com/substrait-io/substrait/blob/73228b4112d79eb1011af0ebb41753ce23ca180c/extensions/functions_arithmetic.yaml#L1240



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #35249:
URL: https://github.com/apache/arrow/pull/35249#discussion_r1173465886


##########
cpp/src/arrow/engine/substrait/extension_set.cc:
##########
@@ -944,6 +944,26 @@ ExtensionIdRegistry::SubstraitAggregateToArrow DecodeBasicAggregate(
                                call.id().name, " to have at least one argument");
       }
       case 1: {
+        std::shared_ptr<compute::FunctionOptions> options = nullptr;
+        if (arrow_function_name == "stddev" || arrow_function_name == "variance") {
+          auto maybe_dist = call.GetOption("distribution");

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1516940713

   Minor comments otherwise looks good


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ianmcook commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "ianmcook (via GitHub)" <gi...@apache.org>.
ianmcook commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1517924180

   Thanks @rtpsw & @icexelloss! Is there a test exercising the `"preference": ["POPULATION"]` case? It would be nice to have that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1517935608

   > Thanks @rtpsw & @icexelloss! Is there a test exercising the `"preference": ["SAMPLE"]` case? It would be nice to have that.
   
   Sure - are you ok with just copying the json plan (but with preference = SAMPLE) into another test case ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1517791894

   @ianmcook @westonpace Do you want to take a look? Ideally I want to merge this soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ianmcook commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "ianmcook (via GitHub)" <gi...@apache.org>.
ianmcook commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1517971686

   > > Thanks @rtpsw & @icexelloss! Is there a test exercising the `"preference": ["SAMPLE"]` case? It would be nice to have that.
   > 
   > Sure - are you ok with just copying the json plan (but with preference = SAMPLE) into another test case ?
   
   That sounds fine to me!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1516654007

   > This should pass `VarianceOptions` with `ddof` set to `1` when the `distribution` option is set to `POPULATION`
   
   Good catch. I'll work on adding this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace merged pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace merged PR #35249:
URL: https://github.com/apache/arrow/pull/35249


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1516056195

   * Closes: #35247


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #35249:
URL: https://github.com/apache/arrow/pull/35249#discussion_r1173952572


##########
cpp/src/arrow/engine/substrait/extension_set.cc:
##########
@@ -944,6 +944,30 @@ ExtensionIdRegistry::SubstraitAggregateToArrow DecodeBasicAggregate(
                                call.id().name, " to have at least one argument");
       }
       case 1: {
+        std::shared_ptr<compute::FunctionOptions> options = nullptr;
+        if (arrow_function_name == "stddev" || arrow_function_name == "variance") {
+          // See the following URL for the spec of stddev and variance:
+          // https://github.com/substrait-io/substrait/blob/
+          // 73228b4112d79eb1011af0ebb41753ce23ca180c/
+          // extensions/functions_arithmetic.yaml#L1240
+          auto maybe_dist = call.GetOption("distribution");
+          if (maybe_dist) {
+            auto& prefs = **maybe_dist;
+            if (prefs.size() != 1) {
+              return Status::Invalid("expected a single preference for ",
+                                     arrow_function_name, " but got ", prefs.size());
+            }

Review Comment:
   Technically multiple preferences means the user would prefer the first value but will be ok with the second.  In other words, it's similar to https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept
   
   So you should be able to handle this case.  However, no producers do this yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1524162355

   Benchmark runs are scheduled for baseline = f6bda06f5cf22879060de23021fe05ae2cfb6d9f and contender = 9feee48a7ceffecd9b1fe34293cc005a7958638f. 9feee48a7ceffecd9b1fe34293cc005a7958638f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/dafa8490c4964347a03bdcd00ddc4f06...6c30208138a94170a827b533b6a855e1/)
   [Failed] [test-mac-arm](https://conbench.ursa.dev/compare/runs/20bdf7f086044561a07d48717d057b17...e4cca6f1baab4488b397ec17e388c27a/)
   [Finished :arrow_down:0.26% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/939d53d8778a4a9ea0318953f72d448c...4b1c5e71301e457793314a92c21d3def/)
   [Finished :arrow_down:0.21% :arrow_up:0.12%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/cfab4077ac3e472f80b17a63b599e676...129333a268b54c4dbf2125ce68acf6f7/)
   Buildkite builds:
   [Finished] [`9feee48a` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2787)
   [Failed] [`9feee48a` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2821)
   [Finished] [`9feee48a` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2785)
   [Finished] [`9feee48a` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2812)
   [Finished] [`f6bda06f` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2786)
   [Failed] [`f6bda06f` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2820)
   [Finished] [`f6bda06f` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2784)
   [Finished] [`f6bda06f` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2811)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35249: GH-35247: [C++] Add Arrow Substrait support for stddev/variance

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35249:
URL: https://github.com/apache/arrow/pull/35249#issuecomment-1524163613

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/939d53d8778a4a9ea0318953f72d448c...4b1c5e71301e457793314a92c21d3def/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org