You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/02/22 06:36:50 UTC

[GitHub] [arrow] westonpace opened a new pull request, #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

westonpace opened a new pull request, #34288:
URL: https://github.com/apache/arrow/pull/34288

   This introduces the special URI `urn:arrow:substrait_simple_extension_function` which can be interpreted as "just call a function from the Arrow function registry by name".  It should only be used in plans where the consumer is Acero as these plans will not likely make any sense elsewhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34288:
URL: https://github.com/apache/arrow/pull/34288#discussion_r1114833746


##########
cpp/src/arrow/engine/substrait/function_test.cc:
##########
@@ -188,6 +188,12 @@ TEST(FunctionMapping, ValidCases) {
        {int8(), int8()},
        "-119",
        int8()},
+      {{kArrowSimpleExtensionFunctionsUri, "add_checked"},

Review Comment:
   thx for the explaination



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34288:
URL: https://github.com/apache/arrow/pull/34288#issuecomment-1443787955

   Thanks @westonpace I will take a follow up to add a test to use  run_query with UDFs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34288:
URL: https://github.com/apache/arrow/pull/34288#discussion_r1114591126


##########
cpp/src/arrow/engine/substrait/function_test.cc:
##########
@@ -188,6 +188,12 @@ TEST(FunctionMapping, ValidCases) {
        {int8(), int8()},
        "-119",
        int8()},
+      {{kArrowSimpleExtensionFunctionsUri, "add_checked"},

Review Comment:
   I wonder if this also works for UDFs? e.g., if I register a Python UDF by name "my_func", I can also use the same `::arrow::compute::call` to invoke it via `kSimpleSubstraitToArrow` or is there more to it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34288:
URL: https://github.com/apache/arrow/pull/34288#discussion_r1114827710


##########
cpp/src/arrow/engine/substrait/function_test.cc:
##########
@@ -188,6 +188,12 @@ TEST(FunctionMapping, ValidCases) {
        {int8(), int8()},
        "-119",
        int8()},
+      {{kArrowSimpleExtensionFunctionsUri, "add_checked"},

Review Comment:
   It should work for any scalar UDF used in a project node or filter node.  It would also work for aggregate UDFs (though these cannot be specified in python).
   
   `::arrow::compute::call` is evaluated by looking up the function, by name, in the function registry given to the plan.  So the UDF will either need to be added to the default function registry or the custom function registry will need to be specified when the plan is run (I believe pyarrow.substrait.run_query exposes this configuration property but, if not, we can add it easily).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34288:
URL: https://github.com/apache/arrow/pull/34288#issuecomment-1442462063

   @westonpace I am working on the substrait producer side of this - IIUC, the producer should generate sth like this?
   ```
   extension_uris {
     extension_uri_anchor: 1
   }
   extension_uris {
     extension_uri_anchor: 2
     uri: "urn:arrow:substrait_simple_extension_function"
   }
   extensions {
     extension_function {
       extension_uri_reference: 1
       function_anchor: 1
       name: "multiply"
     }
   }
   extensions {
     extension_function {
       extension_uri_reference: 2
       function_anchor: 2
       name: "my_udf"
     }
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace merged pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace merged PR #34288:
URL: https://github.com/apache/arrow/pull/34288


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34288:
URL: https://github.com/apache/arrow/pull/34288#issuecomment-1439509453

   * Closes: #34122


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34288:
URL: https://github.com/apache/arrow/pull/34288#issuecomment-1439509489

   :warning: GitHub issue #34122 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34288:
URL: https://github.com/apache/arrow/pull/34288#issuecomment-1443494408

   Benchmark runs are scheduled for baseline = b888f4d6c7dc490ce17b9f64d32af23ffc6f4617 and contender = a9552d1825a60d2664f164b9686b673e46848120. a9552d1825a60d2664f164b9686b673e46848120 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e6591b2416ab449b8e5a7faa5b9adbe5...cfdf861deaa0447f940aaa0548ae5dd2/)
   [Failed :arrow_down:0.34% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/b7a5892dee2948309a4042c08831e46a...6c0f01656407414a9e66e4d511c80f27/)
   [Finished :arrow_down:4.85% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/88b5edfff5484d78a7d41410a8331bd1...8b689b3b648348ab849d8a37b5a65f09/)
   [Finished :arrow_down:0.48% :arrow_up:0.03%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/387a28724e434f849ef4a047f0b18b98...2db41be2c30448a6b75f987f030fb9d3/)
   Buildkite builds:
   [Finished] [`a9552d18` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2429)
   [Finished] [`a9552d18` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2459)
   [Finished] [`a9552d18` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2426)
   [Finished] [`a9552d18` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2450)
   [Finished] [`b888f4d6` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2428)
   [Failed] [`b888f4d6` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2458)
   [Finished] [`b888f4d6` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2425)
   [Finished] [`b888f4d6` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2449)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #34288: GH-34122: [C++] Allow calling function registry functions without requiring a Substrait mapping

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34288:
URL: https://github.com/apache/arrow/pull/34288#issuecomment-1442532299

   > @westonpace I am working on the substrait producer side of this - IIUC, the producer should generate sth like this?
   
   Looks correct to me. That would call a function named `my_udf` in the function registry.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org