You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "rtpsw (via GitHub)" <gi...@apache.org> on 2023/02/04 08:11:37 UTC

[GitHub] [arrow] rtpsw opened a new pull request, #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

rtpsw opened a new pull request, #34042:
URL: https://github.com/apache/arrow/pull/34042

   See #33850


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1422667510

   Yaron, Probably easier to create a new branch./PR to fix the issue. I am
   not sure how reopening a merged PR would work.
   
   On Wed, Feb 8, 2023 at 4:17 AM rtpsw ***@***.***> wrote:
   
   > @westonpace <https://github.com/westonpace>, could you please reopen this
   > PR? I already pushed a commit to the same branch.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/pull/34042#issuecomment-1422276974>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAGBXLFAQPAHW6W7BRHPZBLWWNQCFANCNFSM6AAAAAAURBC37M>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1416693996

   cc @westonpace @icexelloss 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34042:
URL: https://github.com/apache/arrow/pull/34042#discussion_r1096718718


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;
+
+}  // namespace
+
 std::shared_ptr<ExtensionProvider> default_extension_provider() {
-  return ExtensionProvider::kDefaultExtensionProvider;
+  std::unique_lock<std::mutex> lock(kDefaultExtensionProviderMutex);
+  return kDefaultExtensionProvider;
+}
+
+void set_default_extension_provider(const std::shared_ptr<ExtensionProvider>& provider) {

Review Comment:
   Can we document these method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1419455493

   >  I'd rather stick to something like what we already have as it is very simple. Yes, most providers will want to do something similar but I don't think it is a large burden for them.
   
   This sounds fine to me. We can implement the fallback logic internally. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1418060328

   IMO we can handle the "fallback" behavior in Acero instead of custom code so not everyone that wants to provide custom extension need to implement this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1422191511

   @rtpsw @westonpace This broke `test-ubuntu-18.04-cpp` job:
   
   https://github.com/ursacomputing/crossbow/actions/runs/4121522608/jobs/7117290841#step:5:1695
   
   ```text
   FAILED: src/arrow/engine/CMakeFiles/arrow_substrait_shared.dir/substrait/options.cc.o 
   /usr/local/bin/sccache /usr/bin/c++  -DARROW_ENGINE_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS -DARROW_MIMALLOC -DARROW_NO_DEPRECATED_API -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_TIMING_TESTS -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DURI_STATIC_BUILD -DUTF8PROC_STATIC -Darrow_substrait_shared_EXPORTS -Isrc -I/arrow/cpp/src -I/arrow/cpp/src/generated -Isubstrait_ep-generated -isystem /arrow/cpp/thirdparty/flatbuffers/include -isystem /arrow/cpp/thirdparty/hadoop/include -isystem orc_ep-install/include -isystem zstd_ep-install/include -isystem utf8proc_ep-install/include -isystem xsimd_ep/src/xsimd_ep-install/include -isystem jemalloc_ep-prefix/src -isystem mimalloc_ep/src/mimalloc_ep/include/mimalloc-2.0 -Wno-noexcept-type  -fdiagnostics-color=always  -Wall -Wno-conversion -Wno-sign-conversion -W
 unused-result -fno-semantic-interposition -msse4.2  -g -Werror -O0 -ggdb -fPIC   -pthread -std=c++1z -MD -MT src/arrow/engine/CMakeFiles/arrow_substrait_shared.dir/substrait/options.cc.o -MF src/arrow/engine/CMakeFiles/arrow_substrait_shared.dir/substrait/options.cc.o.d -o src/arrow/engine/CMakeFiles/arrow_substrait_shared.dir/substrait/options.cc.o -c /arrow/cpp/src/arrow/engine/substrait/options.cc
   /arrow/cpp/src/arrow/engine/substrait/options.cc:130:6: error: 'mutex' in namespace 'std' does not name a type
    std::mutex g_default_extension_provider_mutex;
         ^~~~~
   /arrow/cpp/src/arrow/engine/substrait/options.cc: In function 'std::shared_ptr<arrow::engine::ExtensionProvider> arrow::engine::default_extension_provider()':
   /arrow/cpp/src/arrow/engine/substrait/options.cc:135:8: error: 'unique_lock' is not a member of 'std'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
           ^~~~~~~~~~~
   /arrow/cpp/src/arrow/engine/substrait/options.cc:135:8: note: suggested alternative: 'unique_copy'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
           ^~~~~~~~~~~
           unique_copy
   /arrow/cpp/src/arrow/engine/substrait/options.cc:135:25: error: 'mutex' is not a member of 'std'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                            ^~~~~
   /arrow/cpp/src/arrow/engine/substrait/options.cc:135:37: error: 'g_default_extension_provider_mutex' was not declared in this scope
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   /arrow/cpp/src/arrow/engine/substrait/options.cc:135:37: note: suggested alternative: 'default_extension_provider'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                        default_extension_provider
   /arrow/cpp/src/arrow/engine/substrait/options.cc:135:32: error: 'lock' was not declared in this scope
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                                   ^~~~
   /arrow/cpp/src/arrow/engine/substrait/options.cc:135:32: note: suggested alternative: 'clock'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                                   ^~~~
                                   clock
   /arrow/cpp/src/arrow/engine/substrait/options.cc: In function 'void arrow::engine::set_default_extension_provider(const std::shared_ptr<arrow::engine::ExtensionProvider>&)':
   /arrow/cpp/src/arrow/engine/substrait/options.cc:140:8: error: 'unique_lock' is not a member of 'std'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
           ^~~~~~~~~~~
   /arrow/cpp/src/arrow/engine/substrait/options.cc:140:8: note: suggested alternative: 'unique_copy'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
           ^~~~~~~~~~~
           unique_copy
   /arrow/cpp/src/arrow/engine/substrait/options.cc:140:25: error: 'mutex' is not a member of 'std'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                            ^~~~~
   /arrow/cpp/src/arrow/engine/substrait/options.cc:140:37: error: 'g_default_extension_provider_mutex' was not declared in this scope
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   /arrow/cpp/src/arrow/engine/substrait/options.cc:140:37: note: suggested alternative: 'default_extension_provider'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                        default_extension_provider
   /arrow/cpp/src/arrow/engine/substrait/options.cc:140:32: error: 'lock' was not declared in this scope
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                                   ^~~~
   /arrow/cpp/src/arrow/engine/substrait/options.cc:140:32: note: suggested alternative: 'clock'
      std::unique_lock<std::mutex> lock(g_default_extension_provider_mutex);
                                   ^~~~
                                   clock
   ```
   
   Could you check this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34042:
URL: https://github.com/apache/arrow/pull/34042#discussion_r1097626452


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;
+
+}  // namespace
+
 std::shared_ptr<ExtensionProvider> default_extension_provider() {
-  return ExtensionProvider::kDefaultExtensionProvider;
+  std::unique_lock<std::mutex> lock(kDefaultExtensionProviderMutex);

Review Comment:
   Better safe than sorry I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34042:
URL: https://github.com/apache/arrow/pull/34042#discussion_r1097905231


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;
+
+}  // namespace
+
 std::shared_ptr<ExtensionProvider> default_extension_provider() {
-  return ExtensionProvider::kDefaultExtensionProvider;
+  std::unique_lock<std::mutex> lock(kDefaultExtensionProviderMutex);
+  return kDefaultExtensionProvider;
+}
+
+void set_default_extension_provider(const std::shared_ptr<ExtensionProvider>& provider) {

Review Comment:
   It is declared in the public header `options.h` and documented there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1419410914

   We could make the chaining of extension providers more explicit (e.g. Acero could have a stack of extension providers).  Or we could create a base class that we offer as a helper / example.  However, I'm not sure that this is a good idea.  I'd rather stick to something like what we already have as it is very simple.  Yes, most providers will want to do something similar but I don't think it is a large burden for them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1416693992

   * Closes: #33850


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34042:
URL: https://github.com/apache/arrow/pull/34042#discussion_r1096727243


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;
+
+}  // namespace
+
 std::shared_ptr<ExtensionProvider> default_extension_provider() {
-  return ExtensionProvider::kDefaultExtensionProvider;
+  std::unique_lock<std::mutex> lock(kDefaultExtensionProviderMutex);

Review Comment:
   Not a big issue though - IMO this is a bit of overkill but wouldn't hurt.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34042:
URL: https://github.com/apache/arrow/pull/34042#discussion_r1097911741


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;

Review Comment:
   You're right we can't use `kName`. I believe the right way to use is `g_name`, which I just used in the recent commit.



##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;

Review Comment:
   You're right we can't use `kName`. I believe the right way is to use `g_name`, which I just used in the recent commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1422093375

   Revision: aa571530da72b8e866b465d36413f1ad3af3602a
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-c920441bc9](https://github.com/ursacomputing/crossbow/branches/all?query=actions-c920441bc9)
   
   |Task|Status|
   |----|------|
   |test-ubuntu-18.04-cpp|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-c920441bc9-github-test-ubuntu-18.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/4121522608/jobs/7117290841)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1418041925

   > One thing wasn't clear to be is how do we dispatch some of the message to our extension provider vs use the default extension provider. For example, for the asof join message, I would imagine that being passed to the existing Acero extension provider - how would we handle that?
   
   The usual pattern would be chaining of an extension provider - something along the lines of
   ```
   class MyExtensionProvider {
   public:
     MyExtensionProvider(const std::shared_ptr<ExtensionProvider>& parent_provider) :
         parent_provider_(parent_provider) {}
   
     Result<RelationInfo> MakeRel(const std::vector<DeclarationInfo>& inputs,
                                  const google::protobuf::Any& rel,
                                  const ExtensionSet& ext_set) override {
       if (rel.Is<substrait_ext::MyRel>()) {
         substrait_ext::MyRel my_rel;
         rel.UnpackTo(&my_rel);
         return MakeMyRel(inputs, my_rel, ext_set);
       }
       return parent_provider_->MakeRel(inputs, rel, ext_set);
     }
   
     // MakeMyRel implementation here
   
   private:
     std::shared_ptr<ExtensionProvider> parent_provider_;
   };
   ```
   and for initialization
   ```
   set_default_extension_provider(std::make_shared<MyExtensionProvider>(default_extension_provider());
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1418075564

   > IMO we can handle the "fallback" behavior in Acero instead of custom code so not everyone that wants to provide custom extension need to implement this.
   
   What do you have in mind for the API? My suggestion would be to encapsulate the chaining pattern via an API like:
   ```
   chain_extension_provider(const std::shared_ptr<ExtensionProvider>& provider);
   ```
   Here `provider` would not need to call a parent because this would be done automatically.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1422276974

   @westonpace, could you please reopen this PR?  I already pushed a commit to the same branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34042:
URL: https://github.com/apache/arrow/pull/34042#discussion_r1096718632


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;
+
+}  // namespace
+
 std::shared_ptr<ExtensionProvider> default_extension_provider() {
-  return ExtensionProvider::kDefaultExtensionProvider;
+  std::unique_lock<std::mutex> lock(kDefaultExtensionProviderMutex);

Review Comment:
   Don't think we need the lock here - IMO this method doesn't need to be thread safe (I cannot imagine who would want to do this in multi threaded way)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34042:
URL: https://github.com/apache/arrow/pull/34042#discussion_r1097690244


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;
+
+}  // namespace
+
 std::shared_ptr<ExtensionProvider> default_extension_provider() {
-  return ExtensionProvider::kDefaultExtensionProvider;
+  std::unique_lock<std::mutex> lock(kDefaultExtensionProviderMutex);

Review Comment:
   SGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1420676291

   Benchmark runs are scheduled for baseline = 2b661bad06bdb31eb54e1abadf396e95466637ca and contender = 7423f0332cb11eb780f421c07bac71f87bf44a03. 7423f0332cb11eb780f421c07bac71f87bf44a03 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7dce38208957469d82d3900fefd83896...1fa959e4d4b14267a05fc085417464bd/)
   [Failed :arrow_down:0.24% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/e259efc66c5a42c08417b799dd69733c...500a58e0187d433cb34558a9702fec5d/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/003712cda92b4a67b9ef2e1e7e64a30f...8fb37e4c79ad4829a67e9b0cdc521e39/)
   [Finished :arrow_down:0.03% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6300d1d633994c388fb3c41cacc97a99...b82a0da8dee741228f8fbf19232054dd/)
   Buildkite builds:
   [Finished] [`7423f033` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2337)
   [Failed] [`7423f033` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2365)
   [Finished] [`7423f033` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2335)
   [Finished] [`7423f033` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2356)
   [Finished] [`2b661bad` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2336)
   [Finished] [`2b661bad` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2364)
   [Finished] [`2b661bad` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2334)
   [Finished] [`2b661bad` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2355)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34042:
URL: https://github.com/apache/arrow/pull/34042#discussion_r1097879089


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;

Review Comment:
   ```suggestion
   std::shared_ptr<ExtensionProvider> default_extension_provider =
       std::make_shared<DefaultExtensionProvider>();
   
   std::mutex default_extension_provider_mutex;
   ```
   
   These aren't constants so we can't use `kName`.  Will this work?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace merged pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace merged PR #34042:
URL: https://github.com/apache/arrow/pull/34042


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1421990857

   @github-actions crossbuw submit test-ubuntu-18.04-cpp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34042:
URL: https://github.com/apache/arrow/pull/34042#discussion_r1096718914


##########
cpp/src/arrow/engine/substrait/options.cc:
##########
@@ -122,11 +122,23 @@ class DefaultExtensionProvider : public BaseExtensionProvider {
   }
 };
 
-std::shared_ptr<ExtensionProvider> ExtensionProvider::kDefaultExtensionProvider =
+namespace {
+
+std::shared_ptr<ExtensionProvider> kDefaultExtensionProvider =
     std::make_shared<DefaultExtensionProvider>();
 
+std::mutex kDefaultExtensionProviderMutex;
+
+}  // namespace
+
 std::shared_ptr<ExtensionProvider> default_extension_provider() {
-  return ExtensionProvider::kDefaultExtensionProvider;
+  std::unique_lock<std::mutex> lock(kDefaultExtensionProviderMutex);
+  return kDefaultExtensionProvider;
+}
+
+void set_default_extension_provider(const std::shared_ptr<ExtensionProvider>& provider) {

Review Comment:
   Also - do we need to expose these in public headers? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1418023500

   One thing wasn't clear to be is how do we dispatch some of the message to our extension provider vs use the default extension provider. For example, for the asof join message, I would imagine that being passed to the existing Acero extension provider - how would we handle that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1422841215

   Create a [fix-PR](https://github.com/apache/arrow/pull/34075).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #34042: GH-33850: [C++] Allow Substrait's default extension provider to be configured

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #34042:
URL: https://github.com/apache/arrow/pull/34042#issuecomment-1422255071

   @kou, I'll handle this. I don't have the same platform readily available, but I have a reasonable guess of the cause.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org