You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/28 07:36:08 UTC

[GitHub] [arrow] siddhantrao23 opened a new pull request, #13446: Add a Secondary Cache to cache gandiva object code

siddhantrao23 opened a new pull request, #13446:
URL: https://github.com/apache/arrow/pull/13446

   Arrow gandiva has a primary cache but this cache doesn't persist across
   restarts. Integrate a new API in project and filter make calls that
   allow the user to specify the implementation of the secondary cache by
   providing a c++ and java interface to this persistant cache.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1168425306

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1168365347

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW
   
   Opening JIRAs ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] siddhantrao23 commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
siddhantrao23 commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r968035541


##########
cpp/src/gandiva/jni/jni_common.cc:
##########
@@ -663,7 +665,17 @@ std::shared_ptr<arrow::Buffer> JavaSecondaryCache::Get(
   auto data = std::shared_ptr<arrow::Buffer>(
       new arrow::Buffer(reinterpret_cast<uint8_t*>(ret_address), ret_size));
 
-  return data;
+  // copy the buffer and release the original memory
+  auto result =
+      arrow::Buffer::CopyNonOwned(*data.get(), arrow::default_cpu_memory_manager());

Review Comment:
   Since it is allocated in gandiva, shouldn't the memory be released on object destruction? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] siddhantrao23 commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
siddhantrao23 commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1245339930

   @github-actions crossbow submit java-jars


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r963708073


##########
cpp/src/gandiva/secondary_cache.h:
##########
@@ -0,0 +1,38 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "arrow/buffer.h"
+
+namespace gandiva {
+
+// secondary cache c++ interface
+class GANDIVA_EXPORT SecondaryCacheInterface {
+ public:
+  virtual std::shared_ptr<arrow::Buffer> Get(
+      std::shared_ptr<arrow::Buffer> serialized_expr) = 0;
+
+  virtual void Set(std::shared_ptr<arrow::Buffer> serialized_expr,
+                   std::shared_ptr<arrow::Buffer> value) = 0;
+
+  virtual void ReleaseMemory(long address) = 0;

Review Comment:
   This class shouldn't contain this method. The jni release call should be transparently called by the jni implementation of this interface



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amol- commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by "amol- (via GitHub)" <gi...@apache.org>.
amol- commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1490654256

   Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r919630065


##########
cpp/src/gandiva/projector.cc:
##########
@@ -75,6 +92,17 @@ Status Projector::Make(SchemaPtr schema, const ExpressionVector& exprs,
   // Verify if previous projector obj code was cached
   if (prev_cached_obj != nullptr) {
     is_cached = true;
+  } else if (sec_cache != nullptr) {
+    std::shared_ptr<arrow::Buffer> arrow_buffer = sec_cache->Get(cache_key.Hash());

Review Comment:
   Using just hash as key might lead to collisions..also you need to take into account the cpu features since the generated code will be different in different cpus.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] siddhantrao23 commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
siddhantrao23 commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r941426328


##########
cpp/src/gandiva/engine.cc:
##########
@@ -112,9 +113,20 @@ void Engine::InitOnce() {
   }
   ARROW_LOG(INFO) << "Detected CPU Name : " << cpu_name.str();
   ARROW_LOG(INFO) << "Detected CPU Features:" << cpu_attrs_str;
+
+  std::string engine_str = cpu_name.str();
+  engine_str += cpu_attrs_str;
+  engine_str_ = strdup(engine_str.c_str());

Review Comment:
   The linter gave me an error when I tried this not sure why.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nivia007 commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
nivia007 commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1209540401

   @github-actions crossbow submit java-jars


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amol- closed pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by "amol- (via GitHub)" <gi...@apache.org>.
amol- closed pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code
URL: https://github.com/apache/arrow/pull/13446


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nivia007 commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
nivia007 commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1209575760

   @github-actions crossbow submit java-jars
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1168425272

   https://issues.apache.org/jira/browse/ARROW-16917


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r919624157


##########
cpp/src/gandiva/filter.cc:
##########
@@ -37,8 +37,15 @@ Filter::Filter(std::unique_ptr<LLVMGenerator> llvm_generator, SchemaPtr schema,
 
 Filter::~Filter() {}
 
+Status Filter::Make(SchemaPtr schema, ConditionPtr condition,
+                    std::shared_ptr<Configuration> config,
+                    std::shared_ptr<Filter>* filter) {
+  return Make(schema, condition, config, nullptr, filter);
+}
+
 Status Filter::Make(SchemaPtr schema, ConditionPtr condition,
                     std::shared_ptr<Configuration> configuration,
+                    std::shared_ptr<SecondaryCacheInterface> sec_cache,

Review Comment:
   This isn't getting used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] siddhantrao23 commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
siddhantrao23 commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r940951048


##########
cpp/src/gandiva/engine.cc:
##########
@@ -112,9 +113,20 @@ void Engine::InitOnce() {
   }
   ARROW_LOG(INFO) << "Detected CPU Name : " << cpu_name.str();
   ARROW_LOG(INFO) << "Detected CPU Features:" << cpu_attrs_str;
+
+  std::string engine_str = cpu_name.str();
+  engine_str += cpu_attrs_str;
+  engine_str_ = strdup(engine_str.c_str());

Review Comment:
   A string variable isn't allowed to be global/static which is why I switched to the char* method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nivia007 commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
nivia007 commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1209524602

   @github-actions crossbow submit java-jars


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1209617193

   ```
   Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
   The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/2826625536
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13446:
URL: https://github.com/apache/arrow/pull/13446#issuecomment-1245358094

   Revision: 9e1d6a83ebc93bfcdc6446f5245aa55f1db1b882
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-6bc4d5c5fc](https://github.com/ursacomputing/crossbow/branches/all?query=actions-6bc4d5c5fc)
   
   |Task|Status|
   |----|------|
   |java-jars|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-6bc4d5c5fc-github-java-jars)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-6bc4d5c5fc-github-java-jars)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] siddhantrao23 commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
siddhantrao23 commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r920174811


##########
cpp/src/gandiva/projector.cc:
##########
@@ -75,6 +92,17 @@ Status Projector::Make(SchemaPtr schema, const ExpressionVector& exprs,
   // Verify if previous projector obj code was cached
   if (prev_cached_obj != nullptr) {
     is_cached = true;
+  } else if (sec_cache != nullptr) {
+    std::shared_ptr<arrow::Buffer> arrow_buffer = sec_cache->Get(cache_key.Hash());

Review Comment:
   Added `Engine.Hash()` which includes the cpu attributes and name to the key



##########
cpp/src/gandiva/filter.cc:
##########
@@ -37,8 +37,15 @@ Filter::Filter(std::unique_ptr<LLVMGenerator> llvm_generator, SchemaPtr schema,
 
 Filter::~Filter() {}
 
+Status Filter::Make(SchemaPtr schema, ConditionPtr condition,
+                    std::shared_ptr<Configuration> config,
+                    std::shared_ptr<Filter>* filter) {
+  return Make(schema, condition, config, nullptr, filter);
+}
+
 Status Filter::Make(SchemaPtr schema, ConditionPtr condition,
                     std::shared_ptr<Configuration> configuration,
+                    std::shared_ptr<SecondaryCacheInterface> sec_cache,

Review Comment:
   Added the calls to secondary cache in filter as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r940918137


##########
java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Projector.java:
##########
@@ -187,6 +187,45 @@ public static Projector make(Schema schema, List<ExpressionTree> exprs,
   public static Projector make(Schema schema, List<ExpressionTree> exprs,
       SelectionVectorType selectionVectorType,
       long configurationId) throws GandivaException {

Review Comment:
   Can you add some java unit test..only with no-op sec cache..just to verify the jni flow



##########
java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Projector.java:
##########
@@ -187,6 +187,45 @@ public static Projector make(Schema schema, List<ExpressionTree> exprs,
   public static Projector make(Schema schema, List<ExpressionTree> exprs,
       SelectionVectorType selectionVectorType,
       long configurationId) throws GandivaException {
+    return make(schema, exprs, selectionVectorType, configurationId, null);
+  }
+    
+  /**
+   * Invoke this function to generate LLVM code to evaluate the list of project expressions.
+   * Invoke Projector::Evaluate() against a RecordBatch to evaluate the record batch
+   * against these projections.
+   *
+   * @param schema         Table schema. The field names in the schema should match the fields used
+   *                       to create the TreeNodes
+   * @param exprs          List of expressions to be evaluated against data
+   * @param configOptions  ConfigOptions parameter
+   * @param secondaryCache SecondaryCache persistant cache for gandiva object code.
+   *
+   * @return A native evaluator object that can be used to invoke these projections on a RecordBatch
+   */
+  public static Projector make(Schema schema, List<ExpressionTree> exprs,
+      ConfigurationBuilder.ConfigOptions configOptions,
+      JavaSecondaryCacheInterface secondaryCache) throws GandivaException {
+    return make(schema, exprs, SelectionVectorType.SV_NONE, JniLoader.getConfiguration(configOptions), secondaryCache);
+  }
+    
+  /**
+   * Invoke this function to generate LLVM code to evaluate the list of project expressions.
+   * Invoke Projector::Evaluate() against a RecordBatch to evaluate the record batch
+   * against these projections.
+   *
+   * @param schema              Table schema. The field names in the schema should match the fields used
+   *                            to create the TreeNodes
+   * @param exprs               List of expressions to be evaluated against data
+   * @param selectionVectorType type of selection vector
+   * @param configurationId     Custom configuration created through config builder.
+   * @param secondaryCache      SecondaryCache persistant cache for gandiva object code.

Review Comment:
   nit: probably no need to mention persistent..The implementation may be anything



##########
cpp/src/gandiva/engine.cc:
##########
@@ -89,6 +89,7 @@ std::once_flag llvm_init_once_flag;
 static bool llvm_init = false;
 static llvm::StringRef cpu_name;
 static llvm::SmallVector<std::string, 10> cpu_attrs;
+static char* engine_str_;

Review Comment:
   use a different name..like cpu_details...or something else



##########
java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/JavaSecondaryCacheInterface.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.gandiva.evaluator;
+
+/**
+ * Acts as an interface to the secondary cache class that uses a callback
+ * mechanism from gandiva.
+ */
+public interface JavaSecondaryCacheInterface {
+  /**
+   * Resultant buffer of the get call to the secondary cache.
+   */
+  class ExpandResult {

Review Comment:
   we don't do any expansion right? better rename it to BufferResult or something.



##########
cpp/src/gandiva/engine.cc:
##########
@@ -112,9 +113,20 @@ void Engine::InitOnce() {
   }
   ARROW_LOG(INFO) << "Detected CPU Name : " << cpu_name.str();
   ARROW_LOG(INFO) << "Detected CPU Features:" << cpu_attrs_str;
+
+  std::string engine_str = cpu_name.str();
+  engine_str += cpu_attrs_str;
+  engine_str_ = strdup(engine_str.c_str());

Review Comment:
   why not just store std::string variable



##########
cpp/src/gandiva/engine.cc:
##########
@@ -112,9 +113,20 @@ void Engine::InitOnce() {
   }
   ARROW_LOG(INFO) << "Detected CPU Name : " << cpu_name.str();
   ARROW_LOG(INFO) << "Detected CPU Features:" << cpu_attrs_str;
+
+  std::string engine_str = cpu_name.str();
+  engine_str += cpu_attrs_str;
+  engine_str_ = strdup(engine_str.c_str());
+
   llvm_init = true;
 }
 
+char* Engine::ToString() {

Review Comment:
   also use a different name for this method..since we are only resulting the cpu details



##########
cpp/src/gandiva/engine.cc:
##########
@@ -112,9 +113,20 @@ void Engine::InitOnce() {
   }
   ARROW_LOG(INFO) << "Detected CPU Name : " << cpu_name.str();
   ARROW_LOG(INFO) << "Detected CPU Features:" << cpu_attrs_str;
+
+  std::string engine_str = cpu_name.str();

Review Comment:
   use a different name..like cpu_details...or something



##########
cpp/src/gandiva/projector.cc:
##########
@@ -83,6 +100,21 @@ Status Projector::Make(SchemaPtr schema, const ExpressionVector& exprs,
   std::unique_ptr<LLVMGenerator> llvm_gen;
   ARROW_RETURN_NOT_OK(LLVMGenerator::Make(configuration, is_cached, &llvm_gen));
 
+  if (!is_cached && sec_cache != nullptr) {
+    std::string key = std::string(Engine::ToString()) + cache_key.ToString();

Review Comment:
   also add a comment there why cpu details are required



##########
cpp/src/gandiva/projector.cc:
##########
@@ -83,6 +100,21 @@ Status Projector::Make(SchemaPtr schema, const ExpressionVector& exprs,
   std::unique_ptr<LLVMGenerator> llvm_gen;
   ARROW_RETURN_NOT_OK(LLVMGenerator::Make(configuration, is_cached, &llvm_gen));
 
+  if (!is_cached && sec_cache != nullptr) {
+    std::string key = std::string(Engine::ToString()) + cache_key.ToString();

Review Comment:
   Have a separate method to get the sec cache key..which both projector and filter calls..also maybe add a delimiter char between the engine & cache_key.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r941374071


##########
cpp/src/gandiva/engine.cc:
##########
@@ -112,9 +113,20 @@ void Engine::InitOnce() {
   }
   ARROW_LOG(INFO) << "Detected CPU Name : " << cpu_name.str();
   ARROW_LOG(INFO) << "Detected CPU Features:" << cpu_attrs_str;
+
+  std::string engine_str = cpu_name.str();
+  engine_str += cpu_attrs_str;
+  engine_str_ = strdup(engine_str.c_str());

Review Comment:
   Why would that be so?



##########
cpp/src/gandiva/engine.cc:
##########
@@ -112,9 +113,20 @@ void Engine::InitOnce() {
   }
   ARROW_LOG(INFO) << "Detected CPU Name : " << cpu_name.str();
   ARROW_LOG(INFO) << "Detected CPU Features:" << cpu_attrs_str;
+
+  std::string cpu_details = cpu_name.str();
+  cpu_details += cpu_attrs_str;
+  cpu_details_ = strdup(cpu_details.c_str());
+
   llvm_init = true;
 }
 
+char* Engine::getCPUDetails() {

Review Comment:
   Capitalise the function name. Also a better name would be GetCpuIdentifier()



##########
java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/ProjectorTest.java:
##########
@@ -2599,4 +2599,33 @@ public void testInitCap() throws Exception {
     releaseRecordBatch(batch);
     releaseValueVectors(output);
   }
+
+  public class DummySecondaryCache implements JavaSecondaryCacheInterface {
+    public BufferResult getSecondaryCache(long addrKey, long sizeKey) {
+      return null;
+    }
+
+    public void setSecondaryCache(long addrKey, long sizeKey, long addrValue, long sizeValue) {
+    }
+  }
+
+  @Test
+  public void testSecondaryCache() throws GandivaException {
+    Field a = Field.nullable("a", int32);
+    Field b = Field.nullable("b", int32);
+    List<Field> args = Lists.newArrayList(a, b);
+
+    Field retType = Field.nullable("c", int32);
+    ExpressionTree root = TreeBuilder.makeExpression("add", args, retType);
+
+    List<ExpressionTree> exprs = Lists.newArrayList(root);
+
+    Schema schema = new Schema(args);
+
+    DummySecondaryCache secondaryCache = new DummySecondaryCache();
+
+    Projector eval = Projector.make(schema, exprs,

Review Comment:
   Verify that the DummySecondaryCache Get And Set methods are called by setting boolean flags.
   Also add the evaluation and result checking for completeness sake



##########
cpp/src/gandiva/engine.cc:
##########
@@ -112,9 +113,20 @@ void Engine::InitOnce() {
   }
   ARROW_LOG(INFO) << "Detected CPU Name : " << cpu_name.str();
   ARROW_LOG(INFO) << "Detected CPU Features:" << cpu_attrs_str;
+
+  std::string cpu_details = cpu_name.str();
+  cpu_details += cpu_attrs_str;
+  cpu_details_ = strdup(cpu_details.c_str());
+
   llvm_init = true;
 }
 
+char* Engine::getCPUDetails() {
+  DCHECK_EQ(llvm_init, true);

Review Comment:
   You can add a static string variable inside this method



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] siddhantrao23 commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
siddhantrao23 commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r964833628


##########
cpp/src/gandiva/secondary_cache.h:
##########
@@ -0,0 +1,38 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "arrow/buffer.h"
+
+namespace gandiva {
+
+// secondary cache c++ interface
+class GANDIVA_EXPORT SecondaryCacheInterface {
+ public:
+  virtual std::shared_ptr<arrow::Buffer> Get(
+      std::shared_ptr<arrow::Buffer> serialized_expr) = 0;
+
+  virtual void Set(std::shared_ptr<arrow::Buffer> serialized_expr,
+                   std::shared_ptr<arrow::Buffer> value) = 0;
+
+  virtual void ReleaseMemory(long address) = 0;

Review Comment:
   Changed method name to releaseBufferResult and removed it from the c++ interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #13446:
URL: https://github.com/apache/arrow/pull/13446#discussion_r965833974


##########
java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/JavaSecondaryCacheInterface.java:
##########
@@ -36,9 +36,11 @@ public BufferResult(long address, long size) {
 
   }
 
-  BufferResult getSecondaryCache(long addr, long size);
+  BufferResult get(long addr, long size);
 
-  void setSecondaryCache(long addrExpr, long sizeExpr, long addr, long size);
+  void set(long addrExpr, long sizeExpr, long addr, long size);

Review Comment:
   Can you add comments  before the methods that also describes the parameters?
   Also rename the parameters to more clear names..like keyBufAddr, keyBufSize, valueBufAddr, valueBufSize



##########
cpp/src/gandiva/jni/jni_common.cc:
##########
@@ -663,7 +665,17 @@ std::shared_ptr<arrow::Buffer> JavaSecondaryCache::Get(
   auto data = std::shared_ptr<arrow::Buffer>(
       new arrow::Buffer(reinterpret_cast<uint8_t*>(ret_address), ret_size));
 
-  return data;
+  // copy the buffer and release the original memory
+  auto result =
+      arrow::Buffer::CopyNonOwned(*data.get(), arrow::default_cpu_memory_manager());

Review Comment:
   how is this memory released back?



##########
cpp/src/gandiva/jni/jni_common.cc:
##########
@@ -663,7 +665,17 @@ std::shared_ptr<arrow::Buffer> JavaSecondaryCache::Get(
   auto data = std::shared_ptr<arrow::Buffer>(

Review Comment:
   you can just stack allocate here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org