You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "js8544 (via GitHub)" <gi...@apache.org> on 2023/06/14 15:54:46 UTC
[GitHub] [arrow] js8544 opened a new pull request, #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
js8544 opened a new pull request, #36067:
URL: https://github.com/apache/arrow/pull/36067
<!--
Thanks for opening a pull request!
If this is your first pull request you can find detailed information on how
to contribute here:
* [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
* [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
Then could you also rename the pull request title in the following format?
GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
or
MINOR: [${COMPONENT}] ${SUMMARY}
In the case of PARQUET issues on JIRA the title also supports:
PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
-->
### Rationale for this change
<!--
Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
-->
Performance improvement for scalar lookup functions.
### What changes are included in this PR?
<!--
There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
-->
1. Reserve space for hashtable for scalar lookup functions before the insertion process.
2. A benchmark that demonstrates the improvement. Data is pasted in the comments.
### Are these changes tested?
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
4. Serve as another way to document the expected behavior of the code
If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
-->
By existing unit tests.
### Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be updated before approving the PR.
-->
<!--
If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
-->
<!-- **This PR includes breaking changes to public APIs.** -->
<!--
Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
-->
<!-- **This PR contains a "Critical Fix".** -->
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238611574
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,51 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IndexInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0), 10);
}
static void IsInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IsInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0), 10);
Review Comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238646880
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -134,10 +151,12 @@ BENCHMARK(IndexInInt8SmallSet)->RangeMultiplier(4)->Range(2, 8);
BENCHMARK(IndexInInt16SmallSet)->RangeMultiplier(4)->Range(2, 64);
BENCHMARK(IndexInInt32SmallSet)->RangeMultiplier(4)->Range(2, 64);
BENCHMARK(IndexInInt64SmallSet)->RangeMultiplier(4)->Range(2, 64);
+BENCHMARK(IndexInInt32LargeSet)->RangeMultiplier(10)->Range(100, 10000000);
Review Comment:
Let's make the benchmarks shorter a bit? We don't need so many and so large value sets IMHO.
```suggestion
BENCHMARK(IndexInInt32LargeSet)->RangeMultiplier(100)->Range(1000, 1000000);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1235916639
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,51 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IndexInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0), 10);
}
static void IsInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IsInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0), 10);
Review Comment:
Why `10` here instead of `1 << 18`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238642308
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -56,8 +56,8 @@ static void SetLookupBenchmarkString(benchmark::State& state,
template <typename Type>
static void SetLookupBenchmarkNumeric(benchmark::State& state,
const std::string& func_name,
- const int64_t value_set_length) {
- const int64_t array_length = 1 << 18;
+ const int64_t value_set_length,
+ const int64_t array_length) {
Review Comment:
perhaps add `state.counters["value_set_length"] = value_set_length` somewhere below to make results more readable? (same in the other bench function)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238355926
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,51 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IndexInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0), 10);
}
static void IsInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IsInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0), 10);
Review Comment:
Well, we should keep the benchmarks meaningful and relevant. A 10-length input array does not really represent real-world use cases.
We can also add this optimization without adding micro-benchmarks for it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238477686
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup.cc:
##########
@@ -36,16 +36,23 @@ namespace {
template <typename Type>
struct SetLookupState : public KernelState {
- explicit SetLookupState(MemoryPool* pool) : lookup_table(pool, 0) {}
+ explicit SetLookupState(MemoryPool* pool) : lookup_table(pool, 0), memory_pool(pool) {}
Review Comment:
Good point. Changed it to `std::optional`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238618296
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup.cc:
##########
@@ -54,8 +61,8 @@ struct SetLookupState : public KernelState {
} else {
return Status::Invalid("value_set should be an array or chunked array");
}
- if (!options.skip_nulls && lookup_table.GetNull() >= 0) {
- null_index = memo_index_to_value_index[lookup_table.GetNull()];
+ if (!options.skip_nulls && lookup_table.value().GetNull() >= 0) {
+ null_index = memo_index_to_value_index[lookup_table.value().GetNull()];
Review Comment:
Can probably use the dereference operator?
```suggestion
if (!options.skip_nulls && lookup_table->GetNull() >= 0) {
null_index = memo_index_to_value_index[lookup_table->GetNull()];
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou merged pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #36067:
URL: https://github.com/apache/arrow/pull/36067
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238606021
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,51 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IndexInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0), 10);
}
static void IsInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IsInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0), 10);
Review Comment:
Well, we can keep the large set benchmarks, but with a "normal" array size.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238675399
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -56,8 +56,8 @@ static void SetLookupBenchmarkString(benchmark::State& state,
template <typename Type>
static void SetLookupBenchmarkNumeric(benchmark::State& state,
const std::string& func_name,
- const int64_t value_set_length) {
- const int64_t array_length = 1 << 18;
+ const int64_t value_set_length,
+ const int64_t array_length) {
Review Comment:
done
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,52 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238618585
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup.cc:
##########
@@ -54,8 +61,8 @@ struct SetLookupState : public KernelState {
} else {
return Status::Invalid("value_set should be an array or chunked array");
}
- if (!options.skip_nulls && lookup_table.GetNull() >= 0) {
- null_index = memo_index_to_value_index[lookup_table.GetNull()];
+ if (!options.skip_nulls && lookup_table.value().GetNull() >= 0) {
+ null_index = memo_index_to_value_index[lookup_table.value().GetNull()];
Review Comment:
Same in other occurrences below.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238621255
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup.cc:
##########
@@ -54,8 +61,8 @@ struct SetLookupState : public KernelState {
} else {
return Status::Invalid("value_set should be an array or chunked array");
}
- if (!options.skip_nulls && lookup_table.GetNull() >= 0) {
- null_index = memo_index_to_value_index[lookup_table.GetNull()];
+ if (!options.skip_nulls && lookup_table.value().GetNull() >= 0) {
+ null_index = memo_index_to_value_index[lookup_table.value().GetNull()];
Review Comment:
Done. They look nicer indeed!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238675747
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -134,10 +151,12 @@ BENCHMARK(IndexInInt8SmallSet)->RangeMultiplier(4)->Range(2, 8);
BENCHMARK(IndexInInt16SmallSet)->RangeMultiplier(4)->Range(2, 64);
BENCHMARK(IndexInInt32SmallSet)->RangeMultiplier(4)->Range(2, 64);
BENCHMARK(IndexInInt64SmallSet)->RangeMultiplier(4)->Range(2, 64);
+BENCHMARK(IndexInInt32LargeSet)->RangeMultiplier(10)->Range(100, 10000000);
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36067:
URL: https://github.com/apache/arrow/pull/36067#issuecomment-1606361220
Conbench analyzed the 6 benchmark runs on commit `2ce4a38f`.
There were 9 benchmark results indicating a performance regression:
- Commit Run on `ursa-thinkcentre-m75q` at [2023-06-25 05:44:29Z](http://conbench.ursa.dev/compare/runs/a7e4561927c347279aec1272a1d9f746...530d7cc6a6e3438996dc8ae202561b4a/)
- [params=<Add, DoubleType>/size:524288/inverse_null_proportion:100, source=cpp-micro, suite=arrow-compute-scalar-arithmetic-benchmark](http://conbench.ursa.dev/compare/benchmarks/06497aef183a7675800026714e13d13d...06497d498b0e712b80003aa1dac08dc8)
- [params=<Multiply, Int8Type>/size:524288/inverse_null_proportion:100, source=cpp-micro, suite=arrow-compute-scalar-arithmetic-benchmark](http://conbench.ursa.dev/compare/benchmarks/06497aef73607f918000944cfe75116b...06497d49e5c172b780008efad0b1d735)
- and 7 more (see the report linked below)
The [full Conbench report](https://github.com/apache/arrow/runs/14539738701) has more details.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238609790
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,51 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IndexInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0), 10);
}
static void IsInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IsInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0), 10);
Review Comment:
That sounds fine to me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238490730
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,51 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IndexInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0), 10);
}
static void IsInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IsInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0), 10);
Review Comment:
Actually, we do use `IndexIn` in a similar fashion. In our recommender system, we need to find if the candidate items (25~50ish) appear in the user's action history (up to 100000 for really acrive users). However, I can remove this if you have a strong opinion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on PR #36067:
URL: https://github.com/apache/arrow/pull/36067#issuecomment-1591552160
Benchmark result:
Before:
```
----------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------
IndexInInt32LargeSet/100 2883 ns 2883 ns 246728 bytes_per_second=13.2302M/s items_per_second=3.46823M/s
IndexInInt32LargeSet/1000 10322 ns 10322 ns 63804 bytes_per_second=3.6958M/s items_per_second=968.833k/s
IndexInInt32LargeSet/10000 139318 ns 139316 ns 4987 bytes_per_second=280.389k/s items_per_second=71.7795k/s
IndexInInt32LargeSet/100000 4588807 ns 4588452 ns 165 bytes_per_second=8.51322k/s items_per_second=2.17938k/s
IndexInInt32LargeSet/1000000 90538493 ns 90536500 ns 10 bytes_per_second=441.811/s items_per_second=110.453/s
IndexInInt32LargeSet/10000000 1428412088 ns 1428369286 ns 1 bytes_per_second=28.004/s items_per_second=7.00099/s
IsInInt32LargeSet/100 2833 ns 2833 ns 234016 bytes_per_second=13.4639M/s items_per_second=3.52948M/s
IsInInt32LargeSet/1000 10376 ns 10376 ns 67001 bytes_per_second=3.6765M/s items_per_second=963.773k/s
IsInInt32LargeSet/10000 148294 ns 148284 ns 4769 bytes_per_second=263.43k/s items_per_second=67.4381k/s
IsInInt32LargeSet/100000 4463505 ns 4463293 ns 147 bytes_per_second=8.75195k/s items_per_second=2.2405k/s
IsInInt32LargeSet/1000000 75740587 ns 75738687 ns 10 bytes_per_second=528.132/s items_per_second=132.033/s
IsInInt32LargeSet/10000000 1317214434 ns 1317158744 ns 1 bytes_per_second=30.3684/s items_per_second=7.5921/s
```
After
```
----------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------
IndexInInt32LargeSet/100 2659 ns 2658 ns 266800 bytes_per_second=14.3493M/s items_per_second=3.76159M/s
IndexInInt32LargeSet/1000 9759 ns 9759 ns 70169 bytes_per_second=3.9091M/s items_per_second=1024.75k/s
IndexInInt32LargeSet/10000 93311 ns 93309 ns 7477 bytes_per_second=418.634k/s items_per_second=107.17k/s
IndexInInt32LargeSet/100000 1729160 ns 1729162 ns 453 bytes_per_second=22.5904k/s items_per_second=5.78315k/s
IndexInInt32LargeSet/1000000 79806851 ns 79805041 ns 9 bytes_per_second=501.221/s items_per_second=125.305/s
IndexInInt32LargeSet/10000000 1112193163 ns 1112056579 ns 1 bytes_per_second=35.9694/s items_per_second=8.99235/s
IsInInt32LargeSet/100 2518 ns 2518 ns 280148 bytes_per_second=15.1526M/s items_per_second=3.97215M/s
IsInInt32LargeSet/1000 9625 ns 9624 ns 72600 bytes_per_second=3.96356M/s items_per_second=1039.02k/s
IsInInt32LargeSet/10000 93759 ns 93754 ns 7256 bytes_per_second=416.647k/s items_per_second=106.662k/s
IsInInt32LargeSet/100000 1608221 ns 1608152 ns 495 bytes_per_second=24.2903k/s items_per_second=6.21832k/s
IsInInt32LargeSet/1000000 73113133 ns 73112833 ns 9 bytes_per_second=547.1/s items_per_second=136.775/s
IsInInt32LargeSet/10000000 1207785785 ns 1207731078 ns 1 bytes_per_second=33.12/s items_per_second=8.27999/s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1236275360
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,51 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IndexInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0), 10);
}
static void IsInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IsInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0), 10);
Review Comment:
If query length is `1 << 18`, the execution time would be dominated by the "lookup" phase and the optimization I implemented would have very little effect. The idea is that the new `LargeSet` benchmarks are to measure the performance of the "preparation" phase, while the existing `SmallSet` benchmarks are to measure the "lookup" phase.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238359428
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup.cc:
##########
@@ -36,16 +36,23 @@ namespace {
template <typename Type>
struct SetLookupState : public KernelState {
- explicit SetLookupState(MemoryPool* pool) : lookup_table(pool, 0) {}
+ explicit SetLookupState(MemoryPool* pool) : lookup_table(pool, 0), memory_pool(pool) {}
Review Comment:
Ok, but this makes an initial memory allocation for the lookup table which is immediately undone in `Init`.
A possible way around that is to make `lookup_table` a `std::optional<MemoTable>`, to allow deferred initialization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238609030
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,51 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IndexInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IndexInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "index_in_meta_binary", state.range(0), 10);
}
static void IsInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt16SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int16Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt32SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
}
static void IsInInt64SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int64Type>(state, "is_in_meta_binary", state.range(0),
+ 1 << 18);
+}
+
+static void IsInInt32LargeSet(benchmark::State& state) {
+ SetLookupBenchmarkNumeric<Int32Type>(state, "is_in_meta_binary", state.range(0), 10);
Review Comment:
Ok, how about 1000?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a diff in pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36067:
URL: https://github.com/apache/arrow/pull/36067#discussion_r1238644123
##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc:
##########
@@ -91,35 +91,52 @@ static void IsInStringLargeSet(benchmark::State& state) {
}
static void IndexInInt8SmallSet(benchmark::State& state) {
- SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0));
+ SetLookupBenchmarkNumeric<Int8Type>(state, "index_in_meta_binary", state.range(0),
+ 1 << 18);
Review Comment:
Hmm, can we avoid the repeated hardcoded value?
You could define a constant such as `constexpr int64 kArrayLengthWithSmallSet = 1 << 18`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #36067: GH-36059: [C++][Compute] Reserve space for hashtable for scalar lookup functions
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36067:
URL: https://github.com/apache/arrow/pull/36067#issuecomment-1591531791
:warning: GitHub issue #36059 **has been automatically assigned in GitHub** to PR creator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org