You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/08 17:56:29 UTC
[GitHub] [arrow] save-buffer opened a new pull request, #12843: [C++] TPC-H generator fix and new test
save-buffer opened a new pull request, #12843:
URL: https://github.com/apache/arrow/pull/12843
We'd occasionally get a hang in the multi-column generators. This was due to us creating an insufficient number of tasks, so we'd never actually finish outputting. This fixes it.
This PR also adds a test to exercise it: it generates TPC-H SF 1 all in one plan and verifies the result.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on a diff in pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #12843:
URL: https://github.com/apache/arrow/pull/12843#discussion_r850045973
##########
cpp/src/arrow/compute/exec/tpch_node_test.cc:
##########
@@ -570,6 +606,46 @@ TEST(TpchNode, Region) {
ASSERT_EQ(num_rows, 5);
}
+TEST(TpchNode, Region) {
+ ASSERT_OK_AND_ASSIGN(auto res, GenerateTable(&TpchGen::Region));
+ VerifyRegion(res);
+}
+
+TEST(TpchNode, AllTables) {
+ constexpr double kScaleFactor = 0.05;
+ constexpr int kNumTables = 8;
+ std::array<TableNodeFn, kNumTables> tables = {
+ &TpchGen::Supplier, &TpchGen::Part, &TpchGen::PartSupp, &TpchGen::Customer,
+ &TpchGen::Orders, &TpchGen::Lineitem, &TpchGen::Nation, &TpchGen::Region,
+ };
+ using VerifyFn = void(const std::vector<ExecBatch>&, double);
+ std::array<VerifyFn*, kNumTables> verify_fns = {
+ &VerifySupplier, &VerifyPart, &VerifyPartSupp, &VerifyCustomer,
+ &VerifyOrders, &VerifyLineitem, &VerifyNation, &VerifyRegion,
+ };
+
+ std::array<AsyncGenerator<util::optional<ExecBatch>>, kNumTables> gens;
+ std::array<std::vector<ExecBatch>, kNumTables> batches;
+ ExecContext ctx(default_memory_pool(), arrow::internal::GetCpuThreadPool());
+ ASSERT_OK_AND_ASSIGN(std::shared_ptr<ExecPlan> plan, ExecPlan::Make(&ctx));
+ ASSERT_OK_AND_ASSIGN(std::unique_ptr<TpchGen> gen,
+ TpchGen::Make(plan.get(), kScaleFactor));
+ for (int i = 0; i < kNumTables; i++) {
+ ASSERT_OK(AddTableAndSinkToPlan(*plan, *gen, gens[i], tables[i]));
+ }
+
+ ASSERT_OK(plan->Validate());
+ ASSERT_OK(plan->StartProducing());
+ plan->finished().Wait();
Review Comment:
```suggestion
ASSERT_OK(plan->finished().status());
```
Or there is an `ASSERT_FINISHES_OK(plan->finished())` but I think you might need an extra include for that.
##########
cpp/src/arrow/compute/exec/tpch_node_test.cc:
##########
@@ -333,24 +343,15 @@ void CountModifiedComments(const Datum& d, int* good_count, int* bad_count) {
}
}
-TEST(TpchNode, ScaleFactor) {
Review Comment:
All of these methods moving around makes it a little harder to follow changes. Can we maybe pick an order (e.g. ScaleFactor, each table in alphabetical order, AllTables) so we can stick with it? Or is there some meaning behind this new order?
##########
cpp/src/arrow/compute/exec/tpch_node_test.cc:
##########
@@ -365,18 +366,28 @@ TEST(TpchNode, Supplier) {
}
ASSERT_EQ(seen_suppkey.size(), kExpectedRows);
ASSERT_EQ(num_rows, kExpectedRows);
- ASSERT_EQ(good_count, 5);
- ASSERT_EQ(bad_count, 5);
+ ASSERT_EQ(good_count, static_cast<int64_t>(5 * scale_factor));
+ ASSERT_EQ(bad_count, static_cast<int64_t>(5 * scale_factor));
}
-TEST(TpchNode, Part) {
- ASSERT_OK_AND_ASSIGN(auto res, GenerateTable(&TpchGen::Part));
+TEST(TpchNode, ScaleFactor) {
+ constexpr double kScaleFactor = 0.01;
+ ASSERT_OK_AND_ASSIGN(auto res, GenerateTable(&TpchGen::Supplier, kScaleFactor));
+ VerifySupplier(res, kScaleFactor);
+}
Review Comment:
Since `AllTables` is now testing a different scale factor for all of the tables I think this test is probably redundant.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098280499
@save-buffer Note you can also run the thread sanitizer build locally using `archery docker run ubuntu-cpp-thread-sanitizer` (or you can pass `-DARROW_USE_TSAN=ON` to CMake).
See https://arrow.apache.org/docs/developers/continuous_integration/archery.html for setting up Archery.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098554412
Revision: e37595074472fd6931a0e7558235bef618a93163
Submitted crossbow builds: [ursacomputing/crossbow @ actions-1854](https://github.com/ursacomputing/crossbow/branches/all?query=actions-1854)
|Task|Status|
|----|------|
|test-ubuntu-20.04-cpp-thread-sanitizer|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1854-github-test-ubuntu-20.04-cpp-thread-sanitizer)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1854-github-test-ubuntu-20.04-cpp-thread-sanitizer)|
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098553671
I'd be fine with that too
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098306611
Revision: 82f3b575d52062a66f4daf4555332846d619e45a
Submitted crossbow builds: [ursacomputing/crossbow @ actions-1847](https://github.com/ursacomputing/crossbow/branches/all?query=actions-1847)
|Task|Status|
|----|------|
|test-ubuntu-20.04-cpp-thread-sanitizer|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1847-github-test-ubuntu-20.04-cpp-thread-sanitizer)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1847-github-test-ubuntu-20.04-cpp-thread-sanitizer)|
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098482351
Revision: 716ece57f70987515a364893f27163ac958b98cb
Submitted crossbow builds: [ursacomputing/crossbow @ actions-1850](https://github.com/ursacomputing/crossbow/branches/all?query=actions-1850)
|Task|Status|
|----|------|
|test-ubuntu-20.04-cpp-thread-sanitizer|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1850-github-test-ubuntu-20.04-cpp-thread-sanitizer)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1850-github-test-ubuntu-20.04-cpp-thread-sanitizer)|
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1104930157
@github-actions crossbow submit *sanitize*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
westonpace commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1093286010
Relabled to ARROW-16148 as this does not fit the criteria for a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes) and I had created the JIRA earlier anyways.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on a diff in pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on code in PR #12843:
URL: https://github.com/apache/arrow/pull/12843#discussion_r855503777
##########
cpp/src/arrow/compute/exec/tpch_node_test.cc:
##########
@@ -570,6 +600,46 @@ TEST(TpchNode, Region) {
ASSERT_EQ(num_rows, 5);
}
+TEST(TpchNode, Region) {
+ ASSERT_OK_AND_ASSIGN(auto res, GenerateTable(&TpchGen::Region));
+ VerifyRegion(res);
+}
+
+TEST(TpchNode, AllTables) {
+ constexpr double kScaleFactor = 0.05;
+ constexpr int kNumTables = 8;
+ std::array<TableNodeFn, kNumTables> tables = {
+ &TpchGen::Supplier, &TpchGen::Part, &TpchGen::PartSupp, &TpchGen::Customer,
+ &TpchGen::Orders, &TpchGen::Lineitem, &TpchGen::Nation, &TpchGen::Region,
+ };
+ using VerifyFn = void(const std::vector<ExecBatch>&, double);
+ std::array<VerifyFn*, kNumTables> verify_fns = {
+ &VerifySupplier, &VerifyPart, &VerifyPartSupp, &VerifyCustomer,
+ &VerifyOrders, &VerifyLineitem, &VerifyNation, &VerifyRegion,
+ };
+
+ std::array<AsyncGenerator<util::optional<ExecBatch>>, kNumTables> gens;
+ std::array<std::vector<ExecBatch>, kNumTables> batches;
Review Comment:
Good point - did that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098480581
I reduced the workload size for tsan since it was such an easy change. I think the main bottleneck is the verification, particularly of LineItem, so I guess to speed it up we could make the verification multithreaded (but also more complicated).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1093285752
https://issues.apache.org/jira/browse/ARROW-16148
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1104602082
Seems like valgrind issues are fixed with this latest commit. The run failed because of a timeout in hash-join-node-test which is not affected by this PR.
Thanks to @westonpace for investigating it!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1104978093
Revision: 957b994714b04225c04ad890c42e8b87b0d347c4
Submitted crossbow builds: [ursacomputing/crossbow @ actions-1894](https://github.com/ursacomputing/crossbow/branches/all?query=actions-1894)
|Task|Status|
|----|------|
|test-fedora-r-clang-sanitizer|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-1894-azure-test-fedora-r-clang-sanitizer)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-1894-azure-test-fedora-r-clang-sanitizer)|
|test-ubuntu-18.04-r-sanitizer|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-1894-azure-test-ubuntu-18.04-r-sanitizer)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-1894-azure-test-ubuntu-18.04-r-sanitizer)|
|test-ubuntu-20.04-cpp-thread-sanitizer|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1894-github-test-ubuntu-20.04-cpp-thread-sanitizer)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1894-github-test-ubuntu-20.04-cpp-thread-sanitizer)|
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12843: [C++] TPC-H generator fix and new test
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1093140666
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
Thanks for opening a pull request!
If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW
Opening JIRAs ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
Then could you also rename pull request title in the following format?
ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
or
MINOR: [${COMPONENT}] ${SUMMARY}
See also:
* [Other pull requests](https://github.com/apache/arrow/pulls/)
* [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098279184
@github-actions crossbow submit test-ubuntu-20.04-cpp-thread-sanitizer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1108055590
Benchmark runs are scheduled for baseline = 8bd5514f52bf9cc542a389edaf697cbc2c97b752 and contender = 24b48083f102ef9366adb0797f5f8fa53c188a7b. 24b48083f102ef9366adb0797f5f8fa53c188a7b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3da722dffc1c425fbb1baacf246902af...763de36c34e14ad5a19ea6e51b0258cd/)
[Failed] [test-mac-arm](https://conbench.ursa.dev/compare/runs/6f7e0408739e4939b45cf0c079f6cbc7...e420b9dcfa84435cb2c646b6114f9630/)
[Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6eb4313b43a24b69b8f6e95ac540e12e...c2f7bb0573764be4a1355f54b8d69c18/)
[Finished :arrow_down:1.72% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/b33fa74d1d544a3b8dd5724f670dd4c4...d254f2a9be8742e68e526b9fc590604c/)
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/581| `24b48083` ec2-t3-xlarge-us-east-2>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/569| `24b48083` test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/567| `24b48083` ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/579| `24b48083` ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/580| `8bd5514f` ec2-t3-xlarge-us-east-2>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/568| `8bd5514f` test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/566| `8bd5514f` ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/578| `8bd5514f` ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on a diff in pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #12843:
URL: https://github.com/apache/arrow/pull/12843#discussion_r854945088
##########
cpp/src/arrow/compute/exec/tpch_node_test.cc:
##########
@@ -570,6 +600,46 @@ TEST(TpchNode, Region) {
ASSERT_EQ(num_rows, 5);
}
+TEST(TpchNode, Region) {
+ ASSERT_OK_AND_ASSIGN(auto res, GenerateTable(&TpchGen::Region));
+ VerifyRegion(res);
+}
+
+TEST(TpchNode, AllTables) {
+ constexpr double kScaleFactor = 0.05;
+ constexpr int kNumTables = 8;
+ std::array<TableNodeFn, kNumTables> tables = {
+ &TpchGen::Supplier, &TpchGen::Part, &TpchGen::PartSupp, &TpchGen::Customer,
+ &TpchGen::Orders, &TpchGen::Lineitem, &TpchGen::Nation, &TpchGen::Region,
+ };
+ using VerifyFn = void(const std::vector<ExecBatch>&, double);
+ std::array<VerifyFn*, kNumTables> verify_fns = {
+ &VerifySupplier, &VerifyPart, &VerifyPartSupp, &VerifyCustomer,
+ &VerifyOrders, &VerifyLineitem, &VerifyNation, &VerifyRegion,
+ };
+
+ std::array<AsyncGenerator<util::optional<ExecBatch>>, kNumTables> gens;
+ std::array<std::vector<ExecBatch>, kNumTables> batches;
Review Comment:
You can probably move `batches` inside the verification loop and then it doesn't need to be `std::array`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on a diff in pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on code in PR #12843:
URL: https://github.com/apache/arrow/pull/12843#discussion_r850055862
##########
cpp/src/arrow/compute/exec/tpch_node_test.cc:
##########
@@ -365,18 +366,28 @@ TEST(TpchNode, Supplier) {
}
ASSERT_EQ(seen_suppkey.size(), kExpectedRows);
ASSERT_EQ(num_rows, kExpectedRows);
- ASSERT_EQ(good_count, 5);
- ASSERT_EQ(bad_count, 5);
+ ASSERT_EQ(good_count, static_cast<int64_t>(5 * scale_factor));
+ ASSERT_EQ(bad_count, static_cast<int64_t>(5 * scale_factor));
}
-TEST(TpchNode, Part) {
- ASSERT_OK_AND_ASSIGN(auto res, GenerateTable(&TpchGen::Part));
+TEST(TpchNode, ScaleFactor) {
+ constexpr double kScaleFactor = 0.01;
+ ASSERT_OK_AND_ASSIGN(auto res, GenerateTable(&TpchGen::Supplier, kScaleFactor));
+ VerifySupplier(res, kScaleFactor);
+}
Review Comment:
Good point, I'll get rid of it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
westonpace commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098490647
Given it's a test I think you could probably safely drop the scale factor even further to 0.1 (0.05 for the all tables test). I'm not sure we need to limit the scale factor decrease to TSAN builds either. Given that we are constantly running CI our test time is pretty limited. As long as each node is generating a few batches of data I think we have decent coverage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098769469
Revision: 3ec0934739591ad6590711ac89c99096fc75f4ba
Submitted crossbow builds: [ursacomputing/crossbow @ actions-1855](https://github.com/ursacomputing/crossbow/branches/all?query=actions-1855)
|Task|Status|
|----|------|
|test-conda-cpp-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-1855-azure-test-conda-cpp-valgrind)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-1855-azure-test-conda-cpp-valgrind)|
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1099211302
@save-buffer Do you want to tackle the Valgrind failure as part of this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1093285777
:warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098451532
Perhaps we can generally try to make the test faster (which will benefit the TSAN builds but also other builds)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1104546361
@github-actions crossbow submit test-conda-cpp-valgrind
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace closed pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
westonpace closed pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
URL: https://github.com/apache/arrow/pull/12843
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1105143580
@jonkeane Does someone generally take a look at R sanitizer failures and open JIRAs for them?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098481457
@github-actions crossbow submit test-ubuntu-20.04-cpp-thread-sanitizer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098448448
It passes in `251103 ms total` on my desktop (i9-9900K).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on a diff in pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #12843:
URL: https://github.com/apache/arrow/pull/12843#discussion_r850764806
##########
cpp/src/arrow/compute/exec/tpch_node_test.cc:
##########
@@ -333,24 +343,15 @@ void CountModifiedComments(const Datum& d, int* good_count, int* bad_count) {
}
}
-TEST(TpchNode, ScaleFactor) {
Review Comment:
Great. As long as there is some kind of order we can stick with going forwards I'm content.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1104547059
Revision: bf68edde42072abf57d438f478d1970e00799a8f
Submitted crossbow builds: [ursacomputing/crossbow @ actions-1889](https://github.com/ursacomputing/crossbow/branches/all?query=actions-1889)
|Task|Status|
|----|------|
|test-conda-cpp-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-1889-azure-test-conda-cpp-valgrind)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-1889-azure-test-conda-cpp-valgrind)|
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jonkeane commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
jonkeane commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1105208102
You can tag @assignUser on these to take a look + create an issue. Oddly(?), the test-fedora-r-clang-sanitizer failure didn't show up in the nightly last night — and I don't see anything obvious in the commits since then that would trigger it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098553741
@github-actions crossbow submit test-ubuntu-20.04-cpp-thread-sanitizer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098768657
@github-actions crossbow submit test-conda-cpp-valgrind
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098481317
It now passes in `95053 ms total` on my desktop with tsan enabled
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098482240
I think bot doesn't listen to me...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on a diff in pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on code in PR #12843:
URL: https://github.com/apache/arrow/pull/12843#discussion_r850057113
##########
cpp/src/arrow/compute/exec/tpch_node_test.cc:
##########
@@ -333,24 +343,15 @@ void CountModifiedComments(const Datum& d, int* good_count, int* bad_count) {
}
}
-TEST(TpchNode, ScaleFactor) {
Review Comment:
The order is the order in which the tables appear in the spec: Supplier, Part, Partsupp, Customer, Orders, Lineitem, Nation, Region.
ScaleFactor was put under VerifySupplier since it was generating the Supplier table, but I've gotten rid of it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] save-buffer commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
save-buffer commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1098447447
OK it seems that the timeout was not caused by the bug that this PR is fixing (which could cause a real hang even without TSAN), it's just caused by TSAN being slow. Not sure what to do about it, maybe we can exclude this test from tsan?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] assignUser commented on pull request #12843: ARROW-16148: [C++] TPC-H generator cleanup
Posted by GitBox <gi...@apache.org>.
assignUser commented on PR #12843:
URL: https://github.com/apache/arrow/pull/12843#issuecomment-1106673435
@jonkeane I ran `test-fedora-r-clang-sanitizer` locally from the current master HEAD (which contains this PR) with no error, so I am guessing it was fixed somewhere along the line since the rebase 11 days ago?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org