You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/30 15:07:48 UTC
[GitHub] [arrow] pitrou opened a new pull request, #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
pitrou opened a new pull request, #13265:
URL: https://github.com/apache/arrow/pull/13265
Repeated calls to `FileMetaData::AppendRowGroups()` would be O(n²) due to incorrect use of `std::vector::reserve`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141279766
Supported benchmark command examples:
`@ursabot benchmark help`
To run all benchmarks:
`@ursabot please benchmark`
To filter benchmarks by language:
`@ursabot please benchmark lang=Python`
`@ursabot please benchmark lang=C++`
`@ursabot please benchmark lang=R`
`@ursabot please benchmark lang=Java`
`@ursabot please benchmark lang=JavaScript`
To filter Python and R benchmarks by name:
`@ursabot please benchmark name=file-write`
`@ursabot please benchmark name=file-write lang=Python`
`@ursabot please benchmark name=file-.*`
To filter C++ benchmarks by archery --suite-filter and --benchmark-filter:
`@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-vector-selection-benchmark --benchmark-filter=TakeStringRandomIndicesWithNulls/262144/2 --iterations=3`
For other `command=cpp-micro` options, please see https://github.com/ursacomputing/benchmarks/blob/main/benchmarks/cpp_micro_benchmarks.py
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141262609
Using the test script from https://github.com/apache/arrow/pull/13234#issue-1248714863:
* on git master:
```
Created Example File
1 took 0.116 seconds
2 took 0.457 seconds
3 took 1.818 seconds
4 took 8.796 seconds
real 0m11,476s
user 0m11,665s
sys 0m1,140s
```
* with this PR:
```
Created Example File
1 took 0.002 seconds
2 took 0.004 seconds
3 took 0.007 seconds
4 took 0.015 seconds
real 0m0,313s
user 0m0,579s
sys 0m1,246s
```
* with PR https://github.com/apache/arrow/pull/13234:
```
Created Example File
1 took 0.002 seconds
2 took 0.003 seconds
3 took 0.007 seconds
4 took 0.015 seconds
real 0m0,336s
user 0m0,604s
sys 0m1,207s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141261768
https://issues.apache.org/jira/browse/ARROW-16613
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kylebarron commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
kylebarron commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141374954
Wow that's cool. I'm impressed it can match the speed of #13234 without presenting a vectorized API to Python.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141279980
@ursabot please benchmark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141279749
@ursabot benchmark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141280065
Benchmark runs are scheduled for baseline = 01d8485d17adacbe75dac2a9b97c34dc28ca31f5 and contender = 454dd5932403c826ad883e0d75d705966d37bee0. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/2319c8a037064c1b8940d6ac7883d2e7...28965e8654284f3687db4e34bda8f94d/)
[Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/ac195ab5c53a4232a600bbf3f4e24d33...a8a0eeedbfbf4913bdb79c7d546ceb54/)
[Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bffab965eb3e4658acbfcee9cf821839...9390af9930d64918b4838c42e4877dee/)
[Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/9df7c51f54174addb662ecc46344342d...e975737806324fdb9c88346748d56007/)
Buildkite builds:
[Scheduled] [`454dd593` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/848)
[Scheduled] [`454dd593` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/846)
[Scheduled] [`454dd593` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/835)
[Scheduled] [`454dd593` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/849)
[Finished] [`01d8485d` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/847)
[Scheduled] [`01d8485d` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/845)
[Scheduled] [`01d8485d` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/834)
[Scheduled] [`01d8485d` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/848)
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1143506688
Benchmark runs are scheduled for baseline = c9f7e28286cecf501bd0375535d9ad035145a0ff and contender = 7fee9db84a8bbefa62f576751e3b51118d12cab3. 7fee9db84a8bbefa62f576751e3b51118d12cab3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/85d8bbf709b9475fab551708da70e57c...6ba68aaa01584700a993838742103441/)
[Failed :arrow_down:0.31% :arrow_up:0.16%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1186501ef1b744a2a0bd82350bc6b01e...6b7134d9e6904e808a65a10fd718093f/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/610e0aefb3ef4f0989c3f95ea40cbe30...99083782a6b94d2198c2f5fbe005806a/)
[Finished :arrow_down:0.59% :arrow_up:0.32%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/e5e2c6d264824b48a3cda32b2af54386...224c0d5839274ccd88187d101d29ce60/)
Buildkite builds:
[Finished] [`7fee9db8` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/860)
[Finished] [`7fee9db8` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/860)
[Finished] [`7fee9db8` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/850)
[Finished] [`7fee9db8` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/862)
[Finished] [`c9f7e282` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/859)
[Failed] [`c9f7e282` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/859)
[Finished] [`c9f7e282` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/849)
[Finished] [`c9f7e282` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/861)
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
URL: https://github.com/apache/arrow/pull/13265
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141261801
:warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org