You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/30 15:07:48 UTC

[GitHub] [arrow] pitrou opened a new pull request, #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

pitrou opened a new pull request, #13265:
URL: https://github.com/apache/arrow/pull/13265

   Repeated calls to `FileMetaData::AppendRowGroups()` would be O(n²) due to incorrect use of `std::vector::reserve`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141279766

   
   Supported benchmark command examples:
   
   `@ursabot benchmark help`
   
   To run all benchmarks:
   `@ursabot please benchmark`
   
   To filter benchmarks by language:
   `@ursabot please benchmark lang=Python`
   `@ursabot please benchmark lang=C++`
   `@ursabot please benchmark lang=R`
   `@ursabot please benchmark lang=Java`
   `@ursabot please benchmark lang=JavaScript`
   
   To filter Python and R benchmarks by name:
   `@ursabot please benchmark name=file-write`
   `@ursabot please benchmark name=file-write lang=Python`
   `@ursabot please benchmark name=file-.*`
   
   To filter C++ benchmarks by archery --suite-filter and --benchmark-filter:
   `@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-vector-selection-benchmark --benchmark-filter=TakeStringRandomIndicesWithNulls/262144/2 --iterations=3`
   
   For other `command=cpp-micro` options, please see https://github.com/ursacomputing/benchmarks/blob/main/benchmarks/cpp_micro_benchmarks.py
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141262609

   Using the test script from https://github.com/apache/arrow/pull/13234#issue-1248714863:
   
   * on git master:
   ```
   Created Example File
   1 took 0.116 seconds
   2 took 0.457 seconds
   3 took 1.818 seconds
   4 took 8.796 seconds
   
   real	0m11,476s
   user	0m11,665s
   sys	0m1,140s
   ```
   
   * with this PR:
   ```
   Created Example File
   1 took 0.002 seconds
   2 took 0.004 seconds
   3 took 0.007 seconds
   4 took 0.015 seconds
   
   real	0m0,313s
   user	0m0,579s
   sys	0m1,246s
   ```
   
   * with PR https://github.com/apache/arrow/pull/13234:
   ```
   Created Example File
   1 took 0.002 seconds
   2 took 0.003 seconds
   3 took 0.007 seconds
   4 took 0.015 seconds
   
   real	0m0,336s
   user	0m0,604s
   sys	0m1,207s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141261768

   https://issues.apache.org/jira/browse/ARROW-16613


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kylebarron commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
kylebarron commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141374954

   Wow that's cool. I'm impressed it can match the speed of #13234 without presenting a vectorized API to Python.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141279980

   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141279749

   @ursabot benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141280065

   Benchmark runs are scheduled for baseline = 01d8485d17adacbe75dac2a9b97c34dc28ca31f5 and contender = 454dd5932403c826ad883e0d75d705966d37bee0. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/2319c8a037064c1b8940d6ac7883d2e7...28965e8654284f3687db4e34bda8f94d/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/ac195ab5c53a4232a600bbf3f4e24d33...a8a0eeedbfbf4913bdb79c7d546ceb54/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bffab965eb3e4658acbfcee9cf821839...9390af9930d64918b4838c42e4877dee/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/9df7c51f54174addb662ecc46344342d...e975737806324fdb9c88346748d56007/)
   Buildkite builds:
   [Scheduled] [`454dd593` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/848)
   [Scheduled] [`454dd593` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/846)
   [Scheduled] [`454dd593` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/835)
   [Scheduled] [`454dd593` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/849)
   [Finished] [`01d8485d` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/847)
   [Scheduled] [`01d8485d` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/845)
   [Scheduled] [`01d8485d` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/834)
   [Scheduled] [`01d8485d` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/848)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1143506688

   Benchmark runs are scheduled for baseline = c9f7e28286cecf501bd0375535d9ad035145a0ff and contender = 7fee9db84a8bbefa62f576751e3b51118d12cab3. 7fee9db84a8bbefa62f576751e3b51118d12cab3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/85d8bbf709b9475fab551708da70e57c...6ba68aaa01584700a993838742103441/)
   [Failed :arrow_down:0.31% :arrow_up:0.16%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1186501ef1b744a2a0bd82350bc6b01e...6b7134d9e6904e808a65a10fd718093f/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/610e0aefb3ef4f0989c3f95ea40cbe30...99083782a6b94d2198c2f5fbe005806a/)
   [Finished :arrow_down:0.59% :arrow_up:0.32%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/e5e2c6d264824b48a3cda32b2af54386...224c0d5839274ccd88187d101d29ce60/)
   Buildkite builds:
   [Finished] [`7fee9db8` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/860)
   [Finished] [`7fee9db8` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/860)
   [Finished] [`7fee9db8` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/850)
   [Finished] [`7fee9db8` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/862)
   [Finished] [`c9f7e282` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/859)
   [Failed] [`c9f7e282` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/859)
   [Finished] [`c9f7e282` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/849)
   [Finished] [`c9f7e282` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/861)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou closed pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()
URL: https://github.com/apache/arrow/pull/13265


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13265: ARROW-16613: [C++][Parquet] Fix performance of repeated calls to AppendRowGroups()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13265:
URL: https://github.com/apache/arrow/pull/13265#issuecomment-1141261801

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org