You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/25 22:54:35 UTC
[GitHub] [arrow] westonpace opened a new pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
westonpace opened a new pull request #12263:
URL: https://github.com/apache/arrow/pull/12263
The test could fail when writing due to a race condition. If the batches were delivered `AAAAABBBBBCCCCC...` then by the time we need to close a file to make space we can close an already completed file (and so we won't have to open up a new one later) and we end up with 5 files for 5 partitions.
Adding `use_threads=False` to the `write_dataset` call was not sufficient. The `arrow::dataset::FileSystemDataset::Write` method was always using the CPU executor for the exec plan. In other scanner methods we base the CPU executor on the scan options (`nullptr` if `scan_options->use_threads` is `false`). Making both of these changes together seems to make the test reliably pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023048330
Benchmark runs are scheduled for baseline = 79800d4a374586a1e66bb85fc05966066ba2199a and contender = 5a51c6d2f83cdd47a006c02e624f08f992a0b761. 5a51c6d2f83cdd47a006c02e624f08f992a0b761 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f984ae935a254a7e8be5651e41f465da...154c75dd953d493fa23880beb8856c2d/)
[Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd1ddf43084f462bad710575a8aeb799...ed5910699d44462d94c10508c26927ab/)
[Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d8fcd168ef244c36b48344095d188308...54ab52cd777248c0bba0f5adcdafe161/)
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1021684894
https://issues.apache.org/jira/browse/ARROW-15438
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023048330
Benchmark runs are scheduled for baseline = 79800d4a374586a1e66bb85fc05966066ba2199a and contender = 5a51c6d2f83cdd47a006c02e624f08f992a0b761. 5a51c6d2f83cdd47a006c02e624f08f992a0b761 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f984ae935a254a7e8be5651e41f465da...154c75dd953d493fa23880beb8856c2d/)
[Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd1ddf43084f462bad710575a8aeb799...ed5910699d44462d94c10508c26927ab/)
[Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d8fcd168ef244c36b48344095d188308...54ab52cd777248c0bba0f5adcdafe161/)
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023039625
This fixes the failure for me while it could be reproduced quite reliably on master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
westonpace commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1021758382
@kszucs Feel free to merge this if you want. This should not block RC6 as it is mostly a flaky test (the threading thing has some practical implications but they are minor)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023048330
Benchmark runs are scheduled for baseline = 79800d4a374586a1e66bb85fc05966066ba2199a and contender = 5a51c6d2f83cdd47a006c02e624f08f992a0b761. 5a51c6d2f83cdd47a006c02e624f08f992a0b761 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f984ae935a254a7e8be5651e41f465da...154c75dd953d493fa23880beb8856c2d/)
[Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd1ddf43084f462bad710575a8aeb799...ed5910699d44462d94c10508c26927ab/)
[Finished :arrow_down:0.22% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d8fcd168ef244c36b48344095d188308...54ab52cd777248c0bba0f5adcdafe161/)
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023048330
Benchmark runs are scheduled for baseline = 79800d4a374586a1e66bb85fc05966066ba2199a and contender = 5a51c6d2f83cdd47a006c02e624f08f992a0b761. 5a51c6d2f83cdd47a006c02e624f08f992a0b761 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f984ae935a254a7e8be5651e41f465da...154c75dd953d493fa23880beb8856c2d/)
[Finished :arrow_down:2.5% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd1ddf43084f462bad710575a8aeb799...ed5910699d44462d94c10508c26927ab/)
[Finished :arrow_down:0.22% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d8fcd168ef244c36b48344095d188308...54ab52cd777248c0bba0f5adcdafe161/)
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] vibhatha commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
vibhatha commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1021760779
Thanks for looking into this @westonpace 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1021686025
Thanks Weston!
@lidavidm could you please verify this locally?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #12263:
URL: https://github.com/apache/arrow/pull/12263
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org