You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/25 22:54:35 UTC

[GitHub] [arrow] westonpace opened a new pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

westonpace opened a new pull request #12263:
URL: https://github.com/apache/arrow/pull/12263


   The test could fail when writing due to a race condition.  If the batches were delivered `AAAAABBBBBCCCCC...` then by the time we need to close a file to make space we can close an already completed file (and so we won't have to open up a new one later) and we end up with 5 files for 5 partitions.
   
   Adding `use_threads=False` to the `write_dataset` call was not sufficient.  The `arrow::dataset::FileSystemDataset::Write` method was always using the CPU executor for the exec plan.  In other scanner methods we base the CPU executor on the scan options (`nullptr` if `scan_options->use_threads` is `false`).  Making both of these changes together seems to make the test reliably pass.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023048330


   Benchmark runs are scheduled for baseline = 79800d4a374586a1e66bb85fc05966066ba2199a and contender = 5a51c6d2f83cdd47a006c02e624f08f992a0b761. 5a51c6d2f83cdd47a006c02e624f08f992a0b761 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f984ae935a254a7e8be5651e41f465da...154c75dd953d493fa23880beb8856c2d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd1ddf43084f462bad710575a8aeb799...ed5910699d44462d94c10508c26927ab/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d8fcd168ef244c36b48344095d188308...54ab52cd777248c0bba0f5adcdafe161/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1021684894


   https://issues.apache.org/jira/browse/ARROW-15438


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023048330


   Benchmark runs are scheduled for baseline = 79800d4a374586a1e66bb85fc05966066ba2199a and contender = 5a51c6d2f83cdd47a006c02e624f08f992a0b761. 5a51c6d2f83cdd47a006c02e624f08f992a0b761 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f984ae935a254a7e8be5651e41f465da...154c75dd953d493fa23880beb8856c2d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd1ddf43084f462bad710575a8aeb799...ed5910699d44462d94c10508c26927ab/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d8fcd168ef244c36b48344095d188308...54ab52cd777248c0bba0f5adcdafe161/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023039625


   This fixes the failure for me while it could be reproduced quite reliably on master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
westonpace commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1021758382


   @kszucs Feel free to merge this if you want.  This should not block RC6 as it is mostly a flaky test (the threading thing has some practical implications but they are minor)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023048330


   Benchmark runs are scheduled for baseline = 79800d4a374586a1e66bb85fc05966066ba2199a and contender = 5a51c6d2f83cdd47a006c02e624f08f992a0b761. 5a51c6d2f83cdd47a006c02e624f08f992a0b761 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f984ae935a254a7e8be5651e41f465da...154c75dd953d493fa23880beb8856c2d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd1ddf43084f462bad710575a8aeb799...ed5910699d44462d94c10508c26927ab/)
   [Finished :arrow_down:0.22% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d8fcd168ef244c36b48344095d188308...54ab52cd777248c0bba0f5adcdafe161/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1023048330


   Benchmark runs are scheduled for baseline = 79800d4a374586a1e66bb85fc05966066ba2199a and contender = 5a51c6d2f83cdd47a006c02e624f08f992a0b761. 5a51c6d2f83cdd47a006c02e624f08f992a0b761 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f984ae935a254a7e8be5651e41f465da...154c75dd953d493fa23880beb8856c2d/)
   [Finished :arrow_down:2.5% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd1ddf43084f462bad710575a8aeb799...ed5910699d44462d94c10508c26927ab/)
   [Finished :arrow_down:0.22% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d8fcd168ef244c36b48344095d188308...54ab52cd777248c0bba0f5adcdafe161/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vibhatha commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
vibhatha commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1021760779


   Thanks for looking into this @westonpace 👍


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #12263:
URL: https://github.com/apache/arrow/pull/12263#issuecomment-1021686025


   Thanks Weston! 
   
   @lidavidm could you please verify this locally?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #12263: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #12263:
URL: https://github.com/apache/arrow/pull/12263


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org