You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/16 13:20:53 UTC

[GitHub] [arrow] raulcd opened a new issue, #33699: [CI][C++] Nightly tests for valgrind have been failing for the last

raulcd opened a new issue, #33699:
URL: https://github.com/apache/arrow/issues/33699

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   Based on http://crossbow.voltrondata.com/
   The [test-conda-cpp-valgrind](https://github.com/ursacomputing/crossbow/runs/10663265308) job has been failing on our nightlies for the last 26 days.
   The latest error includes several test errors:
   ```
   89% tests passed, 8 tests failed out of 70
   
   Label Time Summary:
   arrow-tests        = 1201.63 sec*proc (35 tests)
   arrow_compute      = 1991.41 sec*proc (12 tests)
   arrow_dataset      = 1085.71 sec*proc (10 tests)
   arrow_substrait    =  45.52 sec*proc (1 test)
   filesystem         =  70.61 sec*proc (3 tests)
   parquet-tests      = 799.20 sec*proc (9 tests)
   plasma-tests       = 294.49 sec*proc (3 tests)
   unittest           = 5417.95 sec*proc (70 tests)
   
   Total Test time (real) = 2787.52 sec
   
   The following tests FAILED:
   	 31 - arrow-compute-scalar-test (Timeout)
   	 32 - arrow-compute-vector-test (Timeout)
   	 37 - arrow-compute-hash-join-node-test (Timeout)
   	 39 - arrow-compute-tpch-node-test (Timeout)
   	 40 - arrow-compute-union-node-test (Timeout)
   	 49 - arrow-dataset-scanner-test (Timeout)
   	 50 - arrow-dataset-file-csv-test (Timeout)
   	 65 - parquet-arrow-test (Timeout)
   Errors while running CTest
   ```
   
   ### Component(s)
   
   C++, Continuous Integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on issue #33699: [CI][C++] Nightly tests for valgrind have been failing for the last

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on issue #33699:
URL: https://github.com/apache/arrow/issues/33699#issuecomment-1405000361

   I think the strategy should be two-pronged:
   * increase the test timeout on Valgrind CI jobs specifically
   * analyze the runtimes of these long-running tests and see if any particular test cases can be blamed; sometimes a random or stress test can take a lot of time, in which case we can tweak the number of iterations for Valgrind jobs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace closed issue #33699: [CI][C++] Nightly tests for valgrind have been failing for the last

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace closed issue #33699: [CI][C++] Nightly tests for valgrind have been failing for the last 
URL: https://github.com/apache/arrow/issues/33699


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #33699: [CI][C++] Nightly tests for valgrind have been failing for the last

Posted by GitBox <gi...@apache.org>.

westonpace commented on issue #33699:
URL: https://github.com/apache/arrow/issues/33699#issuecomment-1398505553

   I tried looking into this a bit more today.  I ran the `parquet-reader-test` on master, on the same commit that last passed (df4cb9588) and on a really old commit (54ff2d8777717ea5bb811f3653deeb12fc93452e) and finally on the 8.0.0 release build (from May).  All  runs came very close to the 5 minute mark (actually the third run went over).  However, I didn't notice any significant differences. The most recent build performed best.
   
   If anything, the oddity is that the valgrind job passed.  The `test-conda-cpp-valgrind` job has been failing regularly all the way back to September and only passed for a few days in December.
   
   I recommend increasing the timeout similar to what we did with the TSAN build and then revisit in a few weeks.  @pitrou any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #33699: [CI][C++] Nightly tests for valgrind have been failing for the last

Posted by GitBox <gi...@apache.org>.

westonpace commented on issue #33699:
URL: https://github.com/apache/arrow/issues/33699#issuecomment-1398508810

   Alternatively, we could try reducing the runtime of these tests when valgrind is enabled.  `parquet-arrow-test` for example tries many different type variations (8 different combinations of decimal) and we could probably trim that down when valgrind is enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #33699: [CI][C++] Nightly tests for valgrind have been failing for the last

Posted by GitBox <gi...@apache.org>.

westonpace commented on issue #33699:
URL: https://github.com/apache/arrow/issues/33699#issuecomment-1384138560

   This is an odd collection of tests.  I don't think I've ever seen `arrow-compute-union-node-test` fail and `parquet-reader-test` doesn't really fit in with the mix.  Note that the server seems to be under extreme stress (even more than usual for CI):
   
   ```
   2023-01-16T00:36:55.9251228Z  5/70 Test  #6: arrow-extension-type-test ................   Passed    9.28 sec
   ```
   
   That test takes, at most, 15ms on my server.  If I compare results with the last successful build it seems things are slow across the board, even on the tests that pass:
   
   test name | last failure | last success
   --- | --- | ---
   arrow-csv-test | 50.19 | 18.68
   arrow-compute-aggregate-test | 209.02 | 45.40
   arrow-array-test | 63.27 | 21.42
   
   I don't notice any considerable difference in compilation flags.  The only change I can see is that the passing build includes `-ggdb` and the failing build does not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org