You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "raulcd (via GitHub)" <gi...@apache.org> on 2023/03/20 14:58:51 UTC

[GitHub] [arrow] raulcd opened a new issue, #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`

raulcd opened a new issue, #34643:
URL: https://github.com/apache/arrow/issues/34643

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   For the last ~5 days the nightly tests for [test-build-cpp-fuzz](https://github.com/ursacomputing/crossbow/actions/runs/4463511847/jobs/7838805718) have been failing with:
   ```
    + cp --backup=numbered /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_nested_large_offsets.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_extension.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_primitive_no_batches.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_recursive_nested.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_dictionary_unsigned.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_nested.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_union.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_duplicate_fieldnames.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_dictionary.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bige
 ndian/generated_null.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_decimal.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_interval.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_primitive.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_datetime.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_null_trivial.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_primitive_zerolength.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_decimal256.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_map.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_primitive_large_offsets.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_custom_
 metadata.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_nested_dictionary.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian/generated_map_non_canonical.stream /src/arrow/testing/data/arrow-ipc-stream/integration/0.17.1/generated_union.stream /src/arrow/testing/data/arrow-ipc-stream/integration/2.0.0-compression/generated_uncompressible_zstd.stream /src/arrow/testing/data/arrow-ipc-stream/integration/2.0.0-compression/generated_zstd.stream /src/arrow/testing/data/arrow-ipc-stream/integration/2.0.0-compression/generated_lz4.stream /src/arrow/testing/data/arrow-ipc-stream/integration/2.0.0-compression/generated_uncompressible_lz4.stream /src/arrow/testing/data/arrow-ipc-stream/integration/0.14.1/generated_primitive_no_batches.stream /src/arrow/testing/data/arrow-ipc-stream/integration/0.14.1/generated_nested.stream /src/arrow/testing/data/arrow-ipc-stream/integration/0.14.1/generated_dictionary.stream /src/arrow/testing/d
 ata/arrow-ipc-stream/integration/0.14.1/generated_decimal.stream /src/arrow/testing/data/arrow-ipc-stream/integration/0.14.1/generated_interval.stream /src/arrow/testing/data/arrow-ipc-stream/integration/0.14.1/generated_primitive.stream /src/arrow/testing/data/arrow-ipc-stream/integration/0.14.1/generated_datetime.stream /src/arrow/testing/data/arrow-ipc-stream/integration/0.14.1/generated_primitive_zerolength.stream /src/arrow/testing/data/arrow-ipc-stream/integration/0.14.1/generated_map.stream /src/arrow/testing/data/arrow-ipc-stream/integration/4.0.0-shareddict/generated_shared_dict.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_nested_large_offsets.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_extension.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_primitive_no_batches.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/gene
 rated_recursive_nested.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_dictionary_unsigned.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_nested.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_union.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_duplicate_fieldnames.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_dictionary.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_null.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_decimal.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_interval.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_primitive.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendi
 an/generated_datetime.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_null_trivial.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_primitive_zerolength.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_decimal256.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_map.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_primitive_large_offsets.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_custom_metadata.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_nested_dictionary.stream /src/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_map_non_canonical.stream /tmp/corpus
   + /src/arrow/cpp/build-support/fuzzing/pack_corpus.py /tmp/corpus /out/arrow-ipc-stream-fuzz_seed_corpus.zip
   Traceback (most recent call last):
     File "/src/arrow/cpp/build-support/fuzzing/pack_corpus.py", line 54, in <module>
       main(sys.argv[1], sys.argv[2])
     File "/src/arrow/cpp/build-support/fuzzing/pack_corpus.py", line 47, in main
       process_dir(Path(corpus_dir), zip_output)
     File "/src/arrow/cpp/build-support/fuzzing/pack_corpus.py", line 39, in process_dir
       raise ValueError("Duplicate hash: {0} (in file {1})"
   ValueError: Duplicate hash: 19530341bcb2ce81c942121d5719bf1ed35c2b8c (in file /tmp/corpus/generated_uncompressible_lz4.stream)
   ERROR:root:Building fuzzers failed.
   Error: Process completed with exit code 1.
   ```
   Between the last success and the first failure those are the commits that were merged:
   https://github.com/apache/arrow/compare/fe88d9ad5c346786842913c8d2a369db099b5406...69132a177b0a0e8b7f6a8dac48e7e66fc521013b
   
   ### Component(s)
   
   C++, Continuous Integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34643:
URL: https://github.com/apache/arrow/issues/34643#issuecomment-1476525002

   https://github.com/apache/arrow-testing/pull/88
   
   I poked at it with a Java debugger and the LZ4 files were both ZSTD encoded (oops). I rewrote the files with LZ4, also tested that (at least in Java) the code path for an uncompressed buffer is actually hit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34643:
URL: https://github.com/apache/arrow/issues/34643#issuecomment-1476503018

   Ah, the LZ4 file is actually ZSTD. Let me fix that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on issue #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`

Posted by "raulcd (via GitHub)" <gi...@apache.org>.
raulcd commented on issue #34643:
URL: https://github.com/apache/arrow/issues/34643#issuecomment-1476391336

   @zeroshade @benibus reviewing the commits that could potentially cause this nightly failures I suspect this could be the one that introduced the issue. Do you think this could be related? https://github.com/apache/arrow/commit/2ec02154d6df05c916679938190a0dd40de1fa5d


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34643:
URL: https://github.com/apache/arrow/issues/34643#issuecomment-1476430336

   I do see 
   ```
   a88891b8934f9123e854d0f0369e49385175460c16a6bfb8b7fcb333d318d372  generated_uncompressible_lz4.stream
   a88891b8934f9123e854d0f0369e49385175460c16a6bfb8b7fcb333d318d372  generated_uncompressible_zstd.stream
   ```
   
   possibly I oops'd when generating the files and didn't notice. I'll have to figure out how to poke the files by hand to confirm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 closed issue #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 closed issue #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`
URL: https://github.com/apache/arrow/issues/34643


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34643:
URL: https://github.com/apache/arrow/issues/34643#issuecomment-1476393541

   I suspect what happened is that file may not have been generated right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34643:
URL: https://github.com/apache/arrow/issues/34643#issuecomment-1476391960

   It was probably when I added new test files to use for that PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on issue #34643: [CI][C++] Nightly tests `test-build-cpp-fuzz` fails with `Duplicate hash`

Posted by "raulcd (via GitHub)" <gi...@apache.org>.
raulcd commented on issue #34643:
URL: https://github.com/apache/arrow/issues/34643#issuecomment-1476410889

   I've never seen the `pack_corpus.py` file before. I see we are cloning the [oss-fuzz](https://github.com/google/oss-fuzz.git) repository [directly on the job](https://github.com/apache/arrow/blob/main/dev/tasks/fuzz-tests/github.oss-fuzz.yml#L30-L55) and using some infra. Maybe we could move that job to our docker-compose as an improvement so it's easier to reproduce locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org