You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/06/30 04:06:26 UTC

[GitHub] [arrow] wgtmac opened a new pull request, #36406: GH-36405: [C++][ORC] Upgrade ORC to 1.9.0

wgtmac opened a new pull request, #36406:
URL: https://github.com/apache/arrow/pull/36406

   ### Rationale for this change
   
   Apache ORC has released 1.9.0 recently: https://orc.apache.org/news/2023/06/28/ORC-1.9.0/
   
   The code base does not compile if we upgrade directly due to a new API below:
   ```cpp
       virtual std::unique_ptr<ColumnVectorBatch> createRowBatch(
           uint64_t size, MemoryPool& pool,
           bool encoded = false) const = 0;
   
       virtual std::unique_ptr<ColumnVectorBatch> createRowBatch(
           uint64_t size, MemoryPool& pool, bool encoded = false,
           bool useTightNumericVector = false) const = 0;
   
   ```
   
   ### What changes are included in this PR?
   
   Explicitly specify which overload of `createRowBatch` to use in the orc test.
   
   ### Are these changes tested?
   
   Yes, make sure all tests build and pass.
   
   ### Are there any user-facing changes?
   
   NO.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #36406: GH-36405: [C++][ORC] Upgrade ORC to 1.9.0

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #36406:
URL: https://github.com/apache/arrow/pull/36406#issuecomment-1614164862

   I just workaround this by creating a writer to create the batch instead of directly calling on the type object. I think this is fine as it is only used in the test. @kou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] dongjoon-hyun commented on pull request #36406: GH-36405: [C++][ORC] Upgrade ORC to 1.9.0

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #36406:
URL: https://github.com/apache/arrow/pull/36406#issuecomment-1615309021

   Thank you so much, @wgtmac and @kou .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #36406: GH-36405: [C++][ORC] Upgrade ORC to 1.9.0

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #36406:
URL: https://github.com/apache/arrow/pull/36406#issuecomment-1614090423

   @kou Please take a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #36406: GH-36405: [C++][ORC] Upgrade ORC to 1.9.0

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #36406:
URL: https://github.com/apache/arrow/pull/36406#issuecomment-1614133616

   Unfortunately there isn't any approach to work for both 1.9.0 and 1.8.4 except disabling this test temporarily.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36406: GH-36405: [C++][ORC] Upgrade ORC to 1.9.0

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36406:
URL: https://github.com/apache/arrow/pull/36406#issuecomment-1622795105

   Conbench analyzed the 6 benchmark runs on commit `377811f4`.
   
   There were 8 benchmark results indicating a performance regression:
   
   - Commit Run on `arm64-m6g-linux-compute` at [2023-06-30 08:38:56Z](http://conbench.ursa.dev/compare/runs/ebe2bb01601d46a0a5b32e78cfeeca4b...1effce7f707d4a38b6f91f4bd4a18230/)
     - [params=<SMALL_VECTOR(std::string)>, source=cpp-micro, suite=arrow-small-vector-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649e4e6c0da79bb800014b7de0b72c5...0649e94eddc37d468000e439c3f7b25a)
     - [params=<SMALL_VECTOR(std::string)>, source=cpp-micro, suite=arrow-small-vector-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649e4e6ad6e78038000305234d88c3d...0649e94ecbe57fe48000d3bf99f425bb)
   - and 6 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14812686338) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #36406: GH-36405: [C++][ORC] Upgrade ORC to 1.9.0

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #36406:
URL: https://github.com/apache/arrow/pull/36406#issuecomment-1614157792

   Oh... I found the `ORC_VERSION` macro but we can't use it for this case because it's a string...
   
   We can detect ORC version by CMake and export it to C++ but it'll be overkill for this.
   
   Could you disable the test for now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou merged pull request #36406: GH-36405: [C++][ORC] Upgrade ORC to 1.9.0

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou merged PR #36406:
URL: https://github.com/apache/arrow/pull/36406


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #36406: GH-36405: [C++][ORC] Upgrade ORC to 1.9.0

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #36406:
URL: https://github.com/apache/arrow/pull/36406#issuecomment-1614129087

   Thanks!
   
   Can we write a code that works with both of ORC 1.9.0 and 1.8.4?
   
   Our CI job that still uses ORC 1.8.4 failed with this change:
   
   https://github.com/apache/arrow/actions/runs/5419179225/jobs/9852016855?pr=36406#step:6:2464
   
   ```text
   FAILED: src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o 
   /opt/conda/envs/arrow/bin/ccache /opt/conda/envs/arrow/bin/x86_64-conda-linux-gnu-c++ -DARROW_EXTRA_ERROR_CONTEXT -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS -DARROW_MIMALLOC -DARROW_S3_HAS_CRT -DARROW_WITH_BENCHMARKS_REFERENCE -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DAWS_AUTH_USE_IMPORT_EXPORT -DAWS_CAL_USE_IMPORT_EXPORT -DAWS_CHECKSUMS_USE_IMPORT_EXPORT -DAWS_COMMON_USE_IMPORT_EXPORT -DAWS_COMPRESSION_USE_IMPORT_EXPORT -DAWS_CRT_CPP_USE_IMPORT_EXPORT -DAWS_EVENT_STREAM_USE_IMPORT_EXPORT -DAWS_HTTP_USE_IMPORT_EXPORT -DAWS_IO_USE_IMPORT_EXPORT -DAWS_MQTT_USE_IMPORT_EXPORT -DAWS_MQTT_WITH_WEBSOCKETS -DAWS_S3_USE_IMPORT_EXPORT -DAWS_SDKUTILS_USE_IMPORT_EXPORT -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=10 -DAWS_SDK_VERSION_PATCH=13 -DAWS_USE_EPOLL -DGTEST_LINKED_AS_SHARED_L
 IBRARY=1 -DURI_STATIC_BUILD -I/build/cpp/src -I/arrow/cpp/src -I/arrow/cpp/src/generated -isystem /arrow/cpp/thirdparty/flatbuffers/include -isystem /arrow/cpp/thirdparty/hadoop/include -isystem /build/cpp/jemalloc_ep-prefix/src -isystem /build/cpp/mimalloc_ep/src/mimalloc_ep/include/mimalloc-2.0 -isystem /build/cpp/googletest_ep-prefix/include -Wno-noexcept-type -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/conda/envs/arrow/include -fdiagnostics-color=always -fuse-ld=gold  -Wall -Wno-conversion -Wno-sign-conversion -Wunused-result -Wdate-time -fno-semantic-interposition -msse4.2  -g -Werror -O0 -ggdb -g1 -std=c++17 -fPIE -pthread -DS2N_KYBER512R3_AVX2_BMI2 -DS2N_STACKTRACE -DS2N_CPUID_AVAILABLE -DS2N_FEATURES_AVAILABLE -fPIC -DS2N_FALL_THROUGH_SUPPORTED -DS2N___RESTRICT__SUPPORTED -DS2N_MADVISE_SUPPORTED -DS2N_CLONE_SUPPORTED -DS2N_LIBCRYPTO_SUPPORTS_EV
 P_MD5_SHA1_HASH -DS2N_LIBCRYPTO_SUPPORTS_EVP_RC4 -DS2N_LIBCRYPTO_SUPPORTS_EVP_MD_CTX_SET_PKEY_CTX -MD -MT src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o -MF src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o.d -o src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o -c /arrow/cpp/src/arrow/adapters/orc/adapter_test.cc
   /arrow/cpp/src/arrow/adapters/orc/adapter_test.cc: In function 'void arrow::{anonymous}::TestUnionConversion(std::shared_ptr<arrow::Array>)':
   /arrow/cpp/src/arrow/adapters/orc/adapter_test.cc:1046:31: error: no matching function for call to 'orc::Type::createRowBatch(int64_t, orc::MemoryPool&, bool, bool)'
    1046 |       orc_type->createRowBatch(array->length(), *liborc::getDefaultPool(),
         |       ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1047 |                                /*encoded=*/false, /*useTightNumericVector=*/false);
         |                                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from /opt/conda/envs/arrow/include/orc/Common.hh:23,
                    from /opt/conda/envs/arrow/include/orc/Reader.hh:23,
                    from /opt/conda/envs/arrow/include/orc/OrcFile.hh:25,
                    from /arrow/cpp/src/arrow/adapters/orc/adapter_test.cc:21:
   /opt/conda/envs/arrow/include/orc/Type.hh:73:47: note: candidate: 'virtual std::unique_ptr<orc::ColumnVectorBatch> orc::Type::createRowBatch(uint64_t, orc::MemoryPool&, bool) const'
      73 |     virtual ORC_UNIQUE_PTR<ColumnVectorBatch> createRowBatch(uint64_t size,
         |                                               ^~~~~~~~~~~~~~
   /opt/conda/envs/arrow/include/orc/Type.hh:73:47: note:   candidate expects 3 arguments, 4 provided
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org