You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "davisusanibar (via GitHub)" <gi...@apache.org> on 2023/04/03 18:18:27 UTC

[GitHub] [arrow] davisusanibar opened a new pull request, #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

davisusanibar opened a new pull request, #34866:
URL: https://github.com/apache/arrow/pull/34866

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Closes: https://github.com/apache/arrow/issues/34862
   ### What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   Add ArrowAcero::arrow_acero_static as part of target library on java/dataset/CMakeLists.txt
   
   ### Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   Tested locally. Need to validate that again with `github-actions crossbow submit java-jars`
   
   ### Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1498180619

   OK!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] davisusanibar commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1497389567

   @kou after apply the patch, on the local build, I'm seeing the following test errors:
   
   ```
   [2/12] Linking CXX executable release/arrow-dataset-dataset-test
   FAILED: release/arrow-dataset-dataset-test 
   Undefined symbols for architecture x86_64:
     "parquet::RowGroupMetaData::~RowGroupMetaData()", referenced from:
         arrow::dataset::ParquetFileFragment::TryCountRows(arrow::compute::Expression) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   ...
   [3/12] Linking CXX executable release/arrow-dataset-dataset-writer-test
   FAILED: release/arrow-dataset-dataset-writer-test 
   Undefined symbols for architecture x86_64:
     "parquet::RowGroupMetaData::~RowGroupMetaData()", referenced from:
         arrow::dataset::ParquetFileFragment::TryCountRows(arrow::compute::Expression) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   ...
   [4/12] Linking CXX executable release/arrow-dataset-file-test
   FAILED: release/arrow-dataset-file-test 
   Undefined symbols for architecture x86_64:
     "parquet::RowGroupMetaData::~RowGroupMetaData()", referenced from:
         arrow::dataset::ParquetFileFragment::TryCountRows(arrow::compute::Expression) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1494772422

   * Closes: #34862


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1494917288

   Revision: 30acb2785c900b9f284cb2c583abee588661c568
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-d9771a281a](https://github.com/ursacomputing/crossbow/branches/all?query=actions-d9771a281a)
   
   |Task|Status|
   |----|------|
   |java-jars|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-d9771a281a-github-java-jars)](https://github.com/ursacomputing/crossbow/actions/runs/4600894975/jobs/8128077210)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1494772480

   :warning: GitHub issue #34862 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] davisusanibar commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1494914052

   @github-actions crossbow submit java-jars


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1498182823

   Revision: 0bb28af382d8c623d980fdd27a57a8f3168b0e8a
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-e02b93654f](https://github.com/ursacomputing/crossbow/branches/all?query=actions-e02b93654f)
   
   |Task|Status|
   |----|------|
   |java-jars|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-e02b93654f-github-java-jars)](https://github.com/ursacomputing/crossbow/actions/runs/4622996758/jobs/8176356110)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1498162292

   Oh, sorry. We need to fix Acero dependencies too.
   Can I push to this branch directly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1498180790

   @github-actions crossbow submit java-jars


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou merged pull request #34866: GH-34862: [C++] Fix ArrowDataset dependencies

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou merged PR #34866:
URL: https://github.com/apache/arrow/pull/34866


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34866: GH-34862: [C++] Fix ArrowDataset dependencies

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1500780819

   Benchmark runs are scheduled for baseline = f7644ae88cafc4b86d7eb294c238f34b35791388 and contender = 4cbaa53f5c8745a54b6b8d6f34973b0342c53093. 4cbaa53f5c8745a54b6b8d6f34973b0342c53093 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/2c780182efd8430196a3a554c994e54a...0d049b7bad484f33a15b10deda1684e4/)
   [Finished :arrow_down:1.99% :arrow_up:0.95%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1e808868ad364539ba727c017932cd45...5c206a2cd639469c982fefe9fb282d62/)
   [Finished :arrow_down:0.77% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8222fe26318c48aa85faacd42bc85e6d...620262abcc1544e4b7c0c5e363aece37/)
   [Finished :arrow_down:0.74% :arrow_up:0.8%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/8709593596c0484287c3d9fd90f3b10d...e2827f23fda845a1818a4fd17acf0d74/)
   Buildkite builds:
   [Finished] [`4cbaa53f` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2652)
   [Finished] [`4cbaa53f` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2685)
   [Finished] [`4cbaa53f` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2650)
   [Finished] [`4cbaa53f` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2676)
   [Finished] [`f7644ae8` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2651)
   [Finished] [`f7644ae8` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2684)
   [Finished] [`f7644ae8` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2649)
   [Finished] [`f7644ae8` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2675)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] davisusanibar commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1498164889

   > Oh, sorry. We need to fix Acero dependencies too. Can I push to this branch directly?
   
   Go ahead, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] davisusanibar commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1497345820

   > > Sure!
   > > I think that we should fix this in C++ side no Java side.
   > > Could you revert the current change and apply the following patch?
   > > ```diff
   > > diff --git a/cpp/src/arrow/dataset/CMakeLists.txt b/cpp/src/arrow/dataset/CMakeLists.txt
   > > index e1b14b77c4..bdc65573b4 100644
   > > --- a/cpp/src/arrow/dataset/CMakeLists.txt
   > > +++ b/cpp/src/arrow/dataset/CMakeLists.txt
   > > @@ -25,6 +25,7 @@ set(ARROW_DATASET_SRCS
   > >      discovery.cc
   > >      file_base.cc
   > >      file_ipc.cc
   > > +    file_parquet.cc
   > >      partition.cc
   > >      plan.cc
   > >      projector.cc
   > > @@ -39,39 +40,26 @@ endif()
   > >  
   > >  set(ARROW_DATASET_STATIC_LINK_LIBS)
   > >  set(ARROW_DATASET_SHARED_LINK_LIBS)
   > > -set(ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS)
   > > -set(ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS)
   > > +set(ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS ArrowAcero::arrow_acero_static)
   > > +set(ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS ArrowAcero::arrow_acero_shared)
   > >  
   > >  if(ARROW_CSV)
   > > -  set(ARROW_DATASET_SRCS ${ARROW_DATASET_SRCS} file_csv.cc)
   > > +  list(APPEND ARROW_DATASET_SRCS file_csv.cc)
   > >  endif()
   > >  
   > >  if(ARROW_JSON)
   > > -  set(ARROW_DATASET_SRCS ${ARROW_DATASET_SRCS} file_json.cc)
   > > +  list(APPEND ARROW_DATASET_SRCS file_json.cc)
   > >  endif()
   > >  
   > >  if(ARROW_ORC)
   > > -  set(ARROW_DATASET_SRCS ${ARROW_DATASET_SRCS} file_orc.cc)
   > > -endif()
   > > -
   > > -if(ARROW_PARQUET)
   > > -  list(APPEND ARROW_DATASET_STATIC_LINK_LIBS parquet_static)
   > > -  list(APPEND ARROW_DATASET_SHARED_LINK_LIBS parquet_shared)
   > > -  list(APPEND ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS Parquet::parquet_static)
   > > -  list(APPEND ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS Parquet::parquet_shared)
   > > -  list(APPEND ARROW_DATASET_SRCS file_parquet.cc)
   > > -  list(APPEND ARROW_DATASET_PRIVATE_INCLUDES ${PROJECT_SOURCE_DIR}/src/parquet)
   > > -else()
   > > -  list(APPEND ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS Arrow::arrow_static)
   > > -  list(APPEND ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS Arrow::arrow_shared)
   > > +  list(APPEND ARROW_DATASET_SRCS file_orc.cc)
   > >  endif()
   > >  
   > >  list(APPEND
   > >       ARROW_DATASET_STATIC_LINK_LIBS
   > > -     arrow_static
   > >       arrow_acero_static
   > >       ${ARROW_STATIC_LINK_LIBS})
   > > -list(APPEND ARROW_DATASET_SHARED_LINK_LIBS arrow_shared arrow_acero_shared)
   > > +list(APPEND ARROW_DATASET_SHARED_LINK_LIBS arrow_acero_shared)
   > >  
   > >  add_arrow_lib(arrow_dataset
   > >                CMAKE_PACKAGE_NAME
   > > ```
   > 
   > @kou I just applied that change on C++, but error persist on JNI Java Dataset Module.
   > 
   > It will be because for example if Java JNI Dataset needs `arrow::dataset::FileSystemDataset::Write` and then it calls `"arrow::acero::Declaration::Sequence`, then will be needed to add `ArrowDataset::arrow_dataset_static` and `ArrowAcero::arrow_acero_static` to Java Dataset CMakeLists .
   > 
   > Error message:
   > 
   > ```
   > + cmake --build . --config release
   > [1/1] Linking CXX shared library dataset/libarrow_dataset_jni.dylib
   > FAILED: dataset/libarrow_dataset_jni.dylib 
   > : && /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -O3 -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.2 -dynamiclib -Wl,-headerpad_max_install_names  -o dataset/libarrow_dataset_jni.dylib -install_name @rpath/libarrow_dataset_jni.dylib dataset/CMakeFiles/arrow_java_jni_dataset.dir/src/main/cpp/jni_wrapper.cc.o dataset/CMakeFiles/arrow_java_jni_dataset.dir/src/main/cpp/jni_util.cc.o  /Users/dsusanibar/voltron/jiraarrow/main3/arrow/cpp-build/cpp-install/lib/libarrow_dataset.a  /Users/dsusanibar/voltron/jiraarrow/main3/arrow/cpp-build/cpp-install/lib/libparquet.a  /Users/dsusanibar/voltron/jiraarrow/main3/arrow/cpp-build/cpp-install/lib/libarrow.a  /Users/dsusanibar/voltron/jiraarrow/main3/arrow/cpp-build/cpp-install/lib/libarrow_bundled_dependencies.a  -Xlinker -framework -Xlinker CoreFoundation  -Xlinker -framework -Xlinker Security  /usr/loc
 al/opt/openssl@1.1/lib/libssl.a  /usr/local/opt/openssl@1.1/lib/libcrypto.a  /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk/usr/lib/libz.tbd  /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk/usr/lib/libcurl.tbd  /usr/local/Cellar/thrift/0.18.1/lib/libthrift.a && :
   > Undefined symbols for architecture x86_64:
   >   "arrow::acero::Declaration::Sequence(std::__1::vector<arrow::acero::Declaration, std::__1::allocator<arrow::acero::Declaration> >)", referenced from:
   >       arrow::dataset::FileSystemDataset::Write(arrow::dataset::FileSystemDatasetWriteOptions const&, std::__1::shared_ptr<arrow::dataset::Scanner>) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::CountRowsAsync(arrow::internal::Executor*) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status ()>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)", referenced from:
   >       arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::WrapResultyOnComplete::Callback<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::ThenOnComplete<arrow::dataset::(anonymous namespace)::ScanNode::ScanBatchTask::operator()()::'lambda'(std::__1::shared_ptr<arrow::RecordBatch> const&), arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::PassthruOnFailure<arrow::dataset::(anonymous namespace)::ScanNode::ScanBatchTask::operator()()::'lambda'(std::__1::shared_ptr<arrow::RecordBatch> const&)> > > >::invoke(arrow::FutureImpl const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::DeclarationToStatus(arrow::acero::Declaration, bool, arrow::MemoryPool*, arrow::compute::FunctionRegistry*)", referenced from:
   >       arrow::dataset::FileSystemDataset::Write(arrow::dataset::FileSystemDatasetWriteOptions const&, std::__1::shared_ptr<arrow::dataset::Scanner>) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::ValidateExecNodeInputs(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> > const&, int, char const*)", referenced from:
   >       arrow::dataset::(anonymous namespace)::TeeNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::ScanNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::DeclarationToBatchesAsync(arrow::acero::Declaration, arrow::compute::ExecContext)", referenced from:
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::CountRowsAsync(arrow::internal::Executor*) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::default_exec_factory_registry()", referenced from:
   >       arrow::dataset::MakeWriteNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >       void std::__1::__call_once_proxy[abi:v15006]<std::__1::tuple<arrow::dataset::internal::Initialize()::$_14&&> >(void*) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::MakeScanNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::MakeOrderedSinkNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::MakeAugmentedProjectNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::MapNode::InputFinished(arrow::acero::ExecNode*, int)", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::MapNode::InputReceived(arrow::acero::ExecNode*, arrow::compute::ExecBatch)", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::MapNode::PauseProducing(arrow::acero::ExecNode*, int)", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::MapNode::StartProducing()", referenced from:
   >       arrow::dataset::(anonymous namespace)::TeeNode::StartProducing() in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::MapNode::ResumeProducing(arrow::acero::ExecNode*, int)", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::MapNode::StopProducingImpl()", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::MapNode::Finish()", referenced from:
   >       std::__1::__function::__func<arrow::dataset::(anonymous namespace)::TeeNode::StartProducing()::'lambda1'(), std::__1::allocator<arrow::dataset::(anonymous namespace)::TeeNode::StartProducing()::'lambda1'()>, void ()>::operator()() in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::MapNode::MapNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, std::__1::shared_ptr<arrow::Schema>)", referenced from:
   >       arrow::dataset::(anonymous namespace)::TeeNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::ExecNode::Init()", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::ExecNode::ExecNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, std::__1::shared_ptr<arrow::Schema>)", referenced from:
   >       arrow::dataset::(anonymous namespace)::ScanNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::ExecPlan::StopProducing()", referenced from:
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool)::$_16::operator()(...) const in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::ExecPlan::query_context()", referenced from:
   >       arrow::dataset::(anonymous namespace)::TeeNode::StartProducing() in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::DatasetWritingSinkNodeConsumer::Init(std::__1::shared_ptr<arrow::Schema> const&, arrow::acero::BackpressureControl*, arrow::acero::ExecPlan*) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::ScanNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::ScanNode::StartProducing() in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> >::WrapResultyOnComplete::Callback<arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> >::ThenOnComplete<arrow::dataset::(anonymous namespace)::ScanNode::StartProducing()::'lambda'()::operator()() const::'lambda'(std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> const&), arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> >::PassthruOnFailure<arrow::dataset::(anonymous namespace)::ScanNode::StartProducing()::'lambda'()::operator()() const::'lambda'(std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> const&)> > > >::invoke(arrow::FutureImpl const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::ScanNode::ListFragmentTask::operator()() in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::__1::shared_ptr<arrow::dataset::InspectedFragment> >::WrapResultyOnComplete::Callback<arrow::Future<std::__1::shared_ptr<arrow::dataset::InspectedFragment> >::ThenOnComplete<arrow::dataset::(anonymous namespace)::ScanNode::ListFragmentTask::operator()()::'lambda'(std::__1::shared_ptr<arrow::dataset::InspectedFragment> const&), arrow::Future<std::__1::shared_ptr<arrow::dataset::InspectedFragment> >::PassthruOnFailure<arrow::dataset::(anonymous namespace)::ScanNode::ListFragmentTask::operator()()::'lambda'(std::__1::shared_ptr<arrow::dataset::InspectedFragment> const&)> > > >::invoke(arrow::FutureImpl const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       ...
   >   "arrow::acero::ExecPlan::StartProducing()", referenced from:
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::ExecPlan::Make(arrow::acero::QueryOptions, arrow::compute::ExecContext, std::__1::shared_ptr<arrow::KeyValueMetadata const>)", referenced from:
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::ExecPlan::Make(arrow::compute::ExecContext, std::__1::shared_ptr<arrow::KeyValueMetadata const>)", referenced from:
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::CountRowsAsync(arrow::internal::Executor*) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::ExecPlan::AddNode(std::__1::unique_ptr<arrow::acero::ExecNode, std::__1::default_delete<arrow::acero::ExecNode> >)", referenced from:
   >       arrow::dataset::(anonymous namespace)::TeeNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >       arrow::dataset::(anonymous namespace)::ScanNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::ExecPlan::finished()", referenced from:
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool)::$_16::operator()(...) const in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >       arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<arrow::dataset::EnumeratedRecordBatch>::WrapResultyOnComplete::Callback<arrow::Future<arrow::dataset::EnumeratedRecordBatch>::ThenOnComplete<arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool)::$_18::operator()() const::'lambda'(arrow::dataset::EnumeratedRecordBatch const&), arrow::Future<arrow::dataset::EnumeratedRecordBatch>::PassthruOnFailure<arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool)::$_18::operator()() const::'lambda'(arrow::dataset::EnumeratedRecordBatch const&)> > > >::invoke(arrow::FutureImpl const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::TracedNode::NoteStartProducing(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) const", referenced from:
   >       arrow::dataset::(anonymous namespace)::ScanNode::StartProducing() in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::Declaration::AddToPlan(arrow::acero::ExecPlan*, arrow::acero::ExecFactoryRegistry*) const", referenced from:
   >       arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::MapNode::ordering() const", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "arrow::acero::ExecNode::ToStringExtra(int) const", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::ScanNode in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::ExecNode::Validate() const", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >       vtable for arrow::dataset::(anonymous namespace)::ScanNode in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "arrow::acero::ExecNode::ordering() const", referenced from:
   >       vtable for arrow::dataset::(anonymous namespace)::ScanNode in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "typeinfo for arrow::acero::MapNode", referenced from:
   >       typeinfo for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >   "typeinfo for arrow::acero::ExecNode", referenced from:
   >       typeinfo for arrow::dataset::(anonymous namespace)::ScanNode in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   "vtable for arrow::acero::ExecNode", referenced from:
   >       arrow::acero::ExecNode::~ExecNode() in libarrow_dataset.a(unity_0_cxx.cxx.o)
   >       arrow::acero::ExecNode::~ExecNode() in libarrow_dataset.a(unity_1_cxx.cxx.o)
   >   NOTE: a missing vtable usually means the first non-inline virtual member function has no definition.
   > ld: symbol(s) not found for architecture x86_64
   > clang: error: linker command failed with exit code 1 (use -v to see invocation)
   > ninja: build stopped: subcommand failed.
   > ```
   
   Please ignore my last comment @kou , let me continue building and testing to update the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1497024728

   Sure!
   
   I think that we should fix this in C++ side no Java side.
   
   Could you revert the current change and apply the following patch?
   
   ```diff
   diff --git a/cpp/src/arrow/dataset/CMakeLists.txt b/cpp/src/arrow/dataset/CMakeLists.txt
   index e1b14b77c4..bdc65573b4 100644
   --- a/cpp/src/arrow/dataset/CMakeLists.txt
   +++ b/cpp/src/arrow/dataset/CMakeLists.txt
   @@ -25,6 +25,7 @@ set(ARROW_DATASET_SRCS
        discovery.cc
        file_base.cc
        file_ipc.cc
   +    file_parquet.cc
        partition.cc
        plan.cc
        projector.cc
   @@ -39,39 +40,26 @@ endif()
    
    set(ARROW_DATASET_STATIC_LINK_LIBS)
    set(ARROW_DATASET_SHARED_LINK_LIBS)
   -set(ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS)
   -set(ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS)
   +set(ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS ArrowAcero::arrow_acero_static)
   +set(ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS ArrowAcero::arrow_acero_shared)
    
    if(ARROW_CSV)
   -  set(ARROW_DATASET_SRCS ${ARROW_DATASET_SRCS} file_csv.cc)
   +  list(APPEND ARROW_DATASET_SRCS file_csv.cc)
    endif()
    
    if(ARROW_JSON)
   -  set(ARROW_DATASET_SRCS ${ARROW_DATASET_SRCS} file_json.cc)
   +  list(APPEND ARROW_DATASET_SRCS file_json.cc)
    endif()
    
    if(ARROW_ORC)
   -  set(ARROW_DATASET_SRCS ${ARROW_DATASET_SRCS} file_orc.cc)
   -endif()
   -
   -if(ARROW_PARQUET)
   -  list(APPEND ARROW_DATASET_STATIC_LINK_LIBS parquet_static)
   -  list(APPEND ARROW_DATASET_SHARED_LINK_LIBS parquet_shared)
   -  list(APPEND ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS Parquet::parquet_static)
   -  list(APPEND ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS Parquet::parquet_shared)
   -  list(APPEND ARROW_DATASET_SRCS file_parquet.cc)
   -  list(APPEND ARROW_DATASET_PRIVATE_INCLUDES ${PROJECT_SOURCE_DIR}/src/parquet)
   -else()
   -  list(APPEND ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS Arrow::arrow_static)
   -  list(APPEND ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS Arrow::arrow_shared)
   +  list(APPEND ARROW_DATASET_SRCS file_orc.cc)
    endif()
    
    list(APPEND
         ARROW_DATASET_STATIC_LINK_LIBS
   -     arrow_static
         arrow_acero_static
         ${ARROW_STATIC_LINK_LIBS})
   -list(APPEND ARROW_DATASET_SHARED_LINK_LIBS arrow_shared arrow_acero_shared)
   +list(APPEND ARROW_DATASET_SHARED_LINK_LIBS arrow_acero_shared)
    
    add_arrow_lib(arrow_dataset
                  CMAKE_PACKAGE_NAME
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] davisusanibar commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1497280576

   > Sure!
   > 
   > I think that we should fix this in C++ side no Java side.
   > 
   > Could you revert the current change and apply the following patch?
   > 
   > ```diff
   > diff --git a/cpp/src/arrow/dataset/CMakeLists.txt b/cpp/src/arrow/dataset/CMakeLists.txt
   > index e1b14b77c4..bdc65573b4 100644
   > --- a/cpp/src/arrow/dataset/CMakeLists.txt
   > +++ b/cpp/src/arrow/dataset/CMakeLists.txt
   > @@ -25,6 +25,7 @@ set(ARROW_DATASET_SRCS
   >      discovery.cc
   >      file_base.cc
   >      file_ipc.cc
   > +    file_parquet.cc
   >      partition.cc
   >      plan.cc
   >      projector.cc
   > @@ -39,39 +40,26 @@ endif()
   >  
   >  set(ARROW_DATASET_STATIC_LINK_LIBS)
   >  set(ARROW_DATASET_SHARED_LINK_LIBS)
   > -set(ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS)
   > -set(ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS)
   > +set(ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS ArrowAcero::arrow_acero_static)
   > +set(ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS ArrowAcero::arrow_acero_shared)
   >  
   >  if(ARROW_CSV)
   > -  set(ARROW_DATASET_SRCS ${ARROW_DATASET_SRCS} file_csv.cc)
   > +  list(APPEND ARROW_DATASET_SRCS file_csv.cc)
   >  endif()
   >  
   >  if(ARROW_JSON)
   > -  set(ARROW_DATASET_SRCS ${ARROW_DATASET_SRCS} file_json.cc)
   > +  list(APPEND ARROW_DATASET_SRCS file_json.cc)
   >  endif()
   >  
   >  if(ARROW_ORC)
   > -  set(ARROW_DATASET_SRCS ${ARROW_DATASET_SRCS} file_orc.cc)
   > -endif()
   > -
   > -if(ARROW_PARQUET)
   > -  list(APPEND ARROW_DATASET_STATIC_LINK_LIBS parquet_static)
   > -  list(APPEND ARROW_DATASET_SHARED_LINK_LIBS parquet_shared)
   > -  list(APPEND ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS Parquet::parquet_static)
   > -  list(APPEND ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS Parquet::parquet_shared)
   > -  list(APPEND ARROW_DATASET_SRCS file_parquet.cc)
   > -  list(APPEND ARROW_DATASET_PRIVATE_INCLUDES ${PROJECT_SOURCE_DIR}/src/parquet)
   > -else()
   > -  list(APPEND ARROW_DATASET_STATIC_INSTALL_INTERFACE_LIBS Arrow::arrow_static)
   > -  list(APPEND ARROW_DATASET_SHARED_INSTALL_INTERFACE_LIBS Arrow::arrow_shared)
   > +  list(APPEND ARROW_DATASET_SRCS file_orc.cc)
   >  endif()
   >  
   >  list(APPEND
   >       ARROW_DATASET_STATIC_LINK_LIBS
   > -     arrow_static
   >       arrow_acero_static
   >       ${ARROW_STATIC_LINK_LIBS})
   > -list(APPEND ARROW_DATASET_SHARED_LINK_LIBS arrow_shared arrow_acero_shared)
   > +list(APPEND ARROW_DATASET_SHARED_LINK_LIBS arrow_acero_shared)
   >  
   >  add_arrow_lib(arrow_dataset
   >                CMAKE_PACKAGE_NAME
   > ```
   
   @kou  I just applied that change on C++, but error persist on JNI Java Dataset Module. 
   
   It will be because for example if Java JNI Dataset needs `arrow::dataset::FileSystemDataset::Write` and then it calls `"arrow::acero::Declaration::Sequence`, then will be needed to add `ArrowDataset::arrow_dataset_static` and `ArrowAcero::arrow_acero_static` to Java Dataset CMakeLists .
   
   
   Error message:
   
   ```
   + cmake --build . --config release
   [1/1] Linking CXX shared library dataset/libarrow_dataset_jni.dylib
   FAILED: dataset/libarrow_dataset_jni.dylib 
   : && /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -O3 -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.2 -dynamiclib -Wl,-headerpad_max_install_names  -o dataset/libarrow_dataset_jni.dylib -install_name @rpath/libarrow_dataset_jni.dylib dataset/CMakeFiles/arrow_java_jni_dataset.dir/src/main/cpp/jni_wrapper.cc.o dataset/CMakeFiles/arrow_java_jni_dataset.dir/src/main/cpp/jni_util.cc.o  /Users/dsusanibar/voltron/jiraarrow/main3/arrow/cpp-build/cpp-install/lib/libarrow_dataset.a  /Users/dsusanibar/voltron/jiraarrow/main3/arrow/cpp-build/cpp-install/lib/libparquet.a  /Users/dsusanibar/voltron/jiraarrow/main3/arrow/cpp-build/cpp-install/lib/libarrow.a  /Users/dsusanibar/voltron/jiraarrow/main3/arrow/cpp-build/cpp-install/lib/libarrow_bundled_dependencies.a  -Xlinker -framework -Xlinker CoreFoundation  -Xlinker -framework -Xlinker Security  /usr/local
 /opt/openssl@1.1/lib/libssl.a  /usr/local/opt/openssl@1.1/lib/libcrypto.a  /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk/usr/lib/libz.tbd  /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk/usr/lib/libcurl.tbd  /usr/local/Cellar/thrift/0.18.1/lib/libthrift.a && :
   Undefined symbols for architecture x86_64:
     "arrow::acero::Declaration::Sequence(std::__1::vector<arrow::acero::Declaration, std::__1::allocator<arrow::acero::Declaration> >)", referenced from:
         arrow::dataset::FileSystemDataset::Write(arrow::dataset::FileSystemDatasetWriteOptions const&, std::__1::shared_ptr<arrow::dataset::Scanner>) in libarrow_dataset.a(unity_0_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::AsyncScanner::CountRowsAsync(arrow::internal::Executor*) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::QueryContext::ScheduleTask(std::__1::function<arrow::Status ()>, std::__1::basic_string_view<char, std::__1::char_traits<char> >)", referenced from:
         arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::WrapResultyOnComplete::Callback<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::ThenOnComplete<arrow::dataset::(anonymous namespace)::ScanNode::ScanBatchTask::operator()()::'lambda'(std::__1::shared_ptr<arrow::RecordBatch> const&), arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::PassthruOnFailure<arrow::dataset::(anonymous namespace)::ScanNode::ScanBatchTask::operator()()::'lambda'(std::__1::shared_ptr<arrow::RecordBatch> const&)> > > >::invoke(arrow::FutureImpl const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::DeclarationToStatus(arrow::acero::Declaration, bool, arrow::MemoryPool*, arrow::compute::FunctionRegistry*)", referenced from:
         arrow::dataset::FileSystemDataset::Write(arrow::dataset::FileSystemDatasetWriteOptions const&, std::__1::shared_ptr<arrow::dataset::Scanner>) in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::ValidateExecNodeInputs(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> > const&, int, char const*)", referenced from:
         arrow::dataset::(anonymous namespace)::TeeNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_0_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::ScanNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::DeclarationToBatchesAsync(arrow::acero::Declaration, arrow::compute::ExecContext)", referenced from:
         arrow::dataset::(anonymous namespace)::AsyncScanner::CountRowsAsync(arrow::internal::Executor*) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::default_exec_factory_registry()", referenced from:
         arrow::dataset::MakeWriteNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_0_cxx.cxx.o)
         void std::__1::__call_once_proxy[abi:v15006]<std::__1::tuple<arrow::dataset::internal::Initialize()::$_14&&> >(void*) in libarrow_dataset.a(unity_0_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::MakeScanNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::MakeOrderedSinkNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::MakeAugmentedProjectNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::MapNode::InputFinished(arrow::acero::ExecNode*, int)", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::MapNode::InputReceived(arrow::acero::ExecNode*, arrow::compute::ExecBatch)", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::MapNode::PauseProducing(arrow::acero::ExecNode*, int)", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::MapNode::StartProducing()", referenced from:
         arrow::dataset::(anonymous namespace)::TeeNode::StartProducing() in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::MapNode::ResumeProducing(arrow::acero::ExecNode*, int)", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::MapNode::StopProducingImpl()", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::MapNode::Finish()", referenced from:
         std::__1::__function::__func<arrow::dataset::(anonymous namespace)::TeeNode::StartProducing()::'lambda1'(), std::__1::allocator<arrow::dataset::(anonymous namespace)::TeeNode::StartProducing()::'lambda1'()>, void ()>::operator()() in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::MapNode::MapNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, std::__1::shared_ptr<arrow::Schema>)", referenced from:
         arrow::dataset::(anonymous namespace)::TeeNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::ExecNode::Init()", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::ExecNode::ExecNode(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, std::__1::shared_ptr<arrow::Schema>)", referenced from:
         arrow::dataset::(anonymous namespace)::ScanNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::ExecPlan::StopProducing()", referenced from:
         arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool)::$_16::operator()(...) const in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::ExecPlan::query_context()", referenced from:
         arrow::dataset::(anonymous namespace)::TeeNode::StartProducing() in libarrow_dataset.a(unity_0_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::DatasetWritingSinkNodeConsumer::Init(std::__1::shared_ptr<arrow::Schema> const&, arrow::acero::BackpressureControl*, arrow::acero::ExecPlan*) in libarrow_dataset.a(unity_0_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::ScanNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::ScanNode::StartProducing() in libarrow_dataset.a(unity_1_cxx.cxx.o)
         arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> >::WrapResultyOnComplete::Callback<arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> >::ThenOnComplete<arrow::dataset::(anonymous namespace)::ScanNode::StartProducing()::'lambda'()::operator()() const::'lambda'(std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> const&), arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> >::PassthruOnFailure<arrow::dataset::(anonymous namespace)::ScanNode::StartProducing()::'lambda'()::operator()() const::'lambda'(std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::dataset::Fragment> > ()> const&)> > > >::invoke(arrow::FutureImpl const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::ScanNode::ListFragmentTask::operator()() in libarrow_dataset.a(unity_1_cxx.cxx.o)
         arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::__1::shared_ptr<arrow::dataset::InspectedFragment> >::WrapResultyOnComplete::Callback<arrow::Future<std::__1::shared_ptr<arrow::dataset::InspectedFragment> >::ThenOnComplete<arrow::dataset::(anonymous namespace)::ScanNode::ListFragmentTask::operator()()::'lambda'(std::__1::shared_ptr<arrow::dataset::InspectedFragment> const&), arrow::Future<std::__1::shared_ptr<arrow::dataset::InspectedFragment> >::PassthruOnFailure<arrow::dataset::(anonymous namespace)::ScanNode::ListFragmentTask::operator()()::'lambda'(std::__1::shared_ptr<arrow::dataset::InspectedFragment> const&)> > > >::invoke(arrow::FutureImpl const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
         ...
     "arrow::acero::ExecPlan::StartProducing()", referenced from:
         arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::ExecPlan::Make(arrow::acero::QueryOptions, arrow::compute::ExecContext, std::__1::shared_ptr<arrow::KeyValueMetadata const>)", referenced from:
         arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::ExecPlan::Make(arrow::compute::ExecContext, std::__1::shared_ptr<arrow::KeyValueMetadata const>)", referenced from:
         arrow::dataset::(anonymous namespace)::AsyncScanner::CountRowsAsync(arrow::internal::Executor*) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::ExecPlan::AddNode(std::__1::unique_ptr<arrow::acero::ExecNode, std::__1::default_delete<arrow::acero::ExecNode> >)", referenced from:
         arrow::dataset::(anonymous namespace)::TeeNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_0_cxx.cxx.o)
         arrow::dataset::(anonymous namespace)::ScanNode::Make(arrow::acero::ExecPlan*, std::__1::vector<arrow::acero::ExecNode*, std::__1::allocator<arrow::acero::ExecNode*> >, arrow::acero::ExecNodeOptions const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::ExecPlan::finished()", referenced from:
         arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool)::$_16::operator()(...) const in libarrow_dataset.a(unity_1_cxx.cxx.o)
         arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<arrow::dataset::EnumeratedRecordBatch>::WrapResultyOnComplete::Callback<arrow::Future<arrow::dataset::EnumeratedRecordBatch>::ThenOnComplete<arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool)::$_18::operator()() const::'lambda'(arrow::dataset::EnumeratedRecordBatch const&), arrow::Future<arrow::dataset::EnumeratedRecordBatch>::PassthruOnFailure<arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool)::$_18::operator()() const::'lambda'(arrow::dataset::EnumeratedRecordBatch const&)> > > >::invoke(arrow::FutureImpl const&) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::TracedNode::NoteStartProducing(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) const", referenced from:
         arrow::dataset::(anonymous namespace)::ScanNode::StartProducing() in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::Declaration::AddToPlan(arrow::acero::ExecPlan*, arrow::acero::ExecFactoryRegistry*) const", referenced from:
         arrow::dataset::(anonymous namespace)::AsyncScanner::ScanBatchesUnorderedAsync(arrow::internal::Executor*, bool, bool) in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::MapNode::ordering() const", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "arrow::acero::ExecNode::ToStringExtra(int) const", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::ScanNode in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::ExecNode::Validate() const", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
         vtable for arrow::dataset::(anonymous namespace)::ScanNode in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "arrow::acero::ExecNode::ordering() const", referenced from:
         vtable for arrow::dataset::(anonymous namespace)::ScanNode in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "typeinfo for arrow::acero::MapNode", referenced from:
         typeinfo for arrow::dataset::(anonymous namespace)::TeeNode in libarrow_dataset.a(unity_0_cxx.cxx.o)
     "typeinfo for arrow::acero::ExecNode", referenced from:
         typeinfo for arrow::dataset::(anonymous namespace)::ScanNode in libarrow_dataset.a(unity_1_cxx.cxx.o)
     "vtable for arrow::acero::ExecNode", referenced from:
         arrow::acero::ExecNode::~ExecNode() in libarrow_dataset.a(unity_0_cxx.cxx.o)
         arrow::acero::ExecNode::~ExecNode() in libarrow_dataset.a(unity_1_cxx.cxx.o)
     NOTE: a missing vtable usually means the first non-inline virtual member function has no definition.
   ld: symbol(s) not found for architecture x86_64
   clang: error: linker command failed with exit code 1 (use -v to see invocation)
   ninja: build stopped: subcommand failed.
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] davisusanibar commented on pull request #34866: GH-34862: [Java] Adding ArrowAcero for Dataset module dependency

Posted by "davisusanibar (via GitHub)" <gi...@apache.org>.
davisusanibar commented on PR #34866:
URL: https://github.com/apache/arrow/pull/34866#issuecomment-1496486790

   Hi @kou could you help me on this PR?
   
   The Java Dataset module needs `ArrowAcero::arrow_acero` to work with the new library mentioned in the email https://lists.apache.org/thread/5h5g9k9lvbybzl8fnbg4fppxczm42g6r.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org