You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "wzh241215 (via GitHub)" <gi...@apache.org> on 2023/07/02 07:36:24 UTC

[GitHub] [arrow] wzh241215 opened a new issue, #36432: we use ThreadPool to read hdfsfile, when the std::thread finished, it call the deadlock!

wzh241215 opened a new issue, #36432:
URL: https://github.com/apache/arrow/issues/36432

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   we use ThreadPool to read hdfsfile, when the std::thread finishes task,it is destroyed and  the hdfsThreadDestructor could be called.
   ![image](https://github.com/apache/arrow/assets/62052325/e4b246ad-5e73-4f0a-960e-23352beb73d5)
   In the  hdfsThreadDestructor , it will call (*env)->GetJavaVM(env, &vm) and it couldn't get the vm which is condition wait, so the program is not finished.
   ![image](https://github.com/apache/arrow/assets/62052325/e46e3383-ca70-4160-b964-27c50a431c97)
   
   arrow version: apache-arrow-11.0.0
   hadoop version: branch-3.2.0
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #36432: [C++] We use ThreadPool to read hdfsfile, when the std::thread finished, it call the deadlock!

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #36432:
URL: https://github.com/apache/arrow/issues/36432#issuecomment-1621764983

   HDFS File uses libhdfs, which wraps the JNI. It might lock when user issue a read, is this related?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wzh241215 commented on issue #36432: [C++] We use ThreadPool to read hdfsfile, when the std::thread finished, it call the deadlock!

Posted by "wzh241215 (via GitHub)" <gi...@apache.org>.
wzh241215 commented on issue #36432:
URL: https://github.com/apache/arrow/issues/36432#issuecomment-1639189201

   > What is `hdfsThreadDestructor`?
   hdfsThreadDestructor is  when the hdfs thread destroy ,  it will release some jvm source, the code like that 
   ![image](https://github.com/apache/arrow/assets/62052325/35a76b1e-a03c-4500-af88-91dc90dad22d)
   > Are you asking if we can use https://github.com/haohui/libhdfspp instead of our current hdfs implementation?
   yes,we think the hdfs implementation in java is inefficiently, so we want to use the libhdfspp
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wzh241215 commented on issue #36432: [C++] We use ThreadPool to read hdfsfile, when the std::thread finished, it call the deadlock!

Posted by "wzh241215 (via GitHub)" <gi...@apache.org>.
wzh241215 commented on issue #36432:
URL: https://github.com/apache/arrow/issues/36432#issuecomment-1626904441

   > HDFS File uses libhdfs, which wraps the JNI. It might lock when user issue a read, is this related?
   
   it seems right,  we catch the stack when deadlock.
   In the ORC,  it  use the libhdfspp all files in ORC project.Is there any way in ARROW,  we could not use the  wraps of the JNI,  just like the compile options or  others?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on issue #36432: [C++] We use ThreadPool to read hdfsfile, when the std::thread finished, it call the deadlock!

Posted by "raulcd (via GitHub)" <gi...@apache.org>.
raulcd commented on issue #36432:
URL: https://github.com/apache/arrow/issues/36432#issuecomment-1621721322

   cc @westonpace maybe?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #36432: [C++] We use ThreadPool to read hdfsfile, when the std::thread finished, it call the deadlock!

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #36432:
URL: https://github.com/apache/arrow/issues/36432#issuecomment-1631652431

   What is `hdfsThreadDestructor`?
   
   >  In the ORC, it use the libhdfspp all files in ORC project.Is there any way in ARROW, we could not use the wraps of the JNI, just like the compile options or others?
   
   Are you asking if we can use https://github.com/haohui/libhdfspp instead of our current hdfs implementation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #36432: [C++] We use ThreadPool to read hdfsfile, when the std::thread finished, it call the deadlock!

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #36432:
URL: https://github.com/apache/arrow/issues/36432#issuecomment-1639242075

   `libhdfspp` doesn't use lock, however, it doesn't have a `pread`, so maybe that would be problem when you `pread`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #36432: [C++] We use ThreadPool to read hdfsfile, when the std::thread finished, it call the deadlock!

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #36432:
URL: https://github.com/apache/arrow/issues/36432#issuecomment-1641167339

   > yes,we think the hdfs implementation in java is inefficiently, so we want to use the libhdfspp
   
   It should be possible to create a new filesystem. Instead of changing HdfsFileSystem you can create HdfsppFileSystem.
   
   > libhdfspp doesn't use lock, however, it doesn't have a pread, so maybe that would be problem when you pread.
   
   Yes, if the library does not have pread then we have to do a seek followed by a read for `ReadAt`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org