You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "phalexo (via GitHub)" <gi...@apache.org> on 2023/06/01 17:42:58 UTC

[GitHub] [arrow] phalexo opened a new issue, #35876: Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

phalexo opened a new issue, #35876:
URL: https://github.com/apache/arrow/issues/35876

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit
   Segmentation fault (core dumped)
   
   ### Component(s)
   
   C, C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] phalexo commented on issue #35876: Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "phalexo (via GitHub)" <gi...@apache.org>.
phalexo commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1574858207

   I rebuilt arrow and pyarrow from sources, and the problem still persists. It is not clear in s3fs.cc where arrow::fs::FinalizeS3 should be called. The experiment to call it just before AWS connection termination did NOT get rid of the core dumping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tvalentyn commented on issue #35876: [C++] Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1721467555

   Seeing this error in some unit tests that use pyarrow, which we run in Apache Beam. We run unit tests on multiple os. After upgrading to pyarrow==13.0.0  in the windows tests, we are seeing
   ```
   C:\arrow\cpp\src\arrow\filesystem\s3fs.cc:2829:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit
   ```
   and tests appear to timeout.
   
   https://github.com/apache/beam/actions/runs/6177998648/job/16833521908?pr=28437
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] VelvetAcidChrist commented on issue #35876: [C++] Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "VelvetAcidChrist (via GitHub)" <gi...@apache.org>.
VelvetAcidChrist commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1627484397

   Use pyarrow 11.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35876: Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1585145164

   This requirement comes from S3 and is documented [here](https://sdk.amazonaws.com/cpp/api/LATEST/root/html/index.html)
   
   > The AWS SDK for C++ must be initialized by calling Aws::InitAPI. Before the application terminates, the SDK must be shut down by calling Aws::ShutdownAPI. Each method accepts an argument of [Aws::SDKOptions](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/struct_aws_1_1_s_d_k_options.html)
   > 
   > All other calls to the SDK can be performed between these two method calls.
   >
   > All AWS SDK for C++ calls performed between Aws::InitAPI and Aws::ShutdownAPI should either to be contained within a pair of curly braces or should be invoked by functions called between the two methods.
   
   The `FinalizeS3` method must be called once and it must be called before the application exits (e.g. before any objects with "static storage duration" are destroyed)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35876: [C++] Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1632880136

   It's possible this was fixed in 13.0.0.  You still need to call `FinalizeS3` as I described but we did [fix a bug](https://github.com/apache/arrow/issues/36346) where users could call `FinalizeS3` correctly and still get a crash.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on issue #35876: [C++] Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1642565382

   Hi @phalexo, could you provide a minimal reproduction here for us to test? It's not clear to me whether you're just using PyArrow and are encountered an issue or if you're doing something more complicated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35876: [C++] Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1686653085

   @phalexo @imatiach-msft This should have been fixed in the about-to-be-released version 13.0.0. Can you perhaps try a nightly build? See https://arrow.apache.org/docs/dev/developers/python.html#installing-nightly-packages for how to install them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35876: [C++] Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1589658833

   > I looked at the arrow code, and it is not clear to me where this should happen.
   
   As I said, you must call it once before your application exits.
   
   > Putting the call before the connection is severed did not fix the problem.
   
   I'm sorry.  I don't understand what this means.
   
   For example, this is valid:
   
   ```
   int main() {
     if (!arrow::fs::EnsureS3Initialized().ok()) {
       std::cerr << "Failed to initialize S3" << std::endl;
       return 1;
     }
     // Your program goes here...
     if (!arrow::fs::EnsureS3Finalized().ok()) {
       std::cerr << "Failed to finalize S3" << std::endl;
       return 1;
     }
     return 0;
   }
   ```
   
   This is not valid:
   
   ```
   struct GlobalS3Filesystem {
     GlobalS3Filesystem() {
       if (!arrow::fs::EnsureS3Initialized().ok()) {
         // handle error
         return;
       }
       arrow::Result<std::shared_ptr<arrow::fs::S3FileSystem>> maybe_s3fs =
           arrow::fs::S3FileSystem::Make({});
       if (!maybe_s3fs.ok()) {
         // handle error
         return;
       }
       s3fs = maybe_s3fs.MoveValueUnsafe();
     }
     ~GlobalS3Filesystem() {
       if (!arrow::fs::EnsureS3Finalized().ok()) {
         // handle error
       }
     }
     std::shared_ptr<arrow::fs::FileSystem> s3fs;
   };
   
   std::shared_ptr<arrow::fs::FileSystem> GetGlobalS3Filesystem() {
     static GlobalS3Filesystem global_s3fs;
     return global_s3fs.s3fs;
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] phalexo commented on issue #35876: [C++] Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "phalexo (via GitHub)" <gi...@apache.org>.
phalexo commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1585767266

   > This requirement comes from S3 and is documented [here](https://sdk.amazonaws.com/cpp/api/LATEST/root/html/index.html)
   > 
   > > The AWS SDK for C++ must be initialized by calling Aws::InitAPI. Before the application terminates, the SDK must be shut down by calling Aws::ShutdownAPI. Each method accepts an argument of [Aws::SDKOptions](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/struct_aws_1_1_s_d_k_options.html)
   > > All other calls to the SDK can be performed between these two method calls.
   > > All AWS SDK for C++ calls performed between Aws::InitAPI and Aws::ShutdownAPI should either to be contained within a pair of curly braces or should be invoked by functions called between the two methods.
   > 
   > The `FinalizeS3` method must be called once and it must be called before the application exits (e.g. before any objects with "static storage duration" are destroyed)
   
   Yes, so arrow, pyarrow should clean the stuff up properly. I looked at the arrow code, and it is not clear to me where this should happen. Putting the call before the connection is severed did not fix the problem.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] imatiach-msft commented on issue #35876: [C++] Problem when used from "qlora", huggingface. Does not clean up, and dumps core.

Posted by "imatiach-msft (via GitHub)" <gi...@apache.org>.
imatiach-msft commented on issue #35876:
URL: https://github.com/apache/arrow/issues/35876#issuecomment-1664162217

   running into this issue as well - we aren't even using amazon S3 (we are using Microsoft Azure) but for some reason we are still seeing this error which is causing our builds to crash after hitting the 6 hour time limit transiently (sometimes builds pass, sometimes they do not):
   https://github.com/microsoft/responsible-ai-toolbox/pull/2212


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org