You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "vdytyniak-exos (via GitHub)" <gi...@apache.org> on 2023/04/28 08:19:11 UTC

[GitHub] [arrow] vdytyniak-exos opened a new issue, #35365: A libcurl function was given a bad argument

vdytyniak-exos opened a new issue, #35365:
URL: https://github.com/apache/arrow/issues/35365

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   We use pyarrow to read data from S3 and sometimes we get the following error:
   
   ```
   File "/usr/local/lib/python3.10/dist-packages/{org}/store/storage.py", line 794, in _load_partition
       table = ds.dataset(
     File "pyarrow/_dataset.pyx", line 369, in pyarrow._dataset.Dataset.to_table
     File "pyarrow/_dataset.pyx", line 2818, in pyarrow._dataset.Scanner.to_table
     File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
   OSError: AWS Error NETWORK_CONNECTION during GetObject operation: curlCode: 43, A libcurl function was given a bad argument
   ```
   
   We were trying to find the reason why it happens, but it is very random. Can you help to understand where actually the problem with libcurl can be?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35365: A libcurl function was given a bad argument

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35365:
URL: https://github.com/apache/arrow/issues/35365#issuecomment-1544145706

   I did some basic research on the error and didn't find much.  The only thing I could see that might cause this is if there is an incompatibility between the S3 SDK and the curl versions (e.g. if the S3 SDK was developed / compiled against one version and linked / run with another version).
   
   How are you obtaining pyarrow?  Is it from conda, pip, or a build from source?  Can you use ldd to check which library versions it is linking against?  For example, I use conda so I run this:
   
   ```
   (arrow-release-11) pace@pace-desktop:~$ ldd ~/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/libarrow_python.so.1000.1.0 
   ...
   	libcurl.so.4 => /home/pace/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/../../../././libcurl.so.4 (0x00007f3ffa8ac000)
   ...
   	libaws-c-s3.so.0unstable => /home/pace/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/../../.././././libaws-c-s3.so.0unstable (0x00007f3ffa64c000)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vdytyniak-exos commented on issue #35365: A libcurl function was given a bad argument

Posted by "vdytyniak-exos (via GitHub)" <gi...@apache.org>.
vdytyniak-exos commented on issue #35365:
URL: https://github.com/apache/arrow/issues/35365#issuecomment-1544190961

   I install from pip. I don't see libaws-c-s3.so:
   ```
   root@fba404d79f64:/dir# ldd /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow_python.so.1000.1.0
   	linux-vdso.so.1 (0x00007ffe52b60000)
   	libarrow_dataset.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow_dataset.so.1000 (0x00007f136944b000)
   	libparquet.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libparquet.so.1000 (0x00007f1368d10000)
   	libarrow.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow.so.1000 (0x00007f136663d000)
   	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1366456000)
   	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1366307000)
   	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f13662ec000)
   	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f13660f8000)
   	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f13660ee000)
   	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f13660cb000)
   	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f13660c5000)
   	/lib64/ld-linux-x86-64.so.2 (0x00007f13697bd000)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] A libcurl function was given a bad argument [arrow]

Posted by "shomilj (via GitHub)" <gi...@apache.org>.
shomilj commented on issue #35365:
URL: https://github.com/apache/arrow/issues/35365#issuecomment-1813845068

   We're facing the same issue - we see `AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 43` show up transiently, and it's pretty hard to reproduce - the only thing that seems to be triggering it more frequently is a higher-latency network connection to S3, so our suspicion is that something at a lower layer is not handling higher latency properly (cc @westonpace if you may have any pointers or additional debugging tips).
   
   @vdytyniak-exos did you ever root cause this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vdytyniak-exos commented on issue #35365: A libcurl function was given a bad argument

Posted by "vdytyniak-exos (via GitHub)" <gi...@apache.org>.
vdytyniak-exos commented on issue #35365:
URL: https://github.com/apache/arrow/issues/35365#issuecomment-1538525821

   > We don't use curl directly, only indirectly through aws-cpp-sdk. This is going to be hard to debug without some way to reliably reproduce.
   > 
   > What version of pyarrow are you using? What OS?
   
   pyarrow=10.0.1
   os: ubuntu:20.04


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35365: A libcurl function was given a bad argument

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35365:
URL: https://github.com/apache/arrow/issues/35365#issuecomment-1545961013

   > I install from pip. I don't see libaws-c-s3.so:
   
   Ah, I think, if you installed from pip, everything is statically linked.  Which I suppose rules out a version incompatibility.
   
   In that case I'm afraid I'm at a bit of a loss on where to proceed next.  If it could be reproduced regularly we might try and build with a debug version of curl and break at the point where that error is being generated to figure out what exactly is invalid.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35365: A libcurl function was given a bad argument

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35365:
URL: https://github.com/apache/arrow/issues/35365#issuecomment-1538510360

   We don't use curl directly, only indirectly through aws-cpp-sdk. This is going to be hard to debug without some way to reliably reproduce.
   
   What version of pyarrow are you using?  What OS?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org