You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Bramtimm (via GitHub)" <gi...@apache.org> on 2023/03/05 18:10:42 UTC

[GitHub] [arrow] Bramtimm opened a new issue, #34459: Segmentation fault when trying to connect to AWS S3 Storage on CentOS 7 – Amazon Linux 2

Bramtimm opened a new issue, #34459:
URL: https://github.com/apache/arrow/issues/34459

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   
   Hi there,
   
   Since Arrow 11.0.2 we run into a segmentation fault when trying to connect to our AWS S3 storage with the R library of Arrow (see below). When reading the file locally first, the package tends to work fine. 
   ```
   arrow::s3_bucket('<bucketname>')
   
    *** caught segfault ***
   address 0x28, cause 'memory not mapped'
   
   Traceback:
    1: fs___FileSystemFromUri(uri)
    2: FileSystem$from_uri(bucket)
    3: arrow::s3_bucket("<bucketname>")
   
   Possible actions:
   1: abort (with core dump, if enabled)
   2: normal R exit
   3: exit R without saving workspace
   4: exit R saving workspace
   ```
   
   On closer inspection with C++ debugger it seems that the segmentation fault might be related to OpenSSL. We now use `
   OpenSSL 1.1.1o  3 May 2022` version, but seem to be stuck on the same error on earlier builds. 
   
   
   ```
   Thread 1 "R" received signal SIGSEGV, Segmentation fault.
   0x00007fffeb5900d9 in evp_md_init_internal () from /usr/local/lib64/libcrypto.so.3
   ```
   
   As noted in the installation guide we are aware of setting C++ 17 compiler and devtoolsset-11 for centOS, and use a custom image to build our environment wich sets the makevars etc.  
   
   We are kind of running out of ideas, so any help is much appreciated!
   
   I’ve added the output of `arrow::arrow_info()` and `sessionInfo()` below. 
   ```
   > sessionInfo()
   R version 4.2.2 (2022-10-31)
   Platform: x86_64-pc-linux-gnu (64-bit)
   Running under: Amazon Linux 2
   
   Matrix products: default
   BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
   
   locale:
    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
    [9] LC_ADDRESS=C               LC_TELEPHONE=C            
   [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
   
   attached base packages:
   [1] stats     graphics  grDevices utils     datasets  methods   base     
   
   loaded via a namespace (and not attached):
   [1] compiler_4.2.2
   > R
   Error: object 'R' not found
   > arrow::arrow_info()
   Arrow package version: 11.0.0.2
   
   Capabilities:
                  
   dataset    TRUE
   substrait FALSE
   parquet    TRUE
   json       TRUE
   s3         TRUE
   gcs        TRUE
   utf8proc   TRUE
   re2        TRUE
   snappy     TRUE
   gzip       TRUE
   brotli     TRUE
   zstd       TRUE
   lz4        TRUE
   lz4_frame  TRUE
   lzo       FALSE
   bz2        TRUE
   jemalloc   TRUE
   mimalloc   TRUE
   
   Memory:
                     
   Allocator jemalloc
   Current    0 bytes
   Max        0 bytes
   
   Runtime:
                          
   SIMD Level          avx
   Detected SIMD Level avx
   
   Build:
                              
   C++ Library Version  11.0.0
   C++ Compiler            GNU
   C++ Compiler Version  8.3.1
   ```
   
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34459: [R] Segmentation fault when trying to connect to AWS S3 Storage on CentOS 7 – Amazon Linux 2

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34459:
URL: https://github.com/apache/arrow/issues/34459#issuecomment-1456933841

   Just to verify, is this a segmentation fault on shutdown?  In other words, does this happen as one of the last things you do?  And, if so, does it still happen if you put a sleep before you exit?  There is a known issue with S3 shutdown.
   
   If that's not it do you think there is any chance we could get a full backtrace from reproducing the issue with gdb attached.  Something like...
   
   ```
   R -d gdb
   ...
   gdb prompt> run
   ...
   R prompt> source("my_script.R")
   ...
   SEGFAULT
   gdb prompt> bt
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Bramtimm commented on issue #34459: [R] Segmentation fault when trying to connect to AWS S3 Storage on CentOS 7 – Amazon Linux 2

Posted by "Bramtimm (via GitHub)" <gi...@apache.org>.
Bramtimm commented on issue #34459:
URL: https://github.com/apache/arrow/issues/34459#issuecomment-1460710945

   Thanks for your response, the segmentation fault happened irrelevant of shutdown, but it seems that we have resolved the issue. It looks like it was caused by swapping `openssl-devel` to `openssl11-devel` for python 3.11 (i.e. `yum swap -y openssl-devel openssl11-devel`) in our setup before installing Arrow (R) with `install_arrow()`. This seems to cause the segfault in `libcrypto.so.3` when trying to connect to S3. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace closed issue #34459: [R] Segmentation fault when trying to connect to AWS S3 Storage on CentOS 7 – Amazon Linux 2

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace closed issue #34459: [R] Segmentation fault when trying to connect to AWS S3 Storage on CentOS 7 – Amazon Linux 2
URL: https://github.com/apache/arrow/issues/34459


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34459: [R] Segmentation fault when trying to connect to AWS S3 Storage on CentOS 7 – Amazon Linux 2

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34459:
URL: https://github.com/apache/arrow/issues/34459#issuecomment-1468458509

   Thanks for the update.  I'm going to close this issue then.  Feel free to reopen if you need more investigation or run into further issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org