You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jeroen (Jira)" <ji...@apache.org> on 2021/12/22 12:04:00 UTC

[jira] [Commented] (ARROW-14677) [R][C++] macOS R package arrow segfault on `open_dataset()`

    [ https://issues.apache.org/jira/browse/ARROW-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463767#comment-17463767 ] 

Jeroen commented on ARROW-14677:
--------------------------------

Sorry for the late response.

I think the crash here is caused by the fact that your installation seems to mix autobrew libraries and dynamic libraries from homebrew in `/usr/local/opt`. This again is likely a result of your custom R configuration passing `-L/usr/local/lib` when compiling R packages, and thereby if you have homebrew libraries installed with the same name as the ones from the autobrew bundle, the former mask the latter.

This is probably a rare situation with custom builds of R, but I have added a workaround in the autobrew download script to rename the static libraries before linking, such that they won't get masked by the ones `/usr/local/lib`: [https://github.com/autobrew/scripts/commit/08d8af36ada522b5a79aadfe77c43f798fe600ef]

Can you try again to build the release version of arrow from source using autobrew? I.e. just `install.packages("arrow", type = "source")` should do that.

Thanks!

 

 

> [R][C++] macOS R package arrow segfault on `open_dataset()`
> -----------------------------------------------------------
>
>                 Key: ARROW-14677
>                 URL: https://issues.apache.org/jira/browse/ARROW-14677
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, R
>    Affects Versions: 6.0.0
>            Reporter: Martin Morgan
>            Priority: Major
>
> Following a slack post (https://ropensci.slack.com/archives/C026GCWKA/p1636588933095400), accessing a public bucket with the R client
> {code:java}
> df <- arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
> {code}
> leads to a segfault
> {code:java}
>   *** caught segfault ***
> address 0x0, cause 'unknown'
> Traceback:
> 1: dataset__DatasetFactory_Finish1(self, unify_schemas)
> 2: factory$Finish(schema, isTRUE(unify_schemas))
> 3: doTryCatch(return(expr), name, parentenv, handler)
> 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
> 5: tryCatchList(expr, classes, parentenv, handlers)
> 6: tryCatch(factory$Finish(schema, isTRUE(unify_schemas)), error = function(e)
> { handle_parquet_io_error(e, format)}
> )
> 7: arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/")
>  
> {code}
> The arrow portion of the lldb traceback is
> {code:java}
> (lldb) thread backtrace
> thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x000000012ab2029c libthrift-0.15.0.dylib`std::__1::shared_ptr<apache::thrift::async::TAsyncProcessor>::~shared_ptr() + 46
> frame #1: 0x0000000128bb6ac2 arrow.so`void parquet::DeserializeThriftUnencryptedMsg<parquet::format::FileMetaData>(unsigned char const*, unsigned int*, parquet::format::FileMetaData*) + 309
> frame #2: 0x0000000128bb5f49 arrow.so`parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 517
> frame #3: 0x0000000128bace0d arrow.so`parquet::FileMetaData::FileMetaData(void const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 85
> frame #4: 0x0000000128bacd1b arrow.so`parquet::FileMetaData::Make(void const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 89
> frame #5: 0x0000000128b9cb4a arrow.so`parquet::SerializedFile::ParseUnencryptedFileMetadata(std::__1::shared_ptr<arrow::Buffer> const&, unsigned int) + 118
> frame #6: 0x0000000128b9df43 arrow.so`parquet::SerializedFile::ParseMetaData() + 607
> frame #7: 0x0000000128b9dc6c arrow.so`parquet::ParquetFileReader::Contents::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>, parquet::ReaderProperties const&, std::_1::shared_ptr<parquet::FileMetaData>) + 214
> frame #8: 0x0000000128b9eb72 arrow.so`parquet::ParquetFileReader::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>, parquet::ReaderProperties const&, std::_1::shared_ptr<parquet::FileMetaData>) + 58
> frame #9: 0x0000000128c8a988 arrow.so`arrow::dataset::ParquetFileFormat::GetReader(arrow::dataset::FileSource const&, arrow::dataset::ScanOptions*) const + 286
> frame #10: 0x0000000128c8a72e arrow.so`arrow::dataset::ParquetFileFormat::Inspect(arrow::dataset::FileSource const&) const + 44
> frame #11: 0x0000000128c0b994 arrow.so`arrow::dataset::FileSystemDatasetFactory::InspectSchemas(arrow::dataset::InspectOptions) + 336
> frame #12: 0x0000000128c09079 arrow.so`arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions) + 43
> frame #13: 0x0000000128c0c1cf arrow.so`arrow::dataset::FileSystemDatasetFactory::Finish(arrow::dataset::FinishOptions) + 541
> frame #14: 0x0000000128a66805 arrow.so`dataset__DatasetFactoryFinish1(std::_1::shared_ptr<arrow::dataset::DatasetFactory> const&, bool) + 69
> frame #15: 0x0000000128a105aa arrow.so`arrow_dataset_DatasetFactory_Finish1 + 154 {code}
> arrow was installed from source on
> {code:java}
> > sessionInfo()
> R Under development (unstable) (2021-10-28 r81109)
> Platform: x86_64-apple-darwin19.6.0 (64-bit)
> Running under: macOS Catalina 10.15.7
> Matrix products: default
> BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
> LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] arrow_6.0.0.2
> loaded via a namespace (and not attached):
> [1] tidyselect_1.1.1 bit_4.0.4 compiler_4.2.0
> [4] BiocManager_1.30.16 magrittr_2.0.1 assertthat_0.2.1
> [7] R6_2.5.1 glue_1.5.0 bit64_4.0.5
> [10] vctrs_0.3.8 rlang_0.4.12 purrr_0.3.4
> {code}
> During package installation, the one step that was 'new' to me was the use of autobrew
> {code:java}
> *** Downloading apache-arrow
> Using autobrew bundle: apache-arrow-6.0.0-high_sierra.tar.xz{code}
> I'm not sure how to validate that this use is consistent with my brew installation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)