You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/09/09 20:18:00 UTC

[jira] [Commented] (ARROW-9946) ParquetFileWriter segfaults when `sink` is a string

    [ https://issues.apache.org/jira/browse/ARROW-9946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193163#comment-17193163 ] 

Neal Richardson commented on ARROW-9946:
----------------------------------------

The docs were very recently updated to say that it requires an OutputStream: https://ursalabs.org/arrow-r-nightly/reference/ParquetFileWriter.html

But {{ParquetFileWriter$create()}} should validate that. Would you be interested in submitting a PR to fix that? Could also add in brief docstrings for ParquetFileWriter's two methods while you're there, it's all within the same 10 lines of parquet.R.

> ParquetFileWriter segfaults when `sink` is a string
> ---------------------------------------------------
>
>                 Key: ARROW-9946
>                 URL: https://issues.apache.org/jira/browse/ARROW-9946
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 1.0.1
>         Environment: Ubuntu 20.04
>            Reporter: Karl Dunkle Werner
>            Priority: Minor
>
> Hello again! I have another minor R arrow issue.
>  
> The {{ParquetFileWriter}} docs say that the {{sink}} argument can be a "string which is interpreted as a file path". However, when I try to use a string, I get a segfault because the memory isn't mapped.
>  
> Maybe this is a separate request, but it would also be helpful to have documentation for the methods of the writer created by {{ParquetFileWriter$create()}}.
> Docs link: [https://arrow.apache.org/docs/r/reference/ParquetFileWriter.html]
>  
> {code:r}
> library(arrow)
> sch = schema(a = float32())
> writer = ParquetFileWriter$create(schema = sch, sink = "test.parquet")
> #> *** caught segfault ***
> #> address 0x14100007d, cause 'memory not mapped'
> #> 
> #> Traceback:
> #> 1: parquet___arrow___ParquetFileWriter__Open(schema, sink, properties,     arrow_properties)
> #> 2: shared_ptr_is_null(xp)
> #> 3: shared_ptr(ParquetFileWriter, parquet___arrow___ParquetFileWriter__Open(schema,     sink, properties, arrow_properties))
> #> 4: ParquetFileWriter$create(schema = sch, sink = "test.parquet")
> # This works as expected:
> sink = FileOutputStream$create("test.parquet")
> writer = ParquetFileWriter$create(schema = sch, sink = sink)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)