You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Lennart Tuijnder (Jira)" <ji...@apache.org> on 2021/10/22 07:35:00 UTC

[jira] [Updated] (ARROW-14434) R crashes when making an empty selection for Datasets with DateTime

     [ https://issues.apache.org/jira/browse/ARROW-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lennart Tuijnder updated ARROW-14434:
-------------------------------------
    Description: 
R (3.6.3) crashes when querying a dataset using the "?arrow:: Dataset" functionality when the following conditions are met:
 * The dataset to query contains a data-time/time column
 * An empty selection is made with dplyr::filter on the Dataset object
 * the dplyr::collection method is called. -> (at this point the crash happens)

This crash happens both when the dataset is locally defined or situated on an S3 bucket.

Here is a minimal example to reproduce the bug:
{code:java}
library(dplyr)
library(lubridate)

# If you remove the dataTime column no crashing occurs.
df <- tibble(
	time = seq(5,10,length.out = 10000),
	dateTime = as_datetime(1511870400) + time # dataTime columns causes crash!
)
file <- tempdir()
arrow::write_dataset(df, file)

testdf <- arrow::open_dataset(file) %>%
	# filter(time > 5 & time <6) %>% # When selecting non-empty it does not crash
	filter(time < 5 ) %>% # select empty and it crashes!
	collect()# it crashes when you do collect()

{code}
R crashes with the following message:
 * 
 ** 
 *** 
 **** caught segfault ****
 *address 0x8, cause 'memory not mapped'*

I have included in the attachment the full R console output when running the above code.

 

  was:
R (3.6.3) crashes when querying a dataset using the "?arrow:: Dataset" functionality when the following conditions are met:
 * The dataset to query contains a data-time/time column
 * An empty selection is made with dplyr::filter on the Dataset object
 * the dplyr::collection method is called. -> (at this point the crash happens)

This crash happens both when the dataset is locally defined or situated on an S3 bucket.

Here is a minimal example to reproduce the bug:
{code:java}
library(dplyr)
library(lubridate)

# If you remove the dataTime column no crashing occurs.
df <- tibble(
	time = seq(5,10,length.out = 10000),
	dateTime = as_datetime(1511870400) + time # dataTime columns causes crash!
)
file <- tempdir()
arrow::write_dataset(df, file)testdf <- arrow::open_dataset(file) %>%
	# filter(time > 5 & time <6) %>% # When selecting non-empty it does not crash
	filter(time < 5 ) %>% # select empty and it crashes!
	collect()# it crashes when you do collect()

{code}
R crashes with the following message:

**** caught segfault ****
*address 0x8, cause 'memory not mapped'*

I have included in the attachment the full R console output when running the above code.

 


> R crashes when making an empty selection for Datasets with DateTime
> -------------------------------------------------------------------
>
>                 Key: ARROW-14434
>                 URL: https://issues.apache.org/jira/browse/ARROW-14434
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, R
>    Affects Versions: 5.0.0
>         Environment: OS = Ubuntu 20.04 
> I use Architect IDE (an ide base on eclipse). But the crash also happens with just R console. R = 3.6.3
> See attached files for session info output and an R crash report. 
>            Reporter: Lennart Tuijnder
>            Priority: Major
>         Attachments: RConsole.txt, sessionInfoOutput.txt
>
>
> R (3.6.3) crashes when querying a dataset using the "?arrow:: Dataset" functionality when the following conditions are met:
>  * The dataset to query contains a data-time/time column
>  * An empty selection is made with dplyr::filter on the Dataset object
>  * the dplyr::collection method is called. -> (at this point the crash happens)
> This crash happens both when the dataset is locally defined or situated on an S3 bucket.
> Here is a minimal example to reproduce the bug:
> {code:java}
> library(dplyr)
> library(lubridate)
> # If you remove the dataTime column no crashing occurs.
> df <- tibble(
> 	time = seq(5,10,length.out = 10000),
> 	dateTime = as_datetime(1511870400) + time # dataTime columns causes crash!
> )
> file <- tempdir()
> arrow::write_dataset(df, file)
> testdf <- arrow::open_dataset(file) %>%
> 	# filter(time > 5 & time <6) %>% # When selecting non-empty it does not crash
> 	filter(time < 5 ) %>% # select empty and it crashes!
> 	collect()# it crashes when you do collect()
> {code}
> R crashes with the following message:
>  * 
>  ** 
>  *** 
>  **** caught segfault ****
>  *address 0x8, cause 'memory not mapped'*
> I have included in the attachment the full R console output when running the above code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)