You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dragoș Moldovan-Grünfeld (Jira)" <ji...@apache.org> on 2021/12/09 14:17:00 UTC

[jira] [Comment Edited] (ARROW-15041) [R] Flaky BOM removal test

    [ https://issues.apache.org/jira/browse/ARROW-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456465#comment-17456465 ] 

Dragoș Moldovan-Grünfeld edited comment on ARROW-15041 at 12/9/21, 2:16 PM:
----------------------------------------------------------------------------

This seems to have something to do with the order in which the 2 files are created and subsequently brought together with {{open_dataset()}}. Creating file 2 first doesn't trigger a test failure. 

{code:r}
  temp_dir <- make_temp_dir()
  writeLines("\xef\xbb\xbfa,b\n1,2\n", con = file.path(temp_dir, "file1.csv"))
  writeLines("\xef\xbb\xbfa,b\n3,4\n", con = file.path(temp_dir, "file2.csv"))

  expect_equal(
    open_dataset(temp_dir, format = "csv") %>% collect(),
    tibble(a = c(1, 3), b = c(2, 4))
  )
{code}


was (Author: dragosmg):
This seems to have something to do with the order in which the 2 files are created and subsequently brought together with {{open_dataset()}}. Creating file 2 first doesn't trigger the problem. 

{code:r}
  temp_dir <- make_temp_dir()
  writeLines("\xef\xbb\xbfa,b\n1,2\n", con = file.path(temp_dir, "file1.csv"))
  writeLines("\xef\xbb\xbfa,b\n3,4\n", con = file.path(temp_dir, "file2.csv"))

  expect_equal(
    open_dataset(temp_dir, format = "csv") %>% collect(),
    tibble(a = c(1, 3), b = c(2, 4))
  )
{code}

> [R] Flaky BOM removal test
> --------------------------
>
>                 Key: ARROW-15041
>                 URL: https://issues.apache.org/jira/browse/ARROW-15041
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Antoine Pitrou
>            Priority: Major
>             Fix For: 7.0.0
>
>
> The test introduced in ARROW-14644 appears to be flaky.
> See example failed runs:
> https://github.com/apache/arrow/runs/4466790381?check_suite_focus=true#step:8:21277
> https://github.com/apache/arrow/runs/4463832536?check_suite_focus=true#step:9:22039
> {code}
> ── Failure (test-dataset-csv.R:297:3): open_dataset() deals with BOMs (byte-order-marks) correctly ──
> `object` (`actual`) not equal to `expected` (`expected`).
> actual vs expected
>                 a b
> - actual[1, ]   3 4
> + expected[1, ] 1 2
> - actual[2, ]   1 2
> + expected[2, ] 3 4
>   `actual$a`: 3 1
> `expected$a`: 1 3
>   `actual$b`: 4 2
> `expected$b`: 2 4
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)