You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Adam Black (Jira)" <ji...@apache.org> on 2022/07/07 14:11:00 UTC

[jira] [Created] (ARROW-17002) R dplyr queries create locks on FileSystemDataset files

Adam Black created ARROW-17002:
----------------------------------

             Summary: R dplyr queries create locks on FileSystemDataset files
                 Key: ARROW-17002
                 URL: https://issues.apache.org/jira/browse/ARROW-17002
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 8.0.0
            Reporter: Adam Black


I think that dplyr queries on FileSystemDataset objects will create locks that persist unnecessarily. This issue only seems to occur on Windows. I'm using Windows 10. Calling the garbage collector after the dplyr query seems to release the lock.

 

``` r
library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# I can delete an arrow dataset that has been opened
write_dataset(iris, "iris")
ds <- open_dataset("iris")
file.exists("iris")
#> [1] TRUE
print(unlink("iris", recursive = T))
#> [1] 0
file.exists("iris")
#> [1] FALSE

# However if I run a dplyr query on the data before deleting it the file is locked.
write_dataset(iris, "iris")
ds <- open_dataset("iris")
file.exists("iris")
#> [1] TRUE

# I think this adds a lock that is not automatically removed (on Windows)
ds %>% count() %>% collect()
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1   150

print(unlink("iris", recursive = T))
#> [1] 1
file.exists("iris")
#> [1] TRUE
print(unlink("iris", recursive = T, force = T))
#> [1] 1
file.exists("iris")
#> [1] TRUE
file.remove("iris/part-0.parquet")
#> Warning in file.remove("iris/part-0.parquet"): cannot remove file 'iris/
#> part-0.parquet', reason 'Permission denied'
#> [1] FALSE

# running gc() will clean up the lock and allow the file to be deleted
gc()
#>           used (Mb) gc trigger  (Mb) max used (Mb)
#> Ncells 1179433   63    2354975 125.8  1664192 88.9
#> Vcells 2095138   16    8388608  64.0  3175226 24.3
print(unlink("iris", recursive = T))
#> [1] 0
file.exists("iris")
#> [1] FALSE
```

<sup>Created on 2022-07-07 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>

<details style="margin-bottom:10px;">
<summary>
Session info
</summary>

``` r
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.5 (2021-03-31)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/New_York            
#>  date     2022-07-07                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  arrow       * 8.0.0   2022-05-09 [1] CRAN (R 4.0.5)
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.5)
#>  backports     1.4.0   2021-11-23 [1] CRAN (R 4.0.5)
#>  bit           4.0.4   2020-08-04 [1] CRAN (R 4.0.5)
#>  bit64         4.0.5   2020-08-30 [1] CRAN (R 4.0.5)
#>  cli           3.0.1   2021-07-17 [1] CRAN (R 4.0.5)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.0.5)
#>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.0.5)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.5)
#>  dplyr       * 1.0.8   2022-02-08 [1] CRAN (R 4.0.5)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.0.5)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.5)
#>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.0.5)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.0.5)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.5)
#>  generics      0.1.2   2022-01-31 [1] CRAN (R 4.0.5)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.5)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.0.5)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.0.5)
#>  knitr         1.36    2021-09-29 [1] CRAN (R 4.0.5)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.0.5)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.5)
#>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.0.5)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.5)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.5)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.0.5)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.0.5)
#>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.0.5)
#>  rmarkdown     2.10    2021-08-06 [1] CRAN (R 4.0.5)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.5)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.5)
#>  stringi       1.7.5   2021-10-04 [1] CRAN (R 4.0.5)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.5)
#>  styler        1.5.1   2021-07-13 [1] CRAN (R 4.0.5)
#>  tibble        3.1.2   2021-05-16 [1] CRAN (R 4.0.5)
#>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.0.5)
#>  tzdb          0.2.0   2021-10-27 [1] CRAN (R 4.0.5)
#>  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.5)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.0.5)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.0.5)
#>  xfun          0.25    2021-08-06 [1] CRAN (R 4.0.5)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.5)
#> 
#> [1] C:/Users/adam.DESKTOP-D3KQQA1/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.5/library
```

</details>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)