You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/12/12 16:44:00 UTC
[jira] [Commented] (ARROW-17332) [R] error parsing folder path with accent ('c:/Público') in read_csv_arrow
[ https://issues.apache.org/jira/browse/ARROW-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646199#comment-17646199 ]
Dewey Dunnington commented on ARROW-17332:
------------------------------------------
Thank you again for reporting this!
I think this is an issue with {{normalizePath()}}. It seems that {{normalizePath()}} does not mark its output with the encoding, so passing it a latin-1 value will result in the value being silently re-encoded:
{code:R}
f <- tempfile(fileext = "Público")
write.csv(mtcars, f, row.names = FALSE)
f_utf8 <- enc2utf8(normalizePath(f))
Encoding(f_utf8)
#> [1] "UTF-8"
f_latin1 <- iconv(f, "utf-8", "latin1")
Encoding(f_latin1)
#> [1] "latin1"
f_latin1_normalized <- normalizePath(f_latin1, winslash = "/", mustWork = FALSE)
Encoding(f_latin1_normalized)
#> [1] "unknown"
# The output was silently translated to UTF-8 but was not marked as such
identical(charToRaw(f_latin1), charToRaw(f_latin1_normalized))
#> [1] FALSE
identical(charToRaw(f_utf8), charToRaw(f_latin1_normalized))
#> [1] TRUE
f_utf8_normalized <- normalizePath(f, winslash = "/", mustWork = FALSE)
Encoding(f_utf8_normalized)
#> [1] "unknown"
identical(charToRaw(f_utf8), charToRaw(f_utf8_normalized))
#> [1] TRUE
{code}
I don't know what the intended output encoding is because I don't have an R install at my fingertips where the system encoding is not UTF-8. (Before 11.0.0 I will check with {{docker run ubuntu:latest}} because I'm almost certain that runs R in a latin1 locale if you use the default {{apt-get install r-base}}.
> [R] error parsing folder path with accent ('c:/Público') in read_csv_arrow
> --------------------------------------------------------------------------
>
> Key: ARROW-17332
> URL: https://issues.apache.org/jira/browse/ARROW-17332
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Reporter: Lucas Mation
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> I am a user trying the R arrow package on a windows machine.
> To reproduce create a folder name containing a character with Latin accents
> ```
> libary(arrow)
> p <- 'c:/Público'
> b <- read_csv_arrow(p)
> Error: IOError: Failed to open local file 'c:/Público'. Detail: [Windows error 5] Access is denied.
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)