You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/12/12 16:44:00 UTC

[jira] [Commented] (ARROW-17332) [R] error parsing folder path with accent ('c:/Público') in read_csv_arrow

    [ https://issues.apache.org/jira/browse/ARROW-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646199#comment-17646199 ] 

Dewey Dunnington commented on ARROW-17332:
------------------------------------------

Thank you again for reporting this!

I think this is an issue with {{normalizePath()}}. It seems that {{normalizePath()}} does not mark its output with the encoding, so passing it a latin-1 value will result in the value being silently re-encoded:

{code:R}
f <- tempfile(fileext = "Público")
write.csv(mtcars, f, row.names = FALSE)
f_utf8 <- enc2utf8(normalizePath(f))
Encoding(f_utf8)
#> [1] "UTF-8"

f_latin1 <- iconv(f, "utf-8", "latin1")
Encoding(f_latin1)
#> [1] "latin1"
f_latin1_normalized <- normalizePath(f_latin1, winslash = "/", mustWork = FALSE)
Encoding(f_latin1_normalized)
#> [1] "unknown"

# The output was silently translated to UTF-8 but was not marked as such
identical(charToRaw(f_latin1), charToRaw(f_latin1_normalized))
#> [1] FALSE
identical(charToRaw(f_utf8), charToRaw(f_latin1_normalized))
#> [1] TRUE

f_utf8_normalized <- normalizePath(f, winslash = "/", mustWork = FALSE)
Encoding(f_utf8_normalized)
#> [1] "unknown"
identical(charToRaw(f_utf8), charToRaw(f_utf8_normalized))
#> [1] TRUE
{code}

I don't know what the intended output encoding is because I don't have an R install at my fingertips where the system encoding is not UTF-8. (Before 11.0.0 I will check with {{docker run ubuntu:latest}} because I'm almost certain that runs R in a latin1 locale if you use the default {{apt-get install r-base}}.


> [R] error parsing folder path with accent ('c:/Público') in read_csv_arrow
> --------------------------------------------------------------------------
>
>                 Key: ARROW-17332
>                 URL: https://issues.apache.org/jira/browse/ARROW-17332
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Lucas Mation
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> I am a user trying the R arrow package on a windows machine. 
> To reproduce create a folder name containing a character with Latin accents
> ```
> libary(arrow)
> p <- 'c:/Público'  
> b <- read_csv_arrow(p)
> Error: IOError: Failed to open local file 'c:/Público'. Detail: [Windows error 5] Access is denied.
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)