You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Neal Richardson (JIRA)" <ji...@apache.org> on 2019/08/09 03:48:00 UTC

[jira] [Commented] (ARROW-6183) [R] factor out tidyselect?

    [ https://issues.apache.org/jira/browse/ARROW-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903539#comment-16903539 ] 

Neal Richardson commented on ARROW-6183:
----------------------------------------

The col_select argument already works with a character vector of column names:
{code:java}
library(arrow)
f <- tempfile() 
write.csv(iris, file=f)
df <- read_csv_arrow(f, col_select=c("Sepal.Length", "Species"))
> head(df)
   Sepal.Length Species
 1          5.1  setosa
 2          4.9  setosa
 3          4.7  setosa
 4          4.6  setosa
 5          5.0  setosa
 6          5.4  setosa
{code}
Perhaps we could improve the documentation to make that explicit.

I'm generally in favor of minimizing dependencies (I was fine with dropping tibble FWIW), but the tidyselect dependency is not heavy given what other dependencies arrow already requires. And since it doesn't require you to be "tidy", including it does not force anyone to take a side in an internecine language war. So personally I'm -0 on removing it.

> [R] factor out tidyselect?
> --------------------------
>
>                 Key: ARROW-6183
>                 URL: https://issues.apache.org/jira/browse/ARROW-6183
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: R
>            Reporter: James Lamb
>            Priority: Minor
>
> I noticed tonight that several functions from the *tidyselect* package are re-exported by *arrow*. Why is this necessary? In my opinion, the *arrow* R package should strive to have as few dependencies as possible and should have no opinion about which parts of the R ecosystem ("tidy" or otherwise) are used with it.
> I think it would be valuable to cut the *tidyselect* re-exports, and to make *feather::read_feather()*'s argument *col_select* take a character vector of column names instead of a "*tidyselect::vars_select()"* object. I think that would be more natural and would be intuitive for a broader group of R users.
> Would you be open to removing *tidyselect* and changing *feather::read_feather()* this way?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)