You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/12 16:25:19 UTC

[GitHub] [arrow] thisisnic edited a comment on pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

thisisnic edited a comment on pull request #12083:
URL: https://github.com/apache/arrow/pull/12083#issuecomment-1011224682


   Thanks for the updates here @toppyy .  I've taken the time to have a proper think about this, and on reflection, I don't think we need to make `open_dataset( td, format = 'csv', read_options = CsvReadOptions$create( skip_rows = 1 ))` work for users as they can pass in their `skip_rows` parameter this way: `open_dataset( td, format = 'csv', skip_rows = 1)`.
   
   Directly using `CsvReadOptions$create()` is pretty low-level and so here we can probably assume that someone using it is responsible for making sure things match up themselves (though should absolutely add further documentation to show our recommended way that users work with `open_dataset()` so it's clear - perhaps that could be part of this PR if you're interested but no worries if not).
   
   I feel an alternative solution here might be just to check if there are conflicting arguments specified to `open_dataset()` (e.g. through specifying the `read_options` argument in the ellipses as well as individual read options).  It might be something along the lines of adding validation at the end of open_dataset so that if `(!is.null(schema))` and format is csv, ensure that `identical(names(schema), read_options$column_names)` or raise an error.
   
   How does that sound?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org