You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/06 14:08:25 UTC

[GitHub] [arrow] thisisnic commented on pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

thisisnic commented on pull request #12083:
URL: https://github.com/apache/arrow/pull/12083#issuecomment-1006618779


   Just pasting here the conversation from JIRA:
   
   > I did, however, run into trouble. Say, for example, the user has set skip_rows-option like this:
   `read_options=arrow::CsvReadOptions$create(skip_rows=1))`
   I image we'd like to keep whatever options the user has set when we re-create the `CsvReadOptions` object  with column names from the schema. The problem is that I cannot access `skip_rows` in the object after it's created, so I cannot use that information to create another instance of `CsvReadOptions` that has both the `column_names` and `skip_rows` set (plus any other options).
   
   > Any thoughts? Is there a way to access `skip_rows` and other attributes that I'm unaware of? Of course, one solution is to change class declaration of `CsvReadOptions` to have access to these attributes.
   
   Thanks for opening this draft PR!  After checking out a copy of your branch, I understand what's going on here and why it's not working a lot better not.  
   
   Currently, the approaches that come to mind for me are:
   1. as you say, update that class so we can modify contents after creation
   2. update the signature of `CsvFileFormat$create()`, setting `read_options` to a default value of `NULL` and then calling `csv_file_format_read_opts()` later in the function, passing in both the schema and any user-set values.
   
   I haven't fully fleshed out the second option or tested to see if it'll work, but if it does I'd be in favour of doing it that way so we can make the change we need without having to modify the structure of the existing classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org