You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/11 15:03:40 UTC

[GitHub] [arrow] romainfrancois commented on pull request #7807: ARROW-6537 [R]: Pass column_types to CSV reader

romainfrancois commented on pull request #7807:
URL: https://github.com/apache/arrow/pull/7807#issuecomment-691148776

It feels complicated to bend the various options from `ParseOptions`, `ConvertOptions` and `ReadOptions` to something that looks like `readr::` as they mean different things.

e.g. `ConvertOptions/column_types` which we know handle with a `schema` is only used to specify the types of some columns.

```
/// Optional per-column types (disabling type inference on those columns)
```

and then `ReadOptions/column_names` gives the names of all the columns:

```
/// Column names for the target table.
/// If empty, fall back on autogenerate_column_names.
std::vector<std::string> column_names;
/// Whether to autogenerate column names if `column_names` is empty.
/// If true, column names will be of the form "f0", "f1"...
/// If false, column names will be read from the first CSV row after `skip_rows`.
bool autogenerate_column_names = false;
```

There is also `ConvertOptions/include_columns` to control which to keep

```
/// If non-empty, indicates the names of columns from the CSV file that should
/// be actually read and converted (in the vector's order).
/// Columns not in this vector will be ignored.
std::vector<std::string> include_columns;
```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org