You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/04/29 14:00:44 UTC

[GitHub] [arrow] bkietz commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

bkietz commented on pull request #7033:
URL: https://github.com/apache/arrow/pull/7033#issuecomment-621229069


   @jorisvandenbossche 
   
   > do we plan to use the CsvFileFormat also for writing? ... For Parquet there is a ReaderOptions that grouped all options related to reading in the ParquetFileFormat
   
   We added ReaderOptions to avoid collisions with extant parquet write options. Since we don't have CSV write options avoiding collisions seems like specualtive generality; this can be handled when/if we add CSV writing. Also note that (for example) `ParseOptions::delimiter` would also be used when writing
   
   > I assume this is because the ParseOptions are used in its entirety, and for csv.ConvertOptions and csv.ReadOptions, only part of them is valid to set here? (another option could also be to check that those options are not set and if set raise an error to the user about that)
   
   That is indeed why I used only ParseOptions whole. It seemed better to me than documenting which fields are ignored or emitting errors when things aren't defaulted.
   
   > Do we want to allow reading csv files without a header line with column names?
   
   I can add `column_names` and `autogenerate*` too if required, but maybe that could wait for a follow up? Since the way I handled `/{Convert,Read}Options/` is evidently not transparent maybe everything except ParseOptions should wait as well?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org