You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/21 13:12:17 UTC

[GitHub] [arrow] paddyhoran commented on a change in pull request #7238: ARROW-8877: [Rust] [DataFusion] introduce CsvReadOption struct to simplify UX for read CSV data

paddyhoran commented on a change in pull request #7238:
URL: https://github.com/apache/arrow/pull/7238#discussion_r428641550



##########
File path: rust/datafusion/src/execution/physical_plan/csv.rs
##########
@@ -28,6 +28,34 @@ use arrow::csv;
 use arrow::datatypes::Schema;
 use arrow::record_batch::RecordBatch;
 
+/// CSV file read option
+#[derive(Copy, Clone)]
+pub struct CsvReadOptions<'a> {
+    /// Does the CSV file have a header?
+    ///
+    /// If schema inference is run on a file with no headers, default column names
+    /// are created.
+    pub has_header: bool,
+    /// An optional column delimiter. Defaults to `b','`.
+    pub delimiter: Option<u8>,
+    /// An optional schema representing the CSV files. If None, CSV reader will try to infer it
+    /// based on data in file.
+    pub schema: Option<&'a Schema>,
+    /// number of rows to read from CSV files for schema inference if needed. Defaults to 1000.
+    pub schema_infer_read_records: Option<usize>,

Review comment:
       Just a nit really but there are a number of defaults that are set through `default`.  But don't we need a `new` or similar to always enforce these defaults as the comments state.  For instance, you can currently create a `CsvReadOptions` where `schema_infer_read_records` is `None` and then we have to make sure we enforce the default (1000) somewhere else where it's used.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org