You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/21 14:58:53 UTC

[GitHub] [arrow] houqp commented on a change in pull request #7238: ARROW-8877: [Rust] [DataFusion] introduce CsvReadOption struct to simplify UX for read CSV data

houqp commented on a change in pull request #7238:
URL: https://github.com/apache/arrow/pull/7238#discussion_r428708023



##########
File path: rust/datafusion/src/execution/physical_plan/csv.rs
##########
@@ -28,6 +28,34 @@ use arrow::csv;
 use arrow::datatypes::Schema;
 use arrow::record_batch::RecordBatch;
 
+/// CSV file read option
+#[derive(Copy, Clone)]
+pub struct CsvReadOptions<'a> {
+    /// Does the CSV file have a header?
+    ///
+    /// If schema inference is run on a file with no headers, default column names
+    /// are created.
+    pub has_header: bool,
+    /// An optional column delimiter. Defaults to `b','`.
+    pub delimiter: Option<u8>,
+    /// An optional schema representing the CSV files. If None, CSV reader will try to infer it
+    /// based on data in file.
+    pub schema: Option<&'a Schema>,
+    /// number of rows to read from CSV files for schema inference if needed. Defaults to 1000.
+    pub schema_infer_read_records: Option<usize>,

Review comment:
       I think to enforce that, it's best to switch to builder pattern. The UX will look like this:
   
   ```rust
       ctx.register_csv(
           "aggregate_test_100",
           &format!("{}/csv/aggregate_test_100.csv", testdata),
           CsvReadOptions::new().schema_infer_read_records(50).delimiter(b'|'),
           true,
       )?;
   ```
   
   What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org