You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/02/15 14:30:50 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4908: Allow `SessionContext::read_csv`, etc to read multiple files

alamb commented on code in PR #4908:
URL: https://github.com/apache/arrow-datafusion/pull/4908#discussion_r1107191243


##########
datafusion/core/src/execution/context.rs:
##########
@@ -627,22 +664,18 @@ impl SessionContext {
     ///
     /// For more control such as reading multiple files, you can use
     /// [`read_table`](Self::read_table) with a [`ListingTable`].
-    async fn _read_type<'a>(
+    async fn _read_type<'a, P: DataFilePaths>(
         &self,
-        table_path: impl AsRef<str>,
+        table_paths: P,
         options: impl ReadOptions<'a>,
     ) -> Result<DataFrame> {
-        let table_path = ListingTableUrl::parse(table_path)?;
+        let table_paths = table_paths.to_urls()?;

Review Comment:
   ❤️ 



##########
datafusion/core/src/execution/context.rs:
##########
@@ -2764,6 +2797,9 @@ mod tests {
         async fn call_read_csv(&self) -> DataFrame;
         async fn call_read_avro(&self) -> DataFrame;
         async fn call_read_parquet(&self) -> DataFrame;
+        async fn call_read_csvs(&self) -> DataFrame;

Review Comment:
   Are these methods still needed?



##########
datafusion/core/src/execution/context.rs:
##########
@@ -106,6 +106,43 @@ use super::options::{
     AvroReadOptions, CsvReadOptions, NdJsonReadOptions, ParquetReadOptions, ReadOptions,
 };
 
+/// DataFilePaths adds a method to convert strings and vector of strings to vector of URLs.
+/// This is primarily used as restriciton on the input to methods like read_csv, read_avro
+/// and the rest.

Review Comment:
   ```suggestion
   /// DataFilePaths adds a method to convert strings and vector of strings to vector of [`ListingTableUrl`] URLs.
   /// This allows methods such [`SessionContext::read_csv`] and `[`SessionContext::read_avro`]
   /// to take either a single file or multiple files. 
   ```



##########
datafusion/core/src/execution/context.rs:
##########
@@ -106,6 +106,43 @@ use super::options::{
     AvroReadOptions, CsvReadOptions, NdJsonReadOptions, ParquetReadOptions, ReadOptions,
 };
 
+/// DataFilePaths adds a method to convert strings and vector of strings to vector of URLs.
+/// This is primarily used as restriciton on the input to methods like read_csv, read_avro
+/// and the rest.
+pub trait DataFilePaths {
+    /// Parse to a list of URLs
+    fn to_urls(self) -> Result<Vec<ListingTableUrl>>;
+}
+
+impl DataFilePaths for &str {

Review Comment:
   technically speaking I think this might be a breaking change as the methods used to take `impl AsRef<str>` 
   
   However, I think it is fine for this case as there is an easy workaround (call `as_ref()`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org