You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/20 18:01:58 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3907: Add serde for plans with tables from `TableProviderFactory`s

alamb commented on code in PR #3907:
URL: https://github.com/apache/arrow-datafusion/pull/3907#discussion_r1000956863


##########
datafusion/proto/src/logical_plan.rs:
##########
@@ -760,55 +804,21 @@ impl AsLogicalPlan for LogicalPlanNode {
                     .collect::<Result<Vec<_>, _>>()?;
 
                 if let Some(listing_table) = source.downcast_ref::<ListingTable>() {
-                    let any = listing_table.options().format.as_any();
-                    let file_format_type = if let Some(parquet) =
-                        any.downcast_ref::<ParquetFormat>()
-                    {
-                        FileFormatType::Parquet(protobuf::ParquetFormat {
-                            enable_pruning: parquet.enable_pruning(),
-                        })
-                    } else if let Some(csv) = any.downcast_ref::<CsvFormat>() {
-                        FileFormatType::Csv(protobuf::CsvFormat {
-                            delimiter: byte_to_string(csv.delimiter())?,
-                            has_header: csv.has_header(),
-                        })
-                    } else if any.is::<AvroFormat>() {
-                        FileFormatType::Avro(protobuf::AvroFormat {})
-                    } else {
-                        return Err(proto_error(format!(
-                            "Error converting file format, {:?} is invalid as a datafusion foramt.",
-                            listing_table.options().format
-                        )));
-                    };
-                    Ok(protobuf::LogicalPlanNode {
-                        logical_plan_type: Some(LogicalPlanType::ListingScan(
-                            protobuf::ListingTableScanNode {
-                                file_format_type: Some(file_format_type),
-                                table_name: table_name.to_owned(),
-                                collect_stat: listing_table.options().collect_stat,
-                                file_extension: listing_table
-                                    .options()
-                                    .file_extension
-                                    .clone(),
-                                table_partition_cols: listing_table
-                                    .options()
-                                    .table_partition_cols
-                                    .clone(),
-                                paths: listing_table
-                                    .table_paths()
-                                    .iter()
-                                    .map(|x| x.to_string())
-                                    .collect(),
-                                schema: Some(schema),
-                                projection,
-                                filters,
-                                target_partitions: listing_table
-                                    .options()
-                                    .target_partitions
-                                    as u32,
-                            },
-                        )),
-                    })
+                    Self::serialize_listing_table(
+                        table_name,
+                        projection,
+                        schema,
+                        filters,
+                        listing_table,
+                    )
+                } else if let Some(custom_table) = source.downcast_ref::<CustomTable>() {

Review Comment:
   So this seems to be the key issue -- if `source` isn't some type that DataFusion knows about it can't do a useful job serializing it.
   
   I don't fully understand how a `CustomTable` struct will help -- at some point the client code needs to get the concrete  `TableProvider`, not a `CustomTable`.
   
   I think there was a similar issue serializing "extensions" for which @thinkharderdev  added 
   
   ```rust
   pub trait LogicalExtensionCodec: Debug + Send + Sync {
   ```
   
   I wonder if we can follow the same model here , something like add `LogicalExtensionCodec::try_encoded_table_provider` and `LogicalExtensionCodec::try_decode_table_provider`
   
   
   ```rust
   if let Some(listing_table) = source.downcast_ref::<ListingTable>() {
     // ...
   } else {
     // delegate to provided extension codec:
     extension_codec.try_encode_table_provider(
                   table_name,
                   source,
                   filters,
                   projection,
     ); 
   } 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org