You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/14 10:42:34 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5545: Support arbitrary user defined partition column in `ListingTable` (rather than assuming they are always Dictionary encoded)

alamb commented on code in PR #5545:
URL: https://github.com/apache/arrow-datafusion/pull/5545#discussion_r1135343654


##########
datafusion/core/src/physical_plan/file_format/mod.rs:
##########
@@ -64,6 +65,26 @@ use std::{
 
 use super::{ColumnStatistics, Statistics};
 
+/// Convert logical type of partition column to physical type: `Dictionary(UInt16, val_type)`.
+///
+/// You CAN use this to specify types for partition columns. However you MAY also choose not to dictionary-encode the

Review Comment:
   This looks good -- I think is important to try and help people choose when to use these functions, but I can add that to https://github.com/apache/arrow-datafusion/pull/5576 as a follow on
   
   I also think these functions might be easier to find if they are named something more connected to what they do (dictionary encode). Perhaps `wrap_partition_type_in_dict` but that is just a preference



##########
datafusion/core/src/physical_plan/file_format/parquet.rs:
##########
@@ -1646,10 +1646,10 @@ mod tests {
 
         let meta = local_unpartitioned_file(filename);
 
-        let schema = ParquetFormat::default()
+        let schema = dbg!(ParquetFormat::default()

Review Comment:
   Did you intend to leave this `dbg`  in?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org