You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/23 19:25:13 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2771: Convert batch_size to config option

alamb commented on code in PR #2771:
URL: https://github.com/apache/arrow-datafusion/pull/2771#discussion_r905381166


##########
datafusion/core/src/execution/context.rs:
##########
@@ -1118,10 +1117,24 @@ impl SessionConfig {
         self
     }
 
-    /// Convert configuration to name-value pairs
+    /// Get the currently configured batch size
+    pub fn batch_size(&self) -> usize {
+        self.config_options.get_u32(OPT_BATCH_SIZE) as usize
+    }
+
+    /// Get the current configuration options
+    pub fn config_options(&self) -> &ConfigOptions {
+        &self.config_options
+    }
+
+    /// Convert configuration options to name-value pairs with values converted to strings. Note
+    /// that this method will eventually be deprecated and replaced by [config_options].
     pub fn to_props(&self) -> HashMap<String, String> {
         let mut map = HashMap::new();
-        map.insert(BATCH_SIZE.to_owned(), format!("{}", self.batch_size));
+        map.insert(
+            BATCH_SIZE.to_owned(),
+            format!("{}", self.config_options.get_u32(OPT_BATCH_SIZE)),
+        );

Review Comment:
   I wonder if it is possible to avoid having to enumerate all options again in this function -- what about maybe  converting all entries in `self.config_options`?



##########
datafusion/core/src/execution/context.rs:
##########
@@ -1055,12 +1050,16 @@ impl SessionConfig {
         self.set(key, ScalarValue::Boolean(Some(value)))
     }
 
+    /// Set a u32 configuration option
+    pub fn set_u32(self, key: &str, value: u32) -> Self {
+        self.set(key, ScalarValue::UInt32(Some(value)))
+    }
+
     /// Customize batch size
-    pub fn with_batch_size(mut self, n: usize) -> Self {
+    pub fn with_batch_size(self, n: usize) -> Self {
         // batch size must be greater than zero
         assert!(n > 0);
-        self.batch_size = n;
-        self
+        self.set_u32(OPT_BATCH_SIZE, n as u32)

Review Comment:
   Why not use `u64` as batch size (why are we storing as a u32 and then casting back and forth to `usize`)?



##########
datafusion/core/src/execution/context.rs:
##########
@@ -1055,12 +1050,16 @@ impl SessionConfig {
         self.set(key, ScalarValue::Boolean(Some(value)))
     }
 
+    /// Set a u32 configuration option
+    pub fn set_u32(self, key: &str, value: u32) -> Self {
+        self.set(key, ScalarValue::UInt32(Some(value)))
+    }
+
     /// Customize batch size
-    pub fn with_batch_size(mut self, n: usize) -> Self {
+    pub fn with_batch_size(self, n: usize) -> Self {

Review Comment:
   I like this approach (with a named function `with_batch_size` that calls into the generic implementation below). Seems like a good idea to me
   
   👍 



##########
datafusion/core/src/execution/context.rs:
##########
@@ -1055,12 +1050,16 @@ impl SessionConfig {
         self.set(key, ScalarValue::Boolean(Some(value)))
     }
 
+    /// Set a u32 configuration option

Review Comment:
   ```suggestion
       /// Set a generic `u32` configuration option
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org