You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/27 23:07:35 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2802: Correct schema nullability declaration in tests

alamb commented on code in PR #2802:
URL: https://github.com/apache/arrow-datafusion/pull/2802#discussion_r907870656


##########
datafusion/core/src/physical_optimizer/aggregate_statistics.rs:
##########
@@ -276,8 +276,8 @@ mod tests {
     /// Mock data using a MemoryExec which has an exact count statistic
     fn mock_data() -> Result<Arc<MemoryExec>> {
         let schema = Arc::new(Schema::new(vec![
-            Field::new("a", DataType::Int32, false),
-            Field::new("b", DataType::Int32, false),
+            Field::new("a", DataType::Int32, true),
+            Field::new("b", DataType::Int32, true),

Review Comment:
   This is a pretty easy to understand example of the issue -- prior to this PR, the fields `"a"` and `"b"` are declared as `"nullable=false"` but then 5 lines lower `NULL` data is inserted 🤦 
   
   
   ```rust
           let batch = RecordBatch::try_new(
               Arc::clone(&schema),
               vec![
                   Arc::new(Int32Array::from(vec![Some(1), Some(2), None])),
                   Arc::new(Int32Array::from(vec![Some(4), None, Some(6)])),
               ],
           )?;
   ```
   
   Now that `RecordBatch::try_new` validates the nullability, the schema must match the data otherwise an error results



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org