You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "istvan-fodor (via GitHub)" <gi...@apache.org> on 2024/03/21 20:10:26 UTC

[PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

istvan-fodor opened a new pull request, #5541:
URL: https://github.com/apache/arrow-rs/pull/5541

   # Which issue does this PR close?
   
   Closes #1353 
   
   # Rationale for this change
    
   `FixedSizeListBuilder` doesn't allow users to specify the inner field of elements and defaults to the following FieldRef: `Arc::new(Field::new("item", values_data.data_type().clone(), true))`
   
   We would like to keep the default behavior as-is, but also 
   
   # What changes are included in this PR?
   
   1. Added `with_field` to `FixedSizeListBuilder`, copied over from `GenericListBuilder`
   2. Added new `field: Option<FieldRef>` to `FixedSizeListBuilder`
   3. Implemented field logic patterned after `GenericListBuilder` in `finish()` and `finish_build()`
   4. Changed unsafe build to safe build (see user-facing changes)
   5. Unit tests for 3 scenarios:
       1. `with_field` used
       2. `with_field` used with non-nullable field, expect panic
       3. `with_field` used with wrong type, expect panic  
   
   # Are there any user-facing changes?
   
   Yes.
   
   The `finish_cloned()` and `finish()` methods no longer use the `build_unchecked()` method, but rely on `build()`. If the build fails, the thread panics. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1535870036


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -194,17 +212,23 @@ where
         );
 
         let nulls = self.null_buffer_builder.finish_cloned();
-        let array_data = ArrayData::builder(DataType::FixedSizeList(
-            Arc::new(Field::new("item", values_data.data_type().clone(), true)),
-            self.list_len,
-        ))
-        .len(len)
-        .add_child_data(values_data)
-        .nulls(nulls);
 
-        let array_data = unsafe { array_data.build_unchecked() };

Review Comment:
   I don't understand the rationale for this change.  Is there some case where invalid arrays are created?
   
   Switching to use build() means the data is now checked twice (once when added to the builder and once when the array is built)
   
   If there are ways to construct an invalid FIxedListArray via the builder APIs, I think those should be fixed  rather than running the expensive `build()` checks again.
   
   For example, if you are trying to validate that a builder with a field that was declared to be non-null actually has no nulls, it would be much faster to leave the `build_unchecked()` call and do a separate check that `array_data.null_count()` was zero
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "istvan-fodor (via GitHub)" <gi...@apache.org>.
istvan-fodor commented on PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#issuecomment-2043096936

   @alamb @tustvold , I added some changes.
   
   1. null check now follows the `FixedSizeListArray::try_new` logic
   2. Added two new unit tests to test scenario with nulls: `test_fixed_size_list_array_builder_with_field_and_null` and `test_fixed_size_list_array_builder_cloned_with_field_and_null`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1535870036


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -194,17 +212,23 @@ where
         );
 
         let nulls = self.null_buffer_builder.finish_cloned();
-        let array_data = ArrayData::builder(DataType::FixedSizeList(
-            Arc::new(Field::new("item", values_data.data_type().clone(), true)),
-            self.list_len,
-        ))
-        .len(len)
-        .add_child_data(values_data)
-        .nulls(nulls);
 
-        let array_data = unsafe { array_data.build_unchecked() };

Review Comment:
   I don't understand the rationale for this change.  Is there some case where invalid arrays are created?
   
   Switching to use build() means the data is now checked twice (once when added to the builder and once when the array is built)
   
   If there are ways to construct an invalid FIxedListArray via the builder APIs, I think those should be fixed  rather than running the expensive `build()` checks again.
   
   For example, if you are trying to validate that a builder with a field that was declared to be non-null actually has no nulls, it would be much faster to leave the `build_unchecked()` call and do a separate check that `array_data.null_count()` was zero
   
   Likewise, you could verify the DataType of the `Field` once as part of `with_field` call, rather than after construction



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#issuecomment-2033961199

   I could be missing something but I don't see a comment on line 328? Is it possible that it is pending and the review hasn't been submitted?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "istvan-fodor (via GitHub)" <gi...@apache.org>.
istvan-fodor commented on PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#issuecomment-2034856444

   > I could be missing something but I don't see a comment on line 328? Is it possible that it is pending and the review hasn't been submitted?
   
   Yes, that was the case. I just changed it to a regular comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1535864337


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -247,6 +271,89 @@ mod tests {
         assert_eq!(3, list_array.value_length());
     }
 
+    #[test]
+    fn test_fixed_size_list_array_builder_with_field() {
+        let values_builder = Int32Builder::new();
+        let mut builder = FixedSizeListBuilder::new(values_builder, 3).with_field(Field::new(
+            "list_element",
+            DataType::Int32,
+            true,
+        ));
+
+        //  [[0, 1, 2], null, [3, null, 5], [6, 7, null]]
+        builder.values().append_value(0);
+        builder.values().append_value(1);
+        builder.values().append_value(2);
+        builder.append(true);
+        builder.values().append_null();
+        builder.values().append_null();
+        builder.values().append_null();
+        builder.append(false);
+        builder.values().append_value(3);
+        builder.values().append_null();
+        builder.values().append_value(5);
+        builder.append(true);
+        builder.values().append_value(6);
+        builder.values().append_value(7);
+        builder.values().append_null();
+        builder.append(true);
+        let list_array = builder.finish();

Review Comment:
   since `finish` and `finished_cloned` are separate codepaths, can we also add a basic test for `finished_cloned` too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#issuecomment-2036795573

   There is currently an ongoing discussion w.r.t null handling, in the short term you could possibly use the array constructors directly - https://docs.rs/arrow-array/latest/arrow_array/array/struct.FixedSizeListArray.html#method.try_new


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "istvan-fodor (via GitHub)" <gi...@apache.org>.
istvan-fodor commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1555879140


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -248,9 +325,37 @@ mod tests {
     }
 
     #[test]
-    fn test_fixed_size_list_array_builder_finish_cloned() {
+    fn test_fixed_size_list_array_builder_with_field() {
+        let builder = make_list_builder(false, false);
+        let mut builder = builder.with_field(Field::new("list_element", DataType::Int32, false));
+        let list_array = builder.finish();
+
+        assert_eq!(DataType::Int32, list_array.value_type());
+        assert_eq!(4, list_array.len());
+        assert_eq!(0, list_array.null_count());
+        assert_eq!(6, list_array.value_offset(2));
+        assert_eq!(3, list_array.value_length());
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "field is nullable = false, but the values_builder contains null values"

Review Comment:
   hi @alamb. this is not a 0 sized list. I created the `make_list_builder` to construct builders (you recommended something similar when I first opened the PR). The two boolean parameters control whether to include a null in the builder and to include a particular fixed size list that has a null element.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1555016618


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -248,9 +325,37 @@ mod tests {
     }
 
     #[test]
-    fn test_fixed_size_list_array_builder_finish_cloned() {
+    fn test_fixed_size_list_array_builder_with_field() {
+        let builder = make_list_builder(false, false);

Review Comment:
   > If the field is set to non-nullable, this element will fail the null check. I wanted to check with you if this is logically right or not?
   
   I think I would expect it to be consistent with whatever `FixedSlizeListArray::try_new` did



##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -194,13 +227,31 @@ where
         );
 
         let nulls = self.null_buffer_builder.finish_cloned();
-        let array_data = ArrayData::builder(DataType::FixedSizeList(
-            Arc::new(Field::new("item", values_data.data_type().clone(), true)),
-            self.list_len,
-        ))
-        .len(len)
-        .add_child_data(values_data)
-        .nulls(nulls);
+
+        let field = match &self.field {
+            Some(f) => {
+                assert_eq!(
+                    f.data_type(),
+                    values_data.data_type(),
+                    "DataType of field ({}) should be the same as the values_builder DataType ({})",
+                    f.data_type(),
+                    values_data.data_type()
+                );
+                if !f.is_nullable() {
+                    assert!(
+                        values_data.null_count() == 0,

Review Comment:
   I think it is valid to have a zero length array (which would also have no nulls, but the null count would be zero)
   
   
   ```suggestion
                           values_data.null_count() == 0 || values_data.len() == 0,
   ```
   
   (same for the check below)



##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -248,9 +325,37 @@ mod tests {
     }
 
     #[test]
-    fn test_fixed_size_list_array_builder_finish_cloned() {
+    fn test_fixed_size_list_array_builder_with_field() {
+        let builder = make_list_builder(false, false);
+        let mut builder = builder.with_field(Field::new("list_element", DataType::Int32, false));
+        let list_array = builder.finish();
+
+        assert_eq!(DataType::Int32, list_array.value_type());
+        assert_eq!(4, list_array.len());
+        assert_eq!(0, list_array.null_count());
+        assert_eq!(6, list_array.value_offset(2));
+        assert_eq!(3, list_array.value_length());
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "field is nullable = false, but the values_builder contains null values"

Review Comment:
   this doesn't seem right -- the builder doesn't have null values, instead it has no values at all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "istvan-fodor (via GitHub)" <gi...@apache.org>.
istvan-fodor commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1536526479


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -247,6 +298,180 @@ mod tests {
         assert_eq!(3, list_array.value_length());
     }
 
+    #[test]
+    fn test_fixed_size_list_array_builder_with_field() {
+        let values_builder = Int32Builder::new();
+        let mut builder = FixedSizeListBuilder::new(values_builder, 3).with_field(Field::new(
+            "list_element",
+            DataType::Int32,
+            true,
+        ));
+
+        //  [[0, 1, 2], null, [3, null, 5], [6, 7, null]]
+        builder.values().append_value(0);
+        builder.values().append_value(1);
+        builder.values().append_value(2);
+        builder.append(true);
+        builder.values().append_null();
+        builder.values().append_null();
+        builder.values().append_null();
+        builder.append(false);
+        builder.values().append_value(3);
+        builder.values().append_null();
+        builder.values().append_value(5);
+        builder.append(true);
+        builder.values().append_value(6);
+        builder.values().append_value(7);
+        builder.values().append_null();
+        builder.append(true);

Review Comment:
   Makes sense, I'll go through the tests and rework the builders.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1550174382


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -248,9 +325,37 @@ mod tests {
     }
 
     #[test]
-    fn test_fixed_size_list_array_builder_finish_cloned() {
+    fn test_fixed_size_list_array_builder_with_field() {
+        let builder = make_list_builder(false, false);

Review Comment:
   The logic we apply for FixedSizeList and StructArray elsewhere is to permit "masked" nulls, this is because a null must always take up space. I am not sure if we want to replicate this logic here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "istvan-fodor (via GitHub)" <gi...@apache.org>.
istvan-fodor commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1549918256


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -248,9 +325,37 @@ mod tests {
     }
 
     #[test]
-    fn test_fixed_size_list_array_builder_finish_cloned() {
+    fn test_fixed_size_list_array_builder_with_field() {
+        let builder = make_list_builder(false, false);

Review Comment:
   @alamb , I had a question about this test in particular. With the new capability to set the field as a non-nullable we have the situation where the builder has a null element built like this: 
   
   ```rust
    builder.values().append_null();
    builder.values().append_null();
    builder.values().append_null();
    builder.append(false);
   ```
   
   If the field is set to non-nullable, this element will fail the null check. I wanted to check with you if this is logically right or not? Are we making a distinction between nulls in the values builder, where let's say 1 of the 3 elements of a particular fixed size list is null (which would be a valid fail on non-nullability) or nulls in the main builder (which is built up as 3 nulls + a `append(false)` in the values builder). To me the second one is questionable and comes down to how you want this to work. If this test should not fail with the second complete null case, then I have to go back and revise the null check of the values builder and somehow control for null elements of the main builder. 
   
       



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "SteampunkIslande (via GitHub)" <gi...@apache.org>.
SteampunkIslande commented on PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#issuecomment-2039674046

   > There is currently an ongoing discussion w.r.t null handling, in the short term you could possibly use the array constructors directly - https://docs.rs/arrow-array/latest/arrow_array/array/struct.FixedSizeListArray.html#method.try_new
   
   Thanks, but I really need a builder. Tried implementing it myself with no luck so far. I'm pretty surprised that the builder doesn't know its field name though. There should be a way to know right ? At least I'm hoping that with your pull request it will be fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#issuecomment-2045114367

   Thanks you for sticking with this @istvan-fodor  🙏 
   
   Also, thank you @tustvold  for the follow up in https://github.com/apache/arrow-rs/pull/5612 🚀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "istvan-fodor (via GitHub)" <gi...@apache.org>.
istvan-fodor commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1535950940


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -194,17 +212,23 @@ where
         );
 
         let nulls = self.null_buffer_builder.finish_cloned();
-        let array_data = ArrayData::builder(DataType::FixedSizeList(
-            Arc::new(Field::new("item", values_data.data_type().clone(), true)),
-            self.list_len,
-        ))
-        .len(len)
-        .add_child_data(values_data)
-        .nulls(nulls);
 
-        let array_data = unsafe { array_data.build_unchecked() };

Review Comment:
   The rationale is that with giving the ability to the user to change the field we use, the values_builder and the field can be out of sync.  
   
   If we use build_unchecked, the build will not fail as expected, so the unit tests 
       `builder::fixed_size_list_builder::tests::test_fixed_size_list_array_builder_with_field_null_panic` and
       `builder::fixed_size_list_builder::tests::test_fixed_size_list_array_builder_with_field_type_panic`will fail. 
       
    I think the best place would be to do this check in `with_field`, asserting that the nullable and data_type are the same as the values-builder, but I didn't find a way to actually get the values from values_builder. 
     
     If we stick with `build_unchecked`, we could have this check once the `values_builder` is built (data_type and nullable are readily available on the built `ArrayData`).
     
     I am updating the PR with this version and let me know if this is better. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "istvan-fodor (via GitHub)" <gi...@apache.org>.
istvan-fodor commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1535950940


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -194,17 +212,23 @@ where
         );
 
         let nulls = self.null_buffer_builder.finish_cloned();
-        let array_data = ArrayData::builder(DataType::FixedSizeList(
-            Arc::new(Field::new("item", values_data.data_type().clone(), true)),
-            self.list_len,
-        ))
-        .len(len)
-        .add_child_data(values_data)
-        .nulls(nulls);
 
-        let array_data = unsafe { array_data.build_unchecked() };

Review Comment:
   The rationale is that with giving the ability to the user to change the field we use, the values_builder and the field can be out of sync, so yes, you can construct invalid arrays.  
   
   If we use build_unchecked, the build will not fail as expected, so the unit tests 
       `builder::fixed_size_list_builder::tests::test_fixed_size_list_array_builder_with_field_null_panic` and
       `builder::fixed_size_list_builder::tests::test_fixed_size_list_array_builder_with_field_type_panic`will fail. 
       
    I think the best place would be to do this check in `with_field`, asserting that the nullable and data_type are the same as the values-builder, but I didn't find a way to actually get the values from values_builder. 
     
     If we stick with `build_unchecked`, we could have this check once the `values_builder` is built (data_type and nullable are readily available on the built `ArrayData`).
     
     I am updating the PR with this version and let me know if this is better. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1536212668


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -179,22 +179,37 @@ where
             self.list_len,
             len,
         );
+
+        let nulls = self.null_buffer_builder.finish();
+
         let field = match &self.field {
-            Some(f) => f.clone(),
+            Some(f) => {
+                assert_eq!(
+                    f.data_type(),
+                    values_data.data_type(),
+                    "DataType of field ({}) should be the same as the values_builder DataType ({})",

Review Comment:
   👍 



##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -194,17 +212,23 @@ where
         );
 
         let nulls = self.null_buffer_builder.finish_cloned();
-        let array_data = ArrayData::builder(DataType::FixedSizeList(
-            Arc::new(Field::new("item", values_data.data_type().clone(), true)),
-            self.list_len,
-        ))
-        .len(len)
-        .add_child_data(values_data)
-        .nulls(nulls);
 
-        let array_data = unsafe { array_data.build_unchecked() };

Review Comment:
   Looks good 👍 



##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -247,6 +298,180 @@ mod tests {
         assert_eq!(3, list_array.value_length());
     }
 
+    #[test]
+    fn test_fixed_size_list_array_builder_with_field() {
+        let values_builder = Int32Builder::new();
+        let mut builder = FixedSizeListBuilder::new(values_builder, 3).with_field(Field::new(
+            "list_element",
+            DataType::Int32,
+            true,
+        ));
+
+        //  [[0, 1, 2], null, [3, null, 5], [6, 7, null]]
+        builder.values().append_value(0);
+        builder.values().append_value(1);
+        builder.values().append_value(2);
+        builder.append(true);
+        builder.values().append_null();
+        builder.values().append_null();
+        builder.values().append_null();
+        builder.append(false);
+        builder.values().append_value(3);
+        builder.values().append_null();
+        builder.values().append_value(5);
+        builder.append(true);
+        builder.values().append_value(6);
+        builder.values().append_value(7);
+        builder.values().append_null();
+        builder.append(true);

Review Comment:
   It might be nice to avoid repeating this same setup multiple times in the tests (and it would make it clear the tests share the same data)
   
   for example
   ```rust
   fn make_list_builder() -> FixedSizedListBuilder<i32> { 
     ...
   }
   ```
   
   
   ```suggestion
           let builder = make_list_builder();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "istvan-fodor (via GitHub)" <gi...@apache.org>.
istvan-fodor commented on PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#issuecomment-2022857302

   @alamb , not sure you saw my comment on line 328. Can you check? The new behavior of non-nullable fields raises an additional question about null fields and I wanted to make sure this is in line with the rest of the arrow builders.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#discussion_r1550174382


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -248,9 +325,37 @@ mod tests {
     }
 
     #[test]
-    fn test_fixed_size_list_array_builder_finish_cloned() {
+    fn test_fixed_size_list_array_builder_with_field() {
+        let builder = make_list_builder(false, false);

Review Comment:
   The logic we apply for FixedSizeList and StructArray elsewhere is to permit "masked" nulls, this is because a null must always take up space. I am not sure if we want to replicate this logic here.
   
   See https://docs.rs/arrow-array/latest/arrow_array/array/struct.FixedSizeListArray.html#method.try_new



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "SteampunkIslande (via GitHub)" <gi...@apache.org>.
SteampunkIslande commented on PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541#issuecomment-2036779718

   Hello ! I'm very interested in this PR, I really need FixedSizeList columns and didn't find a workaround that makes the FixedSizeListBuilder create a column with an appropriate name. So I'm really looking forward for this to be accepted


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: implemented with_field() for FixedSizeListBuilder [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #5541:
URL: https://github.com/apache/arrow-rs/pull/5541


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org