You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/02 10:36:45 UTC

[GitHub] [arrow-rs] alamb commented on a change in pull request #511: Fix parquet definition levels

alamb commented on a change in pull request #511:
URL: https://github.com/apache/arrow-rs/pull/511#discussion_r662913943



##########
File path: parquet/src/arrow/arrow_writer.rs
##########
@@ -712,6 +711,47 @@ mod tests {
         writer.close().unwrap();
     }
 
+    #[test]
+    fn arrow_writer_list_non_null() {
+        // define schema
+        let schema = Schema::new(vec![Field::new(
+            "a",
+            DataType::List(Box::new(Field::new("item", DataType::Int32, false))),
+            false,
+        )]);
+
+        // create some data
+        let a_values = Int32Array::from(vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+
+        // Construct a buffer for value offsets, for the nested array:
+        //  [[1], [2, 3], [], [4, 5, 6], [7, 8, 9, 10]]
+        let a_value_offsets =
+            arrow::buffer::Buffer::from(&[0, 1, 3, 3, 6, 10].to_byte_slice());
+
+        // Construct a list array from the above two
+        let a_list_data = ArrayData::builder(DataType::List(Box::new(Field::new(
+            "item",
+            DataType::Int32,
+            false,
+        ))))
+        .len(5)
+        .add_buffer(a_value_offsets)
+        .add_child_data(a_values.data().clone())
+        .build();
+        let a = ListArray::from(a_list_data);
+
+        // build a record batch
+        let batch =
+            RecordBatch::try_new(Arc::new(schema.clone()), vec![Arc::new(a)]).unwrap();
+
+        assert_eq!(batch.column(0).data().null_count(), 0);
+
+        let file = get_temp_file("test_arrow_writer_list_non_null.parquet", &[]);
+        let mut writer = ArrowWriter::try_new(file, Arc::new(schema), None).unwrap();
+        writer.write(&batch).unwrap();
+        writer.close().unwrap();

Review comment:
       I noticed other tests like `arrow_writer_binary` also read data back from parquet and validate the results (and thus confirm data survives the roundtrip).
   
   Would it make sense to have the same test here as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org