You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/02 15:57:39 UTC

[GitHub] [arrow] carols10cents opened a new pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

carols10cents opened a new pull request #8330:
URL: https://github.com/apache/arrow/pull/8330


   Note that this PR goes to the rust-parquet-arrow-writer branch, not master.
   
   Inspired by tests in cpp/src/parquet/arrow/arrow_reader_writer_test.cc
   
   These perform round-trip Arrow -> Parquet -> Arrow of a single RecordBatch with a single column of values of each the supported data types and some of the unsupported ones.
   
   Tests that currently fail are either marked with `#[should_panic]` (if the
   reason they fail is because of a panic) or `#[ignore]` (if the reason
   they fail is because the values don't match).
   
   I am comparing the RecordBatch's column's data before and after the round trip directly; I'm not sure that this is appropriate or not because for some data types, the `null_bitmap` isn't matching and I'm not sure if it's supposed to or not.
   
   So I would love advice on that front, and I would love to know if these tests are useful or not!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

nevi-me commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-703962361


   > @nevi-me I went ahead and merged in [your PR](https://github.com/integer32llc/arrow/pull/1), rebased this PR's branch on rust-parquet-arrow-writer, added one commit to explicitly not support Float16 in the `match` in `write_leaves`, and force pushed. Do you think this can be merged or do more of the ignored tests need to be passing? I'm going to take a look at reading the schema metadata next, in any case.
   
   I'll merge this in my morning (GMT+2). I'm happy with the tests, they also help with unitising further work so others can help us out. I'll open JIRAs for the cases that still fail


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] alamb commented on pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

alamb commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-702838970


   > I am comparing the RecordBatch's column's data before and after the round trip directly; I'm not sure that this is appropriate or not because for some data types, the null_bitmap isn't matching and I'm not sure if it's supposed to or not.
   
   I think the null bitmap should definitely match after the round trip (as it signals which rows are null, which should be preserved).
   
   I am not sure the actual data arrays have to be the same -- what needs to be the same is the data rows where the null_bit is 0 (aka we should probably be checking only for non-null rows)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] andygrove commented on pull request #8330: ARROW-10191: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

andygrove commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-704320070


   @nevi-me @carols10cents I ran the merge tool and the changes appear to have been committed to the branch but this issue did not get closed, so I am closing it manually.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

nevi-me commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-703060945


   > And yes I agree they should be compared separately, as there is generally no requirement on the values of null fields (generally, you'll want to initialize to some known values to avoid security issues).
   
   There was a time ~2 years ago where null slots would contain garbage/unknown values, but I haven't seen that in very long.
   With nested arrays, we'll also want to convert the data directly as a struct with null slots containing a primitive that's non-nullable, would result in a nullable primitive after roundtrip.
   
   @emkornfield in the above example, does the c++ Arrow implementation allow creating such a struct? Asked differently; is it some validation that we should be adding at an Arrow level in Rust, that if you create a nullable struct, you can't have its children being non-nullable?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on a change in pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

nevi-me commented on a change in pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#discussion_r499099739



##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -688,4 +688,413 @@ mod tests {
         writer.write(&batch).unwrap();
         writer.close().unwrap();
     }
+
+    const SMALL_SIZE: usize = 100;
+
+    fn roundtrip(filename: &str, expected_batch: RecordBatch) {
+        let file = get_temp_file(filename, &[]);
+
+        let mut writer = ArrowWriter::try_new(
+            file.try_clone().unwrap(),
+            expected_batch.schema(),
+            None,
+        )
+        .unwrap();
+        writer.write(&expected_batch).unwrap();
+        writer.close().unwrap();
+
+        let reader = SerializedFileReader::new(file).unwrap();
+        let mut arrow_reader = ParquetFileArrowReader::new(Rc::new(reader));
+        let mut record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
+
+        let actual_batch = record_batch_reader.next_batch().unwrap().unwrap();
+
+        assert_eq!(expected_batch.schema(), actual_batch.schema());
+        assert_eq!(expected_batch.num_columns(), actual_batch.num_columns());
+        assert_eq!(expected_batch.num_rows(), actual_batch.num_rows());
+        for i in 0..expected_batch.num_columns() {
+            let expected_data = expected_batch.column(i).data();
+            let actual_data = actual_batch.column(i).data();
+
+            assert_eq!(expected_data.data_type(), actual_data.data_type());
+            assert_eq!(expected_data.len(), actual_data.len());
+            assert_eq!(expected_data.null_count(), actual_data.null_count());
+            assert_eq!(expected_data.offset(), actual_data.offset());
+            assert_eq!(expected_data.buffers(), actual_data.buffers());
+            assert_eq!(expected_data.child_data(), actual_data.child_data());
+            assert_eq!(expected_data.null_bitmap(), actual_data.null_bitmap());
+        }
+    }
+
+    fn one_column_roundtrip(filename: &str, values: ArrayRef, nullable: bool) {
+        let schema = Schema::new(vec![Field::new(
+            "col",
+            values.data_type().clone(),
+            nullable,
+        )]);
+        let expected_batch =
+            RecordBatch::try_new(Arc::new(schema), vec![values]).unwrap();
+
+        roundtrip(filename, expected_batch);
+    }
+
+    fn values_required<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let raw_values: Vec<_> = iter.into_iter().collect();
+        let values = Arc::new(A::from(raw_values));
+        one_column_roundtrip(filename, values, false);
+    }
+
+    fn values_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let optional_raw_values: Vec<_> = iter
+            .into_iter()
+            .enumerate()
+            .map(|(i, v)| if i % 2 == 0 { None } else { Some(v) })
+            .collect();
+        let optional_values = Arc::new(A::from(optional_raw_values));
+        one_column_roundtrip(filename, optional_values, true);
+    }
+
+    fn required_and_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator + Clone,
+    {
+        values_required::<A, I>(iter.clone(), filename);
+        values_optional::<A, I>(iter, filename);
+    }
+
+    #[test]
+    #[should_panic(expected = "Null arrays not supported")]
+    fn null_single_column() {
+        let values = Arc::new(NullArray::new(SMALL_SIZE));
+        one_column_roundtrip("null_single_column", values.clone(), true);
+        one_column_roundtrip("null_single_column", values, false);
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Attempting to write an Arrow type that is not yet implemented"
+    )]
+    fn bool_single_column() {
+        required_and_optional::<BooleanArray, _>(
+            [true, false].iter().cycle().copied().take(SMALL_SIZE),
+            "bool_single_column",
+        );
+    }
+
+    #[test]
+    fn i8_single_column() {
+        required_and_optional::<Int8Array, _>(0..SMALL_SIZE as i8, "i8_single_column");
+    }
+
+    #[test]
+    fn i16_single_column() {
+        required_and_optional::<Int16Array, _>(0..SMALL_SIZE as i16, "i16_single_column");
+    }
+
+    #[test]
+    fn i32_single_column() {
+        required_and_optional::<Int32Array, _>(0..SMALL_SIZE as i32, "i32_single_column");
+    }
+
+    #[test]
+    fn i64_single_column() {
+        required_and_optional::<Int64Array, _>(0..SMALL_SIZE as i64, "i64_single_column");
+    }
+
+    #[test]
+    fn u8_single_column() {
+        required_and_optional::<UInt8Array, _>(0..SMALL_SIZE as u8, "u8_single_column");
+    }
+
+    #[test]
+    fn u16_single_column() {
+        required_and_optional::<UInt16Array, _>(
+            0..SMALL_SIZE as u16,
+            "u16_single_column",
+        );
+    }
+
+    #[test]
+    fn u32_single_column() {
+        required_and_optional::<UInt32Array, _>(
+            0..SMALL_SIZE as u32,
+            "u32_single_column",
+        );
+    }
+
+    #[test]
+    fn u64_single_column() {
+        required_and_optional::<UInt64Array, _>(
+            0..SMALL_SIZE as u64,
+            "u64_single_column",
+        );
+    }
+
+    // How to create Float16 values that aren't supported in Rust?
+
+    #[test]
+    fn f32_single_column() {
+        required_and_optional::<Float32Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f32),
+            "f32_single_column",
+        );
+    }
+
+    #[test]
+    fn f64_single_column() {
+        required_and_optional::<Float64Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f64),
+            "f64_single_column",
+        );
+    }
+
+    // The timestamp array types don't implement From<Vec<T>> because they need the timezone
+    // argument, and they also doesn't support building from a Vec<Option<T>>, so call
+    // one_column_roundtrip manually instead of calling required_and_optional for these tests.
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_second_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampSecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_second_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_millisecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMillisecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_millisecond_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_microsecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMicrosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_microsecond_single_column", values, false);
+    }
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_nanosecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampNanosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_nanosecond_single_column", values, false);
+    }
+
+    #[test]
+    fn date32_single_column() {
+        required_and_optional::<Date32Array, _>(
+            0..SMALL_SIZE as i32,
+            "date32_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Date support isn't correct yet

Review comment:
       The failure because of `ArrowError("underlying Arrow error: Compute error: Casting from Date64(Millisecond) to Int32 not supported")` can be addressed by adding a cast in `arrow::compute::kernels::cast`.
   
   We fail to cast the data in `arrow_writer::197`. The main reason why some casts aren't supported is because I was worried about type loss if one converts something like date32 (number of days since epoch) to int64 then back to date64 (number of millis since epoch). Such a cast should = date32 to date64, which I think it currently doesn't.

##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -688,4 +688,413 @@ mod tests {
         writer.write(&batch).unwrap();
         writer.close().unwrap();
     }
+
+    const SMALL_SIZE: usize = 100;
+
+    fn roundtrip(filename: &str, expected_batch: RecordBatch) {
+        let file = get_temp_file(filename, &[]);
+
+        let mut writer = ArrowWriter::try_new(
+            file.try_clone().unwrap(),
+            expected_batch.schema(),
+            None,
+        )
+        .unwrap();
+        writer.write(&expected_batch).unwrap();
+        writer.close().unwrap();
+
+        let reader = SerializedFileReader::new(file).unwrap();
+        let mut arrow_reader = ParquetFileArrowReader::new(Rc::new(reader));
+        let mut record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
+
+        let actual_batch = record_batch_reader.next_batch().unwrap().unwrap();
+
+        assert_eq!(expected_batch.schema(), actual_batch.schema());
+        assert_eq!(expected_batch.num_columns(), actual_batch.num_columns());
+        assert_eq!(expected_batch.num_rows(), actual_batch.num_rows());
+        for i in 0..expected_batch.num_columns() {
+            let expected_data = expected_batch.column(i).data();
+            let actual_data = actual_batch.column(i).data();
+
+            assert_eq!(expected_data.data_type(), actual_data.data_type());
+            assert_eq!(expected_data.len(), actual_data.len());
+            assert_eq!(expected_data.null_count(), actual_data.null_count());
+            assert_eq!(expected_data.offset(), actual_data.offset());
+            assert_eq!(expected_data.buffers(), actual_data.buffers());
+            assert_eq!(expected_data.child_data(), actual_data.child_data());
+            assert_eq!(expected_data.null_bitmap(), actual_data.null_bitmap());
+        }
+    }
+
+    fn one_column_roundtrip(filename: &str, values: ArrayRef, nullable: bool) {
+        let schema = Schema::new(vec![Field::new(
+            "col",
+            values.data_type().clone(),
+            nullable,
+        )]);
+        let expected_batch =
+            RecordBatch::try_new(Arc::new(schema), vec![values]).unwrap();
+
+        roundtrip(filename, expected_batch);
+    }
+
+    fn values_required<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let raw_values: Vec<_> = iter.into_iter().collect();
+        let values = Arc::new(A::from(raw_values));
+        one_column_roundtrip(filename, values, false);
+    }
+
+    fn values_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let optional_raw_values: Vec<_> = iter
+            .into_iter()
+            .enumerate()
+            .map(|(i, v)| if i % 2 == 0 { None } else { Some(v) })
+            .collect();
+        let optional_values = Arc::new(A::from(optional_raw_values));
+        one_column_roundtrip(filename, optional_values, true);
+    }
+
+    fn required_and_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator + Clone,
+    {
+        values_required::<A, I>(iter.clone(), filename);
+        values_optional::<A, I>(iter, filename);
+    }
+
+    #[test]
+    #[should_panic(expected = "Null arrays not supported")]
+    fn null_single_column() {
+        let values = Arc::new(NullArray::new(SMALL_SIZE));
+        one_column_roundtrip("null_single_column", values.clone(), true);
+        one_column_roundtrip("null_single_column", values, false);
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Attempting to write an Arrow type that is not yet implemented"
+    )]
+    fn bool_single_column() {
+        required_and_optional::<BooleanArray, _>(
+            [true, false].iter().cycle().copied().take(SMALL_SIZE),
+            "bool_single_column",
+        );
+    }
+
+    #[test]
+    fn i8_single_column() {
+        required_and_optional::<Int8Array, _>(0..SMALL_SIZE as i8, "i8_single_column");
+    }
+
+    #[test]
+    fn i16_single_column() {
+        required_and_optional::<Int16Array, _>(0..SMALL_SIZE as i16, "i16_single_column");
+    }
+
+    #[test]
+    fn i32_single_column() {
+        required_and_optional::<Int32Array, _>(0..SMALL_SIZE as i32, "i32_single_column");
+    }
+
+    #[test]
+    fn i64_single_column() {
+        required_and_optional::<Int64Array, _>(0..SMALL_SIZE as i64, "i64_single_column");
+    }
+
+    #[test]
+    fn u8_single_column() {
+        required_and_optional::<UInt8Array, _>(0..SMALL_SIZE as u8, "u8_single_column");
+    }
+
+    #[test]
+    fn u16_single_column() {
+        required_and_optional::<UInt16Array, _>(
+            0..SMALL_SIZE as u16,
+            "u16_single_column",
+        );
+    }
+
+    #[test]
+    fn u32_single_column() {
+        required_and_optional::<UInt32Array, _>(
+            0..SMALL_SIZE as u32,
+            "u32_single_column",
+        );
+    }
+
+    #[test]
+    fn u64_single_column() {
+        required_and_optional::<UInt64Array, _>(
+            0..SMALL_SIZE as u64,
+            "u64_single_column",
+        );
+    }
+
+    // How to create Float16 values that aren't supported in Rust?
+
+    #[test]
+    fn f32_single_column() {
+        required_and_optional::<Float32Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f32),
+            "f32_single_column",
+        );
+    }
+
+    #[test]
+    fn f64_single_column() {
+        required_and_optional::<Float64Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f64),
+            "f64_single_column",
+        );
+    }
+
+    // The timestamp array types don't implement From<Vec<T>> because they need the timezone
+    // argument, and they also doesn't support building from a Vec<Option<T>>, so call
+    // one_column_roundtrip manually instead of calling required_and_optional for these tests.
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_second_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampSecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_second_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_millisecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMillisecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_millisecond_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_microsecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMicrosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_microsecond_single_column", values, false);
+    }
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_nanosecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampNanosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_nanosecond_single_column", values, false);
+    }
+
+    #[test]
+    fn date32_single_column() {
+        required_and_optional::<Date32Array, _>(
+            0..SMALL_SIZE as i32,
+            "date32_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Date support isn't correct yet
+    fn date64_single_column() {
+        required_and_optional::<Date64Array, _>(
+            0..SMALL_SIZE as i64,
+            "date64_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_second_single_column() {
+        required_and_optional::<Time32SecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_second_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_millisecond_single_column() {
+        required_and_optional::<Time32MillisecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_microsecond_single_column() {
+        required_and_optional::<Time64MicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_nanosecond_single_column() {
+        required_and_optional::<Time64NanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_second_single_column() {
+        required_and_optional::<DurationSecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_second_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_millisecond_single_column() {
+        required_and_optional::<DurationMillisecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_microsecond_single_column() {
+        required_and_optional::<DurationMicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_nanosecond_single_column() {
+        required_and_optional::<DurationNanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_year_month_single_column() {
+        required_and_optional::<IntervalYearMonthArray, _>(
+            0..SMALL_SIZE as i32,
+            "interval_year_month_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_day_time_single_column() {
+        required_and_optional::<IntervalDayTimeArray, _>(
+            0..SMALL_SIZE as i64,
+            "interval_day_time_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Binary support isn't correct yet - null_bitmap doesn't match
+    fn binary_single_column() {
+        let one_vec: Vec<u8> = (0..SMALL_SIZE as u8).collect();
+        let many_vecs: Vec<_> = std::iter::repeat(one_vec).take(SMALL_SIZE).collect();
+        let many_vecs_iter = many_vecs.iter().map(|v| v.as_slice());
+
+        // BinaryArrays can't be built from Vec<Option<&str>>, so only call `values_required`
+        values_required::<BinaryArray, _>(many_vecs_iter, "binary_single_column");
+    }
+
+    #[test]
+    #[ignore] // Large Binary support isn't correct yet

Review comment:
       In this case, the reader is supposed to get the correct type from the serialized schema, and then do the conversion to a large array

##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -688,4 +688,413 @@ mod tests {
         writer.write(&batch).unwrap();
         writer.close().unwrap();
     }
+
+    const SMALL_SIZE: usize = 100;
+
+    fn roundtrip(filename: &str, expected_batch: RecordBatch) {
+        let file = get_temp_file(filename, &[]);
+
+        let mut writer = ArrowWriter::try_new(
+            file.try_clone().unwrap(),
+            expected_batch.schema(),
+            None,
+        )
+        .unwrap();
+        writer.write(&expected_batch).unwrap();
+        writer.close().unwrap();
+
+        let reader = SerializedFileReader::new(file).unwrap();
+        let mut arrow_reader = ParquetFileArrowReader::new(Rc::new(reader));
+        let mut record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
+
+        let actual_batch = record_batch_reader.next_batch().unwrap().unwrap();
+
+        assert_eq!(expected_batch.schema(), actual_batch.schema());
+        assert_eq!(expected_batch.num_columns(), actual_batch.num_columns());
+        assert_eq!(expected_batch.num_rows(), actual_batch.num_rows());
+        for i in 0..expected_batch.num_columns() {
+            let expected_data = expected_batch.column(i).data();
+            let actual_data = actual_batch.column(i).data();
+
+            assert_eq!(expected_data.data_type(), actual_data.data_type());
+            assert_eq!(expected_data.len(), actual_data.len());
+            assert_eq!(expected_data.null_count(), actual_data.null_count());
+            assert_eq!(expected_data.offset(), actual_data.offset());
+            assert_eq!(expected_data.buffers(), actual_data.buffers());
+            assert_eq!(expected_data.child_data(), actual_data.child_data());
+            assert_eq!(expected_data.null_bitmap(), actual_data.null_bitmap());
+        }
+    }
+
+    fn one_column_roundtrip(filename: &str, values: ArrayRef, nullable: bool) {
+        let schema = Schema::new(vec![Field::new(
+            "col",
+            values.data_type().clone(),
+            nullable,
+        )]);
+        let expected_batch =
+            RecordBatch::try_new(Arc::new(schema), vec![values]).unwrap();
+
+        roundtrip(filename, expected_batch);
+    }
+
+    fn values_required<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let raw_values: Vec<_> = iter.into_iter().collect();
+        let values = Arc::new(A::from(raw_values));
+        one_column_roundtrip(filename, values, false);
+    }
+
+    fn values_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let optional_raw_values: Vec<_> = iter
+            .into_iter()
+            .enumerate()
+            .map(|(i, v)| if i % 2 == 0 { None } else { Some(v) })
+            .collect();
+        let optional_values = Arc::new(A::from(optional_raw_values));
+        one_column_roundtrip(filename, optional_values, true);
+    }
+
+    fn required_and_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator + Clone,
+    {
+        values_required::<A, I>(iter.clone(), filename);
+        values_optional::<A, I>(iter, filename);
+    }
+
+    #[test]
+    #[should_panic(expected = "Null arrays not supported")]
+    fn null_single_column() {
+        let values = Arc::new(NullArray::new(SMALL_SIZE));
+        one_column_roundtrip("null_single_column", values.clone(), true);
+        one_column_roundtrip("null_single_column", values, false);
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Attempting to write an Arrow type that is not yet implemented"
+    )]
+    fn bool_single_column() {
+        required_and_optional::<BooleanArray, _>(
+            [true, false].iter().cycle().copied().take(SMALL_SIZE),
+            "bool_single_column",
+        );
+    }
+
+    #[test]
+    fn i8_single_column() {
+        required_and_optional::<Int8Array, _>(0..SMALL_SIZE as i8, "i8_single_column");
+    }
+
+    #[test]
+    fn i16_single_column() {
+        required_and_optional::<Int16Array, _>(0..SMALL_SIZE as i16, "i16_single_column");
+    }
+
+    #[test]
+    fn i32_single_column() {
+        required_and_optional::<Int32Array, _>(0..SMALL_SIZE as i32, "i32_single_column");
+    }
+
+    #[test]
+    fn i64_single_column() {
+        required_and_optional::<Int64Array, _>(0..SMALL_SIZE as i64, "i64_single_column");
+    }
+
+    #[test]
+    fn u8_single_column() {
+        required_and_optional::<UInt8Array, _>(0..SMALL_SIZE as u8, "u8_single_column");
+    }
+
+    #[test]
+    fn u16_single_column() {
+        required_and_optional::<UInt16Array, _>(
+            0..SMALL_SIZE as u16,
+            "u16_single_column",
+        );
+    }
+
+    #[test]
+    fn u32_single_column() {
+        required_and_optional::<UInt32Array, _>(
+            0..SMALL_SIZE as u32,
+            "u32_single_column",
+        );
+    }
+
+    #[test]
+    fn u64_single_column() {
+        required_and_optional::<UInt64Array, _>(
+            0..SMALL_SIZE as u64,
+            "u64_single_column",
+        );
+    }
+
+    // How to create Float16 values that aren't supported in Rust?

Review comment:
       We can't create them, so we always leave `f16` as unsupported

##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -688,4 +688,413 @@ mod tests {
         writer.write(&batch).unwrap();
         writer.close().unwrap();
     }
+
+    const SMALL_SIZE: usize = 100;
+
+    fn roundtrip(filename: &str, expected_batch: RecordBatch) {
+        let file = get_temp_file(filename, &[]);
+
+        let mut writer = ArrowWriter::try_new(
+            file.try_clone().unwrap(),
+            expected_batch.schema(),
+            None,
+        )
+        .unwrap();
+        writer.write(&expected_batch).unwrap();
+        writer.close().unwrap();
+
+        let reader = SerializedFileReader::new(file).unwrap();
+        let mut arrow_reader = ParquetFileArrowReader::new(Rc::new(reader));
+        let mut record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
+
+        let actual_batch = record_batch_reader.next_batch().unwrap().unwrap();
+
+        assert_eq!(expected_batch.schema(), actual_batch.schema());
+        assert_eq!(expected_batch.num_columns(), actual_batch.num_columns());
+        assert_eq!(expected_batch.num_rows(), actual_batch.num_rows());
+        for i in 0..expected_batch.num_columns() {
+            let expected_data = expected_batch.column(i).data();
+            let actual_data = actual_batch.column(i).data();
+
+            assert_eq!(expected_data.data_type(), actual_data.data_type());
+            assert_eq!(expected_data.len(), actual_data.len());
+            assert_eq!(expected_data.null_count(), actual_data.null_count());
+            assert_eq!(expected_data.offset(), actual_data.offset());
+            assert_eq!(expected_data.buffers(), actual_data.buffers());
+            assert_eq!(expected_data.child_data(), actual_data.child_data());
+            assert_eq!(expected_data.null_bitmap(), actual_data.null_bitmap());
+        }
+    }
+
+    fn one_column_roundtrip(filename: &str, values: ArrayRef, nullable: bool) {
+        let schema = Schema::new(vec![Field::new(
+            "col",
+            values.data_type().clone(),
+            nullable,
+        )]);
+        let expected_batch =
+            RecordBatch::try_new(Arc::new(schema), vec![values]).unwrap();
+
+        roundtrip(filename, expected_batch);
+    }
+
+    fn values_required<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let raw_values: Vec<_> = iter.into_iter().collect();
+        let values = Arc::new(A::from(raw_values));
+        one_column_roundtrip(filename, values, false);
+    }
+
+    fn values_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let optional_raw_values: Vec<_> = iter
+            .into_iter()
+            .enumerate()
+            .map(|(i, v)| if i % 2 == 0 { None } else { Some(v) })
+            .collect();
+        let optional_values = Arc::new(A::from(optional_raw_values));
+        one_column_roundtrip(filename, optional_values, true);
+    }
+
+    fn required_and_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator + Clone,
+    {
+        values_required::<A, I>(iter.clone(), filename);
+        values_optional::<A, I>(iter, filename);
+    }
+
+    #[test]
+    #[should_panic(expected = "Null arrays not supported")]
+    fn null_single_column() {
+        let values = Arc::new(NullArray::new(SMALL_SIZE));
+        one_column_roundtrip("null_single_column", values.clone(), true);
+        one_column_roundtrip("null_single_column", values, false);
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Attempting to write an Arrow type that is not yet implemented"
+    )]
+    fn bool_single_column() {
+        required_and_optional::<BooleanArray, _>(
+            [true, false].iter().cycle().copied().take(SMALL_SIZE),
+            "bool_single_column",
+        );
+    }
+
+    #[test]
+    fn i8_single_column() {
+        required_and_optional::<Int8Array, _>(0..SMALL_SIZE as i8, "i8_single_column");
+    }
+
+    #[test]
+    fn i16_single_column() {
+        required_and_optional::<Int16Array, _>(0..SMALL_SIZE as i16, "i16_single_column");
+    }
+
+    #[test]
+    fn i32_single_column() {
+        required_and_optional::<Int32Array, _>(0..SMALL_SIZE as i32, "i32_single_column");
+    }
+
+    #[test]
+    fn i64_single_column() {
+        required_and_optional::<Int64Array, _>(0..SMALL_SIZE as i64, "i64_single_column");
+    }
+
+    #[test]
+    fn u8_single_column() {
+        required_and_optional::<UInt8Array, _>(0..SMALL_SIZE as u8, "u8_single_column");
+    }
+
+    #[test]
+    fn u16_single_column() {
+        required_and_optional::<UInt16Array, _>(
+            0..SMALL_SIZE as u16,
+            "u16_single_column",
+        );
+    }
+
+    #[test]
+    fn u32_single_column() {
+        required_and_optional::<UInt32Array, _>(
+            0..SMALL_SIZE as u32,
+            "u32_single_column",
+        );
+    }
+
+    #[test]
+    fn u64_single_column() {
+        required_and_optional::<UInt64Array, _>(
+            0..SMALL_SIZE as u64,
+            "u64_single_column",
+        );
+    }
+
+    // How to create Float16 values that aren't supported in Rust?
+
+    #[test]
+    fn f32_single_column() {
+        required_and_optional::<Float32Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f32),
+            "f32_single_column",
+        );
+    }
+
+    #[test]
+    fn f64_single_column() {
+        required_and_optional::<Float64Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f64),
+            "f64_single_column",
+        );
+    }
+
+    // The timestamp array types don't implement From<Vec<T>> because they need the timezone
+    // argument, and they also doesn't support building from a Vec<Option<T>>, so call
+    // one_column_roundtrip manually instead of calling required_and_optional for these tests.
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_second_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampSecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_second_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_millisecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMillisecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_millisecond_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_microsecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMicrosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_microsecond_single_column", values, false);
+    }
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_nanosecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampNanosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_nanosecond_single_column", values, false);
+    }
+
+    #[test]
+    fn date32_single_column() {
+        required_and_optional::<Date32Array, _>(
+            0..SMALL_SIZE as i32,
+            "date32_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Date support isn't correct yet
+    fn date64_single_column() {
+        required_and_optional::<Date64Array, _>(
+            0..SMALL_SIZE as i64,
+            "date64_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_second_single_column() {
+        required_and_optional::<Time32SecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_second_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_millisecond_single_column() {
+        required_and_optional::<Time32MillisecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_microsecond_single_column() {
+        required_and_optional::<Time64MicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_nanosecond_single_column() {
+        required_and_optional::<Time64NanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_second_single_column() {
+        required_and_optional::<DurationSecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_second_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_millisecond_single_column() {
+        required_and_optional::<DurationMillisecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_microsecond_single_column() {
+        required_and_optional::<DurationMicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_nanosecond_single_column() {
+        required_and_optional::<DurationNanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_year_month_single_column() {
+        required_and_optional::<IntervalYearMonthArray, _>(
+            0..SMALL_SIZE as i32,
+            "interval_year_month_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_day_time_single_column() {
+        required_and_optional::<IntervalDayTimeArray, _>(
+            0..SMALL_SIZE as i64,
+            "interval_day_time_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Binary support isn't correct yet - null_bitmap doesn't match
+    fn binary_single_column() {
+        let one_vec: Vec<u8> = (0..SMALL_SIZE as u8).collect();
+        let many_vecs: Vec<_> = std::iter::repeat(one_vec).take(SMALL_SIZE).collect();
+        let many_vecs_iter = many_vecs.iter().map(|v| v.as_slice());
+
+        // BinaryArrays can't be built from Vec<Option<&str>>, so only call `values_required`
+        values_required::<BinaryArray, _>(many_vecs_iter, "binary_single_column");
+    }
+
+    #[test]
+    #[ignore] // Large Binary support isn't correct yet
+    fn large_binary_single_column() {
+        let one_vec: Vec<u8> = (0..SMALL_SIZE as u8).collect();
+        let many_vecs: Vec<_> = std::iter::repeat(one_vec).take(SMALL_SIZE).collect();
+        let many_vecs_iter = many_vecs.iter().map(|v| v.as_slice());
+
+        // LargeBinaryArrays can't be built from Vec<Option<&str>>, so only call `values_required`
+        values_required::<LargeBinaryArray, _>(
+            many_vecs_iter,
+            "large_binary_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // String support isn't correct yet - null_bitmap doesn't match
+    fn string_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE).map(|i| i.to_string()).collect();
+        let raw_strs = raw_values.iter().map(|s| s.as_str());
+
+        required_and_optional::<StringArray, _>(raw_strs, "string_single_column");
+    }
+
+    #[test]
+    #[ignore] // Large String support isn't correct yet - null_bitmap and buffers don't match
+    fn large_string_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE).map(|i| i.to_string()).collect();
+        let raw_strs = raw_values.iter().map(|s| s.as_str());
+
+        required_and_optional::<LargeStringArray, _>(
+            raw_strs,
+            "large_string_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Reading parquet list array into arrow is not supported yet!"
+    )]
+    fn list_single_column() {
+        let a_values = Int32Array::from(vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+        let a_value_offsets =
+            arrow::buffer::Buffer::from(&[0, 1, 3, 3, 6, 10].to_byte_slice());
+        let a_list_data = ArrayData::builder(DataType::List(Box::new(DataType::Int32)))
+            .len(5)
+            .add_buffer(a_value_offsets)
+            .add_child_data(a_values.data())
+            .build();
+        let a = ListArray::from(a_list_data);
+
+        let values = Arc::new(a);
+        one_column_roundtrip("list_single_column", values, false);
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Reading parquet list array into arrow is not supported yet!"
+    )]
+    fn large_list_single_column() {
+        let a_values = Int32Array::from(vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+        let a_value_offsets =
+            arrow::buffer::Buffer::from(&[0i64, 1, 3, 3, 6, 10].to_byte_slice());
+        let a_list_data =
+            ArrayData::builder(DataType::LargeList(Box::new(DataType::Int32)))
+                .len(5)
+                .add_buffer(a_value_offsets)
+                .add_child_data(a_values.data())
+                .build();
+        let a = LargeListArray::from(a_list_data);
+
+        let values = Arc::new(a);
+        one_column_roundtrip("large_list_single_column", values, false);
+    }
+
+    #[test]
+    #[ignore] // Struct support isn't correct yet - null_bitmap doesn't match
+    fn struct_single_column() {
+        let a_values = Int32Array::from(vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
+        let struct_field_a = Field::new("f", DataType::Int32, false);
+        let s = StructArray::from(vec![(struct_field_a, Arc::new(a_values) as ArrayRef)]);

Review comment:
       The cause might be this here. See https://issues.apache.org/jira/browse/ARROW-5408. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] emkornfield commented on pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

emkornfield commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-703034687


   > I am not sure the actual data arrays have to be the same -- what needs to be the same is the data rows where the null_bit is 0 (aka we should probably be checking only for non-null rows)
   
   Small nit I'm prettry sure  null_bit = 1 is actually where values should be compared.  (thinking of it as a validity bitmap instead of a null bitmap helps).  And yes I agree they should be compared separately, as there is generally no requirement on the values of null fields (generally, you'll want to initialize to some known values to avoid security issues).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] carols10cents commented on pull request #8330: ARROW-10191: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

carols10cents commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-704223459


   @nevi-me Done! https://issues.apache.org/jira/secure/ViewProfile.jspa?name=carols10cents


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] carols10cents commented on a change in pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

carols10cents commented on a change in pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#discussion_r499015678



##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -688,4 +688,413 @@ mod tests {
         writer.write(&batch).unwrap();
         writer.close().unwrap();
     }
+
+    const SMALL_SIZE: usize = 100;
+
+    fn roundtrip(filename: &str, expected_batch: RecordBatch) {
+        let file = get_temp_file(filename, &[]);
+
+        let mut writer = ArrowWriter::try_new(
+            file.try_clone().unwrap(),
+            expected_batch.schema(),
+            None,
+        )
+        .unwrap();
+        writer.write(&expected_batch).unwrap();
+        writer.close().unwrap();
+
+        let reader = SerializedFileReader::new(file).unwrap();
+        let mut arrow_reader = ParquetFileArrowReader::new(Rc::new(reader));
+        let mut record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
+
+        let actual_batch = record_batch_reader.next_batch().unwrap().unwrap();
+
+        assert_eq!(expected_batch.schema(), actual_batch.schema());
+        assert_eq!(expected_batch.num_columns(), actual_batch.num_columns());
+        assert_eq!(expected_batch.num_rows(), actual_batch.num_rows());
+        for i in 0..expected_batch.num_columns() {
+            let expected_data = expected_batch.column(i).data();
+            let actual_data = actual_batch.column(i).data();
+
+            assert_eq!(expected_data.data_type(), actual_data.data_type());
+            assert_eq!(expected_data.len(), actual_data.len());
+            assert_eq!(expected_data.null_count(), actual_data.null_count());
+            assert_eq!(expected_data.offset(), actual_data.offset());
+            assert_eq!(expected_data.buffers(), actual_data.buffers());
+            assert_eq!(expected_data.child_data(), actual_data.child_data());
+            assert_eq!(expected_data.null_bitmap(), actual_data.null_bitmap());
+        }
+    }
+
+    fn one_column_roundtrip(filename: &str, values: ArrayRef, nullable: bool) {
+        let schema = Schema::new(vec![Field::new(
+            "col",
+            values.data_type().clone(),
+            nullable,
+        )]);
+        let expected_batch =
+            RecordBatch::try_new(Arc::new(schema), vec![values]).unwrap();
+
+        roundtrip(filename, expected_batch);
+    }
+
+    fn values_required<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let raw_values: Vec<_> = iter.into_iter().collect();
+        let values = Arc::new(A::from(raw_values));
+        one_column_roundtrip(filename, values, false);
+    }
+
+    fn values_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let optional_raw_values: Vec<_> = iter
+            .into_iter()
+            .enumerate()
+            .map(|(i, v)| if i % 2 == 0 { None } else { Some(v) })
+            .collect();
+        let optional_values = Arc::new(A::from(optional_raw_values));
+        one_column_roundtrip(filename, optional_values, true);
+    }
+
+    fn required_and_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator + Clone,
+    {
+        values_required::<A, I>(iter.clone(), filename);
+        values_optional::<A, I>(iter, filename);
+    }
+
+    #[test]
+    #[should_panic(expected = "Null arrays not supported")]
+    fn null_single_column() {
+        let values = Arc::new(NullArray::new(SMALL_SIZE));
+        one_column_roundtrip("null_single_column", values.clone(), true);
+        one_column_roundtrip("null_single_column", values, false);
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Attempting to write an Arrow type that is not yet implemented"
+    )]
+    fn bool_single_column() {
+        required_and_optional::<BooleanArray, _>(
+            [true, false].iter().cycle().copied().take(SMALL_SIZE),
+            "bool_single_column",
+        );
+    }
+
+    #[test]
+    fn i8_single_column() {
+        required_and_optional::<Int8Array, _>(0..SMALL_SIZE as i8, "i8_single_column");
+    }
+
+    #[test]
+    fn i16_single_column() {
+        required_and_optional::<Int16Array, _>(0..SMALL_SIZE as i16, "i16_single_column");
+    }
+
+    #[test]
+    fn i32_single_column() {
+        required_and_optional::<Int32Array, _>(0..SMALL_SIZE as i32, "i32_single_column");
+    }
+
+    #[test]
+    fn i64_single_column() {
+        required_and_optional::<Int64Array, _>(0..SMALL_SIZE as i64, "i64_single_column");
+    }
+
+    #[test]
+    fn u8_single_column() {
+        required_and_optional::<UInt8Array, _>(0..SMALL_SIZE as u8, "u8_single_column");
+    }
+
+    #[test]
+    fn u16_single_column() {
+        required_and_optional::<UInt16Array, _>(
+            0..SMALL_SIZE as u16,
+            "u16_single_column",
+        );
+    }
+
+    #[test]
+    fn u32_single_column() {
+        required_and_optional::<UInt32Array, _>(
+            0..SMALL_SIZE as u32,
+            "u32_single_column",
+        );
+    }
+
+    #[test]
+    fn u64_single_column() {
+        required_and_optional::<UInt64Array, _>(
+            0..SMALL_SIZE as u64,
+            "u64_single_column",
+        );
+    }
+
+    // How to create Float16 values that aren't supported in Rust?
+
+    #[test]
+    fn f32_single_column() {
+        required_and_optional::<Float32Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f32),
+            "f32_single_column",
+        );
+    }
+
+    #[test]
+    fn f64_single_column() {
+        required_and_optional::<Float64Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f64),
+            "f64_single_column",
+        );
+    }
+
+    // The timestamp array types don't implement From<Vec<T>> because they need the timezone
+    // argument, and they also doesn't support building from a Vec<Option<T>>, so call
+    // one_column_roundtrip manually instead of calling required_and_optional for these tests.
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_second_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampSecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_second_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_millisecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMillisecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_millisecond_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_microsecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMicrosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_microsecond_single_column", values, false);
+    }
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_nanosecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampNanosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_nanosecond_single_column", values, false);
+    }
+
+    #[test]
+    fn date32_single_column() {
+        required_and_optional::<Date32Array, _>(
+            0..SMALL_SIZE as i32,
+            "date32_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Date support isn't correct yet
+    fn date64_single_column() {
+        required_and_optional::<Date64Array, _>(
+            0..SMALL_SIZE as i64,
+            "date64_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_second_single_column() {
+        required_and_optional::<Time32SecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_second_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_millisecond_single_column() {
+        required_and_optional::<Time32MillisecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_microsecond_single_column() {
+        required_and_optional::<Time64MicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_nanosecond_single_column() {
+        required_and_optional::<Time64NanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_second_single_column() {
+        required_and_optional::<DurationSecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_second_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_millisecond_single_column() {
+        required_and_optional::<DurationMillisecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_microsecond_single_column() {
+        required_and_optional::<DurationMicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_nanosecond_single_column() {
+        required_and_optional::<DurationNanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_year_month_single_column() {
+        required_and_optional::<IntervalYearMonthArray, _>(
+            0..SMALL_SIZE as i32,
+            "interval_year_month_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_day_time_single_column() {
+        required_and_optional::<IntervalDayTimeArray, _>(
+            0..SMALL_SIZE as i64,
+            "interval_day_time_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Binary support isn't correct yet - null_bitmap doesn't match
+    fn binary_single_column() {
+        let one_vec: Vec<u8> = (0..SMALL_SIZE as u8).collect();
+        let many_vecs: Vec<_> = std::iter::repeat(one_vec).take(SMALL_SIZE).collect();
+        let many_vecs_iter = many_vecs.iter().map(|v| v.as_slice());
+
+        // BinaryArrays can't be built from Vec<Option<&str>>, so only call `values_required`
+        values_required::<BinaryArray, _>(many_vecs_iter, "binary_single_column");
+    }
+
+    #[test]
+    #[ignore] // Large Binary support isn't correct yet

Review comment:
       This test currently fails because both `LargeBinary` and `Binary` Arrow types are converted to Parquet `BYTE_ARRAY` types. I think this is what's intended, and I can't find any C++ tests that specifically exercise Arrow LargeBinary data, so I'm not sure if it's just supposed to roundtrip and lose the "large" information or if it's supposed to panic as not supported either on writing or reading.
   
   ```
   ---- arrow::arrow_writer::tests::large_binary_single_column stdout ----
   thread 'arrow::arrow_writer::tests::large_binary_single_column' panicked at 'assertion failed: `(left == right)`
     left: `Schema { fields: [Field { name: "col", data_type: LargeBinary, nullable: false, dict_id: 0, dict_is_ordered: false }], metadata: {} }`,
    right: `Schema { fields: [Field { name: "col", data_type: Binary, nullable: false, dict_id: 0, dict_is_ordered: false }], metadata: {} }`', parquet/src/arrow/arrow_writer.rs:712:9
   ```

##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -688,4 +688,413 @@ mod tests {
         writer.write(&batch).unwrap();
         writer.close().unwrap();
     }
+
+    const SMALL_SIZE: usize = 100;
+
+    fn roundtrip(filename: &str, expected_batch: RecordBatch) {
+        let file = get_temp_file(filename, &[]);
+
+        let mut writer = ArrowWriter::try_new(
+            file.try_clone().unwrap(),
+            expected_batch.schema(),
+            None,
+        )
+        .unwrap();
+        writer.write(&expected_batch).unwrap();
+        writer.close().unwrap();
+
+        let reader = SerializedFileReader::new(file).unwrap();
+        let mut arrow_reader = ParquetFileArrowReader::new(Rc::new(reader));
+        let mut record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
+
+        let actual_batch = record_batch_reader.next_batch().unwrap().unwrap();
+
+        assert_eq!(expected_batch.schema(), actual_batch.schema());
+        assert_eq!(expected_batch.num_columns(), actual_batch.num_columns());
+        assert_eq!(expected_batch.num_rows(), actual_batch.num_rows());
+        for i in 0..expected_batch.num_columns() {
+            let expected_data = expected_batch.column(i).data();
+            let actual_data = actual_batch.column(i).data();
+
+            assert_eq!(expected_data.data_type(), actual_data.data_type());
+            assert_eq!(expected_data.len(), actual_data.len());
+            assert_eq!(expected_data.null_count(), actual_data.null_count());
+            assert_eq!(expected_data.offset(), actual_data.offset());
+            assert_eq!(expected_data.buffers(), actual_data.buffers());
+            assert_eq!(expected_data.child_data(), actual_data.child_data());
+            assert_eq!(expected_data.null_bitmap(), actual_data.null_bitmap());
+        }
+    }
+
+    fn one_column_roundtrip(filename: &str, values: ArrayRef, nullable: bool) {
+        let schema = Schema::new(vec![Field::new(
+            "col",
+            values.data_type().clone(),
+            nullable,
+        )]);
+        let expected_batch =
+            RecordBatch::try_new(Arc::new(schema), vec![values]).unwrap();
+
+        roundtrip(filename, expected_batch);
+    }
+
+    fn values_required<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let raw_values: Vec<_> = iter.into_iter().collect();
+        let values = Arc::new(A::from(raw_values));
+        one_column_roundtrip(filename, values, false);
+    }
+
+    fn values_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let optional_raw_values: Vec<_> = iter
+            .into_iter()
+            .enumerate()
+            .map(|(i, v)| if i % 2 == 0 { None } else { Some(v) })
+            .collect();
+        let optional_values = Arc::new(A::from(optional_raw_values));
+        one_column_roundtrip(filename, optional_values, true);
+    }
+
+    fn required_and_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator + Clone,
+    {
+        values_required::<A, I>(iter.clone(), filename);
+        values_optional::<A, I>(iter, filename);
+    }
+
+    #[test]
+    #[should_panic(expected = "Null arrays not supported")]
+    fn null_single_column() {
+        let values = Arc::new(NullArray::new(SMALL_SIZE));
+        one_column_roundtrip("null_single_column", values.clone(), true);
+        one_column_roundtrip("null_single_column", values, false);
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Attempting to write an Arrow type that is not yet implemented"
+    )]
+    fn bool_single_column() {
+        required_and_optional::<BooleanArray, _>(
+            [true, false].iter().cycle().copied().take(SMALL_SIZE),
+            "bool_single_column",
+        );
+    }
+
+    #[test]
+    fn i8_single_column() {
+        required_and_optional::<Int8Array, _>(0..SMALL_SIZE as i8, "i8_single_column");
+    }
+
+    #[test]
+    fn i16_single_column() {
+        required_and_optional::<Int16Array, _>(0..SMALL_SIZE as i16, "i16_single_column");
+    }
+
+    #[test]
+    fn i32_single_column() {
+        required_and_optional::<Int32Array, _>(0..SMALL_SIZE as i32, "i32_single_column");
+    }
+
+    #[test]
+    fn i64_single_column() {
+        required_and_optional::<Int64Array, _>(0..SMALL_SIZE as i64, "i64_single_column");
+    }
+
+    #[test]
+    fn u8_single_column() {
+        required_and_optional::<UInt8Array, _>(0..SMALL_SIZE as u8, "u8_single_column");
+    }
+
+    #[test]
+    fn u16_single_column() {
+        required_and_optional::<UInt16Array, _>(
+            0..SMALL_SIZE as u16,
+            "u16_single_column",
+        );
+    }
+
+    #[test]
+    fn u32_single_column() {
+        required_and_optional::<UInt32Array, _>(
+            0..SMALL_SIZE as u32,
+            "u32_single_column",
+        );
+    }
+
+    #[test]
+    fn u64_single_column() {
+        required_and_optional::<UInt64Array, _>(
+            0..SMALL_SIZE as u64,
+            "u64_single_column",
+        );
+    }
+
+    // How to create Float16 values that aren't supported in Rust?
+
+    #[test]
+    fn f32_single_column() {
+        required_and_optional::<Float32Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f32),
+            "f32_single_column",
+        );
+    }
+
+    #[test]
+    fn f64_single_column() {
+        required_and_optional::<Float64Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f64),
+            "f64_single_column",
+        );
+    }
+
+    // The timestamp array types don't implement From<Vec<T>> because they need the timezone
+    // argument, and they also doesn't support building from a Vec<Option<T>>, so call
+    // one_column_roundtrip manually instead of calling required_and_optional for these tests.
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_second_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampSecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_second_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_millisecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMillisecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_millisecond_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_microsecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMicrosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_microsecond_single_column", values, false);
+    }
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_nanosecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampNanosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_nanosecond_single_column", values, false);
+    }
+
+    #[test]
+    fn date32_single_column() {
+        required_and_optional::<Date32Array, _>(
+            0..SMALL_SIZE as i32,
+            "date32_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Date support isn't correct yet
+    fn date64_single_column() {
+        required_and_optional::<Date64Array, _>(
+            0..SMALL_SIZE as i64,
+            "date64_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_second_single_column() {
+        required_and_optional::<Time32SecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_second_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_millisecond_single_column() {
+        required_and_optional::<Time32MillisecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_microsecond_single_column() {
+        required_and_optional::<Time64MicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_nanosecond_single_column() {
+        required_and_optional::<Time64NanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_second_single_column() {
+        required_and_optional::<DurationSecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_second_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_millisecond_single_column() {
+        required_and_optional::<DurationMillisecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_microsecond_single_column() {
+        required_and_optional::<DurationMicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_nanosecond_single_column() {
+        required_and_optional::<DurationNanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_year_month_single_column() {
+        required_and_optional::<IntervalYearMonthArray, _>(
+            0..SMALL_SIZE as i32,
+            "interval_year_month_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_day_time_single_column() {
+        required_and_optional::<IntervalDayTimeArray, _>(
+            0..SMALL_SIZE as i64,
+            "interval_day_time_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Binary support isn't correct yet - null_bitmap doesn't match
+    fn binary_single_column() {
+        let one_vec: Vec<u8> = (0..SMALL_SIZE as u8).collect();
+        let many_vecs: Vec<_> = std::iter::repeat(one_vec).take(SMALL_SIZE).collect();
+        let many_vecs_iter = many_vecs.iter().map(|v| v.as_slice());
+
+        // BinaryArrays can't be built from Vec<Option<&str>>, so only call `values_required`
+        values_required::<BinaryArray, _>(many_vecs_iter, "binary_single_column");
+    }
+
+    #[test]
+    #[ignore] // Large Binary support isn't correct yet
+    fn large_binary_single_column() {
+        let one_vec: Vec<u8> = (0..SMALL_SIZE as u8).collect();
+        let many_vecs: Vec<_> = std::iter::repeat(one_vec).take(SMALL_SIZE).collect();
+        let many_vecs_iter = many_vecs.iter().map(|v| v.as_slice());
+
+        // LargeBinaryArrays can't be built from Vec<Option<&str>>, so only call `values_required`
+        values_required::<LargeBinaryArray, _>(
+            many_vecs_iter,
+            "large_binary_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // String support isn't correct yet - null_bitmap doesn't match
+    fn string_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE).map(|i| i.to_string()).collect();
+        let raw_strs = raw_values.iter().map(|s| s.as_str());
+
+        required_and_optional::<StringArray, _>(raw_strs, "string_single_column");
+    }
+
+    #[test]
+    #[ignore] // Large String support isn't correct yet - null_bitmap and buffers don't match

Review comment:
       This is actually the same as LargeArray-- the Large part of LargeUtf8 isn't coming back, and I'm not sure the intended way to handle that.
   
   ```
   ---- arrow::arrow_writer::tests::large_string_single_column stdout ----
   thread 'arrow::arrow_writer::tests::large_string_single_column' panicked at 'assertion failed: `(left == right)`
     left: `Schema { fields: [Field { name: "col", data_type: LargeUtf8, nullable: false, dict_id: 0, dict_is_ordered: false }], metadata: {} }`,
    right: `Schema { fields: [Field { name: "col", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false }], metadata: {} }`', parquet/src/arrow/arrow_writer.rs:712:9
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on pull request #8330: ARROW-10191: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

nevi-me commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-704208332


   Hi @andygrove @jorgecarleitao may you please kindly merge this for me, the merge tool's failing to push to GH on my end :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] carols10cents commented on pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

carols10cents commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-703830182


   @nevi-me I went ahead and merged in [your PR](https://github.com/integer32llc/arrow/pull/1), rebased this PR's branch on rust-parquet-arrow-writer, added one commit to explicitly not support Float16 in the `match` in `write_leaves`, and force pushed. Do you think this can be merged or do more of the ignored tests need to be passing? I'm going to take a look at reading the schema metadata next, in any case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] andygrove closed pull request #8330: ARROW-10191: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

andygrove closed pull request #8330:
URL: https://github.com/apache/arrow/pull/8330


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #8330: ARROW-10191: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-704207669


   https://issues.apache.org/jira/browse/ARROW-10191


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on pull request #8330: ARROW-10191: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

nevi-me commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-704378123


   > @nevi-me @carols10cents I ran the merge tool and the changes appear to have been committed to the branch but this PR did not get closed, so I am closing it manually.
   
   It's been the case since I started merging on the branch. I've been meaning to open a JIRA for this, but keep forgetting. I've opened https://issues.apache.org/jira/browse/ARROW-10198


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on a change in pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

nevi-me commented on a change in pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#discussion_r499125186



##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -688,4 +688,413 @@ mod tests {
         writer.write(&batch).unwrap();
         writer.close().unwrap();
     }
+
+    const SMALL_SIZE: usize = 100;
+
+    fn roundtrip(filename: &str, expected_batch: RecordBatch) {
+        let file = get_temp_file(filename, &[]);
+
+        let mut writer = ArrowWriter::try_new(
+            file.try_clone().unwrap(),
+            expected_batch.schema(),
+            None,
+        )
+        .unwrap();
+        writer.write(&expected_batch).unwrap();
+        writer.close().unwrap();
+
+        let reader = SerializedFileReader::new(file).unwrap();
+        let mut arrow_reader = ParquetFileArrowReader::new(Rc::new(reader));
+        let mut record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
+
+        let actual_batch = record_batch_reader.next_batch().unwrap().unwrap();
+
+        assert_eq!(expected_batch.schema(), actual_batch.schema());
+        assert_eq!(expected_batch.num_columns(), actual_batch.num_columns());
+        assert_eq!(expected_batch.num_rows(), actual_batch.num_rows());
+        for i in 0..expected_batch.num_columns() {
+            let expected_data = expected_batch.column(i).data();
+            let actual_data = actual_batch.column(i).data();
+
+            assert_eq!(expected_data.data_type(), actual_data.data_type());
+            assert_eq!(expected_data.len(), actual_data.len());
+            assert_eq!(expected_data.null_count(), actual_data.null_count());
+            assert_eq!(expected_data.offset(), actual_data.offset());
+            assert_eq!(expected_data.buffers(), actual_data.buffers());
+            assert_eq!(expected_data.child_data(), actual_data.child_data());
+            assert_eq!(expected_data.null_bitmap(), actual_data.null_bitmap());
+        }
+    }
+
+    fn one_column_roundtrip(filename: &str, values: ArrayRef, nullable: bool) {
+        let schema = Schema::new(vec![Field::new(
+            "col",
+            values.data_type().clone(),
+            nullable,
+        )]);
+        let expected_batch =
+            RecordBatch::try_new(Arc::new(schema), vec![values]).unwrap();
+
+        roundtrip(filename, expected_batch);
+    }
+
+    fn values_required<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let raw_values: Vec<_> = iter.into_iter().collect();
+        let values = Arc::new(A::from(raw_values));
+        one_column_roundtrip(filename, values, false);
+    }
+
+    fn values_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let optional_raw_values: Vec<_> = iter
+            .into_iter()
+            .enumerate()
+            .map(|(i, v)| if i % 2 == 0 { None } else { Some(v) })
+            .collect();
+        let optional_values = Arc::new(A::from(optional_raw_values));
+        one_column_roundtrip(filename, optional_values, true);
+    }
+
+    fn required_and_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator + Clone,
+    {
+        values_required::<A, I>(iter.clone(), filename);
+        values_optional::<A, I>(iter, filename);
+    }
+
+    #[test]
+    #[should_panic(expected = "Null arrays not supported")]
+    fn null_single_column() {
+        let values = Arc::new(NullArray::new(SMALL_SIZE));
+        one_column_roundtrip("null_single_column", values.clone(), true);
+        one_column_roundtrip("null_single_column", values, false);
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Attempting to write an Arrow type that is not yet implemented"
+    )]
+    fn bool_single_column() {
+        required_and_optional::<BooleanArray, _>(
+            [true, false].iter().cycle().copied().take(SMALL_SIZE),
+            "bool_single_column",
+        );
+    }
+
+    #[test]
+    fn i8_single_column() {
+        required_and_optional::<Int8Array, _>(0..SMALL_SIZE as i8, "i8_single_column");
+    }
+
+    #[test]
+    fn i16_single_column() {
+        required_and_optional::<Int16Array, _>(0..SMALL_SIZE as i16, "i16_single_column");
+    }
+
+    #[test]
+    fn i32_single_column() {
+        required_and_optional::<Int32Array, _>(0..SMALL_SIZE as i32, "i32_single_column");
+    }
+
+    #[test]
+    fn i64_single_column() {
+        required_and_optional::<Int64Array, _>(0..SMALL_SIZE as i64, "i64_single_column");
+    }
+
+    #[test]
+    fn u8_single_column() {
+        required_and_optional::<UInt8Array, _>(0..SMALL_SIZE as u8, "u8_single_column");
+    }
+
+    #[test]
+    fn u16_single_column() {
+        required_and_optional::<UInt16Array, _>(
+            0..SMALL_SIZE as u16,
+            "u16_single_column",
+        );
+    }
+
+    #[test]
+    fn u32_single_column() {
+        required_and_optional::<UInt32Array, _>(
+            0..SMALL_SIZE as u32,
+            "u32_single_column",
+        );
+    }
+
+    #[test]
+    fn u64_single_column() {
+        required_and_optional::<UInt64Array, _>(
+            0..SMALL_SIZE as u64,
+            "u64_single_column",
+        );
+    }
+
+    // How to create Float16 values that aren't supported in Rust?
+
+    #[test]
+    fn f32_single_column() {
+        required_and_optional::<Float32Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f32),
+            "f32_single_column",
+        );
+    }
+
+    #[test]
+    fn f64_single_column() {
+        required_and_optional::<Float64Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f64),
+            "f64_single_column",
+        );
+    }
+
+    // The timestamp array types don't implement From<Vec<T>> because they need the timezone
+    // argument, and they also doesn't support building from a Vec<Option<T>>, so call
+    // one_column_roundtrip manually instead of calling required_and_optional for these tests.
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_second_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampSecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_second_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_millisecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMillisecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_millisecond_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_microsecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMicrosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_microsecond_single_column", values, false);
+    }
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_nanosecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampNanosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_nanosecond_single_column", values, false);
+    }
+
+    #[test]
+    fn date32_single_column() {
+        required_and_optional::<Date32Array, _>(
+            0..SMALL_SIZE as i32,
+            "date32_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Date support isn't correct yet
+    fn date64_single_column() {
+        required_and_optional::<Date64Array, _>(
+            0..SMALL_SIZE as i64,
+            "date64_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_second_single_column() {
+        required_and_optional::<Time32SecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_second_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_millisecond_single_column() {
+        required_and_optional::<Time32MillisecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_microsecond_single_column() {
+        required_and_optional::<Time64MicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_nanosecond_single_column() {
+        required_and_optional::<Time64NanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_second_single_column() {
+        required_and_optional::<DurationSecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_second_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_millisecond_single_column() {
+        required_and_optional::<DurationMillisecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_microsecond_single_column() {
+        required_and_optional::<DurationMicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_nanosecond_single_column() {
+        required_and_optional::<DurationNanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_year_month_single_column() {
+        required_and_optional::<IntervalYearMonthArray, _>(
+            0..SMALL_SIZE as i32,
+            "interval_year_month_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_day_time_single_column() {
+        required_and_optional::<IntervalDayTimeArray, _>(
+            0..SMALL_SIZE as i64,
+            "interval_day_time_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Binary support isn't correct yet - null_bitmap doesn't match
+    fn binary_single_column() {
+        let one_vec: Vec<u8> = (0..SMALL_SIZE as u8).collect();
+        let many_vecs: Vec<_> = std::iter::repeat(one_vec).take(SMALL_SIZE).collect();
+        let many_vecs_iter = many_vecs.iter().map(|v| v.as_slice());
+
+        // BinaryArrays can't be built from Vec<Option<&str>>, so only call `values_required`
+        values_required::<BinaryArray, _>(many_vecs_iter, "binary_single_column");
+    }
+
+    #[test]
+    #[ignore] // Large Binary support isn't correct yet
+    fn large_binary_single_column() {
+        let one_vec: Vec<u8> = (0..SMALL_SIZE as u8).collect();
+        let many_vecs: Vec<_> = std::iter::repeat(one_vec).take(SMALL_SIZE).collect();
+        let many_vecs_iter = many_vecs.iter().map(|v| v.as_slice());
+
+        // LargeBinaryArrays can't be built from Vec<Option<&str>>, so only call `values_required`
+        values_required::<LargeBinaryArray, _>(
+            many_vecs_iter,
+            "large_binary_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // String support isn't correct yet - null_bitmap doesn't match
+    fn string_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE).map(|i| i.to_string()).collect();
+        let raw_strs = raw_values.iter().map(|s| s.as_str());
+
+        required_and_optional::<StringArray, _>(raw_strs, "string_single_column");
+    }
+
+    #[test]
+    #[ignore] // Large String support isn't correct yet - null_bitmap and buffers don't match

Review comment:
       May you please have a look at the function `parquet_to_arrow_schema_by_columns` and https://issues.apache.org/jira/browse/ARROW-10168. I suspect this might be why the arrow metadata gets lost during the roundtrip.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] emkornfield commented on a change in pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

emkornfield commented on a change in pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#discussion_r499108858



##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -688,4 +688,413 @@ mod tests {
         writer.write(&batch).unwrap();
         writer.close().unwrap();
     }
+
+    const SMALL_SIZE: usize = 100;
+
+    fn roundtrip(filename: &str, expected_batch: RecordBatch) {
+        let file = get_temp_file(filename, &[]);
+
+        let mut writer = ArrowWriter::try_new(
+            file.try_clone().unwrap(),
+            expected_batch.schema(),
+            None,
+        )
+        .unwrap();
+        writer.write(&expected_batch).unwrap();
+        writer.close().unwrap();
+
+        let reader = SerializedFileReader::new(file).unwrap();
+        let mut arrow_reader = ParquetFileArrowReader::new(Rc::new(reader));
+        let mut record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
+
+        let actual_batch = record_batch_reader.next_batch().unwrap().unwrap();
+
+        assert_eq!(expected_batch.schema(), actual_batch.schema());
+        assert_eq!(expected_batch.num_columns(), actual_batch.num_columns());
+        assert_eq!(expected_batch.num_rows(), actual_batch.num_rows());
+        for i in 0..expected_batch.num_columns() {
+            let expected_data = expected_batch.column(i).data();
+            let actual_data = actual_batch.column(i).data();
+
+            assert_eq!(expected_data.data_type(), actual_data.data_type());
+            assert_eq!(expected_data.len(), actual_data.len());
+            assert_eq!(expected_data.null_count(), actual_data.null_count());
+            assert_eq!(expected_data.offset(), actual_data.offset());
+            assert_eq!(expected_data.buffers(), actual_data.buffers());
+            assert_eq!(expected_data.child_data(), actual_data.child_data());
+            assert_eq!(expected_data.null_bitmap(), actual_data.null_bitmap());
+        }
+    }
+
+    fn one_column_roundtrip(filename: &str, values: ArrayRef, nullable: bool) {
+        let schema = Schema::new(vec![Field::new(
+            "col",
+            values.data_type().clone(),
+            nullable,
+        )]);
+        let expected_batch =
+            RecordBatch::try_new(Arc::new(schema), vec![values]).unwrap();
+
+        roundtrip(filename, expected_batch);
+    }
+
+    fn values_required<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let raw_values: Vec<_> = iter.into_iter().collect();
+        let values = Arc::new(A::from(raw_values));
+        one_column_roundtrip(filename, values, false);
+    }
+
+    fn values_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator,
+    {
+        let optional_raw_values: Vec<_> = iter
+            .into_iter()
+            .enumerate()
+            .map(|(i, v)| if i % 2 == 0 { None } else { Some(v) })
+            .collect();
+        let optional_values = Arc::new(A::from(optional_raw_values));
+        one_column_roundtrip(filename, optional_values, true);
+    }
+
+    fn required_and_optional<A, I>(iter: I, filename: &str)
+    where
+        A: From<Vec<I::Item>> + From<Vec<Option<I::Item>>> + Array + 'static,
+        I: IntoIterator + Clone,
+    {
+        values_required::<A, I>(iter.clone(), filename);
+        values_optional::<A, I>(iter, filename);
+    }
+
+    #[test]
+    #[should_panic(expected = "Null arrays not supported")]
+    fn null_single_column() {
+        let values = Arc::new(NullArray::new(SMALL_SIZE));
+        one_column_roundtrip("null_single_column", values.clone(), true);
+        one_column_roundtrip("null_single_column", values, false);
+    }
+
+    #[test]
+    #[should_panic(
+        expected = "Attempting to write an Arrow type that is not yet implemented"
+    )]
+    fn bool_single_column() {
+        required_and_optional::<BooleanArray, _>(
+            [true, false].iter().cycle().copied().take(SMALL_SIZE),
+            "bool_single_column",
+        );
+    }
+
+    #[test]
+    fn i8_single_column() {
+        required_and_optional::<Int8Array, _>(0..SMALL_SIZE as i8, "i8_single_column");
+    }
+
+    #[test]
+    fn i16_single_column() {
+        required_and_optional::<Int16Array, _>(0..SMALL_SIZE as i16, "i16_single_column");
+    }
+
+    #[test]
+    fn i32_single_column() {
+        required_and_optional::<Int32Array, _>(0..SMALL_SIZE as i32, "i32_single_column");
+    }
+
+    #[test]
+    fn i64_single_column() {
+        required_and_optional::<Int64Array, _>(0..SMALL_SIZE as i64, "i64_single_column");
+    }
+
+    #[test]
+    fn u8_single_column() {
+        required_and_optional::<UInt8Array, _>(0..SMALL_SIZE as u8, "u8_single_column");
+    }
+
+    #[test]
+    fn u16_single_column() {
+        required_and_optional::<UInt16Array, _>(
+            0..SMALL_SIZE as u16,
+            "u16_single_column",
+        );
+    }
+
+    #[test]
+    fn u32_single_column() {
+        required_and_optional::<UInt32Array, _>(
+            0..SMALL_SIZE as u32,
+            "u32_single_column",
+        );
+    }
+
+    #[test]
+    fn u64_single_column() {
+        required_and_optional::<UInt64Array, _>(
+            0..SMALL_SIZE as u64,
+            "u64_single_column",
+        );
+    }
+
+    // How to create Float16 values that aren't supported in Rust?
+
+    #[test]
+    fn f32_single_column() {
+        required_and_optional::<Float32Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f32),
+            "f32_single_column",
+        );
+    }
+
+    #[test]
+    fn f64_single_column() {
+        required_and_optional::<Float64Array, _>(
+            (0..SMALL_SIZE).map(|i| i as f64),
+            "f64_single_column",
+        );
+    }
+
+    // The timestamp array types don't implement From<Vec<T>> because they need the timezone
+    // argument, and they also doesn't support building from a Vec<Option<T>>, so call
+    // one_column_roundtrip manually instead of calling required_and_optional for these tests.
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_second_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampSecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_second_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_millisecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMillisecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_millisecond_single_column", values, false);
+    }
+
+    #[test]
+    fn timestamp_microsecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampMicrosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_microsecond_single_column", values, false);
+    }
+
+    #[test]
+    #[ignore] // Timestamp support isn't correct yet
+    fn timestamp_nanosecond_single_column() {
+        let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();
+        let values = Arc::new(TimestampNanosecondArray::from_vec(raw_values, None));
+
+        one_column_roundtrip("timestamp_nanosecond_single_column", values, false);
+    }
+
+    #[test]
+    fn date32_single_column() {
+        required_and_optional::<Date32Array, _>(
+            0..SMALL_SIZE as i32,
+            "date32_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Date support isn't correct yet
+    fn date64_single_column() {
+        required_and_optional::<Date64Array, _>(
+            0..SMALL_SIZE as i64,
+            "date64_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_second_single_column() {
+        required_and_optional::<Time32SecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_second_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time32_millisecond_single_column() {
+        required_and_optional::<Time32MillisecondArray, _>(
+            0..SMALL_SIZE as i32,
+            "time32_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_microsecond_single_column() {
+        required_and_optional::<Time64MicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Time support isn't correct yet
+    fn time64_nanosecond_single_column() {
+        required_and_optional::<Time64NanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "time64_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_second_single_column() {
+        required_and_optional::<DurationSecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_second_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_millisecond_single_column() {
+        required_and_optional::<DurationMillisecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_millisecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_microsecond_single_column() {
+        required_and_optional::<DurationMicrosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_microsecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Converting Duration to parquet not supported")]
+    fn duration_nanosecond_single_column() {
+        required_and_optional::<DurationNanosecondArray, _>(
+            0..SMALL_SIZE as i64,
+            "duration_nanosecond_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_year_month_single_column() {
+        required_and_optional::<IntervalYearMonthArray, _>(
+            0..SMALL_SIZE as i32,
+            "interval_year_month_single_column",
+        );
+    }
+
+    #[test]
+    #[should_panic(expected = "Currently unreachable because data type not supported")]
+    fn interval_day_time_single_column() {
+        required_and_optional::<IntervalDayTimeArray, _>(
+            0..SMALL_SIZE as i64,
+            "interval_day_time_single_column",
+        );
+    }
+
+    #[test]
+    #[ignore] // Binary support isn't correct yet - null_bitmap doesn't match
+    fn binary_single_column() {
+        let one_vec: Vec<u8> = (0..SMALL_SIZE as u8).collect();
+        let many_vecs: Vec<_> = std::iter::repeat(one_vec).take(SMALL_SIZE).collect();
+        let many_vecs_iter = many_vecs.iter().map(|v| v.as_slice());
+
+        // BinaryArrays can't be built from Vec<Option<&str>>, so only call `values_required`
+        values_required::<BinaryArray, _>(many_vecs_iter, "binary_single_column");
+    }
+
+    #[test]
+    #[ignore] // Large Binary support isn't correct yet

Review comment:
       I'm currently working on adding some of this support in C++.  Based on parquet metadata there is not a great way to make distinguish these types.  The way C++ handles it is on serialization the Arrow schema for the table is added as metadata (its base64 encoded).  When reading back [we deserialize](https://github.com/apache/arrow/blob/477c1021ac013f22389baf9154fb9ad0cf814bec/cpp/src/parquet/arrow/schema.cc#L653) it and use the types there to make an additional determination.  Currently we only use it for [extension types](https://github.com/apache/arrow/blob/477c1021ac013f22389baf9154fb9ad0cf814bec/cpp/src/parquet/arrow/schema.cc#L692)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] andygrove edited a comment on pull request #8330: ARROW-10191: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

andygrove edited a comment on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-704320070


   @nevi-me @carols10cents I ran the merge tool and the changes appear to have been committed to the branch but this PR did not get closed, so I am closing it manually.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] carols10cents commented on pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

carols10cents commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-703596937


   @nevi-me you should have an invite!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] carols10cents commented on pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

carols10cents commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-702815429


   cc @nevi-me 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on pull request #8330: ARROW-10191: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

nevi-me commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-704204575


   @carols10cents may you please create a JIRA account at https://issues.apache.org, so that we can assign the tasks that you've worked on to you. Thanks :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on pull request #8330: [Rust] [Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes

Posted by GitBox <gi...@apache.org>.

nevi-me commented on pull request #8330:
URL: https://github.com/apache/arrow/pull/8330#issuecomment-703063319


   I rebased the parquet branch against the main one, but now I seem to have picked up additional commits on this PR. They should be fixed by a rebase. @carols10cents may you please grant me write permission on Integer32's fork, so I can rebase there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org