You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/03/04 14:10:36 UTC

[GitHub] [arrow] wgtmac commented on a diff in pull request #34416: GH-34262: [C++][ORC] Support union type

wgtmac commented on code in PR #34416:
URL: https://github.com/apache/arrow/pull/34416#discussion_r1125476231


##########
cpp/src/arrow/adapters/orc/adapter_test.cc:
##########
@@ -840,6 +840,23 @@ TEST_F(TestORCWriterSingleArray, WriteListOfMap) {
   AssertArrayWriteReadEqual(array, array, kDefaultSmallMemStreamSize * 10);
 }
 
+TEST_F(TestORCWriterSingleArray, WriteSparseUnion) {
+  const int64_t num_rows = 1024;
+  auto type =
+      sparse_union({field("_union_0", utf8()), field("_union_1", int32())}, {0, 1});
+  auto array = checked_pointer_cast<SparseUnionArray>(rand.ArrayOf(type, num_rows, 0.4));
+  ArrayVector children;
+  for (int i = 0; i < array->num_fields(); ++i) {
+    ASSERT_OK_AND_ASSIGN(auto flattened_child, array->GetFlattenedField(i));
+    children.emplace_back(std::move(flattened_child));
+  }
+  auto flattened_array = std::make_shared<SparseUnionArray>(
+      array->type(), array->length(), std::move(children), array->type_codes(),
+      array->offset());

Review Comment:
   The random array generator fill random unselected values in the child arrays of `SparseUnionArray`. However, orc file only contains dense union type meaning that these unselected values will not be written to the file (so we can never read them back and compare equality in the unit test). In this PR, I fill unselected values to nulls when reading from the file. So flattening the `SparseUnionArray` before writing makes it easy for the roundtrip equality check of `SparseUnionArray`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org