You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/16 16:30:00 UTC

[GitHub] [arrow] pitrou opened a new pull request #8939: ARROW-10928: [C++] Better Parquet error when trying to write empty struct

pitrou opened a new pull request #8939:
URL: https://github.com/apache/arrow/pull/8939


   An empty struct type (with no child fields) is not easy to write in Parquet,
   since Parquet only represents the data of leaf nodes.
   We would need a way to distinguish between null and non-null (empty) struct values.
   It would probably require a dummy primitive node.
   
   Until we implement such a solution, simply raise a nice error when an empty
   struct is encountered.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8939: ARROW-10928: [C++] Better Parquet error when trying to write empty struct

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8939:
URL: https://github.com/apache/arrow/pull/8939#discussion_r544488947



##########
File path: cpp/src/parquet/schema.cc
##########
@@ -550,11 +550,11 @@ std::unique_ptr<Node> Unflatten(const format::SchemaElement* elements, int lengt
     int field_id = current_id++;
     const void* opaque_element = static_cast<const void*>(&element);
 
-    if (element.num_children == 0) {
+    if (!element.__isset.num_children) {

Review comment:
       Hmm, why would it be an or? If `num_children == 0`, this can still be a group node.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8939: ARROW-10928: [C++] Better Parquet error when trying to write empty struct

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8939:
URL: https://github.com/apache/arrow/pull/8939#discussion_r544488560



##########
File path: cpp/src/parquet/arrow/schema.cc
##########
@@ -113,12 +114,21 @@ Status StructToNode(const std::shared_ptr<::arrow::StructType>& type,
                     const WriterProperties& properties,
                     const ArrowWriterProperties& arrow_properties, NodePtr* out) {
   std::vector<NodePtr> children(type->num_fields());
-  for (int i = 0; i < type->num_fields(); i++) {
-    RETURN_NOT_OK(FieldToNode(type->field(i)->name(), type->field(i), properties,
-                              arrow_properties, &children[i]));
+  if (type->num_fields() != 0) {
+    for (int i = 0; i < type->num_fields(); i++) {
+      RETURN_NOT_OK(FieldToNode(type->field(i)->name(), type->field(i), properties,
+                                arrow_properties, &children[i]));
+    }
+  } else {
+    // XXX (ARROW-10928) We could add a dummy primitive node but that would
+    // require special handling when writing and reading, to avoid column index
+    // mismatches.
+    return Status::NotImplemented(
+        "Cannot write struct type with no child fields to Parquet. "

Review comment:
       Hmm, I can add the struct field name to the message, but I'm not sure I understand the suggestion about "dummy".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #8939: ARROW-10928: [C++] Better Parquet error when trying to write empty struct

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #8939:
URL: https://github.com/apache/arrow/pull/8939#discussion_r544474953



##########
File path: cpp/src/parquet/schema.cc
##########
@@ -550,11 +550,11 @@ std::unique_ptr<Node> Unflatten(const format::SchemaElement* elements, int lengt
     int field_id = current_id++;
     const void* opaque_element = static_cast<const void*>(&element);
 
-    if (element.num_children == 0) {
+    if (!element.__isset.num_children) {

Review comment:
       should this be an or?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #8939: ARROW-10928: [C++] Better Parquet error when trying to write empty struct

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #8939:
URL: https://github.com/apache/arrow/pull/8939#discussion_r544474457



##########
File path: cpp/src/parquet/arrow/schema.cc
##########
@@ -113,12 +114,21 @@ Status StructToNode(const std::shared_ptr<::arrow::StructType>& type,
                     const WriterProperties& properties,
                     const ArrowWriterProperties& arrow_properties, NodePtr* out) {
   std::vector<NodePtr> children(type->num_fields());
-  for (int i = 0; i < type->num_fields(); i++) {
-    RETURN_NOT_OK(FieldToNode(type->field(i)->name(), type->field(i), properties,
-                              arrow_properties, &children[i]));
+  if (type->num_fields() != 0) {
+    for (int i = 0; i < type->num_fields(); i++) {
+      RETURN_NOT_OK(FieldToNode(type->field(i)->name(), type->field(i), properties,
+                                arrow_properties, &children[i]));
+    }
+  } else {
+    // XXX (ARROW-10928) We could add a dummy primitive node but that would
+    // require special handling when writing and reading, to avoid column index
+    // mismatches.
+    return Status::NotImplemented(
+        "Cannot write struct type with no child fields to Parquet. "

Review comment:
       would the struct field have a name here?  Consider removing "dummy" from the message.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8939: ARROW-10928: [C++] Better Parquet error when trying to write empty struct

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8939:
URL: https://github.com/apache/arrow/pull/8939#discussion_r544489391



##########
File path: cpp/src/parquet/schema.cc
##########
@@ -550,11 +550,11 @@ std::unique_ptr<Node> Unflatten(const format::SchemaElement* elements, int lengt
     int field_id = current_id++;
     const void* opaque_element = static_cast<const void*>(&element);
 
-    if (element.num_children == 0) {
+    if (!element.__isset.num_children) {

Review comment:
       Though it seems to produce failures on some Python Parquet tests (because of legacy files?). Let me see.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8939: ARROW-10928: [C++] Better Parquet error when trying to write empty struct

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8939:
URL: https://github.com/apache/arrow/pull/8939


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8939: ARROW-10928: [C++] Better Parquet error when trying to write empty struct

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8939:
URL: https://github.com/apache/arrow/pull/8939#issuecomment-746600174


   https://issues.apache.org/jira/browse/ARROW-10928


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org