You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/16 08:17:47 UTC

[GitHub] [arrow] pitrou commented on a change in pull request #8178: ARROW-9969: [C++] RecordBatch schema has correct types

pitrou commented on a change in pull request #8178:
URL: https://github.com/apache/arrow/pull/8178#discussion_r489250800



##########
File path: cpp/src/arrow/table_builder.cc
##########
@@ -62,7 +62,21 @@ Status RecordBatchBuilder::Flush(bool reset_builders,
     }
     length = fields[i]->length();
   }
-  *batch = RecordBatch::Make(schema_, length, std::move(fields));
+
+  // For certain types like dictionaries, types may not be fully
+  // determined before we have flushed. Make sure that the RecordBatch
+  // gets the correct types in schema.
+  // See: #ARROW-9969
+  std::shared_ptr<Schema> schema(schema_);
+  std::vector<std::shared_ptr<Field>> schema_fields(schema->fields());
+  for (int i = 0; i < this->num_fields(); ++i) {
+    if (!schema_fields[i]->type()->Equals(fields[i]->type())) {
+      std::shared_ptr<Field> new_type(schema_fields[i]->WithType(fields[i]->type()));
+      ARROW_ASSIGN_OR_RAISE(schema, schema->SetField(i, new_type));
+    }
+  }

Review comment:
       For the record, the entire loop may be O(N^2) if there are N types to replace.
   It may be better to do it in two steps:
   1. detect if at least one type needs replacing
   2. if so, create a new schema with a new fields vector
   
   Or you could always create a new schema anyway...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org