You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/08 16:55:53 UTC
[GitHub] [arrow] bkietz commented on a change in pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning
bkietz commented on a change in pull request #7608:
URL: https://github.com/apache/arrow/pull/7608#discussion_r451690654
##########
File path: cpp/src/arrow/dataset/partition.cc
##########
@@ -646,15 +657,26 @@ class HivePartitioningFactory : public PartitioningFactory {
}
}
- return impl.Finish(&dictionaries_);
+ auto schema_result = impl.Finish(&dictionaries_);
+ field_names_ = impl.FieldNames();
+ return schema_result;
Review comment:
Finish doesn't mutate `name_to_index_` and that is the only data member accessed by `FieldNames()`. I don't see why Finish needs to be called first
```suggestion
field_names_ = impl.FieldNames();
return impl.Finish(&dictionaries_);
```
##########
File path: cpp/src/arrow/dataset/partition.cc
##########
@@ -646,15 +657,26 @@ class HivePartitioningFactory : public PartitioningFactory {
}
}
- return impl.Finish(&dictionaries_);
+ auto schema_result = impl.Finish(&dictionaries_);
+ field_names_ = impl.FieldNames();
+ return schema_result;
}
Result<std::shared_ptr<Partitioning>> Finish(
const std::shared_ptr<Schema>& schema) const override {
- return std::shared_ptr<Partitioning>(new HivePartitioning(schema, dictionaries_));
+ for (FieldRef ref : field_names_) {
+ // ensure all of field_names_ are present in schema
+ RETURN_NOT_OK(ref.FindOne(*schema).status());
+ }
+
+ // drop fields which aren't in field_names_
+ auto out_schema = SchemaFromColumnNames(schema, field_names_);
+
+ return std::make_shared<HivePartitioning>(std::move(out_schema), dictionaries_);
Review comment:
The check against field_names_ is only relevant if dictionaries_ is non-empty, which can only occur if Inspect has been called (and `field_names_` has therefore been initialized)
```suggestion
if (dictionaries_.empty()) {
return std::make_shared<HivePartitioning>(schema, dictionaries_);
} else {
for (FieldRef ref : field_names_) {
// ensure all of field_names_ are present in schema
RETURN_NOT_OK(ref.FindOne(*schema).status());
}
// drop fields which aren't in field_names_
auto out_schema = SchemaFromColumnNames(schema, field_names_);
return std::make_shared<HivePartitioning>(std::move(out_schema), dictionaries_);
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org