You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "&res (Jira)" <ji...@apache.org> on 2021/07/11 12:13:00 UTC
[jira] [Created] (ARROW-13301) BaseListBuilder constructor should
check the provided type is a list
&res created ARROW-13301:
----------------------------
Summary: BaseListBuilder constructor should check the provided type is a list
Key: ARROW-13301
URL: https://issues.apache.org/jira/browse/ARROW-13301
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Affects Versions: 4.0.1
Reporter: &res
I've noticed that I can create a ListBuilder with a type that is not a ListType (in particular a StructType).
I'm talking about the following constructor:
{code:java}
BaseListBuilder(MemoryPool* pool, std::shared_ptr<ArrayBuilder> const& value_builder,
const std::shared_ptr<DataType>& type)
{code}
I think this constructor should enforce that the given type is a ListType.
It could also possibly enforce that the type of the elements of the given ListType match the element of the value_build.
Alternatively that constructor could be made private (since `BaseListBuilder(MemoryPool* pool, std::shared_ptr<ArrayBuilder> const& value_builder)` should be enough for most use case).
Here's an example where I'm trying to create a "ListType(list<item: struct<return_code: int32, message: string>>)".
When I create the ListBuilder I've noticed that I works with type set to:
# ListType(list<item: struct<return_code: int32, message: string>>)
# StructType(struct<return_code: int32, message: string>)
In the first case the underlying type is: ListType(list<item: struct<return_code: int32, message: string>>)
But in the second type the underlying type is ListType(list<return_code: struct<return_code: int32, message: string>>). The subtle difference is that the ListType field name has been changed from item to the name of the first element of the list (return_code).
I think it's because BaseListBuilder uses `type->field(0)` to get the name of the list field, but it uses `value_builder_->type()` to get the type.
See:
{code:c++}
BaseListBuilder(MemoryPool* pool, std::shared_ptr<ArrayBuilder> const& value_builder,
const std::shared_ptr<DataType>& type)
: ArrayBuilder(pool),
offsets_builder_(pool),
value_builder_(value_builder),
value_field_(type->field(0)->WithType(NULLPTR)) {}
// ...
std::shared_ptr<DataType> type() const override {
return std::make_shared<TYPE>(value_field_->WithType(value_builder_->type()));
}
{code}
Here's an example that reproduce the issue:
{code}
BOOST_AUTO_TEST_CASE(IsThereABugWithArrays) {
const arrow::FieldVector fields = {
arrow::field("return_code", arrow::int32()),
arrow::field("message", arrow::utf8())};
const std::shared_ptr<arrow::DataType> struct_data_type =
arrow::struct_(fields);
const std::shared_ptr<arrow::DataType> list_of_struct_data_type =
arrow::list(struct_data_type);
const std::shared_ptr<arrow::Schema> schema =
arrow::schema({arrow::field("search_results", list_of_struct_data_type)});
arrow::MemoryPool *pool = arrow::default_memory_pool();
std::shared_ptr<arrow::Int32Builder> return_code_builder =
std::make_shared<arrow::Int32Builder>(pool);
std::shared_ptr<arrow::StringBuilder> message_builder =
std::make_shared<arrow::StringBuilder>(pool);
std::vector<std::shared_ptr<arrow::ArrayBuilder>> struct_fields_builders = {
return_code_builder, message_builder};
std::shared_ptr<arrow::StructBuilder> struct_builder =
std::make_shared<arrow::StructBuilder>(
struct_data_type, pool, struct_fields_builders);
std::shared_ptr<arrow::ListBuilder> list_builder(
std::make_shared<arrow::ListBuilder>(
pool, struct_builder, list_of_struct_data_type));
BOOST_REQUIRE(list_builder->type()->Equals(list_of_struct_data_type));
// This should not be allowed:
std::shared_ptr<arrow::ListBuilder> list_builder_using_struct_dtype(
std::make_shared<arrow::ListBuilder>(
pool, struct_builder, struct_data_type));
std::shared_ptr<arrow::DataType> wrong_data_type = std::make_shared<arrow::ListType> (
arrow::field("return_code", struct_data_type)
);
BOOST_REQUIRE(!list_builder_using_struct_dtype->type()->Equals(list_of_struct_data_type));
BOOST_REQUIRE(list_builder_using_struct_dtype->type()->Equals(wrong_data_type));
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)