You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "&res (Jira)" <ji...@apache.org> on 2021/07/11 12:13:00 UTC

[jira] [Created] (ARROW-13301) BaseListBuilder constructor should check the provided type is a list

&res created ARROW-13301:
----------------------------

             Summary: BaseListBuilder constructor should check the provided type is a list 
                 Key: ARROW-13301
                 URL: https://issues.apache.org/jira/browse/ARROW-13301
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
    Affects Versions: 4.0.1
            Reporter: &res


I've noticed that I can create a ListBuilder with a type that is not a ListType (in particular a StructType). 

I'm talking about the following constructor:
{code:java}
BaseListBuilder(MemoryPool* pool, std::shared_ptr<ArrayBuilder> const& value_builder,
                  const std::shared_ptr<DataType>& type)
 {code}

I think this constructor should enforce that the given type is a ListType. 
It could also possibly enforce that the type of the elements of the given ListType match the element of the value_build. 
Alternatively that constructor could be made private (since `BaseListBuilder(MemoryPool* pool, std::shared_ptr<ArrayBuilder> const& value_builder)` should be enough for most use case).


Here's an example where I'm trying to create a "ListType(list<item: struct<return_code: int32, message: string>>)".

When I create the ListBuilder I've noticed that I works with type set to:
# ListType(list<item: struct<return_code: int32, message: string>>) 
# StructType(struct<return_code: int32, message: string>)

In the first case the underlying type is: ListType(list<item: struct<return_code: int32, message: string>>)

But in the second type the underlying type is ListType(list<return_code: struct<return_code: int32, message: string>>). The subtle difference is that the ListType field name has been changed from item to the name of the first element of the list (return_code).

I  think it's because BaseListBuilder uses `type->field(0)` to get the name of the list field, but it uses `value_builder_->type()` to get the type.

See: 

{code:c++}
  BaseListBuilder(MemoryPool* pool, std::shared_ptr<ArrayBuilder> const& value_builder,
                  const std::shared_ptr<DataType>& type)
      : ArrayBuilder(pool),
        offsets_builder_(pool),
        value_builder_(value_builder),
        value_field_(type->field(0)->WithType(NULLPTR)) {}
// ...
std::shared_ptr<DataType> type() const override {
    return std::make_shared<TYPE>(value_field_->WithType(value_builder_->type()));
  }

{code}

Here's an example that reproduce the issue:

{code}


BOOST_AUTO_TEST_CASE(IsThereABugWithArrays) {
  const arrow::FieldVector fields = {
      arrow::field("return_code", arrow::int32()),
      arrow::field("message", arrow::utf8())};

  const std::shared_ptr<arrow::DataType> struct_data_type =
      arrow::struct_(fields);
  const std::shared_ptr<arrow::DataType> list_of_struct_data_type =
      arrow::list(struct_data_type);

  const std::shared_ptr<arrow::Schema> schema =
      arrow::schema({arrow::field("search_results", list_of_struct_data_type)});

  arrow::MemoryPool *pool = arrow::default_memory_pool();

  std::shared_ptr<arrow::Int32Builder> return_code_builder =
      std::make_shared<arrow::Int32Builder>(pool);
  std::shared_ptr<arrow::StringBuilder> message_builder =
      std::make_shared<arrow::StringBuilder>(pool);
  std::vector<std::shared_ptr<arrow::ArrayBuilder>> struct_fields_builders = {
      return_code_builder, message_builder};


  std::shared_ptr<arrow::StructBuilder> struct_builder =
      std::make_shared<arrow::StructBuilder>(
          struct_data_type, pool, struct_fields_builders);
  std::shared_ptr<arrow::ListBuilder> list_builder(
      std::make_shared<arrow::ListBuilder>(
          pool, struct_builder, list_of_struct_data_type));

  BOOST_REQUIRE(list_builder->type()->Equals(list_of_struct_data_type));

  // This should not be allowed:
  std::shared_ptr<arrow::ListBuilder> list_builder_using_struct_dtype(
      std::make_shared<arrow::ListBuilder>(
          pool, struct_builder, struct_data_type));

  std::shared_ptr<arrow::DataType> wrong_data_type = std::make_shared<arrow::ListType> (
      arrow::field("return_code", struct_data_type)
      );

  BOOST_REQUIRE(!list_builder_using_struct_dtype->type()->Equals(list_of_struct_data_type));
  BOOST_REQUIRE(list_builder_using_struct_dtype->type()->Equals(wrong_data_type));

}
{code}









--
This message was sent by Atlassian Jira
(v8.3.4#803005)