You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Kenta Murata <mu...@gmail.com> on 2020/08/03 02:31:30 UTC

[DISCUSS][C++] MakeBuilder with a DictionaryType ignores the bit-width of the index type

Hi folks,

arrow::MakeBuilder function with a dictionary type creates a
dictionary builder with AdaptiveIntBuilder by ignoring the bit-width
of DictionaryType's index type.
I want to know whether this behavior is intentional or not.

I think this feature is useful when I want to use a dictionary builder
with AdaptiveIntBuilder.
But the result by following code is a little bit surprising.

```cpp
#include <arrow/api.h>
#include <arrow/util/logging.h>
#include <iostream>

int
main(int argc, char **argv)
{
  auto dict_type = arrow::dictionary(arrow::int32(), arrow::utf8());
  std::unique_ptr<arrow::ArrayBuilder> out;
  ARROW_CHECK_OK(arrow::MakeBuilder(arrow::default_memory_pool(),
dict_type, &out));
  std::cout << "type: " << out->type()->ToString() << std::endl;
  return 0;
}
```

You can see the message below when executing this code.

    type: dictionary<values=string, indices=int8, ordered=0>

I got `indices=int8` from a dictionary type with int32 index type.
I guess most people expect they get `indices=int32` here.

-- 
Kenta Murata

Re: [DISCUSS][C++] MakeBuilder with a DictionaryType ignores the bit-width of the index type

Posted by Kenta Murata <mr...@mrkn.jp>.
Agreed.

I made ARROW-9642 and its pull-request.
https://github.com/apache/arrow/pull/7898

2020年8月4日(火) 6:32 Wes McKinney <we...@gmail.com>:

>
> It seems useful to use the index type to set the starting bit width of
> the builder. I guess we can preserve the behavior of expanding to the
> next bit width when overflowing the smaller integer types.
>
> On Sun, Aug 2, 2020 at 9:32 PM Kenta Murata <mu...@gmail.com> wrote:
> >
> > Hi folks,
> >
> > arrow::MakeBuilder function with a dictionary type creates a
> > dictionary builder with AdaptiveIntBuilder by ignoring the bit-width
> > of DictionaryType's index type.
> > I want to know whether this behavior is intentional or not.
> >
> > I think this feature is useful when I want to use a dictionary builder
> > with AdaptiveIntBuilder.
> > But the result by following code is a little bit surprising.
> >
> > ```cpp
> > #include <arrow/api.h>
> > #include <arrow/util/logging.h>
> > #include <iostream>
> >
> > int
> > main(int argc, char **argv)
> > {
> >   auto dict_type = arrow::dictionary(arrow::int32(), arrow::utf8());
> >   std::unique_ptr<arrow::ArrayBuilder> out;
> >   ARROW_CHECK_OK(arrow::MakeBuilder(arrow::default_memory_pool(),
> > dict_type, &out));
> >   std::cout << "type: " << out->type()->ToString() << std::endl;
> >   return 0;
> > }
> > ```
> >
> > You can see the message below when executing this code.
> >
> >     type: dictionary<values=string, indices=int8, ordered=0>
> >
> > I got `indices=int8` from a dictionary type with int32 index type.
> > I guess most people expect they get `indices=int32` here.
> >
> > --
> > Kenta Murata



--
Regards,
Kenta Murata

Re: [DISCUSS][C++] MakeBuilder with a DictionaryType ignores the bit-width of the index type

Posted by Wes McKinney <we...@gmail.com>.
It seems useful to use the index type to set the starting bit width of
the builder. I guess we can preserve the behavior of expanding to the
next bit width when overflowing the smaller integer types.

On Sun, Aug 2, 2020 at 9:32 PM Kenta Murata <mu...@gmail.com> wrote:
>
> Hi folks,
>
> arrow::MakeBuilder function with a dictionary type creates a
> dictionary builder with AdaptiveIntBuilder by ignoring the bit-width
> of DictionaryType's index type.
> I want to know whether this behavior is intentional or not.
>
> I think this feature is useful when I want to use a dictionary builder
> with AdaptiveIntBuilder.
> But the result by following code is a little bit surprising.
>
> ```cpp
> #include <arrow/api.h>
> #include <arrow/util/logging.h>
> #include <iostream>
>
> int
> main(int argc, char **argv)
> {
>   auto dict_type = arrow::dictionary(arrow::int32(), arrow::utf8());
>   std::unique_ptr<arrow::ArrayBuilder> out;
>   ARROW_CHECK_OK(arrow::MakeBuilder(arrow::default_memory_pool(),
> dict_type, &out));
>   std::cout << "type: " << out->type()->ToString() << std::endl;
>   return 0;
> }
> ```
>
> You can see the message below when executing this code.
>
>     type: dictionary<values=string, indices=int8, ordered=0>
>
> I got `indices=int8` from a dictionary type with int32 index type.
> I guess most people expect they get `indices=int32` here.
>
> --
> Kenta Murata