You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2019/06/13 20:51:20 UTC

[RESULT] [VOTE] Formalizing "Extension Type" metadata in Arrow binary protocol

The vote carries with 3 binding +1 and 3 non-binding +1

Thanks everyone

On Thu, Jun 13, 2019 at 12:12 AM Sutou Kouhei <ko...@clear-code.com> wrote:
>
> +1
>
> In <CA...@mail.gmail.com>
>   "[VOTE] Formalizing "Extension Type" metadata in Arrow binary protocol" on Mon, 10 Jun 2019 15:28:22 -0500,
>   Wes McKinney <we...@gmail.com> wrote:
>
> > hi folks,
> >
> > In two mailing list threads [1] [2] we have discussed adding an
> > "extension type" mechanism to the Arrow binary/IPC protocol. The idea
> > is to be able to "annotate" built-in Arrow data types with a type name
> > and serialized type data/metadata so that users can implement their
> > own custom columnar data containers that contain application-defined
> > business logic not built-in to the Arrow libraries. This is designed
> > to be non-obtrusive: readers who are not aware of an extension type
> > can interact with the built-in Arrow type opaquely, and propagate the
> > extension metadata unmodified
> >
> > As two examples:
> >
> > * "uuid" may annotate "fixed size binary of value width 16 bytes"
> > * "latitude-longitude" may annotate "struct<lat: double, lon: double>"
> > or similar
> >
> > An implementation may provide specialized columnar containers with
> > additional business logic around manipulating such data in-memory as
> > required for application development
> >
> > We also have prototype implementations of this mechanism ready to go
> > in C++ and Java. I have proposed language additions to the
> > specification [3] and the C++ implementation with the following
> > tenets:
> >
> > - The custom_metadata Flatbuffers field shall use the colon character
> > ":" as a namespace separator
> > - "ARROW" is designated as a reserved namespace in custom_metadata,
> > for example "ARROW:property"
> > - There may be multiple levels of namespacing, for example:
> > "ARROW:myorg:property_name"
> > - Extension type fields "ARROW:extension:name" and
> > "ARROW:extension:metadata" are reserved in custom_metadata to enable
> > serialization of extension type information
> > - The details of implementation and how extension types are exposed to
> > library users is implementation dependent
> >
> > Please vote to accept these changes (see [3] for the actual changes).
> > The vote will be open for at least 72 hours
> >
> > [ ] +1: Adopt these changes into the Arrow columnar format specification
> > [ ] +0: . . .
> > [ ] -1: I disagree because . . .
> >
> > Here is my vote: +1
> >
> > [1]: https://lists.apache.org/thread.html/96c3f5fe64f45a4c5ccac0562dbfd356b76cd722aa521100b5988d40@%3Cdev.arrow.apache.org%3E
> > [2]: https://lists.apache.org/thread.html/f1fc039471a8a9c06f2f9600296a20d4eb3fda379b23685f809118ee@%3Cdev.arrow.apache.org%3E
> > [3]: https://github.com/apache/arrow/pull/4332