You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/29 17:00:01 UTC

[GitHub] [arrow] zeroshade opened a new pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

zeroshade opened a new pull request #10203:
URL: https://github.com/apache/arrow/pull/10203


   Getting the extension metadata recognized for the integration tests with extension types also had the side effect of being a solution for the custom metadata integration tests, so i've also enabled those for Go.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630327365



##########
File path: go/arrow/internal/testing/types/extension_types.go
##########
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package types contains user-defined types for use in the tests for the arrow package
+package types
+
+import (
+	"encoding/binary"
+	"fmt"
+	"reflect"
+
+	"github.com/apache/arrow/go/arrow"
+	"github.com/apache/arrow/go/arrow/array"
+	"golang.org/x/xerrors"
+)
+
+// UuidArray is a simple array which is a FixedSizeBinary(16)
+type UuidArray struct {

Review comment:
       small style nit: UUID seems more consistent with other abbreviation handling in Go?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630311178



##########
File path: go/arrow/array/extension.go
##########
@@ -0,0 +1,236 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package array
+
+import (
+	"reflect"
+
+	"github.com/apache/arrow/go/arrow"
+	"github.com/apache/arrow/go/arrow/memory"
+	"golang.org/x/xerrors"
+)
+
+// ExtensionArray is the interface that needs to be implemented to handle
+// user-defined extension type arrays. In order to ensure consistency and
+// proper behavior, all ExtensionArray types must embed ExtensionArrayBase
+// in order to meet the interface which provides the default implementation
+// and handling for the array while allowing custom behavior to be built
+// on top of it.
+type ExtensionArray interface {
+	Interface
+	// ExtensionType returns the datatype as per calling DataType(), but
+	// already cast to ExtensionType
+	ExtensionType() arrow.ExtensionType
+	// Storage returns the underlying storage array for this array.
+	Storage() Interface
+
+	// by having a non-exported function in the interface, it means that
+	// consumers must embed ExtensionArrayBase in their structs in order
+	// to fulfill this interface.
+	mustEmbedExtensionArrayBase()
+}
+
+// two extension arrays are equal if their data types are equal and
+// their underlying storage arrays are equal.
+func arrayEqualExtension(l, r ExtensionArray) bool {
+	if !arrow.TypeEqual(l.DataType(), r.DataType()) {
+		return false
+	}
+
+	return ArrayEqual(l.Storage(), r.Storage())
+}
+
+// two extension arrays are approximately equal if their data types are
+// equal and their underlying storage arrays are approximately equal.
+func arrayApproxEqualExtension(l, r ExtensionArray, opt equalOption) bool {
+	if !arrow.TypeEqual(l.DataType(), r.DataType()) {
+		return false
+	}
+
+	return arrayApproxEqual(l.Storage(), r.Storage(), opt)
+}
+
+// NewExtensionArrayWithStorage constructs a new ExtensionArray from the provided
+// ExtensionType and uses the provided storage interface as the underlying storage.
+// This will not release the storage array passed in so consumers should call Release

Review comment:
       so users will generally call release right after construction?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630316132



##########
File path: go/arrow/compare_test.go
##########
@@ -27,7 +27,7 @@ func TestTypeEqual(t *testing.T) {
 		checkMetadata bool
 	}{
 		{
-			nil, nil, false, false,
+			nil, nil, true, false,

Review comment:
       why this change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630564562



##########
File path: go/arrow/compare_test.go
##########
@@ -27,7 +27,7 @@ func TestTypeEqual(t *testing.T) {
 		checkMetadata bool
 	}{
 		{
-			nil, nil, false, false,
+			nil, nil, true, false,

Review comment:
       in order for simplifying the type comparison code when handling lists / extension types I made a slight change to the logic here so that if you call `TypeEqual` with both arguments being `nil` then it returns `true` that they are equal. I was actually surprised when i found that this wasn't the case already such that `TypeEqual(nil, nil)` previously returned false. Since it simplified the code in other areas and also makes sense for nil to be equal to nil for a call to `TypeEqual` I didn't see an issue with making that change which only affected this one unit test which expected the false for two nils.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#issuecomment-829511852


   https://issues.apache.org/jira/browse/ARROW-5385


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#issuecomment-840661181


   +1 thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630561454



##########
File path: go/arrow/array/extension.go
##########
@@ -0,0 +1,236 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package array
+
+import (
+	"reflect"
+
+	"github.com/apache/arrow/go/arrow"
+	"github.com/apache/arrow/go/arrow/memory"
+	"golang.org/x/xerrors"
+)
+
+// ExtensionArray is the interface that needs to be implemented to handle
+// user-defined extension type arrays. In order to ensure consistency and
+// proper behavior, all ExtensionArray types must embed ExtensionArrayBase
+// in order to meet the interface which provides the default implementation
+// and handling for the array while allowing custom behavior to be built
+// on top of it.
+type ExtensionArray interface {
+	Interface
+	// ExtensionType returns the datatype as per calling DataType(), but
+	// already cast to ExtensionType
+	ExtensionType() arrow.ExtensionType
+	// Storage returns the underlying storage array for this array.
+	Storage() Interface
+
+	// by having a non-exported function in the interface, it means that
+	// consumers must embed ExtensionArrayBase in their structs in order
+	// to fulfill this interface.
+	mustEmbedExtensionArrayBase()
+}
+
+// two extension arrays are equal if their data types are equal and
+// their underlying storage arrays are equal.
+func arrayEqualExtension(l, r ExtensionArray) bool {
+	if !arrow.TypeEqual(l.DataType(), r.DataType()) {
+		return false
+	}
+
+	return ArrayEqual(l.Storage(), r.Storage())
+}
+
+// two extension arrays are approximately equal if their data types are
+// equal and their underlying storage arrays are approximately equal.
+func arrayApproxEqualExtension(l, r ExtensionArray, opt equalOption) bool {
+	if !arrow.TypeEqual(l.DataType(), r.DataType()) {
+		return false
+	}
+
+	return arrayApproxEqual(l.Storage(), r.Storage(), opt)
+}
+
+// NewExtensionArrayWithStorage constructs a new ExtensionArray from the provided
+// ExtensionType and uses the provided storage interface as the underlying storage.
+// This will not release the storage array passed in so consumers should call Release

Review comment:
       If they use this constructor to pass an already existing `array.Interface` then they still need to manually call `Release` on the storage `array.Interface` they pass in since this doesn't `take ownership` but instead effectively uses `Retain` to be another reference to the same data.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630315454



##########
File path: go/arrow/compare.go
##########
@@ -46,34 +46,45 @@ func TypeEqual(left, right DataType, opts ...TypeEqualOption) bool {
 
 	switch {
 	case left == nil || right == nil:
-		return false
+		return left == nil && right == nil
 	case left.ID() != right.ID():
 		return false
 	}
 
-	// StructType is the only type that has metadata.
-	l, ok := left.(*StructType)
-	if !ok || cfg.metadata {
-		return reflect.DeepEqual(left, right)
-	}
-
-	r := right.(*StructType)
-	switch {
-	case len(l.fields) != len(r.fields):
-		return false
-	case !reflect.DeepEqual(l.index, r.index):
-		return false
-	}
-	for i := range l.fields {
-		leftField, rightField := l.fields[i], r.fields[i]
-		switch {
-		case leftField.Name != rightField.Name:
+	switch l := left.(type) {
+	case ExtensionType:
+		return l.ExtensionEquals(right.(ExtensionType))
+	case *ListType:
+		if !TypeEqual(l.Elem(), right.(*ListType).Elem(), opts...) {
 			return false
-		case leftField.Nullable != rightField.Nullable:
+		}
+		if cfg.metadata {
+			return l.Meta.Equal(right.(*ListType).Meta)
+		}
+		return true
+	case *StructType:
+		r := right.(*StructType)
+		switch {
+		case len(l.fields) != len(r.fields):
 			return false
-		case !TypeEqual(leftField.Type, rightField.Type, opts...):
+		case !reflect.DeepEqual(l.index, r.index):

Review comment:
       what is the scope of reflect.DeepEqual, i'm wondering if there are chances transient data might cause false negatives.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#issuecomment-830852832


   @emkornfield @jorgecarleitao bump


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630321847



##########
File path: go/arrow/datatype_extension.go
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package arrow
+
+import (
+	"fmt"
+	"reflect"
+	"sync"
+
+	"golang.org/x/xerrors"
+)
+
+var (
+	// global extension type registry, initially left null to avoid paying
+	// the cost if no extension types are used.
+	// the choice to use a sync.Map here is because it's expected that most
+	// use cases would be to register some number of types at initialization
+	// or otherwise and leave them rather than a pattern of repeatedly registering
+	// and unregistering types. As per the documentation for sync.Map
+	// (https://pkg.go.dev/sync#Map), it is specialized for the case where an entry
+	// is written once but read many times which fits our case here as we register
+	// a type once and then have to read it many times when deserializing messages
+	// with that type.
+	extTypeRegistry *sync.Map
+	// used for initializing the registry once and only once
+	initReg sync.Once
+)
+
+// convenience function to ensure that the type registry is initialized once
+// and only once in a goroutine-safe manner.
+func getExtTypeRegistry() *sync.Map {
+	initReg.Do(func() { extTypeRegistry = &sync.Map{} })
+	return extTypeRegistry
+}
+
+// RegisterExtensionType registers the provided ExtensionType by calling ExtensionName
+// to use as a Key for registrying the type. If a type with the same name is already
+// registered then this will return an error saying so, otherwise it will return nil
+// if successful registering the type.
+// This function is safe to call from multiple goroutines simultaneously.
+func RegisterExtensionType(typ ExtensionType) error {
+	name := typ.ExtensionName()
+	registry := getExtTypeRegistry()
+	if _, existed := registry.LoadOrStore(name, typ); existed {
+		return xerrors.Errorf("arrow: type extension with name %s already defined", name)
+	}
+	return nil
+}
+
+// UnregisterExtensionType removes the type with the given name from the registry
+// causing any messages with that type which come in to be expressed with their
+// metadata and underlying type instead of the extension type that isn't known.
+// This function is safe to call from multiple goroutines simultaneously.
+func UnregisterExtensionType(typName string) error {
+	registry := getExtTypeRegistry()
+	if _, loaded := registry.LoadAndDelete(typName); !loaded {
+		return xerrors.Errorf("arrow: no type extension with name %s found", typName)
+	}
+	return nil
+}
+
+// GetExtensionType retrieves and returns the extension type of the given name
+// from the global extension type registry. If the type isn't found it will return
+// nil. This function is safe to call from multiple goroutines concurrently.
+func GetExtensionType(typName string) ExtensionType {
+	registry := getExtTypeRegistry()
+	if val, ok := registry.Load(typName); ok {
+		return val.(ExtensionType)
+	}
+	return nil
+}
+
+// ExtensionType is an interface for handling user-defined types. They must be
+// DataTypes and must embed arrow.ExtensionBase in them in order to work properly
+// ensuring that they always have the expected base behavior.
+//
+// The arrow.ExtensionBase that needs to be embedded implements the DataType interface
+// leaving the remaining functions having to be implemented by the actual user-defined
+// type in order to be handled properly.
+type ExtensionType interface {
+	DataType
+	// ArrayType should return the reflect.TypeOf(ExtensionArrayType{}) where the
+	// ExtensionArrayType is a type that implements the array.ExtensionArray interface.
+	// Such a type must also embed the array.ExtensionArrayBase in it. This will be used
+	// when creating arrays of this ExtensionType by using reflect.New
+	ArrayType() reflect.Type
+	// ExtensionName is what will be used when registering / unregistering this extension
+	// type. Multiple user-defined types can be defined with a parameterized ExtensionType
+	// as long as the parameter is used in the ExtensionName to distinguish the instances
+	// in the global Extension Type registry.
+	// The return from this is also what will be placed in the metadata for IPC communication
+	// under the key ARROW:extension:name
+	ExtensionName() string
+	// StorageType returns the underlying storage type which is used by this extension
+	// type. It is already implemented by the ExtensionBase struct and thus does not need
+	// to be re-implemented by a user-defined type.
+	StorageType() DataType
+	// ExtensionEquals is used to tell whether two ExtensionType instances are equal types.
+	ExtensionEquals(ExtensionType) bool
+	// Serialize should produce any extra metadata necessary for initializing an instance of
+	// this user-defined type. Not all user-defined types require this and it is valid to return
+	// nil from this function or an empty slice. This is used for the IPC format and will be
+	// added to metadata for IPC communication under the key ARROW:extension:metadata
+	// This should be implemented such that it is valid to be called by multiple goroutines
+	// concurrently.
+	Serialize() []byte

Review comment:
       hmm, I need to double check but I think this actually needs to be a UTF8 string not raw bytes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r631180370



##########
File path: go/arrow/datatype_extension.go
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package arrow
+
+import (
+	"fmt"
+	"reflect"
+	"sync"
+
+	"golang.org/x/xerrors"
+)
+
+var (
+	// global extension type registry, initially left null to avoid paying
+	// the cost if no extension types are used.
+	// the choice to use a sync.Map here is because it's expected that most
+	// use cases would be to register some number of types at initialization
+	// or otherwise and leave them rather than a pattern of repeatedly registering
+	// and unregistering types. As per the documentation for sync.Map
+	// (https://pkg.go.dev/sync#Map), it is specialized for the case where an entry
+	// is written once but read many times which fits our case here as we register
+	// a type once and then have to read it many times when deserializing messages
+	// with that type.
+	extTypeRegistry *sync.Map
+	// used for initializing the registry once and only once
+	initReg sync.Once
+)
+
+// convenience function to ensure that the type registry is initialized once
+// and only once in a goroutine-safe manner.
+func getExtTypeRegistry() *sync.Map {
+	initReg.Do(func() { extTypeRegistry = &sync.Map{} })
+	return extTypeRegistry
+}
+
+// RegisterExtensionType registers the provided ExtensionType by calling ExtensionName
+// to use as a Key for registrying the type. If a type with the same name is already
+// registered then this will return an error saying so, otherwise it will return nil
+// if successful registering the type.
+// This function is safe to call from multiple goroutines simultaneously.
+func RegisterExtensionType(typ ExtensionType) error {
+	name := typ.ExtensionName()
+	registry := getExtTypeRegistry()
+	if _, existed := registry.LoadOrStore(name, typ); existed {
+		return xerrors.Errorf("arrow: type extension with name %s already defined", name)
+	}
+	return nil
+}
+
+// UnregisterExtensionType removes the type with the given name from the registry
+// causing any messages with that type which come in to be expressed with their
+// metadata and underlying type instead of the extension type that isn't known.
+// This function is safe to call from multiple goroutines simultaneously.
+func UnregisterExtensionType(typName string) error {
+	registry := getExtTypeRegistry()
+	if _, loaded := registry.LoadAndDelete(typName); !loaded {
+		return xerrors.Errorf("arrow: no type extension with name %s found", typName)
+	}
+	return nil
+}
+
+// GetExtensionType retrieves and returns the extension type of the given name
+// from the global extension type registry. If the type isn't found it will return
+// nil. This function is safe to call from multiple goroutines concurrently.
+func GetExtensionType(typName string) ExtensionType {
+	registry := getExtTypeRegistry()
+	if val, ok := registry.Load(typName); ok {
+		return val.(ExtensionType)
+	}
+	return nil
+}
+
+// ExtensionType is an interface for handling user-defined types. They must be
+// DataTypes and must embed arrow.ExtensionBase in them in order to work properly
+// ensuring that they always have the expected base behavior.
+//
+// The arrow.ExtensionBase that needs to be embedded implements the DataType interface
+// leaving the remaining functions having to be implemented by the actual user-defined
+// type in order to be handled properly.
+type ExtensionType interface {
+	DataType
+	// ArrayType should return the reflect.TypeOf(ExtensionArrayType{}) where the
+	// ExtensionArrayType is a type that implements the array.ExtensionArray interface.
+	// Such a type must also embed the array.ExtensionArrayBase in it. This will be used
+	// when creating arrays of this ExtensionType by using reflect.New
+	ArrayType() reflect.Type
+	// ExtensionName is what will be used when registering / unregistering this extension
+	// type. Multiple user-defined types can be defined with a parameterized ExtensionType
+	// as long as the parameter is used in the ExtensionName to distinguish the instances
+	// in the global Extension Type registry.
+	// The return from this is also what will be placed in the metadata for IPC communication
+	// under the key ARROW:extension:name
+	ExtensionName() string
+	// StorageType returns the underlying storage type which is used by this extension
+	// type. It is already implemented by the ExtensionBase struct and thus does not need
+	// to be re-implemented by a user-defined type.
+	StorageType() DataType
+	// ExtensionEquals is used to tell whether two ExtensionType instances are equal types.
+	ExtensionEquals(ExtensionType) bool
+	// Serialize should produce any extra metadata necessary for initializing an instance of
+	// this user-defined type. Not all user-defined types require this and it is valid to return
+	// nil from this function or an empty slice. This is used for the IPC format and will be
+	// added to metadata for IPC communication under the key ARROW:extension:metadata
+	// This should be implemented such that it is valid to be called by multiple goroutines
+	// concurrently.
+	Serialize() []byte

Review comment:
       I think this likely depends on the language binding.  The [schema](https://github.com/apache/arrow/blob/master/format/Schema.fbs#L311) defines both key and values as string, this is a unfortunate design flaw.  From the [flatbuffers spec](https://google.github.io/flatbuffers/flatbuffers_guide_writing_schema.html) strings should only include UTF-8 or ASCII, I think C++ gets lucky because std::string has no enforcement on top of it.  I think in other languages (I would guess java) this would cause problems).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield closed pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
emkornfield closed pull request #10203:
URL: https://github.com/apache/arrow/pull/10203


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630565256



##########
File path: go/arrow/internal/testing/types/extension_types.go
##########
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package types contains user-defined types for use in the tests for the arrow package
+package types
+
+import (
+	"encoding/binary"
+	"fmt"
+	"reflect"
+
+	"github.com/apache/arrow/go/arrow"
+	"github.com/apache/arrow/go/arrow/array"
+	"golang.org/x/xerrors"
+)
+
+// UuidArray is a simple array which is a FixedSizeBinary(16)
+type UuidArray struct {

Review comment:
       good catch, that's correct. UUID would be more consistent. I've updated to that now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#issuecomment-829432587


   Tagging @emkornfield @sbinet for visibility and reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#issuecomment-833922467


   Well that was annoying to track down but I finally fixed the failures. oy.
   
   @emkornfield this is finally ready for review again when you have the time to do so.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#issuecomment-830160886


   Yap, sorry about that. We fixed it in Rust's end; if you re-trigger it should run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#issuecomment-830177086


   @jorgecarleitao The tests are still failing on rust build failures :( Any ideas?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#issuecomment-831658811


   Sorry will try to take a look my availability this week might be fairly limited.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r631194443



##########
File path: go/arrow/datatype_extension.go
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package arrow
+
+import (
+	"fmt"
+	"reflect"
+	"sync"
+
+	"golang.org/x/xerrors"
+)
+
+var (
+	// global extension type registry, initially left null to avoid paying
+	// the cost if no extension types are used.
+	// the choice to use a sync.Map here is because it's expected that most
+	// use cases would be to register some number of types at initialization
+	// or otherwise and leave them rather than a pattern of repeatedly registering
+	// and unregistering types. As per the documentation for sync.Map
+	// (https://pkg.go.dev/sync#Map), it is specialized for the case where an entry
+	// is written once but read many times which fits our case here as we register
+	// a type once and then have to read it many times when deserializing messages
+	// with that type.
+	extTypeRegistry *sync.Map
+	// used for initializing the registry once and only once
+	initReg sync.Once
+)
+
+// convenience function to ensure that the type registry is initialized once
+// and only once in a goroutine-safe manner.
+func getExtTypeRegistry() *sync.Map {
+	initReg.Do(func() { extTypeRegistry = &sync.Map{} })
+	return extTypeRegistry
+}
+
+// RegisterExtensionType registers the provided ExtensionType by calling ExtensionName
+// to use as a Key for registrying the type. If a type with the same name is already
+// registered then this will return an error saying so, otherwise it will return nil
+// if successful registering the type.
+// This function is safe to call from multiple goroutines simultaneously.
+func RegisterExtensionType(typ ExtensionType) error {
+	name := typ.ExtensionName()
+	registry := getExtTypeRegistry()
+	if _, existed := registry.LoadOrStore(name, typ); existed {
+		return xerrors.Errorf("arrow: type extension with name %s already defined", name)
+	}
+	return nil
+}
+
+// UnregisterExtensionType removes the type with the given name from the registry
+// causing any messages with that type which come in to be expressed with their
+// metadata and underlying type instead of the extension type that isn't known.
+// This function is safe to call from multiple goroutines simultaneously.
+func UnregisterExtensionType(typName string) error {
+	registry := getExtTypeRegistry()
+	if _, loaded := registry.LoadAndDelete(typName); !loaded {
+		return xerrors.Errorf("arrow: no type extension with name %s found", typName)
+	}
+	return nil
+}
+
+// GetExtensionType retrieves and returns the extension type of the given name
+// from the global extension type registry. If the type isn't found it will return
+// nil. This function is safe to call from multiple goroutines concurrently.
+func GetExtensionType(typName string) ExtensionType {
+	registry := getExtTypeRegistry()
+	if val, ok := registry.Load(typName); ok {
+		return val.(ExtensionType)
+	}
+	return nil
+}
+
+// ExtensionType is an interface for handling user-defined types. They must be
+// DataTypes and must embed arrow.ExtensionBase in them in order to work properly
+// ensuring that they always have the expected base behavior.
+//
+// The arrow.ExtensionBase that needs to be embedded implements the DataType interface
+// leaving the remaining functions having to be implemented by the actual user-defined
+// type in order to be handled properly.
+type ExtensionType interface {
+	DataType
+	// ArrayType should return the reflect.TypeOf(ExtensionArrayType{}) where the
+	// ExtensionArrayType is a type that implements the array.ExtensionArray interface.
+	// Such a type must also embed the array.ExtensionArrayBase in it. This will be used
+	// when creating arrays of this ExtensionType by using reflect.New
+	ArrayType() reflect.Type
+	// ExtensionName is what will be used when registering / unregistering this extension
+	// type. Multiple user-defined types can be defined with a parameterized ExtensionType
+	// as long as the parameter is used in the ExtensionName to distinguish the instances
+	// in the global Extension Type registry.
+	// The return from this is also what will be placed in the metadata for IPC communication
+	// under the key ARROW:extension:name
+	ExtensionName() string
+	// StorageType returns the underlying storage type which is used by this extension
+	// type. It is already implemented by the ExtensionBase struct and thus does not need
+	// to be re-implemented by a user-defined type.
+	StorageType() DataType
+	// ExtensionEquals is used to tell whether two ExtensionType instances are equal types.
+	ExtensionEquals(ExtensionType) bool
+	// Serialize should produce any extra metadata necessary for initializing an instance of
+	// this user-defined type. Not all user-defined types require this and it is valid to return
+	// nil from this function or an empty slice. This is used for the IPC format and will be
+	// added to metadata for IPC communication under the key ARROW:extension:metadata
+	// This should be implemented such that it is valid to be called by multiple goroutines
+	// concurrently.
+	Serialize() []byte

Review comment:
       ok, i'll make the changes to have this be a string here, since golang strings are utf8.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r631395293



##########
File path: go/arrow/datatype_extension.go
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package arrow
+
+import (
+	"fmt"
+	"reflect"
+	"sync"
+
+	"golang.org/x/xerrors"
+)
+
+var (
+	// global extension type registry, initially left null to avoid paying
+	// the cost if no extension types are used.
+	// the choice to use a sync.Map here is because it's expected that most
+	// use cases would be to register some number of types at initialization
+	// or otherwise and leave them rather than a pattern of repeatedly registering
+	// and unregistering types. As per the documentation for sync.Map
+	// (https://pkg.go.dev/sync#Map), it is specialized for the case where an entry
+	// is written once but read many times which fits our case here as we register
+	// a type once and then have to read it many times when deserializing messages
+	// with that type.
+	extTypeRegistry *sync.Map
+	// used for initializing the registry once and only once
+	initReg sync.Once
+)
+
+// convenience function to ensure that the type registry is initialized once
+// and only once in a goroutine-safe manner.
+func getExtTypeRegistry() *sync.Map {
+	initReg.Do(func() { extTypeRegistry = &sync.Map{} })
+	return extTypeRegistry
+}
+
+// RegisterExtensionType registers the provided ExtensionType by calling ExtensionName
+// to use as a Key for registrying the type. If a type with the same name is already
+// registered then this will return an error saying so, otherwise it will return nil
+// if successful registering the type.
+// This function is safe to call from multiple goroutines simultaneously.
+func RegisterExtensionType(typ ExtensionType) error {
+	name := typ.ExtensionName()
+	registry := getExtTypeRegistry()
+	if _, existed := registry.LoadOrStore(name, typ); existed {
+		return xerrors.Errorf("arrow: type extension with name %s already defined", name)
+	}
+	return nil
+}
+
+// UnregisterExtensionType removes the type with the given name from the registry
+// causing any messages with that type which come in to be expressed with their
+// metadata and underlying type instead of the extension type that isn't known.
+// This function is safe to call from multiple goroutines simultaneously.
+func UnregisterExtensionType(typName string) error {
+	registry := getExtTypeRegistry()
+	if _, loaded := registry.LoadAndDelete(typName); !loaded {
+		return xerrors.Errorf("arrow: no type extension with name %s found", typName)
+	}
+	return nil
+}
+
+// GetExtensionType retrieves and returns the extension type of the given name
+// from the global extension type registry. If the type isn't found it will return
+// nil. This function is safe to call from multiple goroutines concurrently.
+func GetExtensionType(typName string) ExtensionType {
+	registry := getExtTypeRegistry()
+	if val, ok := registry.Load(typName); ok {
+		return val.(ExtensionType)
+	}
+	return nil
+}
+
+// ExtensionType is an interface for handling user-defined types. They must be
+// DataTypes and must embed arrow.ExtensionBase in them in order to work properly
+// ensuring that they always have the expected base behavior.
+//
+// The arrow.ExtensionBase that needs to be embedded implements the DataType interface
+// leaving the remaining functions having to be implemented by the actual user-defined
+// type in order to be handled properly.
+type ExtensionType interface {
+	DataType
+	// ArrayType should return the reflect.TypeOf(ExtensionArrayType{}) where the
+	// ExtensionArrayType is a type that implements the array.ExtensionArray interface.
+	// Such a type must also embed the array.ExtensionArrayBase in it. This will be used
+	// when creating arrays of this ExtensionType by using reflect.New
+	ArrayType() reflect.Type
+	// ExtensionName is what will be used when registering / unregistering this extension
+	// type. Multiple user-defined types can be defined with a parameterized ExtensionType
+	// as long as the parameter is used in the ExtensionName to distinguish the instances
+	// in the global Extension Type registry.
+	// The return from this is also what will be placed in the metadata for IPC communication
+	// under the key ARROW:extension:name
+	ExtensionName() string
+	// StorageType returns the underlying storage type which is used by this extension
+	// type. It is already implemented by the ExtensionBase struct and thus does not need
+	// to be re-implemented by a user-defined type.
+	StorageType() DataType
+	// ExtensionEquals is used to tell whether two ExtensionType instances are equal types.
+	ExtensionEquals(ExtensionType) bool
+	// Serialize should produce any extra metadata necessary for initializing an instance of
+	// this user-defined type. Not all user-defined types require this and it is valid to return
+	// nil from this function or an empty slice. This is used for the IPC format and will be
+	// added to metadata for IPC communication under the key ARROW:extension:metadata
+	// This should be implemented such that it is valid to be called by multiple goroutines
+	// concurrently.
+	Serialize() []byte

Review comment:
       updated changing it to be a string.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630562673



##########
File path: go/arrow/compare.go
##########
@@ -46,34 +46,45 @@ func TypeEqual(left, right DataType, opts ...TypeEqualOption) bool {
 
 	switch {
 	case left == nil || right == nil:
-		return false
+		return left == nil && right == nil
 	case left.ID() != right.ID():
 		return false
 	}
 
-	// StructType is the only type that has metadata.
-	l, ok := left.(*StructType)
-	if !ok || cfg.metadata {
-		return reflect.DeepEqual(left, right)
-	}
-
-	r := right.(*StructType)
-	switch {
-	case len(l.fields) != len(r.fields):
-		return false
-	case !reflect.DeepEqual(l.index, r.index):
-		return false
-	}
-	for i := range l.fields {
-		leftField, rightField := l.fields[i], r.fields[i]
-		switch {
-		case leftField.Name != rightField.Name:
+	switch l := left.(type) {
+	case ExtensionType:
+		return l.ExtensionEquals(right.(ExtensionType))
+	case *ListType:
+		if !TypeEqual(l.Elem(), right.(*ListType).Elem(), opts...) {
 			return false
-		case leftField.Nullable != rightField.Nullable:
+		}
+		if cfg.metadata {
+			return l.Meta.Equal(right.(*ListType).Meta)
+		}
+		return true
+	case *StructType:
+		r := right.(*StructType)
+		switch {
+		case len(l.fields) != len(r.fields):
 			return false
-		case !TypeEqual(leftField.Type, rightField.Type, opts...):
+		case !reflect.DeepEqual(l.index, r.index):

Review comment:
       it recursively goes through the objects to confirm that they are equal or not. if you look up at the deleted line 64 above this, I actually didn't change this logic at all despite what the diff shows. The diff isn't handling the changed order of the code well, but the previous code was already doing `!reflect.DeepEqual(l.index, r.index)` so there would be no change in logic from the existing setup for struct types here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on a change in pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on a change in pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#discussion_r630554457



##########
File path: go/arrow/datatype_extension.go
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package arrow
+
+import (
+	"fmt"
+	"reflect"
+	"sync"
+
+	"golang.org/x/xerrors"
+)
+
+var (
+	// global extension type registry, initially left null to avoid paying
+	// the cost if no extension types are used.
+	// the choice to use a sync.Map here is because it's expected that most
+	// use cases would be to register some number of types at initialization
+	// or otherwise and leave them rather than a pattern of repeatedly registering
+	// and unregistering types. As per the documentation for sync.Map
+	// (https://pkg.go.dev/sync#Map), it is specialized for the case where an entry
+	// is written once but read many times which fits our case here as we register
+	// a type once and then have to read it many times when deserializing messages
+	// with that type.
+	extTypeRegistry *sync.Map
+	// used for initializing the registry once and only once
+	initReg sync.Once
+)
+
+// convenience function to ensure that the type registry is initialized once
+// and only once in a goroutine-safe manner.
+func getExtTypeRegistry() *sync.Map {
+	initReg.Do(func() { extTypeRegistry = &sync.Map{} })
+	return extTypeRegistry
+}
+
+// RegisterExtensionType registers the provided ExtensionType by calling ExtensionName
+// to use as a Key for registrying the type. If a type with the same name is already
+// registered then this will return an error saying so, otherwise it will return nil
+// if successful registering the type.
+// This function is safe to call from multiple goroutines simultaneously.
+func RegisterExtensionType(typ ExtensionType) error {
+	name := typ.ExtensionName()
+	registry := getExtTypeRegistry()
+	if _, existed := registry.LoadOrStore(name, typ); existed {
+		return xerrors.Errorf("arrow: type extension with name %s already defined", name)
+	}
+	return nil
+}
+
+// UnregisterExtensionType removes the type with the given name from the registry
+// causing any messages with that type which come in to be expressed with their
+// metadata and underlying type instead of the extension type that isn't known.
+// This function is safe to call from multiple goroutines simultaneously.
+func UnregisterExtensionType(typName string) error {
+	registry := getExtTypeRegistry()
+	if _, loaded := registry.LoadAndDelete(typName); !loaded {
+		return xerrors.Errorf("arrow: no type extension with name %s found", typName)
+	}
+	return nil
+}
+
+// GetExtensionType retrieves and returns the extension type of the given name
+// from the global extension type registry. If the type isn't found it will return
+// nil. This function is safe to call from multiple goroutines concurrently.
+func GetExtensionType(typName string) ExtensionType {
+	registry := getExtTypeRegistry()
+	if val, ok := registry.Load(typName); ok {
+		return val.(ExtensionType)
+	}
+	return nil
+}
+
+// ExtensionType is an interface for handling user-defined types. They must be
+// DataTypes and must embed arrow.ExtensionBase in them in order to work properly
+// ensuring that they always have the expected base behavior.
+//
+// The arrow.ExtensionBase that needs to be embedded implements the DataType interface
+// leaving the remaining functions having to be implemented by the actual user-defined
+// type in order to be handled properly.
+type ExtensionType interface {
+	DataType
+	// ArrayType should return the reflect.TypeOf(ExtensionArrayType{}) where the
+	// ExtensionArrayType is a type that implements the array.ExtensionArray interface.
+	// Such a type must also embed the array.ExtensionArrayBase in it. This will be used
+	// when creating arrays of this ExtensionType by using reflect.New
+	ArrayType() reflect.Type
+	// ExtensionName is what will be used when registering / unregistering this extension
+	// type. Multiple user-defined types can be defined with a parameterized ExtensionType
+	// as long as the parameter is used in the ExtensionName to distinguish the instances
+	// in the global Extension Type registry.
+	// The return from this is also what will be placed in the metadata for IPC communication
+	// under the key ARROW:extension:name
+	ExtensionName() string
+	// StorageType returns the underlying storage type which is used by this extension
+	// type. It is already implemented by the ExtensionBase struct and thus does not need
+	// to be re-implemented by a user-defined type.
+	StorageType() DataType
+	// ExtensionEquals is used to tell whether two ExtensionType instances are equal types.
+	ExtensionEquals(ExtensionType) bool
+	// Serialize should produce any extra metadata necessary for initializing an instance of
+	// this user-defined type. Not all user-defined types require this and it is valid to return
+	// nil from this function or an empty slice. This is used for the IPC format and will be
+	// added to metadata for IPC communication under the key ARROW:extension:metadata
+	// This should be implemented such that it is valid to be called by multiple goroutines
+	// concurrently.
+	Serialize() []byte

Review comment:
       The c++ unit tests write raw bytes of uint32 values in the "parametric" extension type used in the unit tests, so I assumed it could be any bytes and wasn't required to be utf-8.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeroshade commented on pull request #10203: ARROW-5385: [Go] Implement EXTENSION datatype

Posted by GitBox <gi...@apache.org>.
zeroshade commented on pull request #10203:
URL: https://github.com/apache/arrow/pull/10203#issuecomment-829577758


   The integration test seems to be failing on some rust build failure that is unrelated to this change 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org