You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "yevgenypats (via GitHub)" <gi...@apache.org> on 2023/03/04 19:45:13 UTC

[GitHub] [arrow] yevgenypats opened a new pull request, #34454: GH-34453: [GO] Support Builders for user defined extensions

yevgenypats opened a new pull request, #34454:
URL: https://github.com/apache/arrow/pull/34454

   This should serve as discussion as it's a medium change but this should Close https://github.com/apache/arrow/issues/34453 and give users the ability to define custom Builder for their extensions just like they define ExtensionTypes and ExtensionArrays


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1128584791


##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   Re breaking, I don't think this will breaking anything as there is a [default](https://github.com/apache/arrow/pull/34454/files#diff-7fe93f07a80f96f7f99153486503543e8bb9443496a71ad078d6c6e9a8a5c39eR169) in the `ExtensionType` that just returns `nil` to keep this backward compatible so if this is not implement I call the old `NewExtensionBuilder` in the `NewBuilder` function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1131373789


##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,20 +18,172 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+	dtype *UUIDType

Review Comment:
   Sounds good. Thanks, I'll try those for next PR! Tests passing now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1128364805


##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   For the second point: Note that you had to add the `NewBuilder` method to the existing extension types. By adding a new method to the interface, anyone who has their own extension types defined that doesn't already have this method defined will have a compile error when they upgrade to this version of Arrow which adds the method because their structs will no longer meet the interface for `ExtensionType`.
   
   >  But actually I think this should 1) provide better performance, no reflection. 2) much better type safety, and better developer experience as it is clear what should be the type of this function (I also wanted to return Builder but had to fallback to interface due to some cyclic import that didn't want to fix part of this PR).
   
   In general, I wouldn't expect creating a builder to be a performance bottleneck as consumers shouldn't create builders repeatedly. That said, a couple ideas I've had so far:
   
   * Like was previously done for Arrays, we could shift the definition of the `Builder` interface to the `arrow` package directly and then add an alias in the `array` package to ensure we don't break any consumers (with a deprecated message telling people to point at `arrow.Builder` instead.
   * If we're going in this direction, rather than passing the allocator and the extension type, we should pass an `ExtensionBuilder` to the method and have this just wrap it and return the wrapped builder. The consumer can also retrieve the extension type and the allocator from the builder directly if they need to. So perhaps something like `WrapBuilder(ExtensionBuilder) Builder`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34454:
URL: https://github.com/apache/arrow/pull/34454#issuecomment-1464686323

   Benchmark runs are scheduled for baseline = 71f3c568af8fe8a6f886ffddb4318b728046a01a and contender = 5219de339b7b34d8af0de73ebef4ca3391b19936. 5219de339b7b34d8af0de73ebef4ca3391b19936 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7eb85a38dc1f484c932b68360f31d38f...8f707f69c3f64220a785bbf1435626c2/)
   [Failed :arrow_down:0.24% :arrow_up:0.09%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/ed8881cee9a24b299969352c262532b2...3e8df6ab044046e6a3469def1d6d029a/)
   [Finished :arrow_down:1.79% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c9a553a60d5b470389944c9c22b206e5...37275bb086fc458592ed8f71a5793bb0/)
   [Finished :arrow_down:0.82% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/0750c6e8e1cd408ea9ee5ab7c3274503...8604339ea21040088bf84dffd9b04349/)
   Buildkite builds:
   [Finished] [`5219de33` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2510)
   [Failed] [`5219de33` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2540)
   [Finished] [`5219de33` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2508)
   [Finished] [`5219de33` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2531)
   [Finished] [`71f3c568` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2509)
   [Finished] [`71f3c568` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2539)
   [Finished] [`71f3c568` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2507)
   [Finished] [`71f3c568` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2530)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1132597425


##########
go/arrow/internal/testing/types/extension_test.go:
##########
@@ -0,0 +1,63 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package types_test
+
+import (
+	"bytes"
+	"encoding/json"
+	"testing"
+
+	"github.com/apache/arrow/go/v12/arrow"
+	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/apache/arrow/go/v12/arrow/internal/testing/types"
+	"github.com/apache/arrow/go/v12/arrow/memory"
+	"github.com/google/uuid"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+var testUUID = uuid.New()
+
+func TestExtensionBuilder(t *testing.T) {
+	extBuilder := array.NewExtensionBuilder(memory.DefaultAllocator, types.NewUUIDType())
+	builder := types.NewUUIDBuilder(extBuilder)
+	builder.Append(testUUID)
+	arr := builder.NewArray()
+	arrStr := arr.String()
+	assert.Equal(t, "[\""+testUUID.String()+"\"]", arrStr)
+	jsonStr, err := json.Marshal(arr)
+	assert.NoError(t, err)
+
+	arr1, _, err := array.FromJSON(memory.DefaultAllocator, types.NewUUIDType(), bytes.NewReader(jsonStr))
+	assert.NoError(t, err)
+	assert.Equal(t, arr, arr1)
+}
+
+func TestExtensionRecordBuilder(t *testing.T) {

Review Comment:
   that works. sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1129803239


##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   @zeroshade can you please take a look at this comment ^ ? (This is why I didn't re-requested a re-review yet as need your guidance here)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1128369139


##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,26 +18,181 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/apache/arrow/go/v12/arrow/memory"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+	dtype *UUIDType
+}
+
+func NewUUIDBuilder(mem memory.Allocator, dtype arrow.ExtensionType) *UUIDBuilder {
+	b := &UUIDBuilder{
+		ExtensionBuilder: array.NewExtensionBuilder(mem, dtype),
+		dtype:            dtype.(*UUIDType),
+	}
+	return b
+}
+
+func (b *UUIDBuilder) Append(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).Append(v[:])
+}
+
+func (b *UUIDBuilder) UnsafeAppend(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).UnsafeAppend(v[:])
+}
+
+func (b *UUIDBuilder) AppendValues(v []uuid.UUID, valid []bool) {
+	data := make([][]byte, len(v))
+	for i, v := range v {
+		data[i] = v[:]
+	}
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).AppendValues(data, valid)
+}
+
+func (b *UUIDBuilder) UnmarshalOne(dec *json.Decoder) error {
+	t, err := dec.Token()

Review Comment:
   Ah, I realized we don't need to manually create default implementations of `UnmarshalOne` etc. because we embed the underlying Storage Builder anyways, so you're right. It will call the underlying one automatically if you don't implement it. All good here then!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1131329338


##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,20 +18,172 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+	dtype *UUIDType

Review Comment:
   Oh ok. Fixed. I wasn't able to find it from the action. Where should I look for it next time? (Can I run those linters locally as well?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1129865484


##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   Ah I missed that! Sorry! and you do check for the nil in `NewBuilder`, okay that's fair. So that avoids the breaking, you're completely correct.
   
   So just need to address the other point in possibly changing the signature / solving the import cycle.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1132595433


##########
go/arrow/datatype_extension.go:
##########
@@ -122,18 +122,17 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	

Review Comment:
   whitespace :smile: should be an empty line without the whitespace please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1131366885


##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,20 +18,172 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+	dtype *UUIDType

Review Comment:
   So that was actually found by the Go build failing, and I saw it by looking at the log file for the failing Go workflows.
   
   You can also run the linters (or most of the workflows) by following the instructions here: https://arrow.apache.org/docs/developers/continuous_integration/archery.html to install the `archery` utility in the repo and then running `archery docker run <job>` as described [here](https://arrow.apache.org/docs/developers/continuous_integration/docker.html).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1129932087


##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   re using `Builder` in the signature I think it might be more complex than it seems (I tried it initially). For example `newData *Data` function which means we need also to take `Data` to `arrow` pacakge. In that case it might be even easier to move `datatype_extension.go` to `array` because we can import from `array` the `arrow` package but not the otherway around (Also, there are even more complications because the interface has private functions which would cause compilation error if we move it to a different package).  Can't think of a much easier way without a big refactor which I think better to avoid atm and just do the runtime check. WDYT ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on PR #34454:
URL: https://github.com/apache/arrow/pull/34454#issuecomment-1461531514

   Ready now for another review, though looks like some workflows are failing. I think this is not connected though to this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on PR #34454:
URL: https://github.com/apache/arrow/pull/34454#issuecomment-1457652754

   > Took a look through, thoughts?
   
   @zeroshade  Thanks for the initial review! Commented and also linked to a few example on how we use it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34454: GH-34453: [GO] Support Builders for user defined extensions

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34454:
URL: https://github.com/apache/arrow/pull/34454#issuecomment-1454854448

   * Closes: #34453


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1127408562


##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   On the first part, we can keep the same pattern, sure. But actually I think this should 1) provide better performance, no reflection. 2) much better type safety, and better developer experience as it is clear what should be the type of this function (I also wanted to return Builder but had to fallback to interface due to some cyclic import that didn't want to fix part of this PR).
   
   Re 2nd thing, not sure I followed, can you share an example of what this going to break? The only place that we use this function now is [here](https://github.com/apache/arrow/pull/34454/files#diff-9a870facf93f4b8b367b13881398e3918d0627e348079d33017231d2b389d8bcR322)  and I fallback to the previous builder.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34454: GH-34453: [GO] Support Builders for user defined extensions

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34454:
URL: https://github.com/apache/arrow/pull/34454#issuecomment-1454854454

   :warning: GitHub issue #34453 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1132021652


##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,20 +18,171 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+}
+
+func NewUUIDBuilder(bldr *array.ExtensionBuilder) *UUIDBuilder {
+	b := &UUIDBuilder{
+		ExtensionBuilder: bldr,
+	}
+	return b
+}
+
+func (b *UUIDBuilder) Append(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).Append(v[:])
+}
+
+func (b *UUIDBuilder) UnsafeAppend(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).UnsafeAppend(v[:])
+}
+
+func (b *UUIDBuilder) AppendValues(v []uuid.UUID, valid []bool) {
+	data := make([][]byte, len(v))
+	for i, v := range v {
+		data[i] = v[:]
+	}
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).AppendValues(data, valid)
+}
+
+func (b *UUIDBuilder) UnmarshalOne(dec *json.Decoder) error {
+	t, err := dec.Token()
+	if err != nil {
+		return err
+	}
+
+	var val uuid.UUID
+	switch v := t.(type) {
+	case string:
+		data, err := uuid.Parse(v)
+		if err != nil {
+			return err
+		}
+		val = data
+	case []byte:
+		data, err := uuid.ParseBytes(v)
+		if err != nil {
+			return err
+		}
+		val = data

Review Comment:
   I just copied that from other places in the code but seems if I remove it then all is good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1132719426


##########
go/arrow/datatype_extension.go:
##########
@@ -122,18 +122,17 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1127408562


##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   On the first part, we can keep the same pattern, sure. But actually I think this should 1) provide better performance, no reflection. 2) much better type safety, and better developer experience as it is clear what should be the type of this function (I also wanted to return Builder but had to fallback to interface due to some cyclic import that didn't want to fix part of this PR).
   
   Re 2nd thing, not sure I followed, can you share an example of that it's going to break? The only place that we use this function now is [here](https://github.com/apache/arrow/pull/34454/files#diff-9a870facf93f4b8b367b13881398e3918d0627e348079d33017231d2b389d8bcR322)  and I fallback to the previous builder.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1131244795


##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,20 +18,172 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+	dtype *UUIDType

Review Comment:
   the failures are because this field is unused here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1131368102


##########
go/arrow/array/extension_builder.go:
##########
@@ -0,0 +1,21 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package array
+
+type ExtensionBuilderWrapper interface {

Review Comment:
   can you add a Godoc comment here to explain what this is for and how to use it? It also might be worthwhile to move the entirety of the `ExtensionBuilder` implementation to this file and update the comment on it to explain the functionality you've added as far as for customizing the behavior.



##########
go/arrow/internal/testing/types/extension_test.go:
##########
@@ -0,0 +1,63 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package types_test
+
+import (
+	"bytes"
+	"encoding/json"
+	"testing"
+
+	"github.com/apache/arrow/go/v12/arrow"
+	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/apache/arrow/go/v12/arrow/internal/testing/types"
+	"github.com/apache/arrow/go/v12/arrow/memory"
+	"github.com/google/uuid"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+var testUUID = uuid.New()
+
+func TestExtensionBuilder(t *testing.T) {
+	extBuilder := array.NewExtensionBuilder(memory.DefaultAllocator, types.NewUUIDType())
+	builder := types.NewUUIDBuilder(extBuilder)
+	builder.Append(testUUID)
+	arr := builder.NewArray()
+	arrStr := arr.String()
+	assert.Equal(t, "[\""+testUUID.String()+"\"]", arrStr)
+	jsonStr, err := json.Marshal(arr)
+	assert.NoError(t, err)
+
+	arr1, _, err := array.FromJSON(memory.DefaultAllocator, types.NewUUIDType(), bytes.NewReader(jsonStr))
+	assert.NoError(t, err)
+	assert.Equal(t, arr, arr1)
+}
+
+func TestExtensionRecordBuilder(t *testing.T) {

Review Comment:
   it would also be probably worthwhile to add a test for the `String()` method of `UUIDArray` , since you did the work to have it print the string representation of the UUID instead of the raw bytes when calling `String()`



##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,20 +18,171 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+}
+
+func NewUUIDBuilder(bldr *array.ExtensionBuilder) *UUIDBuilder {
+	b := &UUIDBuilder{
+		ExtensionBuilder: bldr,
+	}
+	return b
+}
+
+func (b *UUIDBuilder) Append(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).Append(v[:])
+}
+
+func (b *UUIDBuilder) UnsafeAppend(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).UnsafeAppend(v[:])
+}

Review Comment:
   is `UnsafeAppend` actually needed here at all?



##########
go/arrow/datatype_extension.go:
##########
@@ -122,18 +122,16 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-

Review Comment:
   please put the empty line back here, I'd prefer to keep a single empty line between the public and private method declarations here.



##########
go/arrow/internal/testing/types/extension_test.go:
##########
@@ -0,0 +1,63 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package types_test
+
+import (
+	"bytes"
+	"encoding/json"
+	"testing"
+
+	"github.com/apache/arrow/go/v12/arrow"
+	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/apache/arrow/go/v12/arrow/internal/testing/types"
+	"github.com/apache/arrow/go/v12/arrow/memory"
+	"github.com/google/uuid"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+var testUUID = uuid.New()
+
+func TestExtensionBuilder(t *testing.T) {
+	extBuilder := array.NewExtensionBuilder(memory.DefaultAllocator, types.NewUUIDType())
+	builder := types.NewUUIDBuilder(extBuilder)

Review Comment:
   use
   
   ```go
   mem := memory.NewCheckedAllocator(memory.DefaultAllocator)
   defer mem.AssertSize(t, 0)
   ```
   
   to ensure that the memory is handled properly (for example, you're missing the `Release` calls here for the builders and generated arrays



##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,20 +18,171 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+}
+
+func NewUUIDBuilder(bldr *array.ExtensionBuilder) *UUIDBuilder {
+	b := &UUIDBuilder{
+		ExtensionBuilder: bldr,
+	}
+	return b
+}
+
+func (b *UUIDBuilder) Append(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).Append(v[:])
+}
+
+func (b *UUIDBuilder) UnsafeAppend(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).UnsafeAppend(v[:])
+}
+
+func (b *UUIDBuilder) AppendValues(v []uuid.UUID, valid []bool) {
+	data := make([][]byte, len(v))
+	for i, v := range v {
+		data[i] = v[:]
+	}
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).AppendValues(data, valid)

Review Comment:
   this could be done more efficiently if desired via resize + `UnsafeAppend` or other methods. Not necessary for this PR but just something to think about.



##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,20 +18,171 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+}
+
+func NewUUIDBuilder(bldr *array.ExtensionBuilder) *UUIDBuilder {
+	b := &UUIDBuilder{
+		ExtensionBuilder: bldr,
+	}
+	return b
+}
+
+func (b *UUIDBuilder) Append(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).Append(v[:])
+}
+
+func (b *UUIDBuilder) UnsafeAppend(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).UnsafeAppend(v[:])
+}
+
+func (b *UUIDBuilder) AppendValues(v []uuid.UUID, valid []bool) {
+	data := make([][]byte, len(v))
+	for i, v := range v {
+		data[i] = v[:]
+	}
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).AppendValues(data, valid)
+}
+
+func (b *UUIDBuilder) UnmarshalOne(dec *json.Decoder) error {
+	t, err := dec.Token()
+	if err != nil {
+		return err
+	}
+
+	var val uuid.UUID
+	switch v := t.(type) {
+	case string:
+		data, err := uuid.Parse(v)
+		if err != nil {
+			return err
+		}
+		val = data
+	case []byte:
+		data, err := uuid.ParseBytes(v)
+		if err != nil {
+			return err
+		}
+		val = data

Review Comment:
   Is this case actually getting tested by the tests you wrote? Check the test coverage please to confirm. I think it would only go through the String case. I'm not positive though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade merged pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade merged PR #34454:
URL: https://github.com/apache/arrow/pull/34454


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1132020015


##########
go/arrow/internal/testing/types/extension_test.go:
##########
@@ -0,0 +1,63 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package types_test
+
+import (
+	"bytes"
+	"encoding/json"
+	"testing"
+
+	"github.com/apache/arrow/go/v12/arrow"
+	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/apache/arrow/go/v12/arrow/internal/testing/types"
+	"github.com/apache/arrow/go/v12/arrow/memory"
+	"github.com/google/uuid"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+var testUUID = uuid.New()
+
+func TestExtensionBuilder(t *testing.T) {
+	extBuilder := array.NewExtensionBuilder(memory.DefaultAllocator, types.NewUUIDType())
+	builder := types.NewUUIDBuilder(extBuilder)
+	builder.Append(testUUID)
+	arr := builder.NewArray()
+	arrStr := arr.String()
+	assert.Equal(t, "[\""+testUUID.String()+"\"]", arrStr)
+	jsonStr, err := json.Marshal(arr)
+	assert.NoError(t, err)
+
+	arr1, _, err := array.FromJSON(memory.DefaultAllocator, types.NewUUIDType(), bytes.NewReader(jsonStr))
+	assert.NoError(t, err)
+	assert.Equal(t, arr, arr1)
+}
+
+func TestExtensionRecordBuilder(t *testing.T) {

Review Comment:
   I think this is tested via the `MarshalJSON` in a way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1126868015


##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,26 +18,181 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/apache/arrow/go/v12/arrow/memory"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+	dtype *UUIDType
+}
+
+func NewUUIDBuilder(mem memory.Allocator, dtype arrow.ExtensionType) *UUIDBuilder {
+	b := &UUIDBuilder{
+		ExtensionBuilder: array.NewExtensionBuilder(mem, dtype),
+		dtype:            dtype.(*UUIDType),
+	}
+	return b
+}
+
+func (b *UUIDBuilder) Append(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).Append(v[:])
+}
+
+func (b *UUIDBuilder) UnsafeAppend(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).UnsafeAppend(v[:])
+}
+
+func (b *UUIDBuilder) AppendValues(v []uuid.UUID, valid []bool) {
+	data := make([][]byte, len(v))
+	for i, v := range v {
+		data[i] = v[:]
+	}
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).AppendValues(data, valid)
+}
+
+func (b *UUIDBuilder) UnmarshalOne(dec *json.Decoder) error {
+	t, err := dec.Token()

Review Comment:
   Should we have default implementations of the Unmarshal methods for the base `ExtensionBuilder` which just forwards to the storage type?



##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   should we follow the same pattern as `ArrayType` and have a `BuilderType` method which returns a `reflect.Type` that we use to wrap the `ExtensionBuilder` with? This also avoids the import cycle.
   
   Another thing to consider is that this is going to break any and all existing Extension types in other consumers' codebases. We should probably make a second interface type which contains the `BuilderType` method so that we can just use a type assertion test in `NewBuilder` rather than break existing consumers?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1130613280


##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   I like that! Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1127414058


##########
go/arrow/internal/testing/types/extension_types.go:
##########
@@ -18,26 +18,181 @@
 package types
 
 import (
+	"bytes"
 	"encoding/binary"
 	"fmt"
 	"reflect"
+	"strings"
+
+	"github.com/goccy/go-json"
 
 	"github.com/apache/arrow/go/v12/arrow"
 	"github.com/apache/arrow/go/v12/arrow/array"
+	"github.com/apache/arrow/go/v12/arrow/memory"
+	"github.com/google/uuid"
 	"golang.org/x/xerrors"
 )
 
+type UUIDBuilder struct {
+	*array.ExtensionBuilder
+	dtype *UUIDType
+}
+
+func NewUUIDBuilder(mem memory.Allocator, dtype arrow.ExtensionType) *UUIDBuilder {
+	b := &UUIDBuilder{
+		ExtensionBuilder: array.NewExtensionBuilder(mem, dtype),
+		dtype:            dtype.(*UUIDType),
+	}
+	return b
+}
+
+func (b *UUIDBuilder) Append(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).Append(v[:])
+}
+
+func (b *UUIDBuilder) UnsafeAppend(v uuid.UUID) {
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).UnsafeAppend(v[:])
+}
+
+func (b *UUIDBuilder) AppendValues(v []uuid.UUID, valid []bool) {
+	data := make([][]byte, len(v))
+	for i, v := range v {
+		data[i] = v[:]
+	}
+	b.ExtensionBuilder.Builder.(*array.FixedSizeBinaryBuilder).AppendValues(data, valid)
+}
+
+func (b *UUIDBuilder) UnmarshalOne(dec *json.Decoder) error {
+	t, err := dec.Token()

Review Comment:
   If you don't implement `UnmarshalOne` this should call the underlying ExtensionBuilder `UnmarshalOne`. This part was critical for us because otherwise we can't use any of `Marshal`, `Unmarshal` functions as they will always result `base64` encoded values instead of what is a fit for the extension. For UUID for example we want to unmarshal a uuid or in case of IP Address (a string of ip address while storing this as a binary).
   
   Here is an example of a few types we implemented on top of this PR: https://github.com/cloudquery/filetypes/tree/main/internal/cqarrow (This is just a PoC and will eventually go into our [SDK](https://github.com/cloudquery/plugin-sdk) which is used by all our [plugins](https://github.com/cloudquery/cloudquery) )
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34454: GH-34453: [Go] Support Builders for user defined extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34454:
URL: https://github.com/apache/arrow/pull/34454#discussion_r1130035871


##########
go/arrow/datatype_extension.go:
##########
@@ -122,7 +124,9 @@ type ExtensionType interface {
 	// If the storage type is incorrect or something else is invalid with the data this should
 	// return nil and an appropriate error.
 	Deserialize(storageType DataType, data string) (ExtensionType, error)
-
+	// this should return array.Builder interface but we cannot import due to cycle import, so we use 
+	// interface{} instead. At least for 
+	NewBuilder(mem memory.Allocator, dt ExtensionType) interface{}

Review Comment:
   Darn, that's really annoying..... I agree it's better to avoid a big refactor atm. 
   
   I still want to change the signature though instead of having consumers have to create both the underlying storage builder & wrap it.
   
   Instead of adding this method to the `ExtensionType` interface, we could just go with an interface defined in the `array` package which would let us use `Builder` in the method definition. something like:
   
   ```go
   type ExtensionTypeCustomBuilder interface {
       NewBuilder(ExtensionBuilder) Builder
   }
   ```
   
   Then in `builder.go` you can do:
   
   ```go
   case arrow.EXTENSION:
           typ := dtype.(arrow.ExtensionType)
           bldr := NewExtensionBuilder(mem, typ)
           if custom, ok := typ.(ExtensionTypeCustomBuilder); ok {
                  return custom.NewBuilder(bldr)
           }
           return bldr
   ```
   
   What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org