You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "yevgenypats (via GitHub)" <gi...@apache.org> on 2023/02/24 21:57:05 UTC

[GitHub] [arrow] yevgenypats opened a new pull request, #34343: [GO][CSV] Support lists

yevgenypats opened a new pull request, #34343:
URL: https://github.com/apache/arrow/pull/34343

   Still WIP as it only support write but wanted to open a PR so it will be available for comments.
   
   This PR only handles list (of all supported type in recursive manner), follow-up PR will include extensions.
   
   https://github.com/apache/arrow/issues/34334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34343: GH-34334: [Go][CSV] Support lists

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1445350167

   :warning: GitHub issue #34334 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support lists and extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1119010162


##########
go/arrow/csv/reader_test.go:
##########
@@ -350,10 +358,12 @@ rec[2]["f32"]: [(null)]
 rec[2]["f64"]: [(null)]
 rec[2]["str"]: [(null)]
 rec[2]["ts"]: [(null)]
+rec[2]["list(i64)"]: [(null)]
+rec[2]["uuid"]: (extension_type<storage=fixed_size_binary[16]>)[(null)]
 `
-
-	if got, want := out.String(), want; got != want {
-		t.Fatalf("invalid output:\ngot= %s\nwant=%s\n", got, want)
+	got, want := out.String(), want
+	if diff := cmp.Diff(want, got); diff != "" {
+		t.Fatalf("invalid output: (-want, +got) = %s", diff)

Review Comment:
   instead of adding a new dependency, we could use `github.com/stretchr/testify/assert` which will also create the diff for you as you need by using `assert.Equal`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1461591626

   Benchmark runs are scheduled for baseline = 88d39d5d7b9cdad6385b84d96b4e0e65069037c3 and contender = 2f3f41f05e4bfae24344ee7a4b39ad6719f98751. 2f3f41f05e4bfae24344ee7a4b39ad6719f98751 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/48b6c5a3af9c45cd88350dd193051aa1...82247c00bc614b7198d5102c636b3ae2/)
   [Failed :arrow_down:0.09% :arrow_up:0.03%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/ae19ccfa47b54a32983d84569ce71583...cb297380ca494c219e6fa6c7fbb9669e/)
   [Finished :arrow_down:1.79% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/2eb56eb16cf94c408ef050310e1142f6...40bb5b87bdbb402193d74079f52a5e70/)
   [Failed :arrow_down:0.06% :arrow_up:0.06%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6f009a6c03a94a0db37eafb820af9a90...1ba4188478244e568e334dbd9956e86c/)
   Buildkite builds:
   [Finished] [`2f3f41f0` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2498)
   [Finished] [`2f3f41f0` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2528)
   [Finished] [`2f3f41f0` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2496)
   [Finished] [`2f3f41f0` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2519)
   [Finished] [`88d39d5d` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2497)
   [Failed] [`88d39d5d` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2527)
   [Finished] [`88d39d5d` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2495)
   [Failed] [`88d39d5d` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2518)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1128282068


##########
go/arrow/csv/reader.go:
##########
@@ -721,6 +726,32 @@ func (r *Reader) parseDecimal256(field array.Builder, str string, prec, scale in
 	field.(*array.Decimal256Builder).Append(val)
 }
 
+func (r *Reader) parseList(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
+		field.AppendNull()
+		return
+	}
+	str = strings.TrimPrefix(str, "{")
+	str = strings.TrimSuffix(str, "}")
+	listBldr := field.(*array.ListBuilder)
+	listBldr.Append(true)
+	valueBldr := listBldr.ValueBuilder()
+	reader := csv.NewReader(strings.NewReader(str))
+	items, err := reader.Read()
+	if err != nil {
+		r.err = err
+		return
+	}
+	for _, str := range items {
+		r.initFieldConverter(valueBldr)(str)
+	}

Review Comment:
   a future enhancement might be to cache this somehow... but it's not needed for this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1458590604

   @yevgenypats it is unrelated to this PR, I'm addressing it in #34488


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support lists and extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1119001805


##########
go/arrow/csv/reader.go:
##########
@@ -721,6 +734,39 @@ func (r *Reader) parseDecimal256(field array.Builder, str string, prec, scale in
 	field.(*array.Decimal256Builder).Append(val)
 }
 
+func (r *Reader) parseList(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
+		field.AppendNull()
+		return
+	}
+	str = strings.TrimPrefix(str, "{")
+	str = strings.TrimSuffix(str, "}")
+	strs := strings.Split(str, ",")

Review Comment:
   You probably need to use an actual `csv` reader here instead of just splitting on `","`. The reason is so that quotes are handled properly. Imagine a list of strings where one of the strings has a comma in them. Normal CSV handling would use quotes like `foo,bar,"something,baz"` to handle it. But if you just split on `","` it won't handle the quotes correctly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1458586857

   Thanks @zeroshade ! BTW - any idea why the `GO / AMD64 macOS 11 Go 1.17/1.18 - CGO` fails? seems unrelated to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade merged pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade merged PR #34343:
URL: https://github.com/apache/arrow/pull/34343


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1128671750


##########
go/arrow/csv/reader.go:
##########
@@ -731,15 +731,19 @@ func (r *Reader) parseList(field array.Builder, str string) {
 		field.AppendNull()
 		return
 	}
-	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+	if !(strings.HasPrefix(str, "{") && strings.HasSuffix(str, "}")) {
 		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
-		field.AppendNull()
 		return
 	}
 	str = strings.TrimPrefix(str, "{")
 	str = strings.TrimSuffix(str, "}")

Review Comment:
   I can change this but I feel the explicit way is clearer and potentially more efficient. But let me know if you want it the other way. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1460449123

   @yevgenypats Turns out there was a typo in the go workflow yaml in the fix for the cgo macos tests, please rebase main again to get the Go workflows running again (we fixed the typo) thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support lists and extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1119006466


##########
go/arrow/csv/reader.go:
##########
@@ -721,6 +734,39 @@ func (r *Reader) parseDecimal256(field array.Builder, str string, prec, scale in
 	field.(*array.Decimal256Builder).Append(val)
 }
 
+func (r *Reader) parseList(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
+		field.AppendNull()
+		return
+	}
+	str = strings.TrimPrefix(str, "{")
+	str = strings.TrimSuffix(str, "}")
+	strs := strings.Split(str, ",")
+	if len(strs) == 0 {
+		field.AppendNull()
+		return
+	}
+	listBldr := field.(*array.ListBuilder)
+	listBldr.Append(true)
+	valueBldr := listBldr.ValueBuilder()
+	for _, str := range strs {
+		r.initFieldConverter(valueBldr)(str)
+	}
+}
+
+func (r *Reader) parseFixedSizeBinary(field array.Builder, str string, byteWidth int) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	field.(*array.FixedSizeBinaryBuilder).Append([]byte(str))
+}

Review Comment:
   We should base64 decode the string so that we can get actual binary data. We should also verify the length of the decoded binary against `byteWidth`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support lists and extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1119004002


##########
go/arrow/csv/reader.go:
##########
@@ -721,6 +734,39 @@ func (r *Reader) parseDecimal256(field array.Builder, str string, prec, scale in
 	field.(*array.Decimal256Builder).Append(val)
 }
 
+func (r *Reader) parseList(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
+		field.AppendNull()
+		return
+	}
+	str = strings.TrimPrefix(str, "{")
+	str = strings.TrimSuffix(str, "}")
+	strs := strings.Split(str, ",")
+	if len(strs) == 0 {
+		field.AppendNull()
+		return
+	}

Review Comment:
   an empty list is not equivalent to null. You should call `AppendEmpty` or you can call `Append(true)` and just not add anything to the value builder.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support lists and extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1120470176


##########
go/arrow/csv/reader.go:
##########
@@ -721,6 +734,39 @@ func (r *Reader) parseDecimal256(field array.Builder, str string, prec, scale in
 	field.(*array.Decimal256Builder).Append(val)
 }
 
+func (r *Reader) parseList(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
+		field.AppendNull()
+		return
+	}
+	str = strings.TrimPrefix(str, "{")
+	str = strings.TrimSuffix(str, "}")
+	strs := strings.Split(str, ",")
+	if len(strs) == 0 {
+		field.AppendNull()
+		return
+	}
+	listBldr := field.(*array.ListBuilder)
+	listBldr.Append(true)
+	valueBldr := listBldr.ValueBuilder()
+	for _, str := range strs {
+		r.initFieldConverter(valueBldr)(str)
+	}
+}
+
+func (r *Reader) parseFixedSizeBinary(field array.Builder, str string, byteWidth int) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	field.(*array.FixedSizeBinaryBuilder).Append([]byte(str))
+}

Review Comment:
   I'd argue we probably want to make sure that any solution to allow the customization of the behavior *should* default to just calling the underlying storage type if there is no customized version available.
   
   It might make sense to allow providing a custom marshaller / unmarshaller when constructing the Array Type or the ExtensionType itself? And then just defaulting to the underlying storage if that custom one is nil? Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34343: [GO][CSV] Support lists

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1444576561

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1458399645

   @zeroshade I've updated this PR to only include list support as figured out it will be easier to do extensions in a follow-up PR potentially after https://github.com/apache/arrow/pull/34454.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1129707589


##########
go/arrow/csv/reader.go:
##########
@@ -731,15 +731,19 @@ func (r *Reader) parseList(field array.Builder, str string) {
 		field.AppendNull()
 		return
 	}
-	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+	if !(strings.HasPrefix(str, "{") && strings.HasSuffix(str, "}")) {
 		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
-		field.AppendNull()
 		return
 	}
 	str = strings.TrimPrefix(str, "{")
 	str = strings.TrimSuffix(str, "}")

Review Comment:
   Looking at the implementations from the `strings` package, I'm pretty sure that calling `strings.Trim` is more efficient than calling `TrimPrefix` and `TrimSuffix` separately. At minimum it reduces the number of code branches that get used. Personally I'd prefer using `strings.Trim` over the separate calls to prefix/suffix. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1129814269


##########
go/arrow/csv/reader.go:
##########
@@ -731,15 +731,19 @@ func (r *Reader) parseList(field array.Builder, str string) {
 		field.AppendNull()
 		return
 	}
-	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+	if !(strings.HasPrefix(str, "{") && strings.HasSuffix(str, "}")) {
 		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
-		field.AppendNull()
 		return
 	}
 	str = strings.TrimPrefix(str, "{")
 	str = strings.TrimSuffix(str, "}")

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1128574767


##########
go/arrow/csv/reader.go:
##########
@@ -721,6 +726,32 @@ func (r *Reader) parseDecimal256(field array.Builder, str string, prec, scale in
 	field.(*array.Decimal256Builder).Append(val)
 }
 
+func (r *Reader) parseList(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
+		field.AppendNull()
+		return
+	}

Review Comment:
   Good catch. Fixed and added tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1128553464


##########
go/arrow/csv/reader.go:
##########
@@ -731,15 +731,19 @@ func (r *Reader) parseList(field array.Builder, str string) {
 		field.AppendNull()
 		return
 	}
-	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+	if !(strings.HasPrefix(str, "{") && strings.HasSuffix(str, "}")) {
 		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
-		field.AppendNull()
 		return
 	}
 	str = strings.TrimPrefix(str, "{")
 	str = strings.TrimSuffix(str, "}")

Review Comment:
   I think this can be condensed into a single call to `strings.Trim(str, "{}")`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34343: GH-34334: [Go][CSV] Support lists

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1445350146

   * Closes: #34334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1460540765

   > @yevgenypats Turns out there was a typo in the go workflow yaml in the fix for the cgo macos tests, please rebase main again to get the Go workflows running again (we fixed the typo) thanks!
   
   @zeroshade done! I think this should be ready to go 🚢 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1458860363

   > If you rebase in from master, that'll pick up the fix for the failing macos tests. Once you add a test for the error case, I'm happy to merge this :)
   
   Awesome. Done!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1458835173

   If you rebase in from master, that'll pick up the fix for the failing macos tests. Once you add a test for the error case, I'm happy to merge this :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support lists and extensions

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1119004002


##########
go/arrow/csv/reader.go:
##########
@@ -721,6 +734,39 @@ func (r *Reader) parseDecimal256(field array.Builder, str string, prec, scale in
 	field.(*array.Decimal256Builder).Append(val)
 }
 
+func (r *Reader) parseList(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
+		field.AppendNull()
+		return
+	}
+	str = strings.TrimPrefix(str, "{")
+	str = strings.TrimSuffix(str, "}")
+	strs := strings.Split(str, ",")
+	if len(strs) == 0 {
+		field.AppendNull()
+		return
+	}

Review Comment:
   an empty list is not equivalent to null. You should call `AppendEmpty` or you can call `Append(true)` and just not add anything to the value builder. Essentially, you don't need the separate check for when `len(strs) == 0` :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support lists and extensions

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1119281784


##########
go/arrow/csv/reader.go:
##########
@@ -721,6 +734,39 @@ func (r *Reader) parseDecimal256(field array.Builder, str string, prec, scale in
 	field.(*array.Decimal256Builder).Append(val)
 }
 
+func (r *Reader) parseList(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
+		field.AppendNull()
+		return
+	}
+	str = strings.TrimPrefix(str, "{")
+	str = strings.TrimSuffix(str, "}")
+	strs := strings.Split(str, ",")
+	if len(strs) == 0 {
+		field.AppendNull()
+		return
+	}
+	listBldr := field.(*array.ListBuilder)
+	listBldr.Append(true)
+	valueBldr := listBldr.ValueBuilder()
+	for _, str := range strs {
+		r.initFieldConverter(valueBldr)(str)
+	}
+}
+
+func (r *Reader) parseFixedSizeBinary(field array.Builder, str string, byteWidth int) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	field.(*array.FixedSizeBinaryBuilder).Append([]byte(str))
+}

Review Comment:
   Good call. Now that I think about it, when handling an extension type I think we shouldn't default to the underlying storage but rather then to a `MarshalJSON` that is also needed here - https://github.com/apache/arrow/issues/34292#issuecomment-1446791648 WDYT ? As for example in case of uuid I actually don't want it to default to the underlying storage implementation neither in JSON nor in CSV. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34343:
URL: https://github.com/apache/arrow/pull/34343#discussion_r1128279930


##########
go/arrow/csv/reader.go:
##########
@@ -721,6 +726,32 @@ func (r *Reader) parseDecimal256(field array.Builder, str string, prec, scale in
 	field.(*array.Decimal256Builder).Append(val)
 }
 
+func (r *Reader) parseList(field array.Builder, str string) {
+	if r.isNull(str) {
+		field.AppendNull()
+		return
+	}
+	if !(strings.HasPrefix(str, "{") || strings.HasSuffix(str, "}")) {
+		r.err = errors.New("invalid list format. should start with '{' and end with '}'")
+		field.AppendNull()
+		return
+	}

Review Comment:
   The logic here isn't quite correct. It should be 
   
   `if !strings.HasPrefix(str, "{") || !strings.HasSuffix(str, "}") {`
   
   The current logic would only error if *both* are missing. Is there a unit test which tests the case where there is only a prefix "{" or only a suffix "}"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #34343: GH-34334: [Go][CSV] Support list fields

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #34343:
URL: https://github.com/apache/arrow/pull/34343#issuecomment-1458573709

   @yevgenypats This looks good to me other than my last two comments. Please add a couple tests for checking error cases and an empty list case. After that this LGTM :smile: thanks again for this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org