You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "metalmatze (via GitHub)" <gi...@apache.org> on 2023/05/24 17:16:30 UTC

[GitHub] [arrow] metalmatze opened a new pull request, #35744: GH-35711: [Go] Add `Value` and `GetValueIndex` methods to some builders

metalmatze opened a new pull request, #35744:
URL: https://github.com/apache/arrow/pull/35744

   ### Rationale for this change
   
   See #35711 
   
   ### What changes are included in this PR?
   
   Adding `*BinaryDictionaryBuilder.GetValueIndex(i int) int` to get the value's index in the dictionary. Then two methods to get a `[]byte` and `string` results. `Value(i int) []byte` and `ValueStr(i int) string`.
   
   ### Are these changes tested?
   
   So far I have added very little testing. Mostly some sort of integration test to read back values from the BinaryDictionaryBuilder. We should definitely add some test for the `*BinaryMemoTable) Value(i int) []byte`. 
   
   ### Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   Just additions, no breaking API changes. 
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->
   
   cc @zeroshade 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35744: GH-35711: [Go] Add `Value` and `GetValueIndex` methods to some builders

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35744:
URL: https://github.com/apache/arrow/pull/35744#issuecomment-1561645979

   * Closes: #35711


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #35744: GH-35711: [Go] Add `Value` and `GetValueIndex` methods to some builders

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #35744:
URL: https://github.com/apache/arrow/pull/35744#discussion_r1205903529


##########
go/arrow/array/dictionary.go:
##########
@@ -1293,6 +1293,41 @@ func (b *BinaryDictionaryBuilder) InsertStringDictValues(arr *String) (err error
 	return
 }
 
+func (b *BinaryDictionaryBuilder) GetValueIndex(i int) int {
+	switch b := b.idxBuilder.Builder.(type) {
+	case *Uint8Builder:
+		return int(b.Value(i))
+	case *Int8Builder:
+		return int(b.Value(i))
+	case *Uint16Builder:
+		return int(b.Value(i))
+	case *Int16Builder:
+		return int(b.Value(i))
+	case *Uint32Builder:
+		return int(b.Value(i))
+	case *Int32Builder:
+		return int(b.Value(i))
+	case *Uint64Builder:
+		return int(b.Value(i))
+	case *Int64Builder:
+		return int(b.Value(i))
+	default:
+		return -1
+	}
+}
+
+func (b *BinaryDictionaryBuilder) Value(i int) []byte {
+	switch mt := b.memoTable.(type) {
+	case *hashing.BinaryMemoTable:
+		return mt.Value(i)
+	}
+	return nil
+}
+
+func (b *BinaryDictionaryBuilder) ValueStr(i int) string {
+	return string(b.Value(i))

Review Comment:
   `return *(*string)(unsafe.Pointer(&b))` will return the string without copying the bytes. :smile:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] metalmatze commented on a diff in pull request #35744: GH-35711: [Go] Add `Value` and `GetValueIndex` methods to some builders

Posted by "metalmatze (via GitHub)" <gi...@apache.org>.
metalmatze commented on code in PR #35744:
URL: https://github.com/apache/arrow/pull/35744#discussion_r1205852804


##########
go/arrow/array/dictionary.go:
##########
@@ -1293,6 +1293,41 @@ func (b *BinaryDictionaryBuilder) InsertStringDictValues(arr *String) (err error
 	return
 }
 
+func (b *BinaryDictionaryBuilder) GetValueIndex(i int) int {
+	switch b := b.idxBuilder.Builder.(type) {
+	case *Uint8Builder:
+		return int(b.Value(i))
+	case *Int8Builder:
+		return int(b.Value(i))
+	case *Uint16Builder:
+		return int(b.Value(i))
+	case *Int16Builder:
+		return int(b.Value(i))
+	case *Uint32Builder:
+		return int(b.Value(i))
+	case *Int32Builder:
+		return int(b.Value(i))
+	case *Uint64Builder:
+		return int(b.Value(i))
+	case *Int64Builder:
+		return int(b.Value(i))
+	default:
+		return -1
+	}
+}
+
+func (b *BinaryDictionaryBuilder) Value(i int) []byte {
+	switch mt := b.memoTable.(type) {
+	case *hashing.BinaryMemoTable:
+		return mt.Value(i)
+	}
+	return nil
+}
+
+func (b *BinaryDictionaryBuilder) ValueStr(i int) string {
+	return string(b.Value(i))

Review Comment:
   Just now, I saw that this actually allocates quite a lot of bytes... 
   
   For now, I have moved our code to use the `Value(i int) []byte` method above and then using `bytes.Equal` to compared our to bytes converted string with what we read from this builder. 
   
   Is there a better way you use to convert these bytes to strings in the arrow project? I'm thinking of the [yoloString](https://github.com/prometheus/prometheus/blob/cb045c0e4b94bbf3eee174d91b5ef2b8553948d5/model/textparse/promparse.go#L425) Prometheus has.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35744: GH-35711: [Go] Add `Value` and `GetValueIndex` methods to some builders

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35744:
URL: https://github.com/apache/arrow/pull/35744#issuecomment-1561646014

   :warning: GitHub issue #35711 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade merged pull request #35744: GH-35711: [Go] Add `Value` and `GetValueIndex` methods to some builders

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade merged PR #35744:
URL: https://github.com/apache/arrow/pull/35744


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35744: GH-35711: [Go] Add `Value` and `GetValueIndex` methods to some builders

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35744:
URL: https://github.com/apache/arrow/pull/35744#issuecomment-1569375265

   Benchmark runs are scheduled for baseline = 422174e9b6c926fdb6f55741f1484e9b3e0cb001 and contender = 762bd2ab2ebd99b64022b4d42f4a8b035cbf2dd9. 762bd2ab2ebd99b64022b4d42f4a8b035cbf2dd9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/cb530c1fb1cf449db576c1cb8d9ae9cc...7a9809f00b354854b0fffda60edf9c19/)
   [Failed :arrow_down:0.0% :arrow_up:0.03%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/6d3c4c647ea6437fb656fa7ceeb8b8d5...aee36284c8624ddda9749e91841ba6d4/)
   [Finished :arrow_down:0.31% :arrow_up:2.75%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/ec05dc4b15b64024b9044660493f7a95...4fb4c8a6c2f1474bba77b618ccbdd55e/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6705161f8d9e4e97bd6ae247e123436b...074a8870f24d445496fbd88a268ab37e/)
   Buildkite builds:
   [Finished] [`762bd2ab` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2949)
   [Failed] [`762bd2ab` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2985)
   [Finished] [`762bd2ab` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2950)
   [Failed] [`762bd2ab` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2975)
   [Finished] [`422174e9` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2948)
   [Finished] [`422174e9` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2984)
   [Finished] [`422174e9` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2949)
   [Failed] [`422174e9` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2974)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] metalmatze commented on pull request #35744: GH-35711: [Go] Add `Value` and `GetValueIndex` methods to some builders

Posted by "metalmatze (via GitHub)" <gi...@apache.org>.
metalmatze commented on PR #35744:
URL: https://github.com/apache/arrow/pull/35744#issuecomment-1563308900

   Rebased. :slightly_smiling_face: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org