You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "candiduslynx (via GitHub)" <gi...@apache.org> on 2023/06/01 11:11:47 UTC

[GitHub] [arrow] candiduslynx opened a new pull request, #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

candiduslynx opened a new pull request, #35872:
URL: https://github.com/apache/arrow/pull/35872

   ### Rationale for this change
   
   When comparing `array.Struct` values with `array.ApproxEqual` the validity bitmap of the struct itself should take precedence:
   > When reading the struct array the parent validity bitmap takes priority.
   
   This follows a brief discussion from #35851.
   
   ### What changes are included in this PR?
   
   `array.arrayApproxEqualStruct` will check the fields data only if the struct elem is valid.
   
   ### Are these changes tested?
   
   `pqarrow` tests are updated accordingly (no special treatment for structs, just `array.ApproxEqual`
   
   ### Are there any user-facing changes?
   
   `array.ApproxEqual` behavior changed to match the docs about validity bitmap precedence.
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on pull request #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on PR #35872:
URL: https://github.com/apache/arrow/pull/35872#issuecomment-1572621091

   @zeroshade I've updated the code to use bitutils visit, thanks for the suggestion!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #35872:
URL: https://github.com/apache/arrow/pull/35872#discussion_r1213320182


##########
go/arrow/array/compare.go:
##########
@@ -735,10 +735,14 @@ func arrayApproxEqualFixedSizeList(left, right *FixedSizeList, opt equalOption)
 }
 
 func arrayApproxEqualStruct(left, right *Struct, opt equalOption) bool {
-	for i, lf := range left.fields {
-		rf := right.fields[i]
-		if !arrayApproxEqual(lf, rf, opt) {
-			return false
+	for i := 0; i < left.Len(); i++ {
+		if left.IsNull(i) {
+			continue
+		}

Review Comment:
   shouldn't this be:
   
   ```go
   if left.IsNull(i) {
       if !right.IsNull(i) {
           return false
       }
       continue
   }
   ```
   ?



##########
go/arrow/array/compare.go:
##########
@@ -735,10 +735,14 @@ func arrayApproxEqualFixedSizeList(left, right *FixedSizeList, opt equalOption)
 }
 
 func arrayApproxEqualStruct(left, right *Struct, opt equalOption) bool {
-	for i, lf := range left.fields {
-		rf := right.fields[i]
-		if !arrayApproxEqual(lf, rf, opt) {
-			return false
+	for i := 0; i < left.Len(); i++ {
+		if left.IsNull(i) {
+			continue
+		}
+		for j := range left.fields {
+			if !sliceApproxEqual(left.fields[j], int64(i), int64(i+1), right.fields[j], int64(i), int64(i+1), opt) {
+				return false
+			}
 		}

Review Comment:
   We should probably use a `bitutils.SetBitRunReader` to simply skip the indices that are null in the parent bitmap and call `sliceApproxEqual` on the runs of valid values. Something like:
   
   ```go
   compareRun := func(pos, length int64) error {
           for i := range left.fields {
                   if !sliceApproxEqual(left.fields[i], pos, pos+length, right.fields[i], pos, pos+length, opt) {
                       return arrow.ErrInvalid
                   }
           }
           return nil
   }
   
   return bitutils.VisitSetBitRuns(left.NullBitmapBytes(), left.Offset(), left.Len(), compareRun) == nil
   ```
   
   This way we create fewer slices and do more optimized comparisons rather than creating a new slice for *every* index.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35872:
URL: https://github.com/apache/arrow/pull/35872#issuecomment-1574457116

   Benchmark runs are scheduled for baseline = 3299d12efc91220237266bfa6f985f9eb37492f8 and contender = 61692b69e4954f50ced110bf46dd92041748776f. 61692b69e4954f50ced110bf46dd92041748776f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/51319ae050bc4f42a407741149430cdd...1d42cf0f3a28412fb425b28e2b22028a/)
   [Finished :arrow_down:0.97% :arrow_up:0.06%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1672a789a631405d829a73eb0877ff20...76ec182de9a740bbb4d24884644d1da7/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bae2482171784558b00aee594194b8fc...01701b4a2331457b966152cab9b4de59/)
   [Finished :arrow_down:0.36% :arrow_up:0.03%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/3e2dfcab7176455ea7a083ef517f1e7c...456f7e9db9a7442e9c504f6a41111175/)
   Buildkite builds:
   [Finished] [`61692b69` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2970)
   [Finished] [`61692b69` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/3006)
   [Finished] [`61692b69` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2971)
   [Finished] [`61692b69` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2996)
   [Finished] [`3299d12e` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2969)
   [Finished] [`3299d12e` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/3005)
   [Finished] [`3299d12e` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2970)
   [Finished] [`3299d12e` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2995)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on pull request #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on PR #35872:
URL: https://github.com/apache/arrow/pull/35872#issuecomment-1571851941

   @zeroshade this PR includes some changes to pqarrow pkg as well, but they are minor (exit early on the found value, dedup code).
   I can extract it into a minor PR, if that's required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35872:
URL: https://github.com/apache/arrow/pull/35872#discussion_r1213378184


##########
go/arrow/array/compare.go:
##########
@@ -735,10 +735,14 @@ func arrayApproxEqualFixedSizeList(left, right *FixedSizeList, opt equalOption)
 }
 
 func arrayApproxEqualStruct(left, right *Struct, opt equalOption) bool {
-	for i, lf := range left.fields {
-		rf := right.fields[i]
-		if !arrayApproxEqual(lf, rf, opt) {
-			return false
+	for i := 0; i < left.Len(); i++ {
+		if left.IsNull(i) {
+			continue
+		}

Review Comment:
   we already checked for this in `baseArrayEqual`, see other `approx` implementations



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35872:
URL: https://github.com/apache/arrow/pull/35872#discussion_r1213391728


##########
go/arrow/array/compare.go:
##########
@@ -735,10 +735,14 @@ func arrayApproxEqualFixedSizeList(left, right *FixedSizeList, opt equalOption)
 }
 
 func arrayApproxEqualStruct(left, right *Struct, opt equalOption) bool {
-	for i, lf := range left.fields {
-		rf := right.fields[i]
-		if !arrayApproxEqual(lf, rf, opt) {
-			return false
+	for i := 0; i < left.Len(); i++ {
+		if left.IsNull(i) {
+			continue
+		}
+		for j := range left.fields {
+			if !sliceApproxEqual(left.fields[j], int64(i), int64(i+1), right.fields[j], int64(i), int64(i+1), opt) {
+				return false
+			}
 		}

Review Comment:
   Done in fb156f8a1198495134b0841a2c4021f6b1c6f619



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35872:
URL: https://github.com/apache/arrow/pull/35872#issuecomment-1571851120

   :warning: GitHub issue #35871 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade merged pull request #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade merged PR #35872:
URL: https://github.com/apache/arrow/pull/35872


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35872: GH-35871: [Go] Account for struct validity bitmap in `array.ApproxEqual`

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35872:
URL: https://github.com/apache/arrow/pull/35872#discussion_r1213378184


##########
go/arrow/array/compare.go:
##########
@@ -735,10 +735,14 @@ func arrayApproxEqualFixedSizeList(left, right *FixedSizeList, opt equalOption)
 }
 
 func arrayApproxEqualStruct(left, right *Struct, opt equalOption) bool {
-	for i, lf := range left.fields {
-		rf := right.fields[i]
-		if !arrayApproxEqual(lf, rf, opt) {
-			return false
+	for i := 0; i < left.Len(); i++ {
+		if left.IsNull(i) {
+			continue
+		}

Review Comment:
   When we get to `arrayApproxEqualStruct` we already [checked](https://github.com/apache/arrow/blob/main/go/arrow/array/compare.go#L479) for this (data type, len, nullN & validity bitmap) in [`baseArrayEqual`](https://github.com/apache/arrow/blob/main/go/arrow/array/compare.go#L616-L628)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org