You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "zeroshade (via GitHub)" <gi...@apache.org> on 2023/03/21 19:08:45 UTC

[GitHub] [arrow] zeroshade opened a new pull request, #34666: GH-32950: [Go] REE Benchmarks

zeroshade opened a new pull request, #34666:
URL: https://github.com/apache/arrow/pull/34666

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   Adding benchmarks for `run_end_encode` and `run_end_decode` to track and hopefully improve the performance of these kernels.
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] zeroshade merged pull request #34666: GH-32950: [Go] REE Benchmarks

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.

zeroshade merged PR #34666:
URL: https://github.com/apache/arrow/pull/34666


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #34666: GH-32950: [Go] REE Benchmarks

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #34666:
URL: https://github.com/apache/arrow/pull/34666#issuecomment-1480595483

   ['Python', 'R'] benchmarks have high level of regressions.
   [test-mac-arm](https://conbench.ursa.dev/compare/runs/355ec80c40d24e8598d541956f4ea05e...012aa8ed8d4a4fcebe7fe6883d8117b7/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] zeroshade commented on pull request #34666: GH-32950: [Go] REE Benchmarks

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.

zeroshade commented on PR #34666:
URL: https://github.com/apache/arrow/pull/34666#issuecomment-1478444495

   CC @felipecrv 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #34666: GH-32950: [Go] REE Benchmarks

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #34666:
URL: https://github.com/apache/arrow/pull/34666#issuecomment-1478444590

   * Closes: #32950


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] zeroshade commented on a diff in pull request #34666: GH-32950: [Go] REE Benchmarks

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.

zeroshade commented on code in PR #34666:
URL: https://github.com/apache/arrow/pull/34666#discussion_r1145116487


##########
go/arrow/compute/vector_run_end_test.go:
##########
@@ -295,3 +299,125 @@ func TestRunEndFunctions(t *testing.T) {
 		})
 	}
 }
+
+func benchRunEndEncode(b *testing.B, sz int, nullProb float64, runEndType, valueType arrow.DataType) {
+	b.Run("encode", func(b *testing.B) {
+		var (
+			mem = memory.NewCheckedAllocator(memory.DefaultAllocator)
+			rng = gen.NewRandomArrayGenerator(seed, mem)
+		)
+
+		values := rng.ArrayOf(valueType.ID(), int64(sz), nullProb)
+		b.Cleanup(func() {
+			values.Release()
+		})
+
+		var (
+			res   compute.Datum
+			err   error
+			ctx   = compute.WithAllocator(context.Background(), mem)
+			input = &compute.ArrayDatum{Value: values.Data()}
+			opts  = compute.RunEndEncodeOptions{RunEndType: runEndType}
+
+			byts int64
+		)
+
+		for _, buf := range values.Data().Buffers() {
+			if buf != nil {
+				byts += int64(buf.Len())
+			}
+		}
+
+		b.SetBytes(byts)
+		b.ResetTimer()
+		for n := 0; n < b.N; n++ {
+			res, err = compute.RunEndEncode(ctx, opts, input)

Review Comment:
   so right now it's using a specific seed to ensure consistency, but that's a good point that this benchmark is essentially still just benchmarking a random normal distribution of *values* with no guarantees to any particular distribution of runs and run-lengths. It might make more sense to  restrict the min/max values to better guarantee some runs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #34666: GH-32950: [Go] REE Benchmarks

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #34666:
URL: https://github.com/apache/arrow/pull/34666#issuecomment-1480595076

   Benchmark runs are scheduled for baseline = fd5d7107436b2cf6ace361edc7f732a3d48c0f0e and contender = a1153a8cd374455438d7ae4b4293c8bbc6e3abd3. a1153a8cd374455438d7ae4b4293c8bbc6e3abd3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/74f26a594ee641e78c0afc4bd8ed091c...185153d983e04725adf2e4fc388468a9/)
   [Failed :arrow_down:0.45% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/355ec80c40d24e8598d541956f4ea05e...012aa8ed8d4a4fcebe7fe6883d8117b7/)
   [Finished :arrow_down:0.51% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/5a5f82e41bd04a7aaf6354a45d9bf90d...ac8a4f4332fc480e8c83f378ac8dba56/)
   [Finished :arrow_down:0.22% :arrow_up:0.03%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2ae948166db94be2ac798d291aaad6c0...24d4e25f8c4f4adbb89fb8b3568bc264/)
   Buildkite builds:
   [Finished] [`a1153a8c` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2558)
   [Finished] [`a1153a8c` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2588)
   [Finished] [`a1153a8c` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2556)
   [Finished] [`a1153a8c` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2579)
   [Finished] [`fd5d7107` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2557)
   [Failed] [`fd5d7107` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2587)
   [Finished] [`fd5d7107` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2555)
   [Finished] [`fd5d7107` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2578)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #34666: GH-32950: [Go] REE Benchmarks

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #34666:
URL: https://github.com/apache/arrow/pull/34666#issuecomment-1478444648

   :warning: GitHub issue #32950 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] felipecrv commented on a diff in pull request #34666: GH-32950: [Go] REE Benchmarks

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.

felipecrv commented on code in PR #34666:
URL: https://github.com/apache/arrow/pull/34666#discussion_r1145111994


##########
go/arrow/compute/vector_run_end_test.go:
##########
@@ -295,3 +299,125 @@ func TestRunEndFunctions(t *testing.T) {
 		})
 	}
 }
+
+func benchRunEndEncode(b *testing.B, sz int, nullProb float64, runEndType, valueType arrow.DataType) {
+	b.Run("encode", func(b *testing.B) {
+		var (
+			mem = memory.NewCheckedAllocator(memory.DefaultAllocator)
+			rng = gen.NewRandomArrayGenerator(seed, mem)
+		)
+
+		values := rng.ArrayOf(valueType.ID(), int64(sz), nullProb)
+		b.Cleanup(func() {
+			values.Release()
+		})
+
+		var (
+			res   compute.Datum
+			err   error
+			ctx   = compute.WithAllocator(context.Background(), mem)
+			input = &compute.ArrayDatum{Value: values.Data()}
+			opts  = compute.RunEndEncodeOptions{RunEndType: runEndType}
+
+			byts int64
+		)
+
+		for _, buf := range values.Data().Buffers() {
+			if buf != nil {
+				byts += int64(buf.Len())
+			}
+		}
+
+		b.SetBytes(byts)
+		b.ResetTimer()
+		for n := 0; n < b.N; n++ {
+			res, err = compute.RunEndEncode(ctx, opts, input)

Review Comment:
   This will vary very widely depending on the size of the runs. Does the REE random generator pay attention to this? Meaning: does it generate a normal distribution of run-lengths?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org