You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Mark Robert Miller (Jira)" <ji...@apache.org> on 2021/09/02 06:40:00 UTC

[jira] [Commented] (SOLR-15560) Optimize JavaBinCodec encode/decode performance.

    [ https://issues.apache.org/jira/browse/SOLR-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408581#comment-17408581 ] 

Mark Robert Miller commented on SOLR-15560:
-------------------------------------------

I’m about done here. One small thing I still want to look into and there are some things I want to remove and cleanup. Then this is also depends on a separate random benchmark generation issue.

I had two main concerns with these improvements.

One was that this issue is much, much better at the right large data. So I have pushed my avg comparisons to be much less favorable to this issue.

Two, this issue has the potential to generate more garbage, copy more byte arrays. I didn’t want anyone indexing huge docs and/or fields to go backwards. My latest peak at that, I started up a before and after test for encoding - that’s the memory eater. I ran the benchmark with 18 threads. There is a scale parameter that lets you scale up various data sizes. I monitored cpu and heap use. 

Starting at 2x scale, this issue ran twice as fast as current encoding, used 4% cpu vs 60, and used half the heap size of the current impl.

Ramping up to 3x, the current impl almost immediately OOMs. It doesn’t even enter the benchmark, it dies setting up the data - the benchmark setup encodes benchmark data for the decode bench (it setups up all the data whether you run decode or not). Threads and cpu immediately ramp up, OOM hits hard and fast. This issue runs the benchmark fine. I went to 4x. This issue runs fine. 5x. Runs fine. 6x. Runs fine. 7x. We are now using most of the heap, but runs fine. I stopped there. 

> Optimize JavaBinCodec encode/decode performance.
> ------------------------------------------------
>
>                 Key: SOLR-15560
>                 URL: https://issues.apache.org/jira/browse/SOLR-15560
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mark Robert Miller
>            Assignee: Mark Robert Miller
>            Priority: Minor
>         Attachments: javabin.decode.1.before.json, javabin.decode.2.after.json, javabin.decode.before.and.after.compare.png, javabin.decode.before.and.after.summary.png, javabin.encode.before.and.after.compare.png, javabin.encode.before.and.after.summary.png, javabin.encode.decode.compare.png, javabin.encode.decode.summary.png
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Javabin performance can be pretty impactful on search side scatter / gather and especially the /export handler.
> It turns out, in JavaBin, where it does a large switch to dispatch based on the type, its a hot spot that is too large to be inlined.
> You can pull some less common paths out into another method to address this.
> I have not benchmark this yet, and it’s possible other bottlenecks may dampen the win, but I noticed the following on ref branch (with a couple other optimizations that were not nearly as wide affecting or quite as hot):
> When you run the tests, you get the best results in “client” mode - eg you prevent the C2 compiler from kicking in. Let’s say I could run the core nightly tests serially on my laptop in about 8 minutes with C1 - C2 might take another 2 to 3 minutes on top. This is because the work it does optimizing and compiling and uncompiling on such a diverse task ends up being the dominant performance drag.
> With a bit of key optimization here, running the tests with C2 ends up about on par with stopping at C1, even though C2 still dominates everything else.
> That’s a pretty impactful win in order to be able to move the needle like that.
> Why such a win on C2 without C1 also dodging forward? It’s much more manageable to reduce the byte code for a none inlined hot method below the C2 size threshold for inlining than C1s.
> So this should be a decent win i hope. There are a variety of differences that may outweigh it though.
>  * javabin on master has tail recursion.
>  * generates a tremendous number of byte arrays
>  * converts between utf8 and utf16
>  * manually does the encoding (the jvm can cheat)
>  * Has a number of classes that extend it (vs 1 here)
>  * lots of other things
> I’m optimistic we can see some gain though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org