You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2016/07/14 15:45:20 UTC
[jira] [Commented] (SOLR-9296) Examine SortingResponseWriter with
an eye towards removing extra object creation
[ https://issues.apache.org/jira/browse/SOLR-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377139#comment-15377139 ]
Erick Erickson commented on SOLR-9296:
--------------------------------------
Some preliminary results:
I'm particularly interested in any pointers any of the Lucene people have. Now that I've poked at this enough to understand the issues I may be able to appreciate any pointers you have to offer.
"Bottom line". I've instrumented the SortingResponseWriter class to
1> not write to the client for testing
2> try to reduce object creation
3> report summary results only.
on 10M rows (see table below) I'm seeing 0-11% improvements in rows/second with one outlier (mv bool fields) showing 5% worse performance. See below. I'm also seeing a bit spiker response time with the old way of doing things, but probably within the margin of error of my measurements.
4M fewer char[] objects created (Visual VM)
Roughly the same number of other types of objects created.
40M total objects created. NOTE: I had to stop looking after 2.16M rows were processed since VisualVM was slowing the system to a crawl.
Still some work to go to see if I can understand why there were roughly the same number of String objects created, this is encouraging enough to pursue though I think.
First, any suggestions for the most vexing thing of all? Let's say I have to convert an integer to a char[] to output it. Currently that can be done with a formatter that takes an "Appendable". Great, I can reuse one StringBuilder/StringBuffer resetting the length to 0 each time. Unfortunately, there's no way to get to the underlying char[] buffer without copying it around. The OpenStringBuilder class that I'd like to use (lucene utility class) doesn't work because the formatter checks for instanceof StringBuffer and/or StringBuilder or asserts. So I wind up copying to a char[] (which I have one of per field) and writing.
I have a char[] cbuf that I can reuse for the entire export (for each field), so it looks like this
format(val, StringBuffer) // StringBuffer/StringBuilder, depending on what the formatters require)
StringBuilder.getChars(into cbuf)
writer.write(cbuf, 0, StringBuilder.length());
Whereas I'd like to avoid the getChars(...) call.
I'm traveling today so I won't post the code until perhaps tomorrow. So far:
I've taken out a bunch of conversions to String and created some classes that re-use a char[] to move data around. I created a "null writer" to remove the variable of the client having to read 10M rows for testing purposes.
On a preliminary run (exporting 10M rows of various types (int, long, string) the number of allocated objects reduced by about 4M char[] (of 40M total objects) while most of the other object counts remained about the same. I was surprised that the number of String objects stayed similar, I expected that to drop so I need to dig at that some more.
Speed wise I'm seeing up to an 11% improvement in throughput mostly in the single-valued case. Why mv should be different I'm not sure yet. writing mv fields varies from being 5% or so _worse_ (boolfield) to 10% or so better.
These measurements were taken with a null writer that just threw the bits on the floor and added a bit of instrumentation to return the aggregate. I took three runs, each exporting all 10.2M docs (No VisualVM attached, that was just for object counting and gets in the way of perf... badly). You'll notice in the following that all the tries return int_sv which I used as the sort criteria, figuring that would stay constant. Numbers are new/old in thousands, so the first entry says "for returning 10.2M single valued string and integer fields, the new code returned 170K/second and the old code processed 152K/second". Before taking any of these I did a full export of all the fields to try to remove loading and the like from the measurements and for each row below exported three times. The times for each of the three runs reported below were very similar
str_sv,int_sv
170/152
int_sv
193/175
long_sv,int_sv
176/165
date_sv,int_sv
167/145
bool_sv,int_sv
186/172
double_sv,int_sv
171/156
str_mv,int_sv
131/122
int_mv,int_sv
149/138
long_mv,int_sv
146/147
date_mv,int_sv
120/120
bool_mv,int_sv
174/183
double_mv,int_sv
124/125
str_sv,int_sv,date_sv,bool_sv,double_sv,str_mv,int_mv,date_mv,bool_mv,double_mv
55/55
> Examine SortingResponseWriter with an eye towards removing extra object creation
> --------------------------------------------------------------------------------
>
> Key: SOLR-9296
> URL: https://issues.apache.org/jira/browse/SOLR-9296
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 6.2, master (7.0)
> Reporter: Erick Erickson
> Assignee: Erick Erickson
>
> Assigning to myself just to keep from losing track it. Anyone who wants to take it, please feel free!
> While looking at SOLR-9166 I noticed that SortingResponseWriter does a toString for each field it writes out. At a _very_ preliminary examination it seems like we create a lot of String objects that need to be GC'd. Could we reduce this by using some kind of CharsRef/ByteBuffer/Whatever?
> I've only looked at this briefly, not quite sure what the gotchas are but throwing it out for discussion.
> Some initial thoughts:
> 1> for the fixed types (numerics, dates, booleans) there's a strict upper limit on the size of each value so we can allocate something up-front.
> 2> for string fields, we already get a chars ref so just pass that through?
> 3> must make sure that whatever does the actual writing transfers all the bytes before returning.
> I'm sure I won't get to this for a week or perhaps more, so grab it if you have the bandwidth.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org