You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Adrien Grand <jp...@gmail.com> on 2015/08/01 01:23:52 UTC

Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Hi Tomás,

I suspect this might be related to LUCENE-5938. We changed the default
rewrite method for multi-term queries to load documents into a sparse
bit set first first, and only upgrade to a dense bit set when we know
many documents match. When there are lots of terms to intersect, then
we end up spending significant cpu time to build the sparse bit set to
eventually upgrade to a dense bit set like before. This might be what
you are seeing.

You might see the issue less with the population field because it has
fewer unique values, so postings lists are longer and the DocIdSet
building logic can upgrade quicker to a dense bit set.

Mike noticed this slowness when working on BDK trees and we changed
this first phase to use a plain int[] array that we sort and
deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
which seemed to make things faster. Would it be possible for you to
also check a 5.3 snapshot?




On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
<to...@gmail.com> wrote:
> Hi, I'm seeing some query performance degradation between 4.10.4 and 5.2.1.
> It doesn't happen with all the queries, but for queries like range queries
> on fields with many different values the average time in 5.2.1 is worse than
> in 4.10.4. Is anyone seeing something similar?
>
> Test Details:
> * Single thread running queries continuously. I run the test twice for each
> Solr version.
> * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge instance
> with all the defaults but with 5G heap. Index in local disk (SSD)
> * Plain Solr releases, nothing custom. Single Solr core, not in SolrCloud
> mode, no distributed search.
> * "allCountries" geonames dataset (~8M small docs). No updates during the
> test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr 4.10.4
> (fits entirely in RAM)
> * jdk1.8.0_45
>
> Queries: 3k boolean queries (generated with terms from the dataset) with
> range queries as filters on "tlongitude" and "tlatitude" fields with
> randomly generated bounds, e.g.
> q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]
>
> Fields are:
> <field name="tlatitude" type="tdouble"/>
> <field name="tlongitude" type="tdouble"/>
> Field Type:
> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8"
> positionIncrementGap="0"/>
>
> In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1 in
> average.
>
> http://snag.gy/2yPPM.jpg
>
> Doing only the boolean queries show no performance difference between 4.10
> and 5.2, same thing if I do filters on a string field instead of the range
> queries.
>
> When using "double" field type (precisionStep="0"), the difference was
> bigger:
>
> longitude/latitude fields:
> <field name="longitude" type="double" docValues="true"/>
> <field name="latitude" type="double" docValues="true"/>
> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0"
> positionIncrementGap="0"/>
>
> http://snag.gy/Vi5uk.jpg
> I understand this is not the best field type definition for range queries,
> I'm just trying to understand the difference between the two versions and
> why.
>
> Performance was OK when doing range queries on the "population" field
> (long), but that field doesn't have many different values, only 300k out of
> the 8.3M docs have the population field with a value different to 0. On the
> other hand, doing range queries on the _version_ field did show a graph
> similar to the previous one:
>
> <field name="_version_" type="long" indexed="true" stored="true"/>
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> positionIncrementGap="0"/>
>
> http://snag.gy/4tc7e.jpg
>
> Any idea what could be causing this? Is this expected after some known
> change?
>
> With Solr 4.10, a single CPU core remains high during the test (close to
> 100%), but with Solr 5.2, different cores go up and down in utilization
> continuously. That's probably because of the different Jetty version I
> suppose.
> GC pattern looks similar in both. For both Solr versions I'm using the
> settings that ship with Solr (in solr.in.sh) except for Xmx and Xms
>



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Yonik Seeley <ys...@gmail.com>.
On Mon, Aug 3, 2015 at 6:07 PM, Anshum Gupta <an...@anshumgupta.net> wrote:
> Guess bypassing that check and always returning true instead would help
> confirm that. If that's the case, we should just initialize the timeout to
> Long.MAX_VALUE and check for that to short-circuit?

Another possibility is to run queries without a time limit on a reader
that isn't wrapped.
That could possibly be friendlier to branch prediction with mixed
queries (some without timeouts and some with).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
These are the tests I re-ran:
1) Boolean queries with filters (range queries) on a tdouble field
(precisionStep=8). Solr 4.10.4 and 5.2.1 are the official releases.
5.3.0-Exitable is a snapshot of 5.x branch, 5.3.0-Not-Exitable is the same
snapshot but I don't wrap the reader in ExitableDirectoryReader[1]
http://snag.gy/CClfF.jpg

2) The same queries, but the filters are made with "double" fields instead
(precisionStep=0):
http://snag.gy/jKsrV.jpg

I created https://issues.apache.org/jira/browse/SOLR-7875 and
https://issues.apache.org/jira/browse/SOLR-7876

[1] The code I changed in SolrIndexSearcher for the test:

private static DirectoryReader wrapReader(SolrCore core, DirectoryReader
reader) throws IOException {
    assert reader != null;
    DirectoryReader wrapped = UninvertingReader.wrap(reader,
core.getLatestSchema().getUninversionMap(reader));
    boolean disableExitableDirectory =
Boolean.getBoolean("solr.disableQueryTimeout");
    if (disableExitableDirectory) {
      return wrapped;
    }

    return ExitableDirectoryReader.wrap(wrapped,
SolrQueryTimeoutImpl.getInstance());
  }


On Tue, Aug 4, 2015 at 10:13 AM, Tomás Fernández Löbbe <
tomasflobbe@gmail.com> wrote:

> The stack traces I shared are also from the EC2 instances, using Amazon
> Linux. I can also include a test in Mac OS if you are interested in the
> results.
> These are some more tests I ran using 5.3 (r1693977) "Exitable" is with
> the ExitableDirectoryReader wrap, "Not-Exitable" is skips the wrapping of
> the reader completely in SolrIndexSearcher.
>
> Boolean queries with filters on the tdouble fields "tlongitude" and
> "tlatitude":
> http://snag.gy/1zjUy.jpg
> The difference between the two is very small. 5.3 looks better in
> performance than 5.2.1 (probably the fix that Adrian previously mentioned),
> but I ran the test from a different network/location, so the network
> latency may be different. I'll run the test against 5.2.1 and 4.10.4 again
> and update those results. Here is that preliminary graph:
> http://snag.gy/jR43Z.jpg
>
> The "ExitableDirectory" vs. "Not ExitableDirectory" did have a huge
> difference when running the range queries with the double field
> (precisionStep=0):
> http://snag.gy/frW8f.jpg
>
>
>
> On Tue, Aug 4, 2015 at 1:43 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>> Hi,
>>
>> I just asked, if maybe the stack dumps were from his MacOSX local
>> development machine. Otherwise the time difference between 4.10 and 5.2 on
>> MacOSX should be much larger - just as a cross check!
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>> > -----Original Message-----
>> > From: Adrien Grand [mailto:jpountz@gmail.com]
>> > Sent: Tuesday, August 04, 2015 10:35 AM
>> > To: dev@lucene.apache.org
>> > Subject: Re: Understanding performance degradation in range queries
>> > between Solr 5.2.1 and 4.10.4
>> >
>> > Tomás said Solr was running on EC2 in his initial email, so this can't
>> be
>> > MacOSX.
>> >
>> > On Tue, Aug 4, 2015 at 10:21 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> > > Hi,
>> > >
>> > > Just related: Do you run tests on MacOSX? NanoTime is damn slow there!
>> > > And its also not monotonic! There wss a discussion on hotspot-dev
>> > > about that one year ago, but no solution until now.
>> > >
>> > > Macos had no real nanotime impl in kernel so java falls back to wall
>> > > clock, which is slow and non monotonic with ntp or vmware... (see test
>> > > failures in the past on policeman when ntp adjusted clock).
>> > >
>> > > If you run on Linux this could also be a good test: on osx the
>> > > difference to prev version should be even larger.
>> > >
>> > > Uwe
>> > >
>> > >
>> > > Am 4. August 2015 03:07:45 MESZ, schrieb Anshum Gupta
>> > > <an...@anshumgupta.net>:
>> > >>
>> > >> Guess bypassing that check and always returning true instead would
>> > >> help confirm that. If that's the case, we should just initialize the
>> > >> timeout to Long.MAX_VALUE and check for that to short-circuit?
>> > >>
>> > >> On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe
>> > >> <to...@gmail.com> wrote:
>> > >>>
>> > >>> Yes, I saw that, but thought it could be the underlying
>> > >>> implementation, not the "ExitableTermsEnum" wrapper. Maybe it's
>> > >>> related to the calls to System.nanoTime then...
>> > >>>
>> > >>> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <jp...@gmail.com>
>> > wrote:
>> > >>>>
>> > >>>> Thanks for sharing the traces, it looks like my intuition was
>> wrong.
>> > >>>> :) They seem to point to
>> > >>>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks
>> > >>>> whether the time is out before delegating.
>> > >>>>
>> > >>>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
>> > >>>> <to...@gmail.com> wrote:
>> > >>>> > Thanks Adrien,
>> > >>>> > I'll run the tests with 5.3 snapshot and post the results here.
>> > >>>> > In case this helps, this is the hprof samples output
>> > >>>> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-
>> > user/hprof_output.
>> > >>>> > txt)
>> > >>>> > for
>> > >>>> > 4.10.4 and 5.2.1 in my test:
>> > >>>> >
>> > >>>> > Solr 4.10.4:
>> > >>>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
>> > >>>> > rank   self  accum   count trace method
>> > >>>> >    1 75.07% 75.07%  182812 300523
>> > >>>> > java.net.PlainSocketImpl.socketAccept
>> > >>>> >    2  4.27% 79.34%   10408 301576
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMet
>> > aData
>> > >>>> >    3  4.15% 83.49%   10108 301585
>> > >>>> > org.apache.lucene.index.FilteredTermsEnum.docs
>> > >>>> >    4  3.46% 86.95%    8419 301582
>> > >>>> > org.apache.lucene.index.FilteredTermsEnum.next
>> > >>>> >    5  2.49% 89.44%    6070 301573
>> > >>>> > java.net.SocketInputStream.socketRead0
>> > >>>> >    6  1.99% 91.43%    4848 301599
>> > >>>> >
>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>> > >>>> >    7  1.98% 93.42%    4824 301583
>> > >>>> >
>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>> > >>>> >    8  1.57% 94.99%    3824 301589
>> > >>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>> > >>>> >    9  1.44% 96.42%    3504 301594
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnu
>> > m.refillDocs
>> > >>>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
>> > >>>> >   11  0.98% 98.50%    2388 301598
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnu
>> > m.nextDoc
>> > >>>> >   12  0.62% 99.12%    1522 301600
>> > >>>> > org.apache.lucene.store.DataInput.readVInt
>> > >>>> >   13  0.21% 99.33%     500 301612
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
>> > >>>> >   14  0.07% 99.39%     167 301601
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>> > >>>> >   15  0.06% 99.45%     139 301619
>> java.lang.System.identityHashCode
>> > >>>> >   16  0.05% 99.50%     114 301632
>> > >>>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
>> > >>>> >   17  0.04% 99.54%      92 300708
>> java.util.zip.Inflater.inflateBytes
>> > >>>> >   18  0.03% 99.57%      76 301624
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFlo
>> > orBlock
>> > >>>> >   19  0.03% 99.59%      68 300613
>> java.lang.ClassLoader.defineClass1
>> > >>>> >   20  0.03% 99.62%      68 301615
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> > >>>> >   21  0.03% 99.65%      62 301635
>> > >>>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
>> > >>>> >   22  0.02% 99.66%      41 301664
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> > >>>> >   23  0.01% 99.68%      31 301629
>> > >>>> > org.apache.lucene.util.FixedBitSet.<init>
>> > >>>> > CPU SAMPLES END
>> > >>>> >
>> > >>>> > Solr 5.2.1:
>> > >>>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
>> > >>>> > rank   self  accum   count trace method
>> > >>>> >    1 51.38% 51.38%  120954 301291
>> > >>>> > sun.nio.ch.EPollArrayWrapper.epollWait
>> > >>>> >    2 25.69% 77.07%   60477 301292
>> > >>>> > sun.nio.ch.ServerSocketChannelImpl.accept0
>> > >>>> >    3 10.59% 87.66%   24923 301369
>> > >>>> >
>> > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
>> > >>>> >    4  2.20% 89.86%    5182 301414
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
>> > >>>> >    5  2.01% 91.87%    4742 301384
>> > >>>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
>> > >>>> >    6  1.25% 93.12%    2944 301434
>> > >>>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
>> > >>>> >    7  1.11% 94.23%    2612 301367
>> > >>>> >
>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>> > >>>> >    8  0.94% 95.17%    2204 301390
>> org.apache.lucene.util.BitSet.or
>> > >>>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
>> > >>>> >   10  0.78% 96.87%    1825 301449
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
>> > m.refillDocs
>> > >>>> >   11  0.73% 97.60%    1717 301378
>> > >>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>> > >>>> >   12  0.73% 98.33%    1715 301374
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   13  0.33% 98.66%     787 301387
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
>> > m.nextDoc
>> > >>>> >   14  0.16% 98.82%     374 301426
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
>> > m.nextDoc
>> > >>>> >   15  0.10% 98.93%     245 301382
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   16  0.09% 99.02%     219 301381
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>> > >>>> >   17  0.09% 99.11%     207 301370
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   18  0.06% 99.17%     153 301416
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   19  0.06% 99.24%     151 301427
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   20  0.06% 99.30%     151 301441
>> > >>>> > org.apache.lucene.store.DataInput.readVInt
>> > >>>> >   21  0.06% 99.36%     147 301389
>> java.lang.System.identityHashCode
>> > >>>> >   22  0.06% 99.42%     140 301375
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> > >>>> >   23  0.04% 99.47%     104 301425
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   24  0.03% 99.50%      76 301423
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
>> > m.nextDoc
>> > >>>> >   25  0.03% 99.53%      74 301454
>> > >>>> >
>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>> > >>>> >   26  0.03% 99.56%      65 301432
>> > >>>> > org.apache.lucene.util.BitDocIdSet$Builder.or
>> > >>>> >   27  0.02% 99.58%      53 301456
>> > >>>> > org.apache.lucene.util.FixedBitSet.or
>> > >>>> >   28  0.02% 99.60%      52 300077
>> java.lang.ClassLoader.defineClass1
>> > >>>> >   29  0.02% 99.63%      50 301464
>> > >>>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
>> > >>>> >   30  0.02% 99.64%      39 301438
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> > >>>> >   31  0.02% 99.66%      37 301465
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFlo
>> > orBlock
>> > >>>> >   32  0.02% 99.67%      36 301419
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDoc
>> > >>>> > sEnum.nextDoc
>> > >>>> > CPU SAMPLES END
>> > >>>> >
>> > >>>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jpountz@gmail.com
>> >
>> > >>>> > wrote:
>> > >>>> >>
>> > >>>> >> Hi Tomás,
>> > >>>> >>
>> > >>>> >> I suspect this might be related to LUCENE-5938. We changed the
>> > >>>> >> default rewrite method for multi-term queries to load documents
>> > >>>> >> into a sparse bit set first first, and only upgrade to a dense
>> > >>>> >> bit set when we know many documents match. When there are lots
>> > >>>> >> of terms to intersect, then we end up spending significant cpu
>> > >>>> >> time to build the sparse bit set to eventually upgrade to a
>> > >>>> >> dense bit set like before. This might be what you are seeing.
>> > >>>> >>
>> > >>>> >> You might see the issue less with the population field because
>> > >>>> >> it has fewer unique values, so postings lists are longer and the
>> > >>>> >> DocIdSet building logic can upgrade quicker to a dense bit set.
>> > >>>> >>
>> > >>>> >> Mike noticed this slowness when working on BDK trees and we
>> > >>>> >> changed this first phase to use a plain int[] array that we sort
>> > >>>> >> and deduplicate instead of a more fancy sparse bit set
>> > >>>> >> (LUCENE-6645), which seemed to make things faster. Would it be
>> > >>>> >> possible for you to also check a 5.3 snapshot?
>> > >>>> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
>> > >>>> >> <to...@gmail.com> wrote:
>> > >>>> >> > Hi, I'm seeing some query performance degradation between
>> > >>>> >> > 4.10.4 and 5.2.1.
>> > >>>> >> > It doesn't happen with all the queries, but for queries like
>> > >>>> >> > range queries on fields with many different values the average
>> > >>>> >> > time in 5.2.1 is worse than in 4.10.4. Is anyone seeing
>> > >>>> >> > something similar?
>> > >>>> >> >
>> > >>>> >> > Test Details:
>> > >>>> >> > * Single thread running queries continuously. I run the test
>> > >>>> >> > twice for each Solr version.
>> > >>>> >> > * JMeter running on my laptop, Solr running on EC2, on an
>> > >>>> >> > m3.xlarge instance with all the defaults but with 5G heap.
>> > >>>> >> > Index in local disk (SSD)
>> > >>>> >> > * Plain Solr releases, nothing custom. Single Solr core, not
>> > >>>> >> > in SolrCloud mode, no distributed search.
>> > >>>> >> > * "allCountries" geonames dataset (~8M small docs). No updates
>> > >>>> >> > during the test. Index Size is around 1.1GB for Solr 5.2.1 and
>> > >>>> >> > 1.3GB for Solr
>> > >>>> >> > 4.10.4
>> > >>>> >> > (fits entirely in RAM)
>> > >>>> >> > * jdk1.8.0_45
>> > >>>> >> >
>> > >>>> >> > Queries: 3k boolean queries (generated with terms from the
>> > >>>> >> > dataset) with range queries as filters on "tlongitude" and
>> > >>>> >> > "tlatitude" fields with randomly generated bounds, e.g.
>> > >>>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO
>> > X]&fq=tlatitude:[Y
>> > >>>> >> > TO Z]
>> > >>>> >> >
>> > >>>> >> > Fields are:
>> > >>>> >> > <field name="tlatitude" type="tdouble"/> <field
>> > >>>> >> > name="tlongitude" type="tdouble"/> Field Type:
>> > >>>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
>> > >>>> >> > precisionStep="8"
>> > >>>> >> > positionIncrementGap="0"/>
>> > >>>> >> >
>> > >>>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than
>> > >>>> >> > 5.2.1 in average.
>> > >>>> >> >
>> > >>>> >> > http://snag.gy/2yPPM.jpg
>> > >>>> >> >
>> > >>>> >> > Doing only the boolean queries show no performance difference
>> > >>>> >> > between
>> > >>>> >> > 4.10
>> > >>>> >> > and 5.2, same thing if I do filters on a string field instead
>> > >>>> >> > of the range queries.
>> > >>>> >> >
>> > >>>> >> > When using "double" field type (precisionStep="0"), the
>> > >>>> >> > difference was
>> > >>>> >> > bigger:
>> > >>>> >> >
>> > >>>> >> > longitude/latitude fields:
>> > >>>> >> > <field name="longitude" type="double" docValues="true"/>
>> > >>>> >> > <field name="latitude" type="double" docValues="true"/>
>> > >>>> >> > <fieldType name="double" class="solr.TrieDoubleField"
>> > >>>> >> > precisionStep="0"
>> > >>>> >> > positionIncrementGap="0"/>
>> > >>>> >> >
>> > >>>> >> > http://snag.gy/Vi5uk.jpg
>> > >>>> >> > I understand this is not the best field type definition for
>> > >>>> >> > range queries, I'm just trying to understand the difference
>> > >>>> >> > between the two versions and why.
>> > >>>> >> >
>> > >>>> >> > Performance was OK when doing range queries on the
>> > "population"
>> > >>>> >> > field
>> > >>>> >> > (long), but that field doesn't have many different values,
>> > >>>> >> > only 300k out of the 8.3M docs have the population field with
>> > >>>> >> > a value different to 0. On the other hand, doing range queries
>> > >>>> >> > on the _version_ field did show a graph similar to the
>> > >>>> >> > previous one:
>> > >>>> >> >
>> > >>>> >> > <field name="_version_" type="long" indexed="true"
>> > >>>> >> > stored="true"/> <fieldType name="long"
>> > class="solr.TrieLongField" precisionStep="0"
>> > >>>> >> > positionIncrementGap="0"/>
>> > >>>> >> >
>> > >>>> >> > http://snag.gy/4tc7e.jpg
>> > >>>> >> >
>> > >>>> >> > Any idea what could be causing this? Is this expected after
>> > >>>> >> > some known change?
>> > >>>> >> >
>> > >>>> >> > With Solr 4.10, a single CPU core remains high during the test
>> > >>>> >> > (close to 100%), but with Solr 5.2, different cores go up and
>> > >>>> >> > down in utilization continuously. That's probably because of
>> > >>>> >> > the different Jetty version I suppose.
>> > >>>> >> > GC pattern looks similar in both. For both Solr versions I'm
>> > >>>> >> > using the settings that ship with Solr (in solr.in.sh) except
>> > >>>> >> > for Xmx and Xms
>> > >>>> >> >
>> > >>>> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >> --
>> > >>>> >> Adrien
>> > >>>> >>
>> > >>>> >> ----------------------------------------------------------------
>> > >>>> >> ----- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > >>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> > >>>> >>
>> > >>>> >
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>> Adrien
>> > >>>>
>> > >>>> -------------------------------------------------------------------
>> > >>>> -- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> > >>>> additional commands, e-mail: dev-help@lucene.apache.org
>> > >>>>
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Anshum Gupta
>> > >
>> > >
>> > > --
>> > > Uwe Schindler
>> > > H.-H.-Meier-Allee 63, 28213 Bremen
>> > > http://www.thetaphi.de
>> >
>> >
>> >
>> > --
>> > Adrien
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> additional
>> > commands, e-mail: dev-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
The stack traces I shared are also from the EC2 instances, using Amazon
Linux. I can also include a test in Mac OS if you are interested in the
results.
These are some more tests I ran using 5.3 (r1693977) "Exitable" is with the
ExitableDirectoryReader wrap, "Not-Exitable" is skips the wrapping of the
reader completely in SolrIndexSearcher.

Boolean queries with filters on the tdouble fields "tlongitude" and
"tlatitude":
http://snag.gy/1zjUy.jpg
The difference between the two is very small. 5.3 looks better in
performance than 5.2.1 (probably the fix that Adrian previously mentioned),
but I ran the test from a different network/location, so the network
latency may be different. I'll run the test against 5.2.1 and 4.10.4 again
and update those results. Here is that preliminary graph:
http://snag.gy/jR43Z.jpg

The "ExitableDirectory" vs. "Not ExitableDirectory" did have a huge
difference when running the range queries with the double field
(precisionStep=0):
http://snag.gy/frW8f.jpg



On Tue, Aug 4, 2015 at 1:43 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> I just asked, if maybe the stack dumps were from his MacOSX local
> development machine. Otherwise the time difference between 4.10 and 5.2 on
> MacOSX should be much larger - just as a cross check!
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Adrien Grand [mailto:jpountz@gmail.com]
> > Sent: Tuesday, August 04, 2015 10:35 AM
> > To: dev@lucene.apache.org
> > Subject: Re: Understanding performance degradation in range queries
> > between Solr 5.2.1 and 4.10.4
> >
> > Tomás said Solr was running on EC2 in his initial email, so this can't be
> > MacOSX.
> >
> > On Tue, Aug 4, 2015 at 10:21 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > > Hi,
> > >
> > > Just related: Do you run tests on MacOSX? NanoTime is damn slow there!
> > > And its also not monotonic! There wss a discussion on hotspot-dev
> > > about that one year ago, but no solution until now.
> > >
> > > Macos had no real nanotime impl in kernel so java falls back to wall
> > > clock, which is slow and non monotonic with ntp or vmware... (see test
> > > failures in the past on policeman when ntp adjusted clock).
> > >
> > > If you run on Linux this could also be a good test: on osx the
> > > difference to prev version should be even larger.
> > >
> > > Uwe
> > >
> > >
> > > Am 4. August 2015 03:07:45 MESZ, schrieb Anshum Gupta
> > > <an...@anshumgupta.net>:
> > >>
> > >> Guess bypassing that check and always returning true instead would
> > >> help confirm that. If that's the case, we should just initialize the
> > >> timeout to Long.MAX_VALUE and check for that to short-circuit?
> > >>
> > >> On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe
> > >> <to...@gmail.com> wrote:
> > >>>
> > >>> Yes, I saw that, but thought it could be the underlying
> > >>> implementation, not the "ExitableTermsEnum" wrapper. Maybe it's
> > >>> related to the calls to System.nanoTime then...
> > >>>
> > >>> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <jp...@gmail.com>
> > wrote:
> > >>>>
> > >>>> Thanks for sharing the traces, it looks like my intuition was wrong.
> > >>>> :) They seem to point to
> > >>>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks
> > >>>> whether the time is out before delegating.
> > >>>>
> > >>>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
> > >>>> <to...@gmail.com> wrote:
> > >>>> > Thanks Adrien,
> > >>>> > I'll run the tests with 5.3 snapshot and post the results here.
> > >>>> > In case this helps, this is the hprof samples output
> > >>>> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-
> > user/hprof_output.
> > >>>> > txt)
> > >>>> > for
> > >>>> > 4.10.4 and 5.2.1 in my test:
> > >>>> >
> > >>>> > Solr 4.10.4:
> > >>>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
> > >>>> > rank   self  accum   count trace method
> > >>>> >    1 75.07% 75.07%  182812 300523
> > >>>> > java.net.PlainSocketImpl.socketAccept
> > >>>> >    2  4.27% 79.34%   10408 301576
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMet
> > aData
> > >>>> >    3  4.15% 83.49%   10108 301585
> > >>>> > org.apache.lucene.index.FilteredTermsEnum.docs
> > >>>> >    4  3.46% 86.95%    8419 301582
> > >>>> > org.apache.lucene.index.FilteredTermsEnum.next
> > >>>> >    5  2.49% 89.44%    6070 301573
> > >>>> > java.net.SocketInputStream.socketRead0
> > >>>> >    6  1.99% 91.43%    4848 301599
> > >>>> >
> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
> > >>>> >    7  1.98% 93.42%    4824 301583
> > >>>> >
> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
> > >>>> >    8  1.57% 94.99%    3824 301589
> > >>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
> > >>>> >    9  1.44% 96.42%    3504 301594
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnu
> > m.refillDocs
> > >>>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
> > >>>> >   11  0.98% 98.50%    2388 301598
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnu
> > m.nextDoc
> > >>>> >   12  0.62% 99.12%    1522 301600
> > >>>> > org.apache.lucene.store.DataInput.readVInt
> > >>>> >   13  0.21% 99.33%     500 301612
> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
> > >>>> >   14  0.07% 99.39%     167 301601
> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
> > >>>> >   15  0.06% 99.45%     139 301619
> java.lang.System.identityHashCode
> > >>>> >   16  0.05% 99.50%     114 301632
> > >>>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
> > >>>> >   17  0.04% 99.54%      92 300708
> java.util.zip.Inflater.inflateBytes
> > >>>> >   18  0.03% 99.57%      76 301624
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFlo
> > orBlock
> > >>>> >   19  0.03% 99.59%      68 300613
> java.lang.ClassLoader.defineClass1
> > >>>> >   20  0.03% 99.62%      68 301615
> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> > >>>> >   21  0.03% 99.65%      62 301635
> > >>>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
> > >>>> >   22  0.02% 99.66%      41 301664
> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> > >>>> >   23  0.01% 99.68%      31 301629
> > >>>> > org.apache.lucene.util.FixedBitSet.<init>
> > >>>> > CPU SAMPLES END
> > >>>> >
> > >>>> > Solr 5.2.1:
> > >>>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
> > >>>> > rank   self  accum   count trace method
> > >>>> >    1 51.38% 51.38%  120954 301291
> > >>>> > sun.nio.ch.EPollArrayWrapper.epollWait
> > >>>> >    2 25.69% 77.07%   60477 301292
> > >>>> > sun.nio.ch.ServerSocketChannelImpl.accept0
> > >>>> >    3 10.59% 87.66%   24923 301369
> > >>>> >
> > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
> > >>>> >    4  2.20% 89.86%    5182 301414
> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
> > >>>> >    5  2.01% 91.87%    4742 301384
> > >>>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
> > >>>> >    6  1.25% 93.12%    2944 301434
> > >>>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
> > >>>> >    7  1.11% 94.23%    2612 301367
> > >>>> >
> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
> > >>>> >    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
> > >>>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
> > >>>> >   10  0.78% 96.87%    1825 301449
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
> > m.refillDocs
> > >>>> >   11  0.73% 97.60%    1717 301378
> > >>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
> > >>>> >   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
> > >>>> >   13  0.33% 98.66%     787 301387
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
> > m.nextDoc
> > >>>> >   14  0.16% 98.82%     374 301426
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
> > m.nextDoc
> > >>>> >   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
> > >>>> >   16  0.09% 99.02%     219 301381
> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
> > >>>> >   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
> > >>>> >   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
> > >>>> >   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
> > >>>> >   20  0.06% 99.30%     151 301441
> > >>>> > org.apache.lucene.store.DataInput.readVInt
> > >>>> >   21  0.06% 99.36%     147 301389
> java.lang.System.identityHashCode
> > >>>> >   22  0.06% 99.42%     140 301375
> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> > >>>> >   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
> > >>>> >   24  0.03% 99.50%      76 301423
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
> > m.nextDoc
> > >>>> >   25  0.03% 99.53%      74 301454
> > >>>> >
> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
> > >>>> >   26  0.03% 99.56%      65 301432
> > >>>> > org.apache.lucene.util.BitDocIdSet$Builder.or
> > >>>> >   27  0.02% 99.58%      53 301456
> > >>>> > org.apache.lucene.util.FixedBitSet.or
> > >>>> >   28  0.02% 99.60%      52 300077
> java.lang.ClassLoader.defineClass1
> > >>>> >   29  0.02% 99.63%      50 301464
> > >>>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
> > >>>> >   30  0.02% 99.64%      39 301438
> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> > >>>> >   31  0.02% 99.66%      37 301465
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFlo
> > orBlock
> > >>>> >   32  0.02% 99.67%      36 301419
> > >>>> >
> > >>>> >
> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDoc
> > >>>> > sEnum.nextDoc
> > >>>> > CPU SAMPLES END
> > >>>> >
> > >>>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jp...@gmail.com>
> > >>>> > wrote:
> > >>>> >>
> > >>>> >> Hi Tomás,
> > >>>> >>
> > >>>> >> I suspect this might be related to LUCENE-5938. We changed the
> > >>>> >> default rewrite method for multi-term queries to load documents
> > >>>> >> into a sparse bit set first first, and only upgrade to a dense
> > >>>> >> bit set when we know many documents match. When there are lots
> > >>>> >> of terms to intersect, then we end up spending significant cpu
> > >>>> >> time to build the sparse bit set to eventually upgrade to a
> > >>>> >> dense bit set like before. This might be what you are seeing.
> > >>>> >>
> > >>>> >> You might see the issue less with the population field because
> > >>>> >> it has fewer unique values, so postings lists are longer and the
> > >>>> >> DocIdSet building logic can upgrade quicker to a dense bit set.
> > >>>> >>
> > >>>> >> Mike noticed this slowness when working on BDK trees and we
> > >>>> >> changed this first phase to use a plain int[] array that we sort
> > >>>> >> and deduplicate instead of a more fancy sparse bit set
> > >>>> >> (LUCENE-6645), which seemed to make things faster. Would it be
> > >>>> >> possible for you to also check a 5.3 snapshot?
> > >>>> >>
> > >>>> >>
> > >>>> >>
> > >>>> >>
> > >>>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
> > >>>> >> <to...@gmail.com> wrote:
> > >>>> >> > Hi, I'm seeing some query performance degradation between
> > >>>> >> > 4.10.4 and 5.2.1.
> > >>>> >> > It doesn't happen with all the queries, but for queries like
> > >>>> >> > range queries on fields with many different values the average
> > >>>> >> > time in 5.2.1 is worse than in 4.10.4. Is anyone seeing
> > >>>> >> > something similar?
> > >>>> >> >
> > >>>> >> > Test Details:
> > >>>> >> > * Single thread running queries continuously. I run the test
> > >>>> >> > twice for each Solr version.
> > >>>> >> > * JMeter running on my laptop, Solr running on EC2, on an
> > >>>> >> > m3.xlarge instance with all the defaults but with 5G heap.
> > >>>> >> > Index in local disk (SSD)
> > >>>> >> > * Plain Solr releases, nothing custom. Single Solr core, not
> > >>>> >> > in SolrCloud mode, no distributed search.
> > >>>> >> > * "allCountries" geonames dataset (~8M small docs). No updates
> > >>>> >> > during the test. Index Size is around 1.1GB for Solr 5.2.1 and
> > >>>> >> > 1.3GB for Solr
> > >>>> >> > 4.10.4
> > >>>> >> > (fits entirely in RAM)
> > >>>> >> > * jdk1.8.0_45
> > >>>> >> >
> > >>>> >> > Queries: 3k boolean queries (generated with terms from the
> > >>>> >> > dataset) with range queries as filters on "tlongitude" and
> > >>>> >> > "tlatitude" fields with randomly generated bounds, e.g.
> > >>>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO
> > X]&fq=tlatitude:[Y
> > >>>> >> > TO Z]
> > >>>> >> >
> > >>>> >> > Fields are:
> > >>>> >> > <field name="tlatitude" type="tdouble"/> <field
> > >>>> >> > name="tlongitude" type="tdouble"/> Field Type:
> > >>>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
> > >>>> >> > precisionStep="8"
> > >>>> >> > positionIncrementGap="0"/>
> > >>>> >> >
> > >>>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than
> > >>>> >> > 5.2.1 in average.
> > >>>> >> >
> > >>>> >> > http://snag.gy/2yPPM.jpg
> > >>>> >> >
> > >>>> >> > Doing only the boolean queries show no performance difference
> > >>>> >> > between
> > >>>> >> > 4.10
> > >>>> >> > and 5.2, same thing if I do filters on a string field instead
> > >>>> >> > of the range queries.
> > >>>> >> >
> > >>>> >> > When using "double" field type (precisionStep="0"), the
> > >>>> >> > difference was
> > >>>> >> > bigger:
> > >>>> >> >
> > >>>> >> > longitude/latitude fields:
> > >>>> >> > <field name="longitude" type="double" docValues="true"/>
> > >>>> >> > <field name="latitude" type="double" docValues="true"/>
> > >>>> >> > <fieldType name="double" class="solr.TrieDoubleField"
> > >>>> >> > precisionStep="0"
> > >>>> >> > positionIncrementGap="0"/>
> > >>>> >> >
> > >>>> >> > http://snag.gy/Vi5uk.jpg
> > >>>> >> > I understand this is not the best field type definition for
> > >>>> >> > range queries, I'm just trying to understand the difference
> > >>>> >> > between the two versions and why.
> > >>>> >> >
> > >>>> >> > Performance was OK when doing range queries on the
> > "population"
> > >>>> >> > field
> > >>>> >> > (long), but that field doesn't have many different values,
> > >>>> >> > only 300k out of the 8.3M docs have the population field with
> > >>>> >> > a value different to 0. On the other hand, doing range queries
> > >>>> >> > on the _version_ field did show a graph similar to the
> > >>>> >> > previous one:
> > >>>> >> >
> > >>>> >> > <field name="_version_" type="long" indexed="true"
> > >>>> >> > stored="true"/> <fieldType name="long"
> > class="solr.TrieLongField" precisionStep="0"
> > >>>> >> > positionIncrementGap="0"/>
> > >>>> >> >
> > >>>> >> > http://snag.gy/4tc7e.jpg
> > >>>> >> >
> > >>>> >> > Any idea what could be causing this? Is this expected after
> > >>>> >> > some known change?
> > >>>> >> >
> > >>>> >> > With Solr 4.10, a single CPU core remains high during the test
> > >>>> >> > (close to 100%), but with Solr 5.2, different cores go up and
> > >>>> >> > down in utilization continuously. That's probably because of
> > >>>> >> > the different Jetty version I suppose.
> > >>>> >> > GC pattern looks similar in both. For both Solr versions I'm
> > >>>> >> > using the settings that ship with Solr (in solr.in.sh) except
> > >>>> >> > for Xmx and Xms
> > >>>> >> >
> > >>>> >>
> > >>>> >>
> > >>>> >>
> > >>>> >> --
> > >>>> >> Adrien
> > >>>> >>
> > >>>> >> ----------------------------------------------------------------
> > >>>> >> ----- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > >>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
> > >>>> >>
> > >>>> >
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Adrien
> > >>>>
> > >>>> -------------------------------------------------------------------
> > >>>> -- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > >>>> additional commands, e-mail: dev-help@lucene.apache.org
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Anshum Gupta
> > >
> > >
> > > --
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, 28213 Bremen
> > > http://www.thetaphi.de
> >
> >
> >
> > --
> > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> > commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

RE: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

I just asked, if maybe the stack dumps were from his MacOSX local development machine. Otherwise the time difference between 4.10 and 5.2 on MacOSX should be much larger - just as a cross check!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Adrien Grand [mailto:jpountz@gmail.com]
> Sent: Tuesday, August 04, 2015 10:35 AM
> To: dev@lucene.apache.org
> Subject: Re: Understanding performance degradation in range queries
> between Solr 5.2.1 and 4.10.4
> 
> Tomás said Solr was running on EC2 in his initial email, so this can't be
> MacOSX.
> 
> On Tue, Aug 4, 2015 at 10:21 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > Hi,
> >
> > Just related: Do you run tests on MacOSX? NanoTime is damn slow there!
> > And its also not monotonic! There wss a discussion on hotspot-dev
> > about that one year ago, but no solution until now.
> >
> > Macos had no real nanotime impl in kernel so java falls back to wall
> > clock, which is slow and non monotonic with ntp or vmware... (see test
> > failures in the past on policeman when ntp adjusted clock).
> >
> > If you run on Linux this could also be a good test: on osx the
> > difference to prev version should be even larger.
> >
> > Uwe
> >
> >
> > Am 4. August 2015 03:07:45 MESZ, schrieb Anshum Gupta
> > <an...@anshumgupta.net>:
> >>
> >> Guess bypassing that check and always returning true instead would
> >> help confirm that. If that's the case, we should just initialize the
> >> timeout to Long.MAX_VALUE and check for that to short-circuit?
> >>
> >> On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe
> >> <to...@gmail.com> wrote:
> >>>
> >>> Yes, I saw that, but thought it could be the underlying
> >>> implementation, not the "ExitableTermsEnum" wrapper. Maybe it's
> >>> related to the calls to System.nanoTime then...
> >>>
> >>> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <jp...@gmail.com>
> wrote:
> >>>>
> >>>> Thanks for sharing the traces, it looks like my intuition was wrong.
> >>>> :) They seem to point to
> >>>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks
> >>>> whether the time is out before delegating.
> >>>>
> >>>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
> >>>> <to...@gmail.com> wrote:
> >>>> > Thanks Adrien,
> >>>> > I'll run the tests with 5.3 snapshot and post the results here.
> >>>> > In case this helps, this is the hprof samples output
> >>>> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-
> user/hprof_output.
> >>>> > txt)
> >>>> > for
> >>>> > 4.10.4 and 5.2.1 in my test:
> >>>> >
> >>>> > Solr 4.10.4:
> >>>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
> >>>> > rank   self  accum   count trace method
> >>>> >    1 75.07% 75.07%  182812 300523
> >>>> > java.net.PlainSocketImpl.socketAccept
> >>>> >    2  4.27% 79.34%   10408 301576
> >>>> >
> >>>> >
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMet
> aData
> >>>> >    3  4.15% 83.49%   10108 301585
> >>>> > org.apache.lucene.index.FilteredTermsEnum.docs
> >>>> >    4  3.46% 86.95%    8419 301582
> >>>> > org.apache.lucene.index.FilteredTermsEnum.next
> >>>> >    5  2.49% 89.44%    6070 301573
> >>>> > java.net.SocketInputStream.socketRead0
> >>>> >    6  1.99% 91.43%    4848 301599
> >>>> >
> org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
> >>>> >    7  1.98% 93.42%    4824 301583
> >>>> >
> org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
> >>>> >    8  1.57% 94.99%    3824 301589
> >>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
> >>>> >    9  1.44% 96.42%    3504 301594
> >>>> >
> >>>> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnu
> m.refillDocs
> >>>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
> >>>> >   11  0.98% 98.50%    2388 301598
> >>>> >
> >>>> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnu
> m.nextDoc
> >>>> >   12  0.62% 99.12%    1522 301600
> >>>> > org.apache.lucene.store.DataInput.readVInt
> >>>> >   13  0.21% 99.33%     500 301612
> >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
> >>>> >   14  0.07% 99.39%     167 301601
> >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
> >>>> >   15  0.06% 99.45%     139 301619 java.lang.System.identityHashCode
> >>>> >   16  0.05% 99.50%     114 301632
> >>>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
> >>>> >   17  0.04% 99.54%      92 300708 java.util.zip.Inflater.inflateBytes
> >>>> >   18  0.03% 99.57%      76 301624
> >>>> >
> >>>> >
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFlo
> orBlock
> >>>> >   19  0.03% 99.59%      68 300613 java.lang.ClassLoader.defineClass1
> >>>> >   20  0.03% 99.62%      68 301615
> >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> >>>> >   21  0.03% 99.65%      62 301635
> >>>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
> >>>> >   22  0.02% 99.66%      41 301664
> >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> >>>> >   23  0.01% 99.68%      31 301629
> >>>> > org.apache.lucene.util.FixedBitSet.<init>
> >>>> > CPU SAMPLES END
> >>>> >
> >>>> > Solr 5.2.1:
> >>>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
> >>>> > rank   self  accum   count trace method
> >>>> >    1 51.38% 51.38%  120954 301291
> >>>> > sun.nio.ch.EPollArrayWrapper.epollWait
> >>>> >    2 25.69% 77.07%   60477 301292
> >>>> > sun.nio.ch.ServerSocketChannelImpl.accept0
> >>>> >    3 10.59% 87.66%   24923 301369
> >>>> >
> org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
> >>>> >    4  2.20% 89.86%    5182 301414
> >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
> >>>> >    5  2.01% 91.87%    4742 301384
> >>>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
> >>>> >    6  1.25% 93.12%    2944 301434
> >>>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
> >>>> >    7  1.11% 94.23%    2612 301367
> >>>> >
> org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
> >>>> >    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
> >>>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
> >>>> >   10  0.78% 96.87%    1825 301449
> >>>> >
> >>>> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
> m.refillDocs
> >>>> >   11  0.73% 97.60%    1717 301378
> >>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
> >>>> >   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
> >>>> >   13  0.33% 98.66%     787 301387
> >>>> >
> >>>> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
> m.nextDoc
> >>>> >   14  0.16% 98.82%     374 301426
> >>>> >
> >>>> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
> m.nextDoc
> >>>> >   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
> >>>> >   16  0.09% 99.02%     219 301381
> >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
> >>>> >   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
> >>>> >   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
> >>>> >   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
> >>>> >   20  0.06% 99.30%     151 301441
> >>>> > org.apache.lucene.store.DataInput.readVInt
> >>>> >   21  0.06% 99.36%     147 301389 java.lang.System.identityHashCode
> >>>> >   22  0.06% 99.42%     140 301375
> >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> >>>> >   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
> >>>> >   24  0.03% 99.50%      76 301423
> >>>> >
> >>>> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
> m.nextDoc
> >>>> >   25  0.03% 99.53%      74 301454
> >>>> >
> org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
> >>>> >   26  0.03% 99.56%      65 301432
> >>>> > org.apache.lucene.util.BitDocIdSet$Builder.or
> >>>> >   27  0.02% 99.58%      53 301456
> >>>> > org.apache.lucene.util.FixedBitSet.or
> >>>> >   28  0.02% 99.60%      52 300077 java.lang.ClassLoader.defineClass1
> >>>> >   29  0.02% 99.63%      50 301464
> >>>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
> >>>> >   30  0.02% 99.64%      39 301438
> >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> >>>> >   31  0.02% 99.66%      37 301465
> >>>> >
> >>>> >
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFlo
> orBlock
> >>>> >   32  0.02% 99.67%      36 301419
> >>>> >
> >>>> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDoc
> >>>> > sEnum.nextDoc
> >>>> > CPU SAMPLES END
> >>>> >
> >>>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jp...@gmail.com>
> >>>> > wrote:
> >>>> >>
> >>>> >> Hi Tomás,
> >>>> >>
> >>>> >> I suspect this might be related to LUCENE-5938. We changed the
> >>>> >> default rewrite method for multi-term queries to load documents
> >>>> >> into a sparse bit set first first, and only upgrade to a dense
> >>>> >> bit set when we know many documents match. When there are lots
> >>>> >> of terms to intersect, then we end up spending significant cpu
> >>>> >> time to build the sparse bit set to eventually upgrade to a
> >>>> >> dense bit set like before. This might be what you are seeing.
> >>>> >>
> >>>> >> You might see the issue less with the population field because
> >>>> >> it has fewer unique values, so postings lists are longer and the
> >>>> >> DocIdSet building logic can upgrade quicker to a dense bit set.
> >>>> >>
> >>>> >> Mike noticed this slowness when working on BDK trees and we
> >>>> >> changed this first phase to use a plain int[] array that we sort
> >>>> >> and deduplicate instead of a more fancy sparse bit set
> >>>> >> (LUCENE-6645), which seemed to make things faster. Would it be
> >>>> >> possible for you to also check a 5.3 snapshot?
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
> >>>> >> <to...@gmail.com> wrote:
> >>>> >> > Hi, I'm seeing some query performance degradation between
> >>>> >> > 4.10.4 and 5.2.1.
> >>>> >> > It doesn't happen with all the queries, but for queries like
> >>>> >> > range queries on fields with many different values the average
> >>>> >> > time in 5.2.1 is worse than in 4.10.4. Is anyone seeing
> >>>> >> > something similar?
> >>>> >> >
> >>>> >> > Test Details:
> >>>> >> > * Single thread running queries continuously. I run the test
> >>>> >> > twice for each Solr version.
> >>>> >> > * JMeter running on my laptop, Solr running on EC2, on an
> >>>> >> > m3.xlarge instance with all the defaults but with 5G heap.
> >>>> >> > Index in local disk (SSD)
> >>>> >> > * Plain Solr releases, nothing custom. Single Solr core, not
> >>>> >> > in SolrCloud mode, no distributed search.
> >>>> >> > * "allCountries" geonames dataset (~8M small docs). No updates
> >>>> >> > during the test. Index Size is around 1.1GB for Solr 5.2.1 and
> >>>> >> > 1.3GB for Solr
> >>>> >> > 4.10.4
> >>>> >> > (fits entirely in RAM)
> >>>> >> > * jdk1.8.0_45
> >>>> >> >
> >>>> >> > Queries: 3k boolean queries (generated with terms from the
> >>>> >> > dataset) with range queries as filters on "tlongitude" and
> >>>> >> > "tlatitude" fields with randomly generated bounds, e.g.
> >>>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO
> X]&fq=tlatitude:[Y
> >>>> >> > TO Z]
> >>>> >> >
> >>>> >> > Fields are:
> >>>> >> > <field name="tlatitude" type="tdouble"/> <field
> >>>> >> > name="tlongitude" type="tdouble"/> Field Type:
> >>>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
> >>>> >> > precisionStep="8"
> >>>> >> > positionIncrementGap="0"/>
> >>>> >> >
> >>>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than
> >>>> >> > 5.2.1 in average.
> >>>> >> >
> >>>> >> > http://snag.gy/2yPPM.jpg
> >>>> >> >
> >>>> >> > Doing only the boolean queries show no performance difference
> >>>> >> > between
> >>>> >> > 4.10
> >>>> >> > and 5.2, same thing if I do filters on a string field instead
> >>>> >> > of the range queries.
> >>>> >> >
> >>>> >> > When using "double" field type (precisionStep="0"), the
> >>>> >> > difference was
> >>>> >> > bigger:
> >>>> >> >
> >>>> >> > longitude/latitude fields:
> >>>> >> > <field name="longitude" type="double" docValues="true"/>
> >>>> >> > <field name="latitude" type="double" docValues="true"/>
> >>>> >> > <fieldType name="double" class="solr.TrieDoubleField"
> >>>> >> > precisionStep="0"
> >>>> >> > positionIncrementGap="0"/>
> >>>> >> >
> >>>> >> > http://snag.gy/Vi5uk.jpg
> >>>> >> > I understand this is not the best field type definition for
> >>>> >> > range queries, I'm just trying to understand the difference
> >>>> >> > between the two versions and why.
> >>>> >> >
> >>>> >> > Performance was OK when doing range queries on the
> "population"
> >>>> >> > field
> >>>> >> > (long), but that field doesn't have many different values,
> >>>> >> > only 300k out of the 8.3M docs have the population field with
> >>>> >> > a value different to 0. On the other hand, doing range queries
> >>>> >> > on the _version_ field did show a graph similar to the
> >>>> >> > previous one:
> >>>> >> >
> >>>> >> > <field name="_version_" type="long" indexed="true"
> >>>> >> > stored="true"/> <fieldType name="long"
> class="solr.TrieLongField" precisionStep="0"
> >>>> >> > positionIncrementGap="0"/>
> >>>> >> >
> >>>> >> > http://snag.gy/4tc7e.jpg
> >>>> >> >
> >>>> >> > Any idea what could be causing this? Is this expected after
> >>>> >> > some known change?
> >>>> >> >
> >>>> >> > With Solr 4.10, a single CPU core remains high during the test
> >>>> >> > (close to 100%), but with Solr 5.2, different cores go up and
> >>>> >> > down in utilization continuously. That's probably because of
> >>>> >> > the different Jetty version I suppose.
> >>>> >> > GC pattern looks similar in both. For both Solr versions I'm
> >>>> >> > using the settings that ship with Solr (in solr.in.sh) except
> >>>> >> > for Xmx and Xms
> >>>> >> >
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >> Adrien
> >>>> >>
> >>>> >> ----------------------------------------------------------------
> >>>> >> ----- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>> >>
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Adrien
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> -- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> >>>> additional commands, e-mail: dev-help@lucene.apache.org
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Anshum Gupta
> >
> >
> > --
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, 28213 Bremen
> > http://www.thetaphi.de
> 
> 
> 
> --
> Adrien
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Adrien Grand <jp...@gmail.com>.
Tomás said Solr was running on EC2 in his initial email, so this can't
be MacOSX.

On Tue, Aug 4, 2015 at 10:21 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi,
>
> Just related: Do you run tests on MacOSX? NanoTime is damn slow there! And
> its also not monotonic! There wss a discussion on hotspot-dev about that one
> year ago, but no solution until now.
>
> Macos had no real nanotime impl in kernel so java falls back to wall clock,
> which is slow and non monotonic with ntp or vmware... (see test failures in
> the past on policeman when ntp adjusted clock).
>
> If you run on Linux this could also be a good test: on osx the difference to
> prev version should be even larger.
>
> Uwe
>
>
> Am 4. August 2015 03:07:45 MESZ, schrieb Anshum Gupta
> <an...@anshumgupta.net>:
>>
>> Guess bypassing that check and always returning true instead would help
>> confirm that. If that's the case, we should just initialize the timeout to
>> Long.MAX_VALUE and check for that to short-circuit?
>>
>> On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe
>> <to...@gmail.com> wrote:
>>>
>>> Yes, I saw that, but thought it could be the underlying implementation,
>>> not the "ExitableTermsEnum" wrapper. Maybe it's related to the calls to
>>> System.nanoTime then...
>>>
>>> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <jp...@gmail.com> wrote:
>>>>
>>>> Thanks for sharing the traces, it looks like my intuition was wrong.
>>>> :) They seem to point to
>>>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether
>>>> the time is out before delegating.
>>>>
>>>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
>>>> <to...@gmail.com> wrote:
>>>> > Thanks Adrien,
>>>> > I'll run the tests with 5.3 snapshot and post the results here. In
>>>> > case this
>>>> > helps, this is the hprof samples output
>>>> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt)
>>>> > for
>>>> > 4.10.4 and 5.2.1 in my test:
>>>> >
>>>> > Solr 4.10.4:
>>>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
>>>> > rank   self  accum   count trace method
>>>> >    1 75.07% 75.07%  182812 300523
>>>> > java.net.PlainSocketImpl.socketAccept
>>>> >    2  4.27% 79.34%   10408 301576
>>>> >
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData
>>>> >    3  4.15% 83.49%   10108 301585
>>>> > org.apache.lucene.index.FilteredTermsEnum.docs
>>>> >    4  3.46% 86.95%    8419 301582
>>>> > org.apache.lucene.index.FilteredTermsEnum.next
>>>> >    5  2.49% 89.44%    6070 301573
>>>> > java.net.SocketInputStream.socketRead0
>>>> >    6  1.99% 91.43%    4848 301599
>>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>>> >    7  1.98% 93.42%    4824 301583
>>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>>> >    8  1.57% 94.99%    3824 301589
>>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>>> >    9  1.44% 96.42%    3504 301594
>>>> >
>>>> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs
>>>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
>>>> >   11  0.98% 98.50%    2388 301598
>>>> >
>>>> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc
>>>> >   12  0.62% 99.12%    1522 301600
>>>> > org.apache.lucene.store.DataInput.readVInt
>>>> >   13  0.21% 99.33%     500 301612
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
>>>> >   14  0.07% 99.39%     167 301601
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>>> >   15  0.06% 99.45%     139 301619 java.lang.System.identityHashCode
>>>> >   16  0.05% 99.50%     114 301632
>>>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
>>>> >   17  0.04% 99.54%      92 300708 java.util.zip.Inflater.inflateBytes
>>>> >   18  0.03% 99.57%      76 301624
>>>> >
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>>> >   19  0.03% 99.59%      68 300613 java.lang.ClassLoader.defineClass1
>>>> >   20  0.03% 99.62%      68 301615
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>>> >   21  0.03% 99.65%      62 301635
>>>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
>>>> >   22  0.02% 99.66%      41 301664
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>>> >   23  0.01% 99.68%      31 301629
>>>> > org.apache.lucene.util.FixedBitSet.<init>
>>>> > CPU SAMPLES END
>>>> >
>>>> > Solr 5.2.1:
>>>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
>>>> > rank   self  accum   count trace method
>>>> >    1 51.38% 51.38%  120954 301291
>>>> > sun.nio.ch.EPollArrayWrapper.epollWait
>>>> >    2 25.69% 77.07%   60477 301292
>>>> > sun.nio.ch.ServerSocketChannelImpl.accept0
>>>> >    3 10.59% 87.66%   24923 301369
>>>> > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
>>>> >    4  2.20% 89.86%    5182 301414
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
>>>> >    5  2.01% 91.87%    4742 301384
>>>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
>>>> >    6  1.25% 93.12%    2944 301434
>>>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
>>>> >    7  1.11% 94.23%    2612 301367
>>>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>>> >    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
>>>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
>>>> >   10  0.78% 96.87%    1825 301449
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs
>>>> >   11  0.73% 97.60%    1717 301378
>>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>>> >   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
>>>> >   13  0.33% 98.66%     787 301387
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>>> >   14  0.16% 98.82%     374 301426
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>>> >   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
>>>> >   16  0.09% 99.02%     219 301381
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>>> >   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
>>>> >   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
>>>> >   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
>>>> >   20  0.06% 99.30%     151 301441
>>>> > org.apache.lucene.store.DataInput.readVInt
>>>> >   21  0.06% 99.36%     147 301389 java.lang.System.identityHashCode
>>>> >   22  0.06% 99.42%     140 301375
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>>> >   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
>>>> >   24  0.03% 99.50%      76 301423
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>>> >   25  0.03% 99.53%      74 301454
>>>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>>> >   26  0.03% 99.56%      65 301432
>>>> > org.apache.lucene.util.BitDocIdSet$Builder.or
>>>> >   27  0.02% 99.58%      53 301456
>>>> > org.apache.lucene.util.FixedBitSet.or
>>>> >   28  0.02% 99.60%      52 300077 java.lang.ClassLoader.defineClass1
>>>> >   29  0.02% 99.63%      50 301464
>>>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
>>>> >   30  0.02% 99.64%      39 301438
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>>> >   31  0.02% 99.66%      37 301465
>>>> >
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>>> >   32  0.02% 99.67%      36 301419
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>>> > CPU SAMPLES END
>>>> >
>>>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jp...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Hi Tomás,
>>>> >>
>>>> >> I suspect this might be related to LUCENE-5938. We changed the
>>>> >> default
>>>> >> rewrite method for multi-term queries to load documents into a sparse
>>>> >> bit set first first, and only upgrade to a dense bit set when we know
>>>> >> many documents match. When there are lots of terms to intersect, then
>>>> >> we end up spending significant cpu time to build the sparse bit set
>>>> >> to
>>>> >> eventually upgrade to a dense bit set like before. This might be what
>>>> >> you are seeing.
>>>> >>
>>>> >> You might see the issue less with the population field because it has
>>>> >> fewer unique values, so postings lists are longer and the DocIdSet
>>>> >> building logic can upgrade quicker to a dense bit set.
>>>> >>
>>>> >> Mike noticed this slowness when working on BDK trees and we changed
>>>> >> this first phase to use a plain int[] array that we sort and
>>>> >> deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
>>>> >> which seemed to make things faster. Would it be possible for you to
>>>> >> also check a 5.3 snapshot?
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
>>>> >> <to...@gmail.com> wrote:
>>>> >> > Hi, I'm seeing some query performance degradation between 4.10.4
>>>> >> > and
>>>> >> > 5.2.1.
>>>> >> > It doesn't happen with all the queries, but for queries like range
>>>> >> > queries
>>>> >> > on fields with many different values the average time in 5.2.1 is
>>>> >> > worse
>>>> >> > than
>>>> >> > in 4.10.4. Is anyone seeing something similar?
>>>> >> >
>>>> >> > Test Details:
>>>> >> > * Single thread running queries continuously. I run the test twice
>>>> >> > for
>>>> >> > each
>>>> >> > Solr version.
>>>> >> > * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge
>>>> >> > instance
>>>> >> > with all the defaults but with 5G heap. Index in local disk (SSD)
>>>> >> > * Plain Solr releases, nothing custom. Single Solr core, not in
>>>> >> > SolrCloud
>>>> >> > mode, no distributed search.
>>>> >> > * "allCountries" geonames dataset (~8M small docs). No updates
>>>> >> > during
>>>> >> > the
>>>> >> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr
>>>> >> > 4.10.4
>>>> >> > (fits entirely in RAM)
>>>> >> > * jdk1.8.0_45
>>>> >> >
>>>> >> > Queries: 3k boolean queries (generated with terms from the dataset)
>>>> >> > with
>>>> >> > range queries as filters on "tlongitude" and "tlatitude" fields
>>>> >> > with
>>>> >> > randomly generated bounds, e.g.
>>>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]
>>>> >> >
>>>> >> > Fields are:
>>>> >> > <field name="tlatitude" type="tdouble"/>
>>>> >> > <field name="tlongitude" type="tdouble"/>
>>>> >> > Field Type:
>>>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
>>>> >> > precisionStep="8"
>>>> >> > positionIncrementGap="0"/>
>>>> >> >
>>>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1
>>>> >> > in
>>>> >> > average.
>>>> >> >
>>>> >> > http://snag.gy/2yPPM.jpg
>>>> >> >
>>>> >> > Doing only the boolean queries show no performance difference
>>>> >> > between
>>>> >> > 4.10
>>>> >> > and 5.2, same thing if I do filters on a string field instead of
>>>> >> > the
>>>> >> > range
>>>> >> > queries.
>>>> >> >
>>>> >> > When using "double" field type (precisionStep="0"), the difference
>>>> >> > was
>>>> >> > bigger:
>>>> >> >
>>>> >> > longitude/latitude fields:
>>>> >> > <field name="longitude" type="double" docValues="true"/>
>>>> >> > <field name="latitude" type="double" docValues="true"/>
>>>> >> > <fieldType name="double" class="solr.TrieDoubleField"
>>>> >> > precisionStep="0"
>>>> >> > positionIncrementGap="0"/>
>>>> >> >
>>>> >> > http://snag.gy/Vi5uk.jpg
>>>> >> > I understand this is not the best field type definition for range
>>>> >> > queries,
>>>> >> > I'm just trying to understand the difference between the two
>>>> >> > versions
>>>> >> > and
>>>> >> > why.
>>>> >> >
>>>> >> > Performance was OK when doing range queries on the "population"
>>>> >> > field
>>>> >> > (long), but that field doesn't have many different values, only
>>>> >> > 300k out
>>>> >> > of
>>>> >> > the 8.3M docs have the population field with a value different to
>>>> >> > 0. On
>>>> >> > the
>>>> >> > other hand, doing range queries on the _version_ field did show a
>>>> >> > graph
>>>> >> > similar to the previous one:
>>>> >> >
>>>> >> > <field name="_version_" type="long" indexed="true" stored="true"/>
>>>> >> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
>>>> >> > positionIncrementGap="0"/>
>>>> >> >
>>>> >> > http://snag.gy/4tc7e.jpg
>>>> >> >
>>>> >> > Any idea what could be causing this? Is this expected after some
>>>> >> > known
>>>> >> > change?
>>>> >> >
>>>> >> > With Solr 4.10, a single CPU core remains high during the test
>>>> >> > (close to
>>>> >> > 100%), but with Solr 5.2, different cores go up and down in
>>>> >> > utilization
>>>> >> > continuously. That's probably because of the different Jetty
>>>> >> > version I
>>>> >> > suppose.
>>>> >> > GC pattern looks similar in both. For both Solr versions I'm using
>>>> >> > the
>>>> >> > settings that ship with Solr (in solr.in.sh) except for Xmx and Xms
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Adrien
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> >>
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>
>>
>>
>>
>> --
>> Anshum Gupta
>
>
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

Just related: Do you run tests on MacOSX? NanoTime is damn slow there! And its also not monotonic! There wss a discussion on hotspot-dev about that one year ago, but no solution until now.

Macos had no real nanotime impl in kernel so java falls back to wall clock, which is slow and non monotonic with ntp or vmware... (see test failures in the past on policeman when ntp adjusted clock).

If you run on Linux this could also be a good test: on osx the difference to prev version should be even larger.

Uwe

Am 4. August 2015 03:07:45 MESZ, schrieb Anshum Gupta <an...@anshumgupta.net>:
>Guess bypassing that check and always returning true instead would help
>confirm that. If that's the case, we should just initialize the timeout
>to
>Long.MAX_VALUE and check for that to short-circuit?
>
>On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe
><tomasflobbe@gmail.com
>> wrote:
>
>> Yes, I saw that, but thought it could be the underlying
>implementation,
>> not the "ExitableTermsEnum" wrapper. Maybe it's related to the calls
>to
>> System.nanoTime then...
>>
>> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <jp...@gmail.com>
>wrote:
>>
>>> Thanks for sharing the traces, it looks like my intuition was wrong.
>>> :) They seem to point to
>>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether
>>> the time is out before delegating.
>>>
>>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
>>> <to...@gmail.com> wrote:
>>> > Thanks Adrien,
>>> > I'll run the tests with 5.3 snapshot and post the results here. In
>case
>>> this
>>> > helps, this is the hprof samples output
>>> >
>(-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt)
>>> for
>>> > 4.10.4 and 5.2.1 in my test:
>>> >
>>> > Solr 4.10.4:
>>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
>>> > rank   self  accum   count trace method
>>> >    1 75.07% 75.07%  182812 300523
>java.net.PlainSocketImpl.socketAccept
>>> >    2  4.27% 79.34%   10408 301576
>>> >
>org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData
>>> >    3  4.15% 83.49%   10108 301585
>>> > org.apache.lucene.index.FilteredTermsEnum.docs
>>> >    4  3.46% 86.95%    8419 301582
>>> > org.apache.lucene.index.FilteredTermsEnum.next
>>> >    5  2.49% 89.44%    6070 301573
>java.net.SocketInputStream.socketRead0
>>> >    6  1.99% 91.43%    4848 301599
>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>> >    7  1.98% 93.42%    4824 301583
>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>> >    8  1.57% 94.99%    3824 301589
>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>> >    9  1.44% 96.42%    3504 301594
>>> >
>>>
>org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs
>>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
>>> >   11  0.98% 98.50%    2388 301598
>>> >
>>>
>org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc
>>> >   12  0.62% 99.12%    1522 301600
>>> org.apache.lucene.store.DataInput.readVInt
>>> >   13  0.21% 99.33%     500 301612
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
>>> >   14  0.07% 99.39%     167 301601
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>> >   15  0.06% 99.45%     139 301619
>java.lang.System.identityHashCode
>>> >   16  0.05% 99.50%     114 301632
>>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
>>> >   17  0.04% 99.54%      92 300708
>java.util.zip.Inflater.inflateBytes
>>> >   18  0.03% 99.57%      76 301624
>>> >
>>>
>org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>> >   19  0.03% 99.59%      68 300613
>java.lang.ClassLoader.defineClass1
>>> >   20  0.03% 99.62%      68 301615
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   21  0.03% 99.65%      62 301635
>>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
>>> >   22  0.02% 99.66%      41 301664
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   23  0.01% 99.68%      31 301629
>>> org.apache.lucene.util.FixedBitSet.<init>
>>> > CPU SAMPLES END
>>> >
>>> > Solr 5.2.1:
>>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
>>> > rank   self  accum   count trace method
>>> >    1 51.38% 51.38%  120954 301291
>sun.nio.ch.EPollArrayWrapper.epollWait
>>> >    2 25.69% 77.07%   60477 301292
>>> sun.nio.ch.ServerSocketChannelImpl.accept0
>>> >    3 10.59% 87.66%   24923 301369
>>> >
>org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
>>> >    4  2.20% 89.86%    5182 301414
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
>>> >    5  2.01% 91.87%    4742 301384
>>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
>>> >    6  1.25% 93.12%    2944 301434
>>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
>>> >    7  1.11% 94.23%    2612 301367
>>> >
>org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>> >    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
>>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
>>> >   10  0.78% 96.87%    1825 301449
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs
>>> >   11  0.73% 97.60%    1717 301378
>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>> >   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
>>> >   13  0.33% 98.66%     787 301387
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> >   14  0.16% 98.82%     374 301426
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> >   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
>>> >   16  0.09% 99.02%     219 301381
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>> >   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
>>> >   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
>>> >   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
>>> >   20  0.06% 99.30%     151 301441
>>> org.apache.lucene.store.DataInput.readVInt
>>> >   21  0.06% 99.36%     147 301389
>java.lang.System.identityHashCode
>>> >   22  0.06% 99.42%     140 301375
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
>>> >   24  0.03% 99.50%      76 301423
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> >   25  0.03% 99.53%      74 301454
>>> >
>org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>> >   26  0.03% 99.56%      65 301432
>>> > org.apache.lucene.util.BitDocIdSet$Builder.or
>>> >   27  0.02% 99.58%      53 301456
>org.apache.lucene.util.FixedBitSet.or
>>> >   28  0.02% 99.60%      52 300077
>java.lang.ClassLoader.defineClass1
>>> >   29  0.02% 99.63%      50 301464
>>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
>>> >   30  0.02% 99.64%      39 301438
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   31  0.02% 99.66%      37 301465
>>> >
>>>
>org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>> >   32  0.02% 99.67%      36 301419
>>> >
>>>
>org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> > CPU SAMPLES END
>>> >
>>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jp...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi Tomás,
>>> >>
>>> >> I suspect this might be related to LUCENE-5938. We changed the
>default
>>> >> rewrite method for multi-term queries to load documents into a
>sparse
>>> >> bit set first first, and only upgrade to a dense bit set when we
>know
>>> >> many documents match. When there are lots of terms to intersect,
>then
>>> >> we end up spending significant cpu time to build the sparse bit
>set to
>>> >> eventually upgrade to a dense bit set like before. This might be
>what
>>> >> you are seeing.
>>> >>
>>> >> You might see the issue less with the population field because it
>has
>>> >> fewer unique values, so postings lists are longer and the
>DocIdSet
>>> >> building logic can upgrade quicker to a dense bit set.
>>> >>
>>> >> Mike noticed this slowness when working on BDK trees and we
>changed
>>> >> this first phase to use a plain int[] array that we sort and
>>> >> deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
>>> >> which seemed to make things faster. Would it be possible for you
>to
>>> >> also check a 5.3 snapshot?
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
>>> >> <to...@gmail.com> wrote:
>>> >> > Hi, I'm seeing some query performance degradation between
>4.10.4 and
>>> >> > 5.2.1.
>>> >> > It doesn't happen with all the queries, but for queries like
>range
>>> >> > queries
>>> >> > on fields with many different values the average time in 5.2.1
>is
>>> worse
>>> >> > than
>>> >> > in 4.10.4. Is anyone seeing something similar?
>>> >> >
>>> >> > Test Details:
>>> >> > * Single thread running queries continuously. I run the test
>twice
>>> for
>>> >> > each
>>> >> > Solr version.
>>> >> > * JMeter running on my laptop, Solr running on EC2, on an
>m3.xlarge
>>> >> > instance
>>> >> > with all the defaults but with 5G heap. Index in local disk
>(SSD)
>>> >> > * Plain Solr releases, nothing custom. Single Solr core, not in
>>> >> > SolrCloud
>>> >> > mode, no distributed search.
>>> >> > * "allCountries" geonames dataset (~8M small docs). No updates
>during
>>> >> > the
>>> >> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for
>Solr
>>> >> > 4.10.4
>>> >> > (fits entirely in RAM)
>>> >> > * jdk1.8.0_45
>>> >> >
>>> >> > Queries: 3k boolean queries (generated with terms from the
>dataset)
>>> with
>>> >> > range queries as filters on "tlongitude" and "tlatitude" fields
>with
>>> >> > randomly generated bounds, e.g.
>>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y
>TO Z]
>>> >> >
>>> >> > Fields are:
>>> >> > <field name="tlatitude" type="tdouble"/>
>>> >> > <field name="tlongitude" type="tdouble"/>
>>> >> > Field Type:
>>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
>>> precisionStep="8"
>>> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than
>5.2.1 in
>>> >> > average.
>>> >> >
>>> >> > http://snag.gy/2yPPM.jpg
>>> >> >
>>> >> > Doing only the boolean queries show no performance difference
>between
>>> >> > 4.10
>>> >> > and 5.2, same thing if I do filters on a string field instead
>of the
>>> >> > range
>>> >> > queries.
>>> >> >
>>> >> > When using "double" field type (precisionStep="0"), the
>difference
>>> was
>>> >> > bigger:
>>> >> >
>>> >> > longitude/latitude fields:
>>> >> > <field name="longitude" type="double" docValues="true"/>
>>> >> > <field name="latitude" type="double" docValues="true"/>
>>> >> > <fieldType name="double" class="solr.TrieDoubleField"
>>> precisionStep="0"
>>> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > http://snag.gy/Vi5uk.jpg
>>> >> > I understand this is not the best field type definition for
>range
>>> >> > queries,
>>> >> > I'm just trying to understand the difference between the two
>versions
>>> >> > and
>>> >> > why.
>>> >> >
>>> >> > Performance was OK when doing range queries on the "population"
>field
>>> >> > (long), but that field doesn't have many different values, only
>300k
>>> out
>>> >> > of
>>> >> > the 8.3M docs have the population field with a value different
>to 0.
>>> On
>>> >> > the
>>> >> > other hand, doing range queries on the _version_ field did show
>a
>>> graph
>>> >> > similar to the previous one:
>>> >> >
>>> >> > <field name="_version_" type="long" indexed="true"
>stored="true"/>
>>> >> > <fieldType name="long" class="solr.TrieLongField"
>precisionStep="0"
>>> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > http://snag.gy/4tc7e.jpg
>>> >> >
>>> >> > Any idea what could be causing this? Is this expected after
>some
>>> known
>>> >> > change?
>>> >> >
>>> >> > With Solr 4.10, a single CPU core remains high during the test
>>> (close to
>>> >> > 100%), but with Solr 5.2, different cores go up and down in
>>> utilization
>>> >> > continuously. That's probably because of the different Jetty
>version
>>> I
>>> >> > suppose.
>>> >> > GC pattern looks similar in both. For both Solr versions I'm
>using
>>> the
>>> >> > settings that ship with Solr (in solr.in.sh) except for Xmx and
>Xms
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Adrien
>>> >>
>>> >>
>---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Adrien
>>>
>>>
>---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>
>
>-- 
>Anshum Gupta

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
I'll run the test removing the ExitableDirectoryReader wrap to check if
this is the issue. I'll post my results here.

On Mon, Aug 3, 2015 at 6:07 PM, Anshum Gupta <an...@anshumgupta.net> wrote:

> Guess bypassing that check and always returning true instead would help
> confirm that. If that's the case, we should just initialize the timeout to
> Long.MAX_VALUE and check for that to short-circuit?
>
> On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe <
> tomasflobbe@gmail.com> wrote:
>
>> Yes, I saw that, but thought it could be the underlying implementation,
>> not the "ExitableTermsEnum" wrapper. Maybe it's related to the calls to
>> System.nanoTime then...
>>
>> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <jp...@gmail.com> wrote:
>>
>>> Thanks for sharing the traces, it looks like my intuition was wrong.
>>> :) They seem to point to
>>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether
>>> the time is out before delegating.
>>>
>>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
>>> <to...@gmail.com> wrote:
>>> > Thanks Adrien,
>>> > I'll run the tests with 5.3 snapshot and post the results here. In
>>> case this
>>> > helps, this is the hprof samples output
>>> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt)
>>> for
>>> > 4.10.4 and 5.2.1 in my test:
>>> >
>>> > Solr 4.10.4:
>>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
>>> > rank   self  accum   count trace method
>>> >    1 75.07% 75.07%  182812 300523 java.net.PlainSocketImpl.socketAccept
>>> >    2  4.27% 79.34%   10408 301576
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData
>>> >    3  4.15% 83.49%   10108 301585
>>> > org.apache.lucene.index.FilteredTermsEnum.docs
>>> >    4  3.46% 86.95%    8419 301582
>>> > org.apache.lucene.index.FilteredTermsEnum.next
>>> >    5  2.49% 89.44%    6070 301573
>>> java.net.SocketInputStream.socketRead0
>>> >    6  1.99% 91.43%    4848 301599
>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>> >    7  1.98% 93.42%    4824 301583
>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>> >    8  1.57% 94.99%    3824 301589
>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>> >    9  1.44% 96.42%    3504 301594
>>> >
>>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs
>>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
>>> >   11  0.98% 98.50%    2388 301598
>>> >
>>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc
>>> >   12  0.62% 99.12%    1522 301600
>>> org.apache.lucene.store.DataInput.readVInt
>>> >   13  0.21% 99.33%     500 301612
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
>>> >   14  0.07% 99.39%     167 301601
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>> >   15  0.06% 99.45%     139 301619 java.lang.System.identityHashCode
>>> >   16  0.05% 99.50%     114 301632
>>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
>>> >   17  0.04% 99.54%      92 300708 java.util.zip.Inflater.inflateBytes
>>> >   18  0.03% 99.57%      76 301624
>>> >
>>> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>> >   19  0.03% 99.59%      68 300613 java.lang.ClassLoader.defineClass1
>>> >   20  0.03% 99.62%      68 301615
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   21  0.03% 99.65%      62 301635
>>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
>>> >   22  0.02% 99.66%      41 301664
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   23  0.01% 99.68%      31 301629
>>> org.apache.lucene.util.FixedBitSet.<init>
>>> > CPU SAMPLES END
>>> >
>>> > Solr 5.2.1:
>>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
>>> > rank   self  accum   count trace method
>>> >    1 51.38% 51.38%  120954 301291
>>> sun.nio.ch.EPollArrayWrapper.epollWait
>>> >    2 25.69% 77.07%   60477 301292
>>> sun.nio.ch.ServerSocketChannelImpl.accept0
>>> >    3 10.59% 87.66%   24923 301369
>>> > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
>>> >    4  2.20% 89.86%    5182 301414
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
>>> >    5  2.01% 91.87%    4742 301384
>>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
>>> >    6  1.25% 93.12%    2944 301434
>>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
>>> >    7  1.11% 94.23%    2612 301367
>>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>> >    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
>>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
>>> >   10  0.78% 96.87%    1825 301449
>>> >
>>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs
>>> >   11  0.73% 97.60%    1717 301378
>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>> >   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
>>> >   13  0.33% 98.66%     787 301387
>>> >
>>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> >   14  0.16% 98.82%     374 301426
>>> >
>>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> >   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
>>> >   16  0.09% 99.02%     219 301381
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>> >   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
>>> >   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
>>> >   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
>>> >   20  0.06% 99.30%     151 301441
>>> org.apache.lucene.store.DataInput.readVInt
>>> >   21  0.06% 99.36%     147 301389 java.lang.System.identityHashCode
>>> >   22  0.06% 99.42%     140 301375
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
>>> >   24  0.03% 99.50%      76 301423
>>> >
>>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> >   25  0.03% 99.53%      74 301454
>>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>> >   26  0.03% 99.56%      65 301432
>>> > org.apache.lucene.util.BitDocIdSet$Builder.or
>>> >   27  0.02% 99.58%      53 301456 org.apache.lucene.util.FixedBitSet.or
>>> >   28  0.02% 99.60%      52 300077 java.lang.ClassLoader.defineClass1
>>> >   29  0.02% 99.63%      50 301464
>>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
>>> >   30  0.02% 99.64%      39 301438
>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>> >   31  0.02% 99.66%      37 301465
>>> >
>>> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>> >   32  0.02% 99.67%      36 301419
>>> >
>>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>> > CPU SAMPLES END
>>> >
>>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jp...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi Tomás,
>>> >>
>>> >> I suspect this might be related to LUCENE-5938. We changed the default
>>> >> rewrite method for multi-term queries to load documents into a sparse
>>> >> bit set first first, and only upgrade to a dense bit set when we know
>>> >> many documents match. When there are lots of terms to intersect, then
>>> >> we end up spending significant cpu time to build the sparse bit set to
>>> >> eventually upgrade to a dense bit set like before. This might be what
>>> >> you are seeing.
>>> >>
>>> >> You might see the issue less with the population field because it has
>>> >> fewer unique values, so postings lists are longer and the DocIdSet
>>> >> building logic can upgrade quicker to a dense bit set.
>>> >>
>>> >> Mike noticed this slowness when working on BDK trees and we changed
>>> >> this first phase to use a plain int[] array that we sort and
>>> >> deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
>>> >> which seemed to make things faster. Would it be possible for you to
>>> >> also check a 5.3 snapshot?
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
>>> >> <to...@gmail.com> wrote:
>>> >> > Hi, I'm seeing some query performance degradation between 4.10.4 and
>>> >> > 5.2.1.
>>> >> > It doesn't happen with all the queries, but for queries like range
>>> >> > queries
>>> >> > on fields with many different values the average time in 5.2.1 is
>>> worse
>>> >> > than
>>> >> > in 4.10.4. Is anyone seeing something similar?
>>> >> >
>>> >> > Test Details:
>>> >> > * Single thread running queries continuously. I run the test twice
>>> for
>>> >> > each
>>> >> > Solr version.
>>> >> > * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge
>>> >> > instance
>>> >> > with all the defaults but with 5G heap. Index in local disk (SSD)
>>> >> > * Plain Solr releases, nothing custom. Single Solr core, not in
>>> >> > SolrCloud
>>> >> > mode, no distributed search.
>>> >> > * "allCountries" geonames dataset (~8M small docs). No updates
>>> during
>>> >> > the
>>> >> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr
>>> >> > 4.10.4
>>> >> > (fits entirely in RAM)
>>> >> > * jdk1.8.0_45
>>> >> >
>>> >> > Queries: 3k boolean queries (generated with terms from the dataset)
>>> with
>>> >> > range queries as filters on "tlongitude" and "tlatitude" fields with
>>> >> > randomly generated bounds, e.g.
>>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]
>>> >> >
>>> >> > Fields are:
>>> >> > <field name="tlatitude" type="tdouble"/>
>>> >> > <field name="tlongitude" type="tdouble"/>
>>> >> > Field Type:
>>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
>>> precisionStep="8"
>>> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1
>>> in
>>> >> > average.
>>> >> >
>>> >> > http://snag.gy/2yPPM.jpg
>>> >> >
>>> >> > Doing only the boolean queries show no performance difference
>>> between
>>> >> > 4.10
>>> >> > and 5.2, same thing if I do filters on a string field instead of the
>>> >> > range
>>> >> > queries.
>>> >> >
>>> >> > When using "double" field type (precisionStep="0"), the difference
>>> was
>>> >> > bigger:
>>> >> >
>>> >> > longitude/latitude fields:
>>> >> > <field name="longitude" type="double" docValues="true"/>
>>> >> > <field name="latitude" type="double" docValues="true"/>
>>> >> > <fieldType name="double" class="solr.TrieDoubleField"
>>> precisionStep="0"
>>> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > http://snag.gy/Vi5uk.jpg
>>> >> > I understand this is not the best field type definition for range
>>> >> > queries,
>>> >> > I'm just trying to understand the difference between the two
>>> versions
>>> >> > and
>>> >> > why.
>>> >> >
>>> >> > Performance was OK when doing range queries on the "population"
>>> field
>>> >> > (long), but that field doesn't have many different values, only
>>> 300k out
>>> >> > of
>>> >> > the 8.3M docs have the population field with a value different to
>>> 0. On
>>> >> > the
>>> >> > other hand, doing range queries on the _version_ field did show a
>>> graph
>>> >> > similar to the previous one:
>>> >> >
>>> >> > <field name="_version_" type="long" indexed="true" stored="true"/>
>>> >> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
>>> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > http://snag.gy/4tc7e.jpg
>>> >> >
>>> >> > Any idea what could be causing this? Is this expected after some
>>> known
>>> >> > change?
>>> >> >
>>> >> > With Solr 4.10, a single CPU core remains high during the test
>>> (close to
>>> >> > 100%), but with Solr 5.2, different cores go up and down in
>>> utilization
>>> >> > continuously. That's probably because of the different Jetty
>>> version I
>>> >> > suppose.
>>> >> > GC pattern looks similar in both. For both Solr versions I'm using
>>> the
>>> >> > settings that ship with Solr (in solr.in.sh) except for Xmx and Xms
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Adrien
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Adrien
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>
>
> --
> Anshum Gupta
>

Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Anshum Gupta <an...@anshumgupta.net>.
Guess bypassing that check and always returning true instead would help
confirm that. If that's the case, we should just initialize the timeout to
Long.MAX_VALUE and check for that to short-circuit?

On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe <tomasflobbe@gmail.com
> wrote:

> Yes, I saw that, but thought it could be the underlying implementation,
> not the "ExitableTermsEnum" wrapper. Maybe it's related to the calls to
> System.nanoTime then...
>
> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <jp...@gmail.com> wrote:
>
>> Thanks for sharing the traces, it looks like my intuition was wrong.
>> :) They seem to point to
>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether
>> the time is out before delegating.
>>
>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
>> <to...@gmail.com> wrote:
>> > Thanks Adrien,
>> > I'll run the tests with 5.3 snapshot and post the results here. In case
>> this
>> > helps, this is the hprof samples output
>> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt)
>> for
>> > 4.10.4 and 5.2.1 in my test:
>> >
>> > Solr 4.10.4:
>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
>> > rank   self  accum   count trace method
>> >    1 75.07% 75.07%  182812 300523 java.net.PlainSocketImpl.socketAccept
>> >    2  4.27% 79.34%   10408 301576
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData
>> >    3  4.15% 83.49%   10108 301585
>> > org.apache.lucene.index.FilteredTermsEnum.docs
>> >    4  3.46% 86.95%    8419 301582
>> > org.apache.lucene.index.FilteredTermsEnum.next
>> >    5  2.49% 89.44%    6070 301573 java.net.SocketInputStream.socketRead0
>> >    6  1.99% 91.43%    4848 301599
>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>> >    7  1.98% 93.42%    4824 301583
>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>> >    8  1.57% 94.99%    3824 301589
>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>> >    9  1.44% 96.42%    3504 301594
>> >
>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs
>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
>> >   11  0.98% 98.50%    2388 301598
>> >
>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc
>> >   12  0.62% 99.12%    1522 301600
>> org.apache.lucene.store.DataInput.readVInt
>> >   13  0.21% 99.33%     500 301612
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
>> >   14  0.07% 99.39%     167 301601
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>> >   15  0.06% 99.45%     139 301619 java.lang.System.identityHashCode
>> >   16  0.05% 99.50%     114 301632
>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
>> >   17  0.04% 99.54%      92 300708 java.util.zip.Inflater.inflateBytes
>> >   18  0.03% 99.57%      76 301624
>> >
>> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>> >   19  0.03% 99.59%      68 300613 java.lang.ClassLoader.defineClass1
>> >   20  0.03% 99.62%      68 301615
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> >   21  0.03% 99.65%      62 301635
>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
>> >   22  0.02% 99.66%      41 301664
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> >   23  0.01% 99.68%      31 301629
>> org.apache.lucene.util.FixedBitSet.<init>
>> > CPU SAMPLES END
>> >
>> > Solr 5.2.1:
>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
>> > rank   self  accum   count trace method
>> >    1 51.38% 51.38%  120954 301291 sun.nio.ch.EPollArrayWrapper.epollWait
>> >    2 25.69% 77.07%   60477 301292
>> sun.nio.ch.ServerSocketChannelImpl.accept0
>> >    3 10.59% 87.66%   24923 301369
>> > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
>> >    4  2.20% 89.86%    5182 301414
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
>> >    5  2.01% 91.87%    4742 301384
>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
>> >    6  1.25% 93.12%    2944 301434
>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
>> >    7  1.11% 94.23%    2612 301367
>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>> >    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
>> >   10  0.78% 96.87%    1825 301449
>> >
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs
>> >   11  0.73% 97.60%    1717 301378
>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>> >   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
>> >   13  0.33% 98.66%     787 301387
>> >
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>> >   14  0.16% 98.82%     374 301426
>> >
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>> >   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
>> >   16  0.09% 99.02%     219 301381
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>> >   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
>> >   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
>> >   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
>> >   20  0.06% 99.30%     151 301441
>> org.apache.lucene.store.DataInput.readVInt
>> >   21  0.06% 99.36%     147 301389 java.lang.System.identityHashCode
>> >   22  0.06% 99.42%     140 301375
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> >   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
>> >   24  0.03% 99.50%      76 301423
>> >
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>> >   25  0.03% 99.53%      74 301454
>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>> >   26  0.03% 99.56%      65 301432
>> > org.apache.lucene.util.BitDocIdSet$Builder.or
>> >   27  0.02% 99.58%      53 301456 org.apache.lucene.util.FixedBitSet.or
>> >   28  0.02% 99.60%      52 300077 java.lang.ClassLoader.defineClass1
>> >   29  0.02% 99.63%      50 301464
>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
>> >   30  0.02% 99.64%      39 301438
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> >   31  0.02% 99.66%      37 301465
>> >
>> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>> >   32  0.02% 99.67%      36 301419
>> >
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>> > CPU SAMPLES END
>> >
>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jp...@gmail.com>
>> wrote:
>> >>
>> >> Hi Tomás,
>> >>
>> >> I suspect this might be related to LUCENE-5938. We changed the default
>> >> rewrite method for multi-term queries to load documents into a sparse
>> >> bit set first first, and only upgrade to a dense bit set when we know
>> >> many documents match. When there are lots of terms to intersect, then
>> >> we end up spending significant cpu time to build the sparse bit set to
>> >> eventually upgrade to a dense bit set like before. This might be what
>> >> you are seeing.
>> >>
>> >> You might see the issue less with the population field because it has
>> >> fewer unique values, so postings lists are longer and the DocIdSet
>> >> building logic can upgrade quicker to a dense bit set.
>> >>
>> >> Mike noticed this slowness when working on BDK trees and we changed
>> >> this first phase to use a plain int[] array that we sort and
>> >> deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
>> >> which seemed to make things faster. Would it be possible for you to
>> >> also check a 5.3 snapshot?
>> >>
>> >>
>> >>
>> >>
>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
>> >> <to...@gmail.com> wrote:
>> >> > Hi, I'm seeing some query performance degradation between 4.10.4 and
>> >> > 5.2.1.
>> >> > It doesn't happen with all the queries, but for queries like range
>> >> > queries
>> >> > on fields with many different values the average time in 5.2.1 is
>> worse
>> >> > than
>> >> > in 4.10.4. Is anyone seeing something similar?
>> >> >
>> >> > Test Details:
>> >> > * Single thread running queries continuously. I run the test twice
>> for
>> >> > each
>> >> > Solr version.
>> >> > * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge
>> >> > instance
>> >> > with all the defaults but with 5G heap. Index in local disk (SSD)
>> >> > * Plain Solr releases, nothing custom. Single Solr core, not in
>> >> > SolrCloud
>> >> > mode, no distributed search.
>> >> > * "allCountries" geonames dataset (~8M small docs). No updates during
>> >> > the
>> >> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr
>> >> > 4.10.4
>> >> > (fits entirely in RAM)
>> >> > * jdk1.8.0_45
>> >> >
>> >> > Queries: 3k boolean queries (generated with terms from the dataset)
>> with
>> >> > range queries as filters on "tlongitude" and "tlatitude" fields with
>> >> > randomly generated bounds, e.g.
>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]
>> >> >
>> >> > Fields are:
>> >> > <field name="tlatitude" type="tdouble"/>
>> >> > <field name="tlongitude" type="tdouble"/>
>> >> > Field Type:
>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
>> precisionStep="8"
>> >> > positionIncrementGap="0"/>
>> >> >
>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1 in
>> >> > average.
>> >> >
>> >> > http://snag.gy/2yPPM.jpg
>> >> >
>> >> > Doing only the boolean queries show no performance difference between
>> >> > 4.10
>> >> > and 5.2, same thing if I do filters on a string field instead of the
>> >> > range
>> >> > queries.
>> >> >
>> >> > When using "double" field type (precisionStep="0"), the difference
>> was
>> >> > bigger:
>> >> >
>> >> > longitude/latitude fields:
>> >> > <field name="longitude" type="double" docValues="true"/>
>> >> > <field name="latitude" type="double" docValues="true"/>
>> >> > <fieldType name="double" class="solr.TrieDoubleField"
>> precisionStep="0"
>> >> > positionIncrementGap="0"/>
>> >> >
>> >> > http://snag.gy/Vi5uk.jpg
>> >> > I understand this is not the best field type definition for range
>> >> > queries,
>> >> > I'm just trying to understand the difference between the two versions
>> >> > and
>> >> > why.
>> >> >
>> >> > Performance was OK when doing range queries on the "population" field
>> >> > (long), but that field doesn't have many different values, only 300k
>> out
>> >> > of
>> >> > the 8.3M docs have the population field with a value different to 0.
>> On
>> >> > the
>> >> > other hand, doing range queries on the _version_ field did show a
>> graph
>> >> > similar to the previous one:
>> >> >
>> >> > <field name="_version_" type="long" indexed="true" stored="true"/>
>> >> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
>> >> > positionIncrementGap="0"/>
>> >> >
>> >> > http://snag.gy/4tc7e.jpg
>> >> >
>> >> > Any idea what could be causing this? Is this expected after some
>> known
>> >> > change?
>> >> >
>> >> > With Solr 4.10, a single CPU core remains high during the test
>> (close to
>> >> > 100%), but with Solr 5.2, different cores go up and down in
>> utilization
>> >> > continuously. That's probably because of the different Jetty version
>> I
>> >> > suppose.
>> >> > GC pattern looks similar in both. For both Solr versions I'm using
>> the
>> >> > settings that ship with Solr (in solr.in.sh) except for Xmx and Xms
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Adrien
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >
>>
>>
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>


-- 
Anshum Gupta

Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
Yes, I saw that, but thought it could be the underlying implementation, not
the "ExitableTermsEnum" wrapper. Maybe it's related to the calls to
System.nanoTime then...

On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <jp...@gmail.com> wrote:

> Thanks for sharing the traces, it looks like my intuition was wrong.
> :) They seem to point to
> ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether
> the time is out before delegating.
>
> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
> <to...@gmail.com> wrote:
> > Thanks Adrien,
> > I'll run the tests with 5.3 snapshot and post the results here. In case
> this
> > helps, this is the hprof samples output
> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt) for
> > 4.10.4 and 5.2.1 in my test:
> >
> > Solr 4.10.4:
> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
> > rank   self  accum   count trace method
> >    1 75.07% 75.07%  182812 300523 java.net.PlainSocketImpl.socketAccept
> >    2  4.27% 79.34%   10408 301576
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData
> >    3  4.15% 83.49%   10108 301585
> > org.apache.lucene.index.FilteredTermsEnum.docs
> >    4  3.46% 86.95%    8419 301582
> > org.apache.lucene.index.FilteredTermsEnum.next
> >    5  2.49% 89.44%    6070 301573 java.net.SocketInputStream.socketRead0
> >    6  1.99% 91.43%    4848 301599
> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
> >    7  1.98% 93.42%    4824 301583
> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
> >    8  1.57% 94.99%    3824 301589
> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
> >    9  1.44% 96.42%    3504 301594
> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs
> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
> >   11  0.98% 98.50%    2388 301598
> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc
> >   12  0.62% 99.12%    1522 301600
> org.apache.lucene.store.DataInput.readVInt
> >   13  0.21% 99.33%     500 301612
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
> >   14  0.07% 99.39%     167 301601
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
> >   15  0.06% 99.45%     139 301619 java.lang.System.identityHashCode
> >   16  0.05% 99.50%     114 301632
> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
> >   17  0.04% 99.54%      92 300708 java.util.zip.Inflater.inflateBytes
> >   18  0.03% 99.57%      76 301624
> >
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
> >   19  0.03% 99.59%      68 300613 java.lang.ClassLoader.defineClass1
> >   20  0.03% 99.62%      68 301615
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> >   21  0.03% 99.65%      62 301635
> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
> >   22  0.02% 99.66%      41 301664
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> >   23  0.01% 99.68%      31 301629
> org.apache.lucene.util.FixedBitSet.<init>
> > CPU SAMPLES END
> >
> > Solr 5.2.1:
> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
> > rank   self  accum   count trace method
> >    1 51.38% 51.38%  120954 301291 sun.nio.ch.EPollArrayWrapper.epollWait
> >    2 25.69% 77.07%   60477 301292
> sun.nio.ch.ServerSocketChannelImpl.accept0
> >    3 10.59% 87.66%   24923 301369
> > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
> >    4  2.20% 89.86%    5182 301414
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
> >    5  2.01% 91.87%    4742 301384
> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
> >    6  1.25% 93.12%    2944 301434
> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
> >    7  1.11% 94.23%    2612 301367
> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
> >    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
> >   10  0.78% 96.87%    1825 301449
> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs
> >   11  0.73% 97.60%    1717 301378
> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
> >   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
> >   13  0.33% 98.66%     787 301387
> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
> >   14  0.16% 98.82%     374 301426
> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
> >   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
> >   16  0.09% 99.02%     219 301381
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
> >   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
> >   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
> >   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
> >   20  0.06% 99.30%     151 301441
> org.apache.lucene.store.DataInput.readVInt
> >   21  0.06% 99.36%     147 301389 java.lang.System.identityHashCode
> >   22  0.06% 99.42%     140 301375
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> >   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
> >   24  0.03% 99.50%      76 301423
> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
> >   25  0.03% 99.53%      74 301454
> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
> >   26  0.03% 99.56%      65 301432
> > org.apache.lucene.util.BitDocIdSet$Builder.or
> >   27  0.02% 99.58%      53 301456 org.apache.lucene.util.FixedBitSet.or
> >   28  0.02% 99.60%      52 300077 java.lang.ClassLoader.defineClass1
> >   29  0.02% 99.63%      50 301464
> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
> >   30  0.02% 99.64%      39 301438
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
> >   31  0.02% 99.66%      37 301465
> >
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
> >   32  0.02% 99.67%      36 301419
> >
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
> > CPU SAMPLES END
> >
> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jp...@gmail.com> wrote:
> >>
> >> Hi Tomás,
> >>
> >> I suspect this might be related to LUCENE-5938. We changed the default
> >> rewrite method for multi-term queries to load documents into a sparse
> >> bit set first first, and only upgrade to a dense bit set when we know
> >> many documents match. When there are lots of terms to intersect, then
> >> we end up spending significant cpu time to build the sparse bit set to
> >> eventually upgrade to a dense bit set like before. This might be what
> >> you are seeing.
> >>
> >> You might see the issue less with the population field because it has
> >> fewer unique values, so postings lists are longer and the DocIdSet
> >> building logic can upgrade quicker to a dense bit set.
> >>
> >> Mike noticed this slowness when working on BDK trees and we changed
> >> this first phase to use a plain int[] array that we sort and
> >> deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
> >> which seemed to make things faster. Would it be possible for you to
> >> also check a 5.3 snapshot?
> >>
> >>
> >>
> >>
> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
> >> <to...@gmail.com> wrote:
> >> > Hi, I'm seeing some query performance degradation between 4.10.4 and
> >> > 5.2.1.
> >> > It doesn't happen with all the queries, but for queries like range
> >> > queries
> >> > on fields with many different values the average time in 5.2.1 is
> worse
> >> > than
> >> > in 4.10.4. Is anyone seeing something similar?
> >> >
> >> > Test Details:
> >> > * Single thread running queries continuously. I run the test twice for
> >> > each
> >> > Solr version.
> >> > * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge
> >> > instance
> >> > with all the defaults but with 5G heap. Index in local disk (SSD)
> >> > * Plain Solr releases, nothing custom. Single Solr core, not in
> >> > SolrCloud
> >> > mode, no distributed search.
> >> > * "allCountries" geonames dataset (~8M small docs). No updates during
> >> > the
> >> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr
> >> > 4.10.4
> >> > (fits entirely in RAM)
> >> > * jdk1.8.0_45
> >> >
> >> > Queries: 3k boolean queries (generated with terms from the dataset)
> with
> >> > range queries as filters on "tlongitude" and "tlatitude" fields with
> >> > randomly generated bounds, e.g.
> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]
> >> >
> >> > Fields are:
> >> > <field name="tlatitude" type="tdouble"/>
> >> > <field name="tlongitude" type="tdouble"/>
> >> > Field Type:
> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
> precisionStep="8"
> >> > positionIncrementGap="0"/>
> >> >
> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1 in
> >> > average.
> >> >
> >> > http://snag.gy/2yPPM.jpg
> >> >
> >> > Doing only the boolean queries show no performance difference between
> >> > 4.10
> >> > and 5.2, same thing if I do filters on a string field instead of the
> >> > range
> >> > queries.
> >> >
> >> > When using "double" field type (precisionStep="0"), the difference was
> >> > bigger:
> >> >
> >> > longitude/latitude fields:
> >> > <field name="longitude" type="double" docValues="true"/>
> >> > <field name="latitude" type="double" docValues="true"/>
> >> > <fieldType name="double" class="solr.TrieDoubleField"
> precisionStep="0"
> >> > positionIncrementGap="0"/>
> >> >
> >> > http://snag.gy/Vi5uk.jpg
> >> > I understand this is not the best field type definition for range
> >> > queries,
> >> > I'm just trying to understand the difference between the two versions
> >> > and
> >> > why.
> >> >
> >> > Performance was OK when doing range queries on the "population" field
> >> > (long), but that field doesn't have many different values, only 300k
> out
> >> > of
> >> > the 8.3M docs have the population field with a value different to 0.
> On
> >> > the
> >> > other hand, doing range queries on the _version_ field did show a
> graph
> >> > similar to the previous one:
> >> >
> >> > <field name="_version_" type="long" indexed="true" stored="true"/>
> >> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> >> > positionIncrementGap="0"/>
> >> >
> >> > http://snag.gy/4tc7e.jpg
> >> >
> >> > Any idea what could be causing this? Is this expected after some known
> >> > change?
> >> >
> >> > With Solr 4.10, a single CPU core remains high during the test (close
> to
> >> > 100%), but with Solr 5.2, different cores go up and down in
> utilization
> >> > continuously. That's probably because of the different Jetty version I
> >> > suppose.
> >> > GC pattern looks similar in both. For both Solr versions I'm using the
> >> > settings that ship with Solr (in solr.in.sh) except for Xmx and Xms
> >> >
> >>
> >>
> >>
> >> --
> >> Adrien
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Adrien Grand <jp...@gmail.com>.
Thanks for sharing the traces, it looks like my intuition was wrong.
:) They seem to point to
ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether
the time is out before delegating.

On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
<to...@gmail.com> wrote:
> Thanks Adrien,
> I'll run the tests with 5.3 snapshot and post the results here. In case this
> helps, this is the hprof samples output
> (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt) for
> 4.10.4 and 5.2.1 in my test:
>
> Solr 4.10.4:
> CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
> rank   self  accum   count trace method
>    1 75.07% 75.07%  182812 300523 java.net.PlainSocketImpl.socketAccept
>    2  4.27% 79.34%   10408 301576
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData
>    3  4.15% 83.49%   10108 301585
> org.apache.lucene.index.FilteredTermsEnum.docs
>    4  3.46% 86.95%    8419 301582
> org.apache.lucene.index.FilteredTermsEnum.next
>    5  2.49% 89.44%    6070 301573 java.net.SocketInputStream.socketRead0
>    6  1.99% 91.43%    4848 301599
> org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>    7  1.98% 93.42%    4824 301583
> org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>    8  1.57% 94.99%    3824 301589
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>    9  1.44% 96.42%    3504 301594
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs
>   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
>   11  0.98% 98.50%    2388 301598
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc
>   12  0.62% 99.12%    1522 301600 org.apache.lucene.store.DataInput.readVInt
>   13  0.21% 99.33%     500 301612
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
>   14  0.07% 99.39%     167 301601
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>   15  0.06% 99.45%     139 301619 java.lang.System.identityHashCode
>   16  0.05% 99.50%     114 301632
> org.apache.lucene.codecs.lucene41.ForUtil.readBlock
>   17  0.04% 99.54%      92 300708 java.util.zip.Inflater.inflateBytes
>   18  0.03% 99.57%      76 301624
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>   19  0.03% 99.59%      68 300613 java.lang.ClassLoader.defineClass1
>   20  0.03% 99.62%      68 301615
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>   21  0.03% 99.65%      62 301635
> org.apache.solr.search.SolrIndexSearcher.getDocSetNC
>   22  0.02% 99.66%      41 301664
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>   23  0.01% 99.68%      31 301629 org.apache.lucene.util.FixedBitSet.<init>
> CPU SAMPLES END
>
> Solr 5.2.1:
> CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
> rank   self  accum   count trace method
>    1 51.38% 51.38%  120954 301291 sun.nio.ch.EPollArrayWrapper.epollWait
>    2 25.69% 77.07%   60477 301292 sun.nio.ch.ServerSocketChannelImpl.accept0
>    3 10.59% 87.66%   24923 301369
> org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
>    4  2.20% 89.86%    5182 301414
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
>    5  2.01% 91.87%    4742 301384
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
>    6  1.25% 93.12%    2944 301434
> java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
>    7  1.11% 94.23%    2612 301367
> org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
>    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
>   10  0.78% 96.87%    1825 301449
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs
>   11  0.73% 97.60%    1717 301378
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
>   13  0.33% 98.66%     787 301387
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>   14  0.16% 98.82%     374 301426
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
>   16  0.09% 99.02%     219 301381
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
>   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
>   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
>   20  0.06% 99.30%     151 301441 org.apache.lucene.store.DataInput.readVInt
>   21  0.06% 99.36%     147 301389 java.lang.System.identityHashCode
>   22  0.06% 99.42%     140 301375
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
>   24  0.03% 99.50%      76 301423
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>   25  0.03% 99.53%      74 301454
> org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>   26  0.03% 99.56%      65 301432
> org.apache.lucene.util.BitDocIdSet$Builder.or
>   27  0.02% 99.58%      53 301456 org.apache.lucene.util.FixedBitSet.or
>   28  0.02% 99.60%      52 300077 java.lang.ClassLoader.defineClass1
>   29  0.02% 99.63%      50 301464
> org.apache.lucene.codecs.lucene50.ForUtil.readBlock
>   30  0.02% 99.64%      39 301438
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>   31  0.02% 99.66%      37 301465
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>   32  0.02% 99.67%      36 301419
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
> CPU SAMPLES END
>
> On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jp...@gmail.com> wrote:
>>
>> Hi Tomás,
>>
>> I suspect this might be related to LUCENE-5938. We changed the default
>> rewrite method for multi-term queries to load documents into a sparse
>> bit set first first, and only upgrade to a dense bit set when we know
>> many documents match. When there are lots of terms to intersect, then
>> we end up spending significant cpu time to build the sparse bit set to
>> eventually upgrade to a dense bit set like before. This might be what
>> you are seeing.
>>
>> You might see the issue less with the population field because it has
>> fewer unique values, so postings lists are longer and the DocIdSet
>> building logic can upgrade quicker to a dense bit set.
>>
>> Mike noticed this slowness when working on BDK trees and we changed
>> this first phase to use a plain int[] array that we sort and
>> deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
>> which seemed to make things faster. Would it be possible for you to
>> also check a 5.3 snapshot?
>>
>>
>>
>>
>> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
>> <to...@gmail.com> wrote:
>> > Hi, I'm seeing some query performance degradation between 4.10.4 and
>> > 5.2.1.
>> > It doesn't happen with all the queries, but for queries like range
>> > queries
>> > on fields with many different values the average time in 5.2.1 is worse
>> > than
>> > in 4.10.4. Is anyone seeing something similar?
>> >
>> > Test Details:
>> > * Single thread running queries continuously. I run the test twice for
>> > each
>> > Solr version.
>> > * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge
>> > instance
>> > with all the defaults but with 5G heap. Index in local disk (SSD)
>> > * Plain Solr releases, nothing custom. Single Solr core, not in
>> > SolrCloud
>> > mode, no distributed search.
>> > * "allCountries" geonames dataset (~8M small docs). No updates during
>> > the
>> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr
>> > 4.10.4
>> > (fits entirely in RAM)
>> > * jdk1.8.0_45
>> >
>> > Queries: 3k boolean queries (generated with terms from the dataset) with
>> > range queries as filters on "tlongitude" and "tlatitude" fields with
>> > randomly generated bounds, e.g.
>> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]
>> >
>> > Fields are:
>> > <field name="tlatitude" type="tdouble"/>
>> > <field name="tlongitude" type="tdouble"/>
>> > Field Type:
>> > <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8"
>> > positionIncrementGap="0"/>
>> >
>> > In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1 in
>> > average.
>> >
>> > http://snag.gy/2yPPM.jpg
>> >
>> > Doing only the boolean queries show no performance difference between
>> > 4.10
>> > and 5.2, same thing if I do filters on a string field instead of the
>> > range
>> > queries.
>> >
>> > When using "double" field type (precisionStep="0"), the difference was
>> > bigger:
>> >
>> > longitude/latitude fields:
>> > <field name="longitude" type="double" docValues="true"/>
>> > <field name="latitude" type="double" docValues="true"/>
>> > <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0"
>> > positionIncrementGap="0"/>
>> >
>> > http://snag.gy/Vi5uk.jpg
>> > I understand this is not the best field type definition for range
>> > queries,
>> > I'm just trying to understand the difference between the two versions
>> > and
>> > why.
>> >
>> > Performance was OK when doing range queries on the "population" field
>> > (long), but that field doesn't have many different values, only 300k out
>> > of
>> > the 8.3M docs have the population field with a value different to 0. On
>> > the
>> > other hand, doing range queries on the _version_ field did show a graph
>> > similar to the previous one:
>> >
>> > <field name="_version_" type="long" indexed="true" stored="true"/>
>> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
>> > positionIncrementGap="0"/>
>> >
>> > http://snag.gy/4tc7e.jpg
>> >
>> > Any idea what could be causing this? Is this expected after some known
>> > change?
>> >
>> > With Solr 4.10, a single CPU core remains high during the test (close to
>> > 100%), but with Solr 5.2, different cores go up and down in utilization
>> > continuously. That's probably because of the different Jetty version I
>> > suppose.
>> > GC pattern looks similar in both. For both Solr versions I'm using the
>> > settings that ship with Solr (in solr.in.sh) except for Xmx and Xms
>> >
>>
>>
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
Thanks Adrien,
I'll run the tests with 5.3 snapshot and post the results here. In case
this helps, this is the hprof samples output
(-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt) for
4.10.4 and 5.2.1 in my test:

Solr 4.10.4:
CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
rank   self  accum   count trace method
   1 75.07% 75.07%  182812 300523 java.net.PlainSocketImpl.socketAccept
   2  4.27% 79.34%   10408 301576
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData
   3  4.15% 83.49%   10108 301585
org.apache.lucene.index.FilteredTermsEnum.docs
   4  3.46% 86.95%    8419 301582
org.apache.lucene.index.FilteredTermsEnum.next
   5  2.49% 89.44%    6070 301573 java.net.SocketInputStream.socketRead0
   6  1.99% 91.43%    4848 301599
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
   7  1.98% 93.42%    4824 301583
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
   8  1.57% 94.99%    3824 301589
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
   9  1.44% 96.42%    3504 301594
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs
  10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
  11  0.98% 98.50%    2388 301598
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc
  12  0.62% 99.12%    1522 301600 org.apache.lucene.store.DataInput.readVInt
  13  0.21% 99.33%     500 301612
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
  14  0.07% 99.39%     167 301601
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
  15  0.06% 99.45%     139 301619 java.lang.System.identityHashCode
  16  0.05% 99.50%     114 301632
org.apache.lucene.codecs.lucene41.ForUtil.readBlock
  17  0.04% 99.54%      92 300708 java.util.zip.Inflater.inflateBytes
  18  0.03% 99.57%      76 301624
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
  19  0.03% 99.59%      68 300613 java.lang.ClassLoader.defineClass1
  20  0.03% 99.62%      68 301615
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
  21  0.03% 99.65%      62 301635
org.apache.solr.search.SolrIndexSearcher.getDocSetNC
  22  0.02% 99.66%      41 301664
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
  23  0.01% 99.68%      31 301629 org.apache.lucene.util.FixedBitSet.<init>
CPU SAMPLES END

Solr 5.2.1:
CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
rank   self  accum   count trace method
   1 51.38% 51.38%  120954 301291 sun.nio.ch.EPollArrayWrapper.epollWait
   2 25.69% 77.07%   60477 301292 sun.nio.ch.ServerSocketChannelImpl.accept0
   3 10.59% 87.66%   24923 301369
org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
   4  2.20% 89.86%    5182 301414
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
   5  2.01% 91.87%    4742 301384
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
   6  1.25% 93.12%    2944 301434
java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
   7  1.11% 94.23%    2612 301367
org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
   8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
   9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
  10  0.78% 96.87%    1825 301449
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs
  11  0.73% 97.60%    1717 301378
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
  12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
  13  0.33% 98.66%     787 301387
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
  14  0.16% 98.82%     374 301426
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
  15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
  16  0.09% 99.02%     219 301381
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
  17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
  18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
  19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
  20  0.06% 99.30%     151 301441 org.apache.lucene.store.DataInput.readVInt
  21  0.06% 99.36%     147 301389 java.lang.System.identityHashCode
  22  0.06% 99.42%     140 301375
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
  23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
  24  0.03% 99.50%      76 301423
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
  25  0.03% 99.53%      74 301454
org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
  26  0.03% 99.56%      65 301432
org.apache.lucene.util.BitDocIdSet$Builder.or
  27  0.02% 99.58%      53 301456 org.apache.lucene.util.FixedBitSet.or
  28  0.02% 99.60%      52 300077 java.lang.ClassLoader.defineClass1
  29  0.02% 99.63%      50 301464
org.apache.lucene.codecs.lucene50.ForUtil.readBlock
  30  0.02% 99.64%      39 301438
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
  31  0.02% 99.66%      37 301465
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
  32  0.02% 99.67%      36 301419
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
CPU SAMPLES END

On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <jp...@gmail.com> wrote:

> Hi Tomás,
>
> I suspect this might be related to LUCENE-5938. We changed the default
> rewrite method for multi-term queries to load documents into a sparse
> bit set first first, and only upgrade to a dense bit set when we know
> many documents match. When there are lots of terms to intersect, then
> we end up spending significant cpu time to build the sparse bit set to
> eventually upgrade to a dense bit set like before. This might be what
> you are seeing.
>
> You might see the issue less with the population field because it has
> fewer unique values, so postings lists are longer and the DocIdSet
> building logic can upgrade quicker to a dense bit set.
>
> Mike noticed this slowness when working on BDK trees and we changed
> this first phase to use a plain int[] array that we sort and
> deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
> which seemed to make things faster. Would it be possible for you to
> also check a 5.3 snapshot?
>
>
>
>
> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
> <to...@gmail.com> wrote:
> > Hi, I'm seeing some query performance degradation between 4.10.4 and
> 5.2.1.
> > It doesn't happen with all the queries, but for queries like range
> queries
> > on fields with many different values the average time in 5.2.1 is worse
> than
> > in 4.10.4. Is anyone seeing something similar?
> >
> > Test Details:
> > * Single thread running queries continuously. I run the test twice for
> each
> > Solr version.
> > * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge
> instance
> > with all the defaults but with 5G heap. Index in local disk (SSD)
> > * Plain Solr releases, nothing custom. Single Solr core, not in SolrCloud
> > mode, no distributed search.
> > * "allCountries" geonames dataset (~8M small docs). No updates during the
> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr 4.10.4
> > (fits entirely in RAM)
> > * jdk1.8.0_45
> >
> > Queries: 3k boolean queries (generated with terms from the dataset) with
> > range queries as filters on "tlongitude" and "tlatitude" fields with
> > randomly generated bounds, e.g.
> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]
> >
> > Fields are:
> > <field name="tlatitude" type="tdouble"/>
> > <field name="tlongitude" type="tdouble"/>
> > Field Type:
> > <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8"
> > positionIncrementGap="0"/>
> >
> > In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1 in
> > average.
> >
> > http://snag.gy/2yPPM.jpg
> >
> > Doing only the boolean queries show no performance difference between
> 4.10
> > and 5.2, same thing if I do filters on a string field instead of the
> range
> > queries.
> >
> > When using "double" field type (precisionStep="0"), the difference was
> > bigger:
> >
> > longitude/latitude fields:
> > <field name="longitude" type="double" docValues="true"/>
> > <field name="latitude" type="double" docValues="true"/>
> > <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0"
> > positionIncrementGap="0"/>
> >
> > http://snag.gy/Vi5uk.jpg
> > I understand this is not the best field type definition for range
> queries,
> > I'm just trying to understand the difference between the two versions and
> > why.
> >
> > Performance was OK when doing range queries on the "population" field
> > (long), but that field doesn't have many different values, only 300k out
> of
> > the 8.3M docs have the population field with a value different to 0. On
> the
> > other hand, doing range queries on the _version_ field did show a graph
> > similar to the previous one:
> >
> > <field name="_version_" type="long" indexed="true" stored="true"/>
> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> > positionIncrementGap="0"/>
> >
> > http://snag.gy/4tc7e.jpg
> >
> > Any idea what could be causing this? Is this expected after some known
> > change?
> >
> > With Solr 4.10, a single CPU core remains high during the test (close to
> > 100%), but with Solr 5.2, different cores go up and down in utilization
> > continuously. That's probably because of the different Jetty version I
> > suppose.
> > GC pattern looks similar in both. For both Solr versions I'm using the
> > settings that ship with Solr (in solr.in.sh) except for Xmx and Xms
> >
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>