You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Jan Høydahl <ja...@cominvent.com> on 2023/06/01 07:51:20 UTC

Re: Atomic updates too slow in Solr 8 vs Solr 7

Yes, the numbers should be similar. Thanks for digging in. If there is a big regression, it could be worthwile running the same test on v8.0, 8.1, 8.2 etc to plot in what version the regression happened. Perhaps it could even be scripted :)

Solr unfortunately do not yet have a comprehensive official nightly benchmark run that would catch such regressions. But the community is working on it, there are some tests on http://mostly.cool maintained by Ishan, but I have no idea how to add new tests..

Jan

> 31. mai 2023 kl. 23:50 skrev Rahul Goswami <ra...@gmail.com>:
> 
> Sure, I can do that. Let me create an index with a few million docs, call
> RTG with a few million iterations on it and note the times between 7.x and
> 8.x. I assume this should be sufficient (?)
> 
> On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <ja...@cominvent.com> wrote:
> 
>> Would be nice to determine whether RTG is orders of magnitude slower in
>> 8.x than 7.x and is the main culprit.  Then we could isolate the testing to
>> RTG only and not involce Atomic Update?
>> 
>> Jan
>> 
>>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <ra...@gmail.com>:
>>> 
>>> I don’t have any nested documents. And the results are consistent across
>>> multiple runs. I tried looking for similar issues in the mailing list,
>> but
>>> couldn’t find anything relevant . So if you do happen to find any JIRAs
>>> addressing it that would be really helpful (thanks!).
>>> 
>>> To Jan’s question about RTG taking more time in Solr 8.x, I can say with
>>> good certainty that this is the case. Although it does look into
>>> transaction logs first, thread dumps suggest that it is the next phase
>>> (when it doesn't find the doc in tlog) which seems to be time consuming .
>>> It tries to look up the document via the current searcher
>>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is this
>>> call where many threads are spending time:
>>> 
>>> 
>> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
>>> 
>>> Although this call is the same in 7.7.2 and 8.11.1 quite likely
>>> something changed in Lucene's FST.java which is causing the slowness. I
>> am
>>> trying to dig further and might also ask folks on the Lucene mailing
>> list.
>>> Thanks.
>>> 
>>> 
>>> 
>>> On Wed, May 31, 2023 at 11:36 AM Srijan <sh...@gmail.com> wrote:
>>> 
>>>> I would love some profiling as well. I know 8.8 or 8.9 had some
>> performance
>>>> problems with atomic update but this was later addressed. I cant find
>> the
>>>> jira atm though. Also I am on 8.11.1 and atomic update is not an issue
>> for
>>>> me.
>>>> 
>>>> By the way, do you happen to have nested docs?
>>>> 
>>>> 
>>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <ja...@cominvent.com> wrote:
>>>> 
>>>>> Hi
>>>>> 
>>>>> MMap is most important for searching. Indexing bypasses the cache by
>>>> using
>>>>> direct IO.
>>>>> 
>>>>> I have noticed slow real time get on Solr 8.x during atomic update
>>>> myself.
>>>>> Would be interesting with a comparison with profiling. RTG gets the
>>>>> document from transaction log I believe? Could there be some RTG
>> changes
>>>> in
>>>>> 8.x that caused such slowdown?
>>>>> 
>>>>> Jan Høydahl
>>>>> 
>>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <ra...@gmail.com>:
>>>>>> 
>>>>>> Thanks for the response Shawn. We are using Windows server with
>> pretty
>>>>> huge
>>>>>> indexes (multiple TB cores). With Mmap, I have observed that the
>>>> machine
>>>>>> just completely freezes with high CPU and memory usage to a point
>> where
>>>>> it
>>>>>> becomes impossible to even connect to it. SimpleFS works out well for
>>>> us
>>>>> in
>>>>>> this case.
>>>>>> 
>>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes the
>>>>> crawl
>>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
>>>> something
>>>>>> OUTSIDE the directory factory in the code which is causing this.
>>>>>> 
>>>>>> Thanks,
>>>>>> Rahul
>>>>>> 
>>>>>> 
>>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <ap...@elyograg.org>
>>>>> wrote:
>>>>>>> 
>>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
>>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m Xmx3072m
>>>> -
>>>>>>>> Indexing client code running 15 parallel threads indexing in batches
>>>> of
>>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite work
>>>>>>>> well on Windows for our index sizes which commonly run north of 1
>> TB)
>>>>>>> 
>>>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP for
>>>> your
>>>>>>> indexes.  Not using MMAP is likely to slow things down considerably.
>>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.  Which
>>>> of
>>>>>>> course requires 64-bit hardware.
>>>>>>> 
>>>>>>> 32 bit systems and software cannot properly deal with data larger
>> than
>>>>>>> about 2GB.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Shawn
>>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Atomic updates too slow in Solr 8 vs Solr 7

Posted by Rahul Goswami <ra...@gmail.com>.

Sorry to have dropped the ball on this past couple of days. I created the
JIRA with information shared on this mail thread.

https://issues.apache.org/jira/browse/SOLR-16838

Running further benchmarks reveals that the slowness is in
the searcher.getFirstMatch() call inside getInputDocument() .

*Solr 7.7.2*
2023-06-01 21:17:34.492 WARN  (qtp1034094674-41) [   x:techproducts]
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime:
508053 ; *searcherFetchTime: 3229011 *

*Solr 8*
2023-06-01 20:43:31.767 WARN  (qtp391506011-56) [   x:techproducts]
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime:
410873 ; *searcherFetchTime: 33296008 *

Starting a discussion on the Lucene list as well.

Thanks,
Rahul


On Thu, Jun 1, 2023 at 9:17 PM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> > Solr unfortunately do not yet have a comprehensive official nightly
> benchmark run that would catch such regressions. But the community is
> working on it, there are some tests on http://mostly.cool maintained by
> Ishan, but I have no idea how to add new tests..
>
> I'll publish steps to add new tests soon.
>
> For now, check out solr-bench from GitHub.com/fullstorydev/solr-bench
>
> Run an existing test as follows:
>
> mvn clean compile assembly:single
>
> ./stress.sh -c <commitid> suites/cluster-test.json
>
> Results will be written out in suites/results/.
>
>
>
> On Fri, 2 Jun, 2023, 2:44 am Jan Høydahl, <ja...@cominvent.com> wrote:
>
> > Hi,
> >
> > Next step I believe is for you to open a Solr JIRA issue for this
> > regression, as it is obviously an issue.
> > Then we can use that JIRA to coordinate, even if the fix ends up being in
> > Lucene code.
> >
> > Once we have a Solr JIRA, it may make sense to open a mail thread on
> > users@lucene list to get some Lucene expertice involved in digging.
> >
> > Jan
> >
> > > 1. jun. 2023 kl. 18:15 skrev Rahul Goswami <ra...@gmail.com>:
> > >
> > > So I ran the test to index 5 million docs this time (batches of 1000
> docs
> > > in 15 parallel threads). To eliminate the network overhead and get as
> > > accurate a benchmark as possible, I used an AtomicLong to measure the
> > time
> > > around the RTG call in DistibutedUpdateProcessor across all calls (
> > >
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1416
> > ).
> > > Did this for both Solr 7.7.2 and Solr 8.11.1 and built the
> solr-core.jar
> > to
> > > replace it in the solr webapp lib.
> > >
> > > RTG in Solr 8.x is ~10x slower. Here are the numbers (times are in
> > > milliseconds):
> > >
> > > *Solr 7.7.2 *: 2023-06-01 15:39:48.272 WARN  (qtp1034094674-24) [
> > > x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> > time:7293486*
> > > *Solr 8.11.1: *2023-06-01 04:46:24.758 WARN  (qtp391506011-71) [
> > > x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> > > time:72029877*
> > >
> > > Let me know if I can help further with debugging. At this point I am
> > > suspecting the slowness to be on Lucene's side with the findTargetArc()
> > > method in FST.java. I am running more analysis on my side in parallel
> and
> > > will share if/when I find more.
> > >
> > > -Rahul
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Jun 1, 2023 at 3:53 AM Jan Høydahl <ja...@cominvent.com>
> > wrote:
> > >
> > >> Yes, the numbers should be similar. Thanks for digging in. If there
> is a
> > >> big regression, it could be worthwile running the same test on v8.0,
> > 8.1,
> > >> 8.2 etc to plot in what version the regression happened. Perhaps it
> > could
> > >> even be scripted :)
> > >>
> > >> Solr unfortunately do not yet have a comprehensive official nightly
> > >> benchmark run that would catch such regressions. But the community is
> > >> working on it, there are some tests on http://mostly.cool maintained
> by
> > >> Ishan, but I have no idea how to add new tests..
> > >>
> > >> Jan
> > >>
> > >>> 31. mai 2023 kl. 23:50 skrev Rahul Goswami <ra...@gmail.com>:
> > >>>
> > >>> Sure, I can do that. Let me create an index with a few million docs,
> > call
> > >>> RTG with a few million iterations on it and note the times between
> 7.x
> > >> and
> > >>> 8.x. I assume this should be sufficient (?)
> > >>>
> > >>> On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <ja...@cominvent.com>
> > >> wrote:
> > >>>
> > >>>> Would be nice to determine whether RTG is orders of magnitude slower
> > in
> > >>>> 8.x than 7.x and is the main culprit.  Then we could isolate the
> > >> testing to
> > >>>> RTG only and not involce Atomic Update?
> > >>>>
> > >>>> Jan
> > >>>>
> > >>>>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <rahul196452@gmail.com
> >:
> > >>>>>
> > >>>>> I don’t have any nested documents. And the results are consistent
> > >> across
> > >>>>> multiple runs. I tried looking for similar issues in the mailing
> > list,
> > >>>> but
> > >>>>> couldn’t find anything relevant . So if you do happen to find any
> > JIRAs
> > >>>>> addressing it that would be really helpful (thanks!).
> > >>>>>
> > >>>>> To Jan’s question about RTG taking more time in Solr 8.x, I can say
> > >> with
> > >>>>> good certainty that this is the case. Although it does look into
> > >>>>> transaction logs first, thread dumps suggest that it is the next
> > phase
> > >>>>> (when it doesn't find the doc in tlog) which seems to be time
> > >> consuming .
> > >>>>> It tries to look up the document via the current searcher
> > >>>>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is
> > >> this
> > >>>>> call where many threads are spending time:
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
> > >>>>>
> > >>>>> Although this call is the same in 7.7.2 and 8.11.1 quite likely
> > >>>>> something changed in Lucene's FST.java which is causing the
> > slowness. I
> > >>>> am
> > >>>>> trying to dig further and might also ask folks on the Lucene
> mailing
> > >>>> list.
> > >>>>> Thanks.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Wed, May 31, 2023 at 11:36 AM Srijan <sh...@gmail.com>
> wrote:
> > >>>>>
> > >>>>>> I would love some profiling as well. I know 8.8 or 8.9 had some
> > >>>> performance
> > >>>>>> problems with atomic update but this was later addressed. I cant
> > find
> > >>>> the
> > >>>>>> jira atm though. Also I am on 8.11.1 and atomic update is not an
> > issue
> > >>>> for
> > >>>>>> me.
> > >>>>>>
> > >>>>>> By the way, do you happen to have nested docs?
> > >>>>>>
> > >>>>>>
> > >>>>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <ja...@cominvent.com>
> > >> wrote:
> > >>>>>>
> > >>>>>>> Hi
> > >>>>>>>
> > >>>>>>> MMap is most important for searching. Indexing bypasses the cache
> > by
> > >>>>>> using
> > >>>>>>> direct IO.
> > >>>>>>>
> > >>>>>>> I have noticed slow real time get on Solr 8.x during atomic
> update
> > >>>>>> myself.
> > >>>>>>> Would be interesting with a comparison with profiling. RTG gets
> the
> > >>>>>>> document from transaction log I believe? Could there be some RTG
> > >>>> changes
> > >>>>>> in
> > >>>>>>> 8.x that caused such slowdown?
> > >>>>>>>
> > >>>>>>> Jan Høydahl
> > >>>>>>>
> > >>>>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <
> rahul196452@gmail.com
> > >:
> > >>>>>>>>
> > >>>>>>>> Thanks for the response Shawn. We are using Windows server with
> > >>>> pretty
> > >>>>>>> huge
> > >>>>>>>> indexes (multiple TB cores). With Mmap, I have observed that the
> > >>>>>> machine
> > >>>>>>>> just completely freezes with high CPU and memory usage to a
> point
> > >>>> where
> > >>>>>>> it
> > >>>>>>>> becomes impossible to even connect to it. SimpleFS works out
> well
> > >> for
> > >>>>>> us
> > >>>>>>> in
> > >>>>>>>> this case.
> > >>>>>>>>
> > >>>>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes
> > the
> > >>>>>>> crawl
> > >>>>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
> > >>>>>> something
> > >>>>>>>> OUTSIDE the directory factory in the code which is causing this.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Rahul
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <
> > apache@elyograg.org
> > >>>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
> > >>>>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m
> > >> Xmx3072m
> > >>>>>> -
> > >>>>>>>>>> Indexing client code running 15 parallel threads indexing in
> > >> batches
> > >>>>>> of
> > >>>>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't
> quite
> > >> work
> > >>>>>>>>>> well on Windows for our index sizes which commonly run north
> of
> > 1
> > >>>> TB)
> > >>>>>>>>>
> > >>>>>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP
> > for
> > >>>>>> your
> > >>>>>>>>> indexes.  Not using MMAP is likely to slow things down
> > >> considerably.
> > >>>>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.
> > >> Which
> > >>>>>> of
> > >>>>>>>>> course requires 64-bit hardware.
> > >>>>>>>>>
> > >>>>>>>>> 32 bit systems and software cannot properly deal with data
> larger
> > >>>> than
> > >>>>>>>>> about 2GB.
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Shawn
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: Atomic updates too slow in Solr 8 vs Solr 7

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

> Solr unfortunately do not yet have a comprehensive official nightly
benchmark run that would catch such regressions. But the community is
working on it, there are some tests on http://mostly.cool maintained by
Ishan, but I have no idea how to add new tests..

I'll publish steps to add new tests soon.

For now, check out solr-bench from GitHub.com/fullstorydev/solr-bench

Run an existing test as follows:

mvn clean compile assembly:single

./stress.sh -c <commitid> suites/cluster-test.json

Results will be written out in suites/results/.



On Fri, 2 Jun, 2023, 2:44 am Jan Høydahl, <ja...@cominvent.com> wrote:

> Hi,
>
> Next step I believe is for you to open a Solr JIRA issue for this
> regression, as it is obviously an issue.
> Then we can use that JIRA to coordinate, even if the fix ends up being in
> Lucene code.
>
> Once we have a Solr JIRA, it may make sense to open a mail thread on
> users@lucene list to get some Lucene expertice involved in digging.
>
> Jan
>
> > 1. jun. 2023 kl. 18:15 skrev Rahul Goswami <ra...@gmail.com>:
> >
> > So I ran the test to index 5 million docs this time (batches of 1000 docs
> > in 15 parallel threads). To eliminate the network overhead and get as
> > accurate a benchmark as possible, I used an AtomicLong to measure the
> time
> > around the RTG call in DistibutedUpdateProcessor across all calls (
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1416
> ).
> > Did this for both Solr 7.7.2 and Solr 8.11.1 and built the solr-core.jar
> to
> > replace it in the solr webapp lib.
> >
> > RTG in Solr 8.x is ~10x slower. Here are the numbers (times are in
> > milliseconds):
> >
> > *Solr 7.7.2 *: 2023-06-01 15:39:48.272 WARN  (qtp1034094674-24) [
> > x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> time:7293486*
> > *Solr 8.11.1: *2023-06-01 04:46:24.758 WARN  (qtp391506011-71) [
> > x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> > time:72029877*
> >
> > Let me know if I can help further with debugging. At this point I am
> > suspecting the slowness to be on Lucene's side with the findTargetArc()
> > method in FST.java. I am running more analysis on my side in parallel and
> > will share if/when I find more.
> >
> > -Rahul
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jun 1, 2023 at 3:53 AM Jan Høydahl <ja...@cominvent.com>
> wrote:
> >
> >> Yes, the numbers should be similar. Thanks for digging in. If there is a
> >> big regression, it could be worthwile running the same test on v8.0,
> 8.1,
> >> 8.2 etc to plot in what version the regression happened. Perhaps it
> could
> >> even be scripted :)
> >>
> >> Solr unfortunately do not yet have a comprehensive official nightly
> >> benchmark run that would catch such regressions. But the community is
> >> working on it, there are some tests on http://mostly.cool maintained by
> >> Ishan, but I have no idea how to add new tests..
> >>
> >> Jan
> >>
> >>> 31. mai 2023 kl. 23:50 skrev Rahul Goswami <ra...@gmail.com>:
> >>>
> >>> Sure, I can do that. Let me create an index with a few million docs,
> call
> >>> RTG with a few million iterations on it and note the times between 7.x
> >> and
> >>> 8.x. I assume this should be sufficient (?)
> >>>
> >>> On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <ja...@cominvent.com>
> >> wrote:
> >>>
> >>>> Would be nice to determine whether RTG is orders of magnitude slower
> in
> >>>> 8.x than 7.x and is the main culprit.  Then we could isolate the
> >> testing to
> >>>> RTG only and not involce Atomic Update?
> >>>>
> >>>> Jan
> >>>>
> >>>>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <ra...@gmail.com>:
> >>>>>
> >>>>> I don’t have any nested documents. And the results are consistent
> >> across
> >>>>> multiple runs. I tried looking for similar issues in the mailing
> list,
> >>>> but
> >>>>> couldn’t find anything relevant . So if you do happen to find any
> JIRAs
> >>>>> addressing it that would be really helpful (thanks!).
> >>>>>
> >>>>> To Jan’s question about RTG taking more time in Solr 8.x, I can say
> >> with
> >>>>> good certainty that this is the case. Although it does look into
> >>>>> transaction logs first, thread dumps suggest that it is the next
> phase
> >>>>> (when it doesn't find the doc in tlog) which seems to be time
> >> consuming .
> >>>>> It tries to look up the document via the current searcher
> >>>>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is
> >> this
> >>>>> call where many threads are spending time:
> >>>>>
> >>>>>
> >>>>
> >>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
> >>>>>
> >>>>> Although this call is the same in 7.7.2 and 8.11.1 quite likely
> >>>>> something changed in Lucene's FST.java which is causing the
> slowness. I
> >>>> am
> >>>>> trying to dig further and might also ask folks on the Lucene mailing
> >>>> list.
> >>>>> Thanks.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, May 31, 2023 at 11:36 AM Srijan <sh...@gmail.com> wrote:
> >>>>>
> >>>>>> I would love some profiling as well. I know 8.8 or 8.9 had some
> >>>> performance
> >>>>>> problems with atomic update but this was later addressed. I cant
> find
> >>>> the
> >>>>>> jira atm though. Also I am on 8.11.1 and atomic update is not an
> issue
> >>>> for
> >>>>>> me.
> >>>>>>
> >>>>>> By the way, do you happen to have nested docs?
> >>>>>>
> >>>>>>
> >>>>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <ja...@cominvent.com>
> >> wrote:
> >>>>>>
> >>>>>>> Hi
> >>>>>>>
> >>>>>>> MMap is most important for searching. Indexing bypasses the cache
> by
> >>>>>> using
> >>>>>>> direct IO.
> >>>>>>>
> >>>>>>> I have noticed slow real time get on Solr 8.x during atomic update
> >>>>>> myself.
> >>>>>>> Would be interesting with a comparison with profiling. RTG gets the
> >>>>>>> document from transaction log I believe? Could there be some RTG
> >>>> changes
> >>>>>> in
> >>>>>>> 8.x that caused such slowdown?
> >>>>>>>
> >>>>>>> Jan Høydahl
> >>>>>>>
> >>>>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <rahul196452@gmail.com
> >:
> >>>>>>>>
> >>>>>>>> Thanks for the response Shawn. We are using Windows server with
> >>>> pretty
> >>>>>>> huge
> >>>>>>>> indexes (multiple TB cores). With Mmap, I have observed that the
> >>>>>> machine
> >>>>>>>> just completely freezes with high CPU and memory usage to a point
> >>>> where
> >>>>>>> it
> >>>>>>>> becomes impossible to even connect to it. SimpleFS works out well
> >> for
> >>>>>> us
> >>>>>>> in
> >>>>>>>> this case.
> >>>>>>>>
> >>>>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes
> the
> >>>>>>> crawl
> >>>>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
> >>>>>> something
> >>>>>>>> OUTSIDE the directory factory in the code which is causing this.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Rahul
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <
> apache@elyograg.org
> >>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
> >>>>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m
> >> Xmx3072m
> >>>>>> -
> >>>>>>>>>> Indexing client code running 15 parallel threads indexing in
> >> batches
> >>>>>> of
> >>>>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite
> >> work
> >>>>>>>>>> well on Windows for our index sizes which commonly run north of
> 1
> >>>> TB)
> >>>>>>>>>
> >>>>>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP
> for
> >>>>>> your
> >>>>>>>>> indexes.  Not using MMAP is likely to slow things down
> >> considerably.
> >>>>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.
> >> Which
> >>>>>> of
> >>>>>>>>> course requires 64-bit hardware.
> >>>>>>>>>
> >>>>>>>>> 32 bit systems and software cannot properly deal with data larger
> >>>> than
> >>>>>>>>> about 2GB.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Shawn
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Atomic updates too slow in Solr 8 vs Solr 7

Posted by Jan Høydahl <ja...@cominvent.com>.

Hi,

Next step I believe is for you to open a Solr JIRA issue for this regression, as it is obviously an issue.
Then we can use that JIRA to coordinate, even if the fix ends up being in Lucene code.

Once we have a Solr JIRA, it may make sense to open a mail thread on users@lucene list to get some Lucene expertice involved in digging.

Jan

> 1. jun. 2023 kl. 18:15 skrev Rahul Goswami <ra...@gmail.com>:
> 
> So I ran the test to index 5 million docs this time (batches of 1000 docs
> in 15 parallel threads). To eliminate the network overhead and get as
> accurate a benchmark as possible, I used an AtomicLong to measure the time
> around the RTG call in DistibutedUpdateProcessor across all calls (
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1416).
> Did this for both Solr 7.7.2 and Solr 8.11.1 and built the solr-core.jar to
> replace it in the solr webapp lib.
> 
> RTG in Solr 8.x is ~10x slower. Here are the numbers (times are in
> milliseconds):
> 
> *Solr 7.7.2 *: 2023-06-01 15:39:48.272 WARN  (qtp1034094674-24) [
> x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg time:7293486*
> *Solr 8.11.1: *2023-06-01 04:46:24.758 WARN  (qtp391506011-71) [
> x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> time:72029877*
> 
> Let me know if I can help further with debugging. At this point I am
> suspecting the slowness to be on Lucene's side with the findTargetArc()
> method in FST.java. I am running more analysis on my side in parallel and
> will share if/when I find more.
> 
> -Rahul
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Jun 1, 2023 at 3:53 AM Jan Høydahl <ja...@cominvent.com> wrote:
> 
>> Yes, the numbers should be similar. Thanks for digging in. If there is a
>> big regression, it could be worthwile running the same test on v8.0, 8.1,
>> 8.2 etc to plot in what version the regression happened. Perhaps it could
>> even be scripted :)
>> 
>> Solr unfortunately do not yet have a comprehensive official nightly
>> benchmark run that would catch such regressions. But the community is
>> working on it, there are some tests on http://mostly.cool maintained by
>> Ishan, but I have no idea how to add new tests..
>> 
>> Jan
>> 
>>> 31. mai 2023 kl. 23:50 skrev Rahul Goswami <ra...@gmail.com>:
>>> 
>>> Sure, I can do that. Let me create an index with a few million docs, call
>>> RTG with a few million iterations on it and note the times between 7.x
>> and
>>> 8.x. I assume this should be sufficient (?)
>>> 
>>> On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <ja...@cominvent.com>
>> wrote:
>>> 
>>>> Would be nice to determine whether RTG is orders of magnitude slower in
>>>> 8.x than 7.x and is the main culprit.  Then we could isolate the
>> testing to
>>>> RTG only and not involce Atomic Update?
>>>> 
>>>> Jan
>>>> 
>>>>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <ra...@gmail.com>:
>>>>> 
>>>>> I don’t have any nested documents. And the results are consistent
>> across
>>>>> multiple runs. I tried looking for similar issues in the mailing list,
>>>> but
>>>>> couldn’t find anything relevant . So if you do happen to find any JIRAs
>>>>> addressing it that would be really helpful (thanks!).
>>>>> 
>>>>> To Jan’s question about RTG taking more time in Solr 8.x, I can say
>> with
>>>>> good certainty that this is the case. Although it does look into
>>>>> transaction logs first, thread dumps suggest that it is the next phase
>>>>> (when it doesn't find the doc in tlog) which seems to be time
>> consuming .
>>>>> It tries to look up the document via the current searcher
>>>>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is
>> this
>>>>> call where many threads are spending time:
>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
>>>>> 
>>>>> Although this call is the same in 7.7.2 and 8.11.1 quite likely
>>>>> something changed in Lucene's FST.java which is causing the slowness. I
>>>> am
>>>>> trying to dig further and might also ask folks on the Lucene mailing
>>>> list.
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, May 31, 2023 at 11:36 AM Srijan <sh...@gmail.com> wrote:
>>>>> 
>>>>>> I would love some profiling as well. I know 8.8 or 8.9 had some
>>>> performance
>>>>>> problems with atomic update but this was later addressed. I cant find
>>>> the
>>>>>> jira atm though. Also I am on 8.11.1 and atomic update is not an issue
>>>> for
>>>>>> me.
>>>>>> 
>>>>>> By the way, do you happen to have nested docs?
>>>>>> 
>>>>>> 
>>>>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <ja...@cominvent.com>
>> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> 
>>>>>>> MMap is most important for searching. Indexing bypasses the cache by
>>>>>> using
>>>>>>> direct IO.
>>>>>>> 
>>>>>>> I have noticed slow real time get on Solr 8.x during atomic update
>>>>>> myself.
>>>>>>> Would be interesting with a comparison with profiling. RTG gets the
>>>>>>> document from transaction log I believe? Could there be some RTG
>>>> changes
>>>>>> in
>>>>>>> 8.x that caused such slowdown?
>>>>>>> 
>>>>>>> Jan Høydahl
>>>>>>> 
>>>>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <ra...@gmail.com>:
>>>>>>>> 
>>>>>>>> Thanks for the response Shawn. We are using Windows server with
>>>> pretty
>>>>>>> huge
>>>>>>>> indexes (multiple TB cores). With Mmap, I have observed that the
>>>>>> machine
>>>>>>>> just completely freezes with high CPU and memory usage to a point
>>>> where
>>>>>>> it
>>>>>>>> becomes impossible to even connect to it. SimpleFS works out well
>> for
>>>>>> us
>>>>>>> in
>>>>>>>> this case.
>>>>>>>> 
>>>>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes the
>>>>>>> crawl
>>>>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
>>>>>> something
>>>>>>>> OUTSIDE the directory factory in the code which is causing this.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Rahul
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <apache@elyograg.org
>>> 
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
>>>>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m
>> Xmx3072m
>>>>>> -
>>>>>>>>>> Indexing client code running 15 parallel threads indexing in
>> batches
>>>>>> of
>>>>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite
>> work
>>>>>>>>>> well on Windows for our index sizes which commonly run north of 1
>>>> TB)
>>>>>>>>> 
>>>>>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP for
>>>>>> your
>>>>>>>>> indexes.  Not using MMAP is likely to slow things down
>> considerably.
>>>>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.
>> Which
>>>>>> of
>>>>>>>>> course requires 64-bit hardware.
>>>>>>>>> 
>>>>>>>>> 32 bit systems and software cannot properly deal with data larger
>>>> than
>>>>>>>>> about 2GB.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Shawn
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Atomic updates too slow in Solr 8 vs Solr 7

Posted by Rahul Goswami <ra...@gmail.com>.

So I ran the test to index 5 million docs this time (batches of 1000 docs
in 15 parallel threads). To eliminate the network overhead and get as
accurate a benchmark as possible, I used an AtomicLong to measure the time
around the RTG call in DistibutedUpdateProcessor across all calls (
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1416).
Did this for both Solr 7.7.2 and Solr 8.11.1 and built the solr-core.jar to
replace it in the solr webapp lib.

RTG in Solr 8.x is ~10x slower. Here are the numbers (times are in
milliseconds):

*Solr 7.7.2 *: 2023-06-01 15:39:48.272 WARN  (qtp1034094674-24) [
x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg time:7293486*
*Solr 8.11.1: *2023-06-01 04:46:24.758 WARN  (qtp391506011-71) [
x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
time:72029877*

Let me know if I can help further with debugging. At this point I am
suspecting the slowness to be on Lucene's side with the findTargetArc()
method in FST.java. I am running more analysis on my side in parallel and
will share if/when I find more.

-Rahul











On Thu, Jun 1, 2023 at 3:53 AM Jan Høydahl <ja...@cominvent.com> wrote:

> Yes, the numbers should be similar. Thanks for digging in. If there is a
> big regression, it could be worthwile running the same test on v8.0, 8.1,
> 8.2 etc to plot in what version the regression happened. Perhaps it could
> even be scripted :)
>
> Solr unfortunately do not yet have a comprehensive official nightly
> benchmark run that would catch such regressions. But the community is
> working on it, there are some tests on http://mostly.cool maintained by
> Ishan, but I have no idea how to add new tests..
>
> Jan
>
> > 31. mai 2023 kl. 23:50 skrev Rahul Goswami <ra...@gmail.com>:
> >
> > Sure, I can do that. Let me create an index with a few million docs, call
> > RTG with a few million iterations on it and note the times between 7.x
> and
> > 8.x. I assume this should be sufficient (?)
> >
> > On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <ja...@cominvent.com>
> wrote:
> >
> >> Would be nice to determine whether RTG is orders of magnitude slower in
> >> 8.x than 7.x and is the main culprit.  Then we could isolate the
> testing to
> >> RTG only and not involce Atomic Update?
> >>
> >> Jan
> >>
> >>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <ra...@gmail.com>:
> >>>
> >>> I don’t have any nested documents. And the results are consistent
> across
> >>> multiple runs. I tried looking for similar issues in the mailing list,
> >> but
> >>> couldn’t find anything relevant . So if you do happen to find any JIRAs
> >>> addressing it that would be really helpful (thanks!).
> >>>
> >>> To Jan’s question about RTG taking more time in Solr 8.x, I can say
> with
> >>> good certainty that this is the case. Although it does look into
> >>> transaction logs first, thread dumps suggest that it is the next phase
> >>> (when it doesn't find the doc in tlog) which seems to be time
> consuming .
> >>> It tries to look up the document via the current searcher
> >>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is
> this
> >>> call where many threads are spending time:
> >>>
> >>>
> >>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
> >>>
> >>> Although this call is the same in 7.7.2 and 8.11.1 quite likely
> >>> something changed in Lucene's FST.java which is causing the slowness. I
> >> am
> >>> trying to dig further and might also ask folks on the Lucene mailing
> >> list.
> >>> Thanks.
> >>>
> >>>
> >>>
> >>> On Wed, May 31, 2023 at 11:36 AM Srijan <sh...@gmail.com> wrote:
> >>>
> >>>> I would love some profiling as well. I know 8.8 or 8.9 had some
> >> performance
> >>>> problems with atomic update but this was later addressed. I cant find
> >> the
> >>>> jira atm though. Also I am on 8.11.1 and atomic update is not an issue
> >> for
> >>>> me.
> >>>>
> >>>> By the way, do you happen to have nested docs?
> >>>>
> >>>>
> >>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <ja...@cominvent.com>
> wrote:
> >>>>
> >>>>> Hi
> >>>>>
> >>>>> MMap is most important for searching. Indexing bypasses the cache by
> >>>> using
> >>>>> direct IO.
> >>>>>
> >>>>> I have noticed slow real time get on Solr 8.x during atomic update
> >>>> myself.
> >>>>> Would be interesting with a comparison with profiling. RTG gets the
> >>>>> document from transaction log I believe? Could there be some RTG
> >> changes
> >>>> in
> >>>>> 8.x that caused such slowdown?
> >>>>>
> >>>>> Jan Høydahl
> >>>>>
> >>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <ra...@gmail.com>:
> >>>>>>
> >>>>>> Thanks for the response Shawn. We are using Windows server with
> >> pretty
> >>>>> huge
> >>>>>> indexes (multiple TB cores). With Mmap, I have observed that the
> >>>> machine
> >>>>>> just completely freezes with high CPU and memory usage to a point
> >> where
> >>>>> it
> >>>>>> becomes impossible to even connect to it. SimpleFS works out well
> for
> >>>> us
> >>>>> in
> >>>>>> this case.
> >>>>>>
> >>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes the
> >>>>> crawl
> >>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
> >>>> something
> >>>>>> OUTSIDE the directory factory in the code which is causing this.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Rahul
> >>>>>>
> >>>>>>
> >>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <apache@elyograg.org
> >
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
> >>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m
> Xmx3072m
> >>>> -
> >>>>>>>> Indexing client code running 15 parallel threads indexing in
> batches
> >>>> of
> >>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite
> work
> >>>>>>>> well on Windows for our index sizes which commonly run north of 1
> >> TB)
> >>>>>>>
> >>>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP for
> >>>> your
> >>>>>>> indexes.  Not using MMAP is likely to slow things down
> considerably.
> >>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.
> Which
> >>>> of
> >>>>>>> course requires 64-bit hardware.
> >>>>>>>
> >>>>>>> 32 bit systems and software cannot properly deal with data larger
> >> than
> >>>>>>> about 2GB.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Shawn
> >>>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>