You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Ron Haines <mi...@gmail.com> on 2023/06/14 17:25:59 UTC

Re: join query parser performance

Fyi, I am finally getting back to this.  I apologize for the delay.



I am going to try using the ‘method=topLevelDV’ option to see if that makes
a difference.  I will run same tests used below, and follow up with results.



As far as more details about this scenario:

   - Per the ‘user query’.  Some of them are quite simple, edismax,
   q=Maricopa county ethel
   - from a content point of view, updates are not happening very
   frequently.  Typically get batches of updates spread out over the course of
   the day.
   - not quite sure what you are asking for per the 'collection
   definitions'.  The main collection is about 27 million docs, across 96
   shards, 2 replicas. The fromIndex 'join' collection is quite small...about
   80k docs, single shard, but replicated across the 96 shards.
   - in the table below are the qtimes, response times, run both
   with/without using the ‘join’.  Also have resultCount, for reference.
   - it is a small test sample iof 12 queries, single-threaded,
      - Note, the qtimes…on average, for this small query set, increases
      about 40% with the join


search_qtime - no join

responseTime - no join

search_qtime - with join

responseTime - with join

resultCount

1748

3179

2834

4292

471894

1557

2865

1794

3108

332

929

2278

1261

2654

541282

813

2107

1036

2322

15347

413

1730

539

1838

42

388

1725

678

2027

313

1095

2481

1453

2821

435627

829

2263

1310

2739

299

838

2103

1081

2358

86049

1236

2610

1911

3283

77881

950

2274

1313

2661

15160

763

2066

885

2184

738

What is most concerning is the cpu increase that we see in Solr.   Here is
a more ‘concurrent' test, at about 12 qps, but it is not at a 'full'
load...maybe 50%.  This test 'held up', meaning we did not get into any
trouble.


Hope these images comes thru...but, here is a cpu profile for a 1 hour test
with no 'join' being used,


[image: image.png]

And, here is the same 1 hour test, using the 'join', run twice.  Not the
difference in 'scale' of cpu of these 2 tests vs. the one above, from a
'cores' point of view:
[image: image.png]

Like I said, I'll run these same tests with the ‘method=topLevelDV’, and
see if it changes behavior.

Thx

Ron Haines

On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Ron, how often both indices are updated? Presumably if they are static,
> filter cache may help.
> It's worth making sure that the app gives a chance to filter cache.;
> To better understand the problem it is worth taking a few treadumps under
> load: a deep stack gives a clue for hotspot (or just take a sampling
> profile). Once we know the hot spot we can think about a workaround.
> https://issues.apache.org/jira/browse/SOLR-16717 about sharding
> "fromIndex"
> https://issues.apache.org/jira/browse/SOLR-16242 about keeping "local/to"
> index cache when fromIndex is updated.
>
> On Thu, May 25, 2023 at 5:01 PM Andy Lester <an...@petdance.com> wrote:
>
> >
> >
> > > On May 25, 2023, at 7:51 AM, Ron Haines <mi...@gmail.com> wrote:
> > >
> > > So, when this feature is enabled, this negative &fq gets added:
> > > -{!join fromIndex=primary_rollup from=group_id_mv to=group_member_id
> > > score=none}${q}
> >
> >
> > Can we see collection definitions of both the source collection and the
> > join? Also, a sample query, not just the one parameter? Also, how often
> are
> > either of these collections updated? One thing that killed off an entire
> > project that we were doing was that the join table was getting updated
> > about once a minute, and this destroyed all our caching, and made the
> > queries we wanted to do unusable.
> >
> >
> > Thanks,
> > Andy
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: join query parser performance

Posted by Mikhail Khludnev <mk...@apache.org>.

They are just idling (imho), what you are looking for is a depth stack
having SearchHandler, QueryComponent, IndexSearcher lines.

On Mon, Jun 19, 2023 at 2:53 PM Ron Haines <mi...@gmail.com> wrote:

> did get a thread dump, via admin console, while full load test is running:
>
> Seeing a lot of these....most in the 300k-900k ms range....yes 300-900
> seconds.  Any chance these are just idle threads, waiting for work?
>
> qtp1278319954-13565 (13565)
>
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5a0f92c5
>
>    - java.base@17.0.7/jdk.internal.misc.Unsafe.park(Native Method)
>    - java.base@17.0.7
> /java.util.concurrent.locks.LockSupport.parkNanos(Unknown
>    Source)
>    - java.base@17.0.7
> /java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
>    Source)
>    -
>
>  org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
>    -
>
>  org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:974)
>    -
>
>  org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1018)
>    - java.base@17.0.7/java.lang.Thread.run(Unknown Source)
>    -
>


-- 
Sincerely yours
Mikhail Khludnev

Re: join query parser performance

Posted by Ron Haines <mi...@gmail.com>.

did get a thread dump, via admin console, while full load test is running:

Seeing a lot of these....most in the 300k-900k ms range....yes 300-900
seconds.  Any chance these are just idle threads, waiting for work?

qtp1278319954-13565 (13565)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5a0f92c5

   - java.base@17.0.7/jdk.internal.misc.Unsafe.park(Native Method)
   - java.base@17.0.7/java.util.concurrent.locks.LockSupport.parkNanos(Unknown
   Source)
   - java.base@17.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
   Source)
   -
   org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
   -
   org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:974)
   -
   org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1018)
   - java.base@17.0.7/java.lang.Thread.run(Unknown Source)
   -

Re: join query parser performance

Posted by Mikhail Khludnev <mk...@apache.org>.

Just for the reference. This is how it can be solved with index time join.
If we put full copies of group_member docs as children. Then we can search
for parent docs and remove joined results.
 eg q={!v=$mainq} -{!v=$grpmembrz} -{!parent .. filters=$grpmembrz
v=$mainq}
There are a few details and you see how cumbersome updates will be, but if
docs are not really huge and update rate is moderate, it might be a way to
go instead of query time join.

On Thu, Jun 15, 2023 at 3:08 PM Ron Haines <mi...@gmail.com> wrote:

> yes, we would return 'D'.
>
> So, are you asking why not just do the join in the main index?  I started
> that way, then realized that a document that 'belongs' to another doc both
> need to be on the same shard for the join to work.  That's when I moved to
> the 'fromIndex' approach and created the small 'fromIndex' collection (uner
> 200k docs), single-sharded, replicated across all of the shards of the main
> collection.
>
> On Thu, Jun 15, 2023 at 5:57 AM Mikhail Khludnev <mk...@apache.org> wrote:
>
> > Thanks for the clarification, Ron.
> > Why the membership is extracted into a separate index?
> > Join is heavy anyway, but run it cross core is even more heavier.
> >
> > Example you give is not really specific. I can implement it via
> > fq=-group_member_id:*
> >
> > Let's extend it
> > doc#   group_id.  group_member_id
> > 1.            A.                 C
> > 2.            B                  -
> > 3.            C                  -
> > 4.            D                 *G*
> > 5.            E                   B
> > 6.            F.                   -
> > 7.            G
> >
> > So, if a user runs a query that finds docs A,B,C,D,E,F. (not G)
> > Should it return D?
> >
> >
> > On Thu, Jun 15, 2023 at 6:01 AM Ron Haines <mi...@gmail.com> wrote:
> >
> > > adding more context as to why we are using the 'join'.
> > >
> > > We have a collection of documents where all documents have a 'group_id'
> > > (which is essentially the doc's id).  And, some docs have a
> > > 'group_member_id' that indicates if that doc belongs to a 'group_id'.
> > For
> > > example:
> > >
> > > doc#   group_id.  group_member_id
> > > 1.            A.                 C
> > > 2.            B                  -
> > > 3.            C                  -
> > > 4.            D                  C
> > > 5.            E                   B
> > > 6.            F.                   -
> > >
> > > So, if a user runs a query that finds docs A,B,C,D,E,F we do not want
> to
> > > include any of the documents that belong to any of the group_id's.  So,
> > for
> > > this search we really want a result count of 3 (docs B, C, F).
> > > We want to exclude:
> > > A because it belongs to C
> > > D because it belongs to C
> > > E because it belongs to B
> > >
> > > This negative 'join' &fq is how we are excluding these docs.  Note
> that a
> > > document can 'belong' to more than 1 document.  So, yes, it does affect
> > the
> > > result count, if that was a question.
> > >
> > > Thanks for the suggestions.  I still have to run the test with the
> > > 'method=topLevelDv', and I will pursue getting ThreadDumps.  Thx.  More
> > to
> > > come....
> > >
> > > On Wed, Jun 14, 2023 at 4:26 PM Mikhail Khludnev <mk...@apache.org>
> > wrote:
> > >
> > > > Note: images are shredded in the mailing list.
> > > > Well, if we apply heavy operation (join) it's reasonable that it warm
> > > CPU.
> > > > It should impact number of results. Does it?
> > > > Overall, the usage seems non-typical: query looks like role based
> > access
> > > > control (or group membership problem), but has dismax as a sub-query.
> > > Can't
> > > > docs be remodelled somehow in a more efficient manner?
> > > > It's worth understanding what keeps CPU busy, usually a few thread
> > dumps
> > > > under load gives a useful clue.
> > > > Also, if "to" side is huge and highly sharded, and "from" is small,
> and
> > > > updates are rare, index-time join via {!parent} may work well.
> Caveat -
> > > it
> > > > may be cumbersome..
> > > > PS, I suggested two jiras earlier, I don't think they are applicable
> > > here.
> > > >
> > > > On Wed, Jun 14, 2023 at 8:26 PM Ron Haines <mi...@gmail.com>
> wrote:
> > > >
> > > > > Fyi, I am finally getting back to this.  I apologize for the delay.
> > > > >
> > > > >
> > > > >
> > > > > I am going to try using the ‘method=topLevelDV’ option to see if
> that
> > > > > makes a difference.  I will run same tests used below, and follow
> up
> > > with
> > > > > results.
> > > > >
> > > > >
> > > > >
> > > > > As far as more details about this scenario:
> > > > >
> > > > >    - Per the ‘user query’.  Some of them are quite simple, edismax,
> > > > >    q=Maricopa county ethel
> > > > >    - from a content point of view, updates are not happening very
> > > > >    frequently.  Typically get batches of updates spread out over
> the
> > > > course of
> > > > >    the day.
> > > > >    - not quite sure what you are asking for per the 'collection
> > > > >    definitions'.  The main collection is about 27 million docs,
> > across
> > > 96
> > > > >    shards, 2 replicas. The fromIndex 'join' collection is quite
> > > > small...about
> > > > >    80k docs, single shard, but replicated across the 96 shards.
> > > > >    - in the table below are the qtimes, response times, run both
> > > > >    with/without using the ‘join’.  Also have resultCount, for
> > > reference.
> > > > >    - it is a small test sample iof 12 queries, single-threaded,
> > > > >       - Note, the qtimes…on average, for this small query set,
> > > increases
> > > > >       about 40% with the join
> > > > >
> > > > >
> > > > > search_qtime - no join
> > > > >
> > > > > responseTime - no join
> > > > >
> > > > > search_qtime - with join
> > > > >
> > > > > responseTime - with join
> > > > >
> > > > > resultCount
> > > > >
> > > > > 1748
> > > > >
> > > > > 3179
> > > > >
> > > > > 2834
> > > > >
> > > > > 4292
> > > > >
> > > > > 471894
> > > > >
> > > > > 1557
> > > > >
> > > > > 2865
> > > > >
> > > > > 1794
> > > > >
> > > > > 3108
> > > > >
> > > > > 332
> > > > >
> > > > > 929
> > > > >
> > > > > 2278
> > > > >
> > > > > 1261
> > > > >
> > > > > 2654
> > > > >
> > > > > 541282
> > > > >
> > > > > 813
> > > > >
> > > > > 2107
> > > > >
> > > > > 1036
> > > > >
> > > > > 2322
> > > > >
> > > > > 15347
> > > > >
> > > > > 413
> > > > >
> > > > > 1730
> > > > >
> > > > > 539
> > > > >
> > > > > 1838
> > > > >
> > > > > 42
> > > > >
> > > > > 388
> > > > >
> > > > > 1725
> > > > >
> > > > > 678
> > > > >
> > > > > 2027
> > > > >
> > > > > 313
> > > > >
> > > > > 1095
> > > > >
> > > > > 2481
> > > > >
> > > > > 1453
> > > > >
> > > > > 2821
> > > > >
> > > > > 435627
> > > > >
> > > > > 829
> > > > >
> > > > > 2263
> > > > >
> > > > > 1310
> > > > >
> > > > > 2739
> > > > >
> > > > > 299
> > > > >
> > > > > 838
> > > > >
> > > > > 2103
> > > > >
> > > > > 1081
> > > > >
> > > > > 2358
> > > > >
> > > > > 86049
> > > > >
> > > > > 1236
> > > > >
> > > > > 2610
> > > > >
> > > > > 1911
> > > > >
> > > > > 3283
> > > > >
> > > > > 77881
> > > > >
> > > > > 950
> > > > >
> > > > > 2274
> > > > >
> > > > > 1313
> > > > >
> > > > > 2661
> > > > >
> > > > > 15160
> > > > >
> > > > > 763
> > > > >
> > > > > 2066
> > > > >
> > > > > 885
> > > > >
> > > > > 2184
> > > > >
> > > > > 738
> > > > >
> > > > > What is most concerning is the cpu increase that we see in Solr.
> >  Here
> > > > is
> > > > > a more ‘concurrent' test, at about 12 qps, but it is not at a
> 'full'
> > > > > load...maybe 50%.  This test 'held up', meaning we did not get into
> > any
> > > > > trouble.
> > > > >
> > > > >
> > > > > Hope these images comes thru...but, here is a cpu profile for a 1
> > hour
> > > > > test with no 'join' being used,
> > > > >
> > > > >
> > > > > [image: image.png]
> > > > >
> > > > > And, here is the same 1 hour test, using the 'join', run twice.
> Not
> > > the
> > > > > difference in 'scale' of cpu of these 2 tests vs. the one above,
> > from a
> > > > > 'cores' point of view:
> > > > > [image: image.png]
> > > > >
> > > > > Like I said, I'll run these same tests with the
> ‘method=topLevelDV’,
> > > and
> > > > > see if it changes behavior.
> > > > >
> > > > > Thx
> > > > >
> > > > > Ron Haines
> > > > >
> > > > > On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <mk...@apache.org>
> > > > wrote:
> > > > >
> > > > >> Ron, how often both indices are updated? Presumably if they are
> > > static,
> > > > >> filter cache may help.
> > > > >> It's worth making sure that the app gives a chance to filter
> cache.;
> > > > >> To better understand the problem it is worth taking a few
> treadumps
> > > > under
> > > > >> load: a deep stack gives a clue for hotspot (or just take a
> sampling
> > > > >> profile). Once we know the hot spot we can think about a
> workaround.
> > > > >> https://issues.apache.org/jira/browse/SOLR-16717 about sharding
> > > > >> "fromIndex"
> > > > >> https://issues.apache.org/jira/browse/SOLR-16242 about keeping
> > > > "local/to"
> > > > >> index cache when fromIndex is updated.
> > > > >>
> > > > >> On Thu, May 25, 2023 at 5:01 PM Andy Lester <an...@petdance.com>
> > > wrote:
> > > > >>
> > > > >> >
> > > > >> >
> > > > >> > > On May 25, 2023, at 7:51 AM, Ron Haines <mi...@gmail.com>
> > > wrote:
> > > > >> > >
> > > > >> > > So, when this feature is enabled, this negative &fq gets
> added:
> > > > >> > > -{!join fromIndex=primary_rollup from=group_id_mv
> > > to=group_member_id
> > > > >> > > score=none}${q}
> > > > >> >
> > > > >> >
> > > > >> > Can we see collection definitions of both the source collection
> > and
> > > > the
> > > > >> > join? Also, a sample query, not just the one parameter? Also,
> how
> > > > often
> > > > >> are
> > > > >> > either of these collections updated? One thing that killed off
> an
> > > > entire
> > > > >> > project that we were doing was that the join table was getting
> > > updated
> > > > >> > about once a minute, and this destroyed all our caching, and
> made
> > > the
> > > > >> > queries we wanted to do unusable.
> > > > >> >
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Andy
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Sincerely yours
> > > > >> Mikhail Khludnev
> > > > >>
> > > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > >
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev

Re: join query parser performance

Posted by Ron Haines <mi...@gmail.com>.

yes, we would return 'D'.

So, are you asking why not just do the join in the main index?  I started
that way, then realized that a document that 'belongs' to another doc both
need to be on the same shard for the join to work.  That's when I moved to
the 'fromIndex' approach and created the small 'fromIndex' collection (uner
200k docs), single-sharded, replicated across all of the shards of the main
collection.

On Thu, Jun 15, 2023 at 5:57 AM Mikhail Khludnev <mk...@apache.org> wrote:

> Thanks for the clarification, Ron.
> Why the membership is extracted into a separate index?
> Join is heavy anyway, but run it cross core is even more heavier.
>
> Example you give is not really specific. I can implement it via
> fq=-group_member_id:*
>
> Let's extend it
> doc#   group_id.  group_member_id
> 1.            A.                 C
> 2.            B                  -
> 3.            C                  -
> 4.            D                 *G*
> 5.            E                   B
> 6.            F.                   -
> 7.            G
>
> So, if a user runs a query that finds docs A,B,C,D,E,F. (not G)
> Should it return D?
>
>
> On Thu, Jun 15, 2023 at 6:01 AM Ron Haines <mi...@gmail.com> wrote:
>
> > adding more context as to why we are using the 'join'.
> >
> > We have a collection of documents where all documents have a 'group_id'
> > (which is essentially the doc's id).  And, some docs have a
> > 'group_member_id' that indicates if that doc belongs to a 'group_id'.
> For
> > example:
> >
> > doc#   group_id.  group_member_id
> > 1.            A.                 C
> > 2.            B                  -
> > 3.            C                  -
> > 4.            D                  C
> > 5.            E                   B
> > 6.            F.                   -
> >
> > So, if a user runs a query that finds docs A,B,C,D,E,F we do not want to
> > include any of the documents that belong to any of the group_id's.  So,
> for
> > this search we really want a result count of 3 (docs B, C, F).
> > We want to exclude:
> > A because it belongs to C
> > D because it belongs to C
> > E because it belongs to B
> >
> > This negative 'join' &fq is how we are excluding these docs.  Note that a
> > document can 'belong' to more than 1 document.  So, yes, it does affect
> the
> > result count, if that was a question.
> >
> > Thanks for the suggestions.  I still have to run the test with the
> > 'method=topLevelDv', and I will pursue getting ThreadDumps.  Thx.  More
> to
> > come....
> >
> > On Wed, Jun 14, 2023 at 4:26 PM Mikhail Khludnev <mk...@apache.org>
> wrote:
> >
> > > Note: images are shredded in the mailing list.
> > > Well, if we apply heavy operation (join) it's reasonable that it warm
> > CPU.
> > > It should impact number of results. Does it?
> > > Overall, the usage seems non-typical: query looks like role based
> access
> > > control (or group membership problem), but has dismax as a sub-query.
> > Can't
> > > docs be remodelled somehow in a more efficient manner?
> > > It's worth understanding what keeps CPU busy, usually a few thread
> dumps
> > > under load gives a useful clue.
> > > Also, if "to" side is huge and highly sharded, and "from" is small, and
> > > updates are rare, index-time join via {!parent} may work well. Caveat -
> > it
> > > may be cumbersome..
> > > PS, I suggested two jiras earlier, I don't think they are applicable
> > here.
> > >
> > > On Wed, Jun 14, 2023 at 8:26 PM Ron Haines <mi...@gmail.com> wrote:
> > >
> > > > Fyi, I am finally getting back to this.  I apologize for the delay.
> > > >
> > > >
> > > >
> > > > I am going to try using the ‘method=topLevelDV’ option to see if that
> > > > makes a difference.  I will run same tests used below, and follow up
> > with
> > > > results.
> > > >
> > > >
> > > >
> > > > As far as more details about this scenario:
> > > >
> > > >    - Per the ‘user query’.  Some of them are quite simple, edismax,
> > > >    q=Maricopa county ethel
> > > >    - from a content point of view, updates are not happening very
> > > >    frequently.  Typically get batches of updates spread out over the
> > > course of
> > > >    the day.
> > > >    - not quite sure what you are asking for per the 'collection
> > > >    definitions'.  The main collection is about 27 million docs,
> across
> > 96
> > > >    shards, 2 replicas. The fromIndex 'join' collection is quite
> > > small...about
> > > >    80k docs, single shard, but replicated across the 96 shards.
> > > >    - in the table below are the qtimes, response times, run both
> > > >    with/without using the ‘join’.  Also have resultCount, for
> > reference.
> > > >    - it is a small test sample iof 12 queries, single-threaded,
> > > >       - Note, the qtimes…on average, for this small query set,
> > increases
> > > >       about 40% with the join
> > > >
> > > >
> > > > search_qtime - no join
> > > >
> > > > responseTime - no join
> > > >
> > > > search_qtime - with join
> > > >
> > > > responseTime - with join
> > > >
> > > > resultCount
> > > >
> > > > 1748
> > > >
> > > > 3179
> > > >
> > > > 2834
> > > >
> > > > 4292
> > > >
> > > > 471894
> > > >
> > > > 1557
> > > >
> > > > 2865
> > > >
> > > > 1794
> > > >
> > > > 3108
> > > >
> > > > 332
> > > >
> > > > 929
> > > >
> > > > 2278
> > > >
> > > > 1261
> > > >
> > > > 2654
> > > >
> > > > 541282
> > > >
> > > > 813
> > > >
> > > > 2107
> > > >
> > > > 1036
> > > >
> > > > 2322
> > > >
> > > > 15347
> > > >
> > > > 413
> > > >
> > > > 1730
> > > >
> > > > 539
> > > >
> > > > 1838
> > > >
> > > > 42
> > > >
> > > > 388
> > > >
> > > > 1725
> > > >
> > > > 678
> > > >
> > > > 2027
> > > >
> > > > 313
> > > >
> > > > 1095
> > > >
> > > > 2481
> > > >
> > > > 1453
> > > >
> > > > 2821
> > > >
> > > > 435627
> > > >
> > > > 829
> > > >
> > > > 2263
> > > >
> > > > 1310
> > > >
> > > > 2739
> > > >
> > > > 299
> > > >
> > > > 838
> > > >
> > > > 2103
> > > >
> > > > 1081
> > > >
> > > > 2358
> > > >
> > > > 86049
> > > >
> > > > 1236
> > > >
> > > > 2610
> > > >
> > > > 1911
> > > >
> > > > 3283
> > > >
> > > > 77881
> > > >
> > > > 950
> > > >
> > > > 2274
> > > >
> > > > 1313
> > > >
> > > > 2661
> > > >
> > > > 15160
> > > >
> > > > 763
> > > >
> > > > 2066
> > > >
> > > > 885
> > > >
> > > > 2184
> > > >
> > > > 738
> > > >
> > > > What is most concerning is the cpu increase that we see in Solr.
>  Here
> > > is
> > > > a more ‘concurrent' test, at about 12 qps, but it is not at a 'full'
> > > > load...maybe 50%.  This test 'held up', meaning we did not get into
> any
> > > > trouble.
> > > >
> > > >
> > > > Hope these images comes thru...but, here is a cpu profile for a 1
> hour
> > > > test with no 'join' being used,
> > > >
> > > >
> > > > [image: image.png]
> > > >
> > > > And, here is the same 1 hour test, using the 'join', run twice.  Not
> > the
> > > > difference in 'scale' of cpu of these 2 tests vs. the one above,
> from a
> > > > 'cores' point of view:
> > > > [image: image.png]
> > > >
> > > > Like I said, I'll run these same tests with the ‘method=topLevelDV’,
> > and
> > > > see if it changes behavior.
> > > >
> > > > Thx
> > > >
> > > > Ron Haines
> > > >
> > > > On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <mk...@apache.org>
> > > wrote:
> > > >
> > > >> Ron, how often both indices are updated? Presumably if they are
> > static,
> > > >> filter cache may help.
> > > >> It's worth making sure that the app gives a chance to filter cache.;
> > > >> To better understand the problem it is worth taking a few treadumps
> > > under
> > > >> load: a deep stack gives a clue for hotspot (or just take a sampling
> > > >> profile). Once we know the hot spot we can think about a workaround.
> > > >> https://issues.apache.org/jira/browse/SOLR-16717 about sharding
> > > >> "fromIndex"
> > > >> https://issues.apache.org/jira/browse/SOLR-16242 about keeping
> > > "local/to"
> > > >> index cache when fromIndex is updated.
> > > >>
> > > >> On Thu, May 25, 2023 at 5:01 PM Andy Lester <an...@petdance.com>
> > wrote:
> > > >>
> > > >> >
> > > >> >
> > > >> > > On May 25, 2023, at 7:51 AM, Ron Haines <mi...@gmail.com>
> > wrote:
> > > >> > >
> > > >> > > So, when this feature is enabled, this negative &fq gets added:
> > > >> > > -{!join fromIndex=primary_rollup from=group_id_mv
> > to=group_member_id
> > > >> > > score=none}${q}
> > > >> >
> > > >> >
> > > >> > Can we see collection definitions of both the source collection
> and
> > > the
> > > >> > join? Also, a sample query, not just the one parameter? Also, how
> > > often
> > > >> are
> > > >> > either of these collections updated? One thing that killed off an
> > > entire
> > > >> > project that we were doing was that the join table was getting
> > updated
> > > >> > about once a minute, and this destroyed all our caching, and made
> > the
> > > >> > queries we wanted to do unusable.
> > > >> >
> > > >> >
> > > >> > Thanks,
> > > >> > Andy
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Sincerely yours
> > > >> Mikhail Khludnev
> > > >>
> > > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: join query parser performance

Posted by Mikhail Khludnev <mk...@apache.org>.

Thanks for the clarification, Ron.
Why the membership is extracted into a separate index?
Join is heavy anyway, but run it cross core is even more heavier.

Example you give is not really specific. I can implement it via
fq=-group_member_id:*

Let's extend it
doc#   group_id.  group_member_id
1.            A.                 C
2.            B                  -
3.            C                  -
4.            D                 *G*
5.            E                   B
6.            F.                   -
7.            G

So, if a user runs a query that finds docs A,B,C,D,E,F. (not G)
Should it return D?


On Thu, Jun 15, 2023 at 6:01 AM Ron Haines <mi...@gmail.com> wrote:

> adding more context as to why we are using the 'join'.
>
> We have a collection of documents where all documents have a 'group_id'
> (which is essentially the doc's id).  And, some docs have a
> 'group_member_id' that indicates if that doc belongs to a 'group_id'.  For
> example:
>
> doc#   group_id.  group_member_id
> 1.            A.                 C
> 2.            B                  -
> 3.            C                  -
> 4.            D                  C
> 5.            E                   B
> 6.            F.                   -
>
> So, if a user runs a query that finds docs A,B,C,D,E,F we do not want to
> include any of the documents that belong to any of the group_id's.  So, for
> this search we really want a result count of 3 (docs B, C, F).
> We want to exclude:
> A because it belongs to C
> D because it belongs to C
> E because it belongs to B
>
> This negative 'join' &fq is how we are excluding these docs.  Note that a
> document can 'belong' to more than 1 document.  So, yes, it does affect the
> result count, if that was a question.
>
> Thanks for the suggestions.  I still have to run the test with the
> 'method=topLevelDv', and I will pursue getting ThreadDumps.  Thx.  More to
> come....
>
> On Wed, Jun 14, 2023 at 4:26 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
> > Note: images are shredded in the mailing list.
> > Well, if we apply heavy operation (join) it's reasonable that it warm
> CPU.
> > It should impact number of results. Does it?
> > Overall, the usage seems non-typical: query looks like role based access
> > control (or group membership problem), but has dismax as a sub-query.
> Can't
> > docs be remodelled somehow in a more efficient manner?
> > It's worth understanding what keeps CPU busy, usually a few thread dumps
> > under load gives a useful clue.
> > Also, if "to" side is huge and highly sharded, and "from" is small, and
> > updates are rare, index-time join via {!parent} may work well. Caveat -
> it
> > may be cumbersome..
> > PS, I suggested two jiras earlier, I don't think they are applicable
> here.
> >
> > On Wed, Jun 14, 2023 at 8:26 PM Ron Haines <mi...@gmail.com> wrote:
> >
> > > Fyi, I am finally getting back to this.  I apologize for the delay.
> > >
> > >
> > >
> > > I am going to try using the ‘method=topLevelDV’ option to see if that
> > > makes a difference.  I will run same tests used below, and follow up
> with
> > > results.
> > >
> > >
> > >
> > > As far as more details about this scenario:
> > >
> > >    - Per the ‘user query’.  Some of them are quite simple, edismax,
> > >    q=Maricopa county ethel
> > >    - from a content point of view, updates are not happening very
> > >    frequently.  Typically get batches of updates spread out over the
> > course of
> > >    the day.
> > >    - not quite sure what you are asking for per the 'collection
> > >    definitions'.  The main collection is about 27 million docs, across
> 96
> > >    shards, 2 replicas. The fromIndex 'join' collection is quite
> > small...about
> > >    80k docs, single shard, but replicated across the 96 shards.
> > >    - in the table below are the qtimes, response times, run both
> > >    with/without using the ‘join’.  Also have resultCount, for
> reference.
> > >    - it is a small test sample iof 12 queries, single-threaded,
> > >       - Note, the qtimes…on average, for this small query set,
> increases
> > >       about 40% with the join
> > >
> > >
> > > search_qtime - no join
> > >
> > > responseTime - no join
> > >
> > > search_qtime - with join
> > >
> > > responseTime - with join
> > >
> > > resultCount
> > >
> > > 1748
> > >
> > > 3179
> > >
> > > 2834
> > >
> > > 4292
> > >
> > > 471894
> > >
> > > 1557
> > >
> > > 2865
> > >
> > > 1794
> > >
> > > 3108
> > >
> > > 332
> > >
> > > 929
> > >
> > > 2278
> > >
> > > 1261
> > >
> > > 2654
> > >
> > > 541282
> > >
> > > 813
> > >
> > > 2107
> > >
> > > 1036
> > >
> > > 2322
> > >
> > > 15347
> > >
> > > 413
> > >
> > > 1730
> > >
> > > 539
> > >
> > > 1838
> > >
> > > 42
> > >
> > > 388
> > >
> > > 1725
> > >
> > > 678
> > >
> > > 2027
> > >
> > > 313
> > >
> > > 1095
> > >
> > > 2481
> > >
> > > 1453
> > >
> > > 2821
> > >
> > > 435627
> > >
> > > 829
> > >
> > > 2263
> > >
> > > 1310
> > >
> > > 2739
> > >
> > > 299
> > >
> > > 838
> > >
> > > 2103
> > >
> > > 1081
> > >
> > > 2358
> > >
> > > 86049
> > >
> > > 1236
> > >
> > > 2610
> > >
> > > 1911
> > >
> > > 3283
> > >
> > > 77881
> > >
> > > 950
> > >
> > > 2274
> > >
> > > 1313
> > >
> > > 2661
> > >
> > > 15160
> > >
> > > 763
> > >
> > > 2066
> > >
> > > 885
> > >
> > > 2184
> > >
> > > 738
> > >
> > > What is most concerning is the cpu increase that we see in Solr.   Here
> > is
> > > a more ‘concurrent' test, at about 12 qps, but it is not at a 'full'
> > > load...maybe 50%.  This test 'held up', meaning we did not get into any
> > > trouble.
> > >
> > >
> > > Hope these images comes thru...but, here is a cpu profile for a 1 hour
> > > test with no 'join' being used,
> > >
> > >
> > > [image: image.png]
> > >
> > > And, here is the same 1 hour test, using the 'join', run twice.  Not
> the
> > > difference in 'scale' of cpu of these 2 tests vs. the one above, from a
> > > 'cores' point of view:
> > > [image: image.png]
> > >
> > > Like I said, I'll run these same tests with the ‘method=topLevelDV’,
> and
> > > see if it changes behavior.
> > >
> > > Thx
> > >
> > > Ron Haines
> > >
> > > On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <mk...@apache.org>
> > wrote:
> > >
> > >> Ron, how often both indices are updated? Presumably if they are
> static,
> > >> filter cache may help.
> > >> It's worth making sure that the app gives a chance to filter cache.;
> > >> To better understand the problem it is worth taking a few treadumps
> > under
> > >> load: a deep stack gives a clue for hotspot (or just take a sampling
> > >> profile). Once we know the hot spot we can think about a workaround.
> > >> https://issues.apache.org/jira/browse/SOLR-16717 about sharding
> > >> "fromIndex"
> > >> https://issues.apache.org/jira/browse/SOLR-16242 about keeping
> > "local/to"
> > >> index cache when fromIndex is updated.
> > >>
> > >> On Thu, May 25, 2023 at 5:01 PM Andy Lester <an...@petdance.com>
> wrote:
> > >>
> > >> >
> > >> >
> > >> > > On May 25, 2023, at 7:51 AM, Ron Haines <mi...@gmail.com>
> wrote:
> > >> > >
> > >> > > So, when this feature is enabled, this negative &fq gets added:
> > >> > > -{!join fromIndex=primary_rollup from=group_id_mv
> to=group_member_id
> > >> > > score=none}${q}
> > >> >
> > >> >
> > >> > Can we see collection definitions of both the source collection and
> > the
> > >> > join? Also, a sample query, not just the one parameter? Also, how
> > often
> > >> are
> > >> > either of these collections updated? One thing that killed off an
> > entire
> > >> > project that we were doing was that the join table was getting
> updated
> > >> > about once a minute, and this destroyed all our caching, and made
> the
> > >> > queries we wanted to do unusable.
> > >> >
> > >> >
> > >> > Thanks,
> > >> > Andy
> > >>
> > >>
> > >>
> > >> --
> > >> Sincerely yours
> > >> Mikhail Khludnev
> > >>
> > >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev

Re: join query parser performance

Posted by Ron Haines <mi...@gmail.com>.

adding more context as to why we are using the 'join'.

We have a collection of documents where all documents have a 'group_id'
(which is essentially the doc's id).  And, some docs have a
'group_member_id' that indicates if that doc belongs to a 'group_id'.  For
example:

doc#   group_id.  group_member_id
1.            A.                 C
2.            B                  -
3.            C                  -
4.            D                  C
5.            E                   B
6.            F.                   -

So, if a user runs a query that finds docs A,B,C,D,E,F we do not want to
include any of the documents that belong to any of the group_id's.  So, for
this search we really want a result count of 3 (docs B, C, F).
We want to exclude:
A because it belongs to C
D because it belongs to C
E because it belongs to B

This negative 'join' &fq is how we are excluding these docs.  Note that a
document can 'belong' to more than 1 document.  So, yes, it does affect the
result count, if that was a question.

Thanks for the suggestions.  I still have to run the test with the
'method=topLevelDv', and I will pursue getting ThreadDumps.  Thx.  More to
come....

On Wed, Jun 14, 2023 at 4:26 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Note: images are shredded in the mailing list.
> Well, if we apply heavy operation (join) it's reasonable that it warm CPU.
> It should impact number of results. Does it?
> Overall, the usage seems non-typical: query looks like role based access
> control (or group membership problem), but has dismax as a sub-query. Can't
> docs be remodelled somehow in a more efficient manner?
> It's worth understanding what keeps CPU busy, usually a few thread dumps
> under load gives a useful clue.
> Also, if "to" side is huge and highly sharded, and "from" is small, and
> updates are rare, index-time join via {!parent} may work well. Caveat - it
> may be cumbersome..
> PS, I suggested two jiras earlier, I don't think they are applicable here.
>
> On Wed, Jun 14, 2023 at 8:26 PM Ron Haines <mi...@gmail.com> wrote:
>
> > Fyi, I am finally getting back to this.  I apologize for the delay.
> >
> >
> >
> > I am going to try using the ‘method=topLevelDV’ option to see if that
> > makes a difference.  I will run same tests used below, and follow up with
> > results.
> >
> >
> >
> > As far as more details about this scenario:
> >
> >    - Per the ‘user query’.  Some of them are quite simple, edismax,
> >    q=Maricopa county ethel
> >    - from a content point of view, updates are not happening very
> >    frequently.  Typically get batches of updates spread out over the
> course of
> >    the day.
> >    - not quite sure what you are asking for per the 'collection
> >    definitions'.  The main collection is about 27 million docs, across 96
> >    shards, 2 replicas. The fromIndex 'join' collection is quite
> small...about
> >    80k docs, single shard, but replicated across the 96 shards.
> >    - in the table below are the qtimes, response times, run both
> >    with/without using the ‘join’.  Also have resultCount, for reference.
> >    - it is a small test sample iof 12 queries, single-threaded,
> >       - Note, the qtimes…on average, for this small query set, increases
> >       about 40% with the join
> >
> >
> > search_qtime - no join
> >
> > responseTime - no join
> >
> > search_qtime - with join
> >
> > responseTime - with join
> >
> > resultCount
> >
> > 1748
> >
> > 3179
> >
> > 2834
> >
> > 4292
> >
> > 471894
> >
> > 1557
> >
> > 2865
> >
> > 1794
> >
> > 3108
> >
> > 332
> >
> > 929
> >
> > 2278
> >
> > 1261
> >
> > 2654
> >
> > 541282
> >
> > 813
> >
> > 2107
> >
> > 1036
> >
> > 2322
> >
> > 15347
> >
> > 413
> >
> > 1730
> >
> > 539
> >
> > 1838
> >
> > 42
> >
> > 388
> >
> > 1725
> >
> > 678
> >
> > 2027
> >
> > 313
> >
> > 1095
> >
> > 2481
> >
> > 1453
> >
> > 2821
> >
> > 435627
> >
> > 829
> >
> > 2263
> >
> > 1310
> >
> > 2739
> >
> > 299
> >
> > 838
> >
> > 2103
> >
> > 1081
> >
> > 2358
> >
> > 86049
> >
> > 1236
> >
> > 2610
> >
> > 1911
> >
> > 3283
> >
> > 77881
> >
> > 950
> >
> > 2274
> >
> > 1313
> >
> > 2661
> >
> > 15160
> >
> > 763
> >
> > 2066
> >
> > 885
> >
> > 2184
> >
> > 738
> >
> > What is most concerning is the cpu increase that we see in Solr.   Here
> is
> > a more ‘concurrent' test, at about 12 qps, but it is not at a 'full'
> > load...maybe 50%.  This test 'held up', meaning we did not get into any
> > trouble.
> >
> >
> > Hope these images comes thru...but, here is a cpu profile for a 1 hour
> > test with no 'join' being used,
> >
> >
> > [image: image.png]
> >
> > And, here is the same 1 hour test, using the 'join', run twice.  Not the
> > difference in 'scale' of cpu of these 2 tests vs. the one above, from a
> > 'cores' point of view:
> > [image: image.png]
> >
> > Like I said, I'll run these same tests with the ‘method=topLevelDV’, and
> > see if it changes behavior.
> >
> > Thx
> >
> > Ron Haines
> >
> > On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <mk...@apache.org>
> wrote:
> >
> >> Ron, how often both indices are updated? Presumably if they are static,
> >> filter cache may help.
> >> It's worth making sure that the app gives a chance to filter cache.;
> >> To better understand the problem it is worth taking a few treadumps
> under
> >> load: a deep stack gives a clue for hotspot (or just take a sampling
> >> profile). Once we know the hot spot we can think about a workaround.
> >> https://issues.apache.org/jira/browse/SOLR-16717 about sharding
> >> "fromIndex"
> >> https://issues.apache.org/jira/browse/SOLR-16242 about keeping
> "local/to"
> >> index cache when fromIndex is updated.
> >>
> >> On Thu, May 25, 2023 at 5:01 PM Andy Lester <an...@petdance.com> wrote:
> >>
> >> >
> >> >
> >> > > On May 25, 2023, at 7:51 AM, Ron Haines <mi...@gmail.com> wrote:
> >> > >
> >> > > So, when this feature is enabled, this negative &fq gets added:
> >> > > -{!join fromIndex=primary_rollup from=group_id_mv to=group_member_id
> >> > > score=none}${q}
> >> >
> >> >
> >> > Can we see collection definitions of both the source collection and
> the
> >> > join? Also, a sample query, not just the one parameter? Also, how
> often
> >> are
> >> > either of these collections updated? One thing that killed off an
> entire
> >> > project that we were doing was that the join table was getting updated
> >> > about once a minute, and this destroyed all our caching, and made the
> >> > queries we wanted to do unusable.
> >> >
> >> >
> >> > Thanks,
> >> > Andy
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >>
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: join query parser performance

Posted by Mikhail Khludnev <mk...@apache.org>.

Note: images are shredded in the mailing list.
Well, if we apply heavy operation (join) it's reasonable that it warm CPU.
It should impact number of results. Does it?
Overall, the usage seems non-typical: query looks like role based access
control (or group membership problem), but has dismax as a sub-query. Can't
docs be remodelled somehow in a more efficient manner?
It's worth understanding what keeps CPU busy, usually a few thread dumps
under load gives a useful clue.
Also, if "to" side is huge and highly sharded, and "from" is small, and
updates are rare, index-time join via {!parent} may work well. Caveat - it
may be cumbersome..
PS, I suggested two jiras earlier, I don't think they are applicable here.

On Wed, Jun 14, 2023 at 8:26 PM Ron Haines <mi...@gmail.com> wrote:

> Fyi, I am finally getting back to this.  I apologize for the delay.
>
>
>
> I am going to try using the ‘method=topLevelDV’ option to see if that
> makes a difference.  I will run same tests used below, and follow up with
> results.
>
>
>
> As far as more details about this scenario:
>
>    - Per the ‘user query’.  Some of them are quite simple, edismax,
>    q=Maricopa county ethel
>    - from a content point of view, updates are not happening very
>    frequently.  Typically get batches of updates spread out over the course of
>    the day.
>    - not quite sure what you are asking for per the 'collection
>    definitions'.  The main collection is about 27 million docs, across 96
>    shards, 2 replicas. The fromIndex 'join' collection is quite small...about
>    80k docs, single shard, but replicated across the 96 shards.
>    - in the table below are the qtimes, response times, run both
>    with/without using the ‘join’.  Also have resultCount, for reference.
>    - it is a small test sample iof 12 queries, single-threaded,
>       - Note, the qtimes…on average, for this small query set, increases
>       about 40% with the join
>
>
> search_qtime - no join
>
> responseTime - no join
>
> search_qtime - with join
>
> responseTime - with join
>
> resultCount
>
> 1748
>
> 3179
>
> 2834
>
> 4292
>
> 471894
>
> 1557
>
> 2865
>
> 1794
>
> 3108
>
> 332
>
> 929
>
> 2278
>
> 1261
>
> 2654
>
> 541282
>
> 813
>
> 2107
>
> 1036
>
> 2322
>
> 15347
>
> 413
>
> 1730
>
> 539
>
> 1838
>
> 42
>
> 388
>
> 1725
>
> 678
>
> 2027
>
> 313
>
> 1095
>
> 2481
>
> 1453
>
> 2821
>
> 435627
>
> 829
>
> 2263
>
> 1310
>
> 2739
>
> 299
>
> 838
>
> 2103
>
> 1081
>
> 2358
>
> 86049
>
> 1236
>
> 2610
>
> 1911
>
> 3283
>
> 77881
>
> 950
>
> 2274
>
> 1313
>
> 2661
>
> 15160
>
> 763
>
> 2066
>
> 885
>
> 2184
>
> 738
>
> What is most concerning is the cpu increase that we see in Solr.   Here is
> a more ‘concurrent' test, at about 12 qps, but it is not at a 'full'
> load...maybe 50%.  This test 'held up', meaning we did not get into any
> trouble.
>
>
> Hope these images comes thru...but, here is a cpu profile for a 1 hour
> test with no 'join' being used,
>
>
> [image: image.png]
>
> And, here is the same 1 hour test, using the 'join', run twice.  Not the
> difference in 'scale' of cpu of these 2 tests vs. the one above, from a
> 'cores' point of view:
> [image: image.png]
>
> Like I said, I'll run these same tests with the ‘method=topLevelDV’, and
> see if it changes behavior.
>
> Thx
>
> Ron Haines
>
> On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
>> Ron, how often both indices are updated? Presumably if they are static,
>> filter cache may help.
>> It's worth making sure that the app gives a chance to filter cache.;
>> To better understand the problem it is worth taking a few treadumps under
>> load: a deep stack gives a clue for hotspot (or just take a sampling
>> profile). Once we know the hot spot we can think about a workaround.
>> https://issues.apache.org/jira/browse/SOLR-16717 about sharding
>> "fromIndex"
>> https://issues.apache.org/jira/browse/SOLR-16242 about keeping "local/to"
>> index cache when fromIndex is updated.
>>
>> On Thu, May 25, 2023 at 5:01 PM Andy Lester <an...@petdance.com> wrote:
>>
>> >
>> >
>> > > On May 25, 2023, at 7:51 AM, Ron Haines <mi...@gmail.com> wrote:
>> > >
>> > > So, when this feature is enabled, this negative &fq gets added:
>> > > -{!join fromIndex=primary_rollup from=group_id_mv to=group_member_id
>> > > score=none}${q}
>> >
>> >
>> > Can we see collection definitions of both the source collection and the
>> > join? Also, a sample query, not just the one parameter? Also, how often
>> are
>> > either of these collections updated? One thing that killed off an entire
>> > project that we were doing was that the join table was getting updated
>> > about once a minute, and this destroyed all our caching, and made the
>> > queries we wanted to do unusable.
>> >
>> >
>> > Thanks,
>> > Andy
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

-- 
Sincerely yours
Mikhail Khludnev