You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by Mikhail Khludnev <mk...@apache.org> on 2021/12/14 21:24:14 UTC

Inventory updates via join query and caches

Hello, Colleagues.
I want to discuss one frequent usecase: inventory updates.
Let's say we can't reindex docs when inventory numbers updated. We can put
inventory in separate index, and apply fq={!join ..
fromIndex=inventory}left:(0 TO *]. Once it's cached in main index filter
cache it gets a good response time. We can even shard main collection, but
keep inventory single shard. Ok.
The sad moment occurs when commit goes into inventory core, after searcher
is refreshed it's going to be cache misses on those inventory queries, and
many of them go into new inventory searcher. That's not good. I can think
of two workarounds:
 - relax {!join} equality regarding fromIndex timestamp, so for some time
it will be outdated inventory, but it's ok. And then we need to somehow,
evict, invalidate, regenerate inventory filter
 - newSearcher listener in inventory core can introspect main core cache
entries find {!join .. fromIndex=inventory}... regenerate and insert
results.
I'm afraid to think about queryResult cache.

Is it worth to have something like this in Solr distro?

-- 
Sincerely yours
Mikhail Khludnev

Re: Inventory updates via join query and caches

Posted by Mikhail Khludnev <mk...@apache.org>.
ok. It takes a while to scratch some code with a test
https://issues.apache.org/jira/browse/SOLR-16242
https://github.com/apache/solr/pull/623 Please chime it!

On Tue, Feb 15, 2022 at 11:01 AM Mikhail Khludnev <mk...@apache.org> wrote:

> It turned out to be a little bit more optimistic after I moved the cache
> check into QueryWrapper.createWeight(Searcher, ...,... ). TBC.
>
> Joel,
> Regarding moving inventory into the main index, I'm afraid it requires
> frequent commits into the main index and impacts search latency.
>
> On Mon, Feb 14, 2022 at 12:45 AM Mikhail Khludnev <mk...@apache.org> wrote:
>
>> Hi, David and Joel.
>> It took a while. I kicked tires a little
>> https://github.com/apache/solr/pull/623
>> I introduced {!join cacheEventually=true} param. It yields false positive
>> JoinQueries (ignores fromCore timestamp), and backed on docsets reside in
>> the user cache.
>> Cache listener doesn't suit for this purpose - fresh "from" searcher
>> isn't available for refreshing queries. So, I made it work with special
>> update processor which registered at inventory ("from") core and refreshes
>> user cache of "to" searcher with regenerator.warm.
>> You know, it's even work passing a simple test.
>> Here's the bummer q=*:*&fq={!join cacheEventually=true fromCore=inventory
>> ..}.. if it's cached in query result cache, and commit into main index
>> starts to warm query result cache with a new "to" searcher, and it picks up
>> old searcher doc set. Boom. Presumably it can worked around by
>> q={!cache=false}... or disabling query result cache, but it seems not so
>> elegant, as I thought.
>>
>> On Mon, Dec 20, 2021 at 4:09 AM Joel Bernstein <jo...@gmail.com>
>> wrote:
>>
>>> The second approach (newSearcher listener) is a nice approach if the
>>> filter cache is too full to rely on auto-warming.
>>> Static warming queries fail on cross core joins but I believe succeed on
>>> a self core join. So you could move the inventory into the same core and
>>> use a static warming query. The downside to this is the pollution of the
>>> main index with ever changing inventory segments.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>>
>>> On Sun, Dec 19, 2021 at 6:10 PM David Smiley <ds...@apache.org> wrote:
>>>
>>>> I'm not sure there is a clean/simple solution to this specific
>>>> problem.  But I could imagine a more general & simple feature that could
>>>> solve this scenario, with just a bit more work by the user.
>>>>
>>>> Imagine an optional cache-key on ExtendedQuery auto-parsed, perhaps
>>>> with local-param "cacheKey".  It would wrap any Query with one having a
>>>> special equals & hashcode on this key.  Solr wouldn't parse the string for
>>>> this query so long as it can look it up in a special cache of these.  That
>>>> special cache would be Map<String,Query> with weak values such that if it's
>>>> not used anymore (e.g. not in the filter cache), it would be GC'ed.  This
>>>> would be useful for expensive queries that might resolve from some
>>>> network location (e.g. access control filters that refer to data in
>>>> who-knows-where).  So that's useful on its own but doesn't solve your
>>>> conundrum.  Then, imagine some new request handler that allows you to
>>>> provide this key & query and have it perform a filter cache save,
>>>> overwriting whatever entry that may have been there.  You could even do
>>>> this in a newSearcher event on the inventory core, calling into the primary
>>>> product core.
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Tue, Dec 14, 2021 at 4:24 PM Mikhail Khludnev <mk...@apache.org>
>>>> wrote:
>>>>
>>>>> Hello, Colleagues.
>>>>> I want to discuss one frequent usecase: inventory updates.
>>>>> Let's say we can't reindex docs when inventory numbers updated. We can
>>>>> put inventory in separate index, and apply fq={!join ..
>>>>> fromIndex=inventory}left:(0 TO *]. Once it's cached in main index filter
>>>>> cache it gets a good response time. We can even shard main collection, but
>>>>> keep inventory single shard. Ok.
>>>>> The sad moment occurs when commit goes into inventory core, after
>>>>> searcher is refreshed it's going to be cache misses on those inventory
>>>>> queries, and many of them go into new inventory searcher. That's not good.
>>>>> I can think of two workarounds:
>>>>>  - relax {!join} equality regarding fromIndex timestamp, so for some
>>>>> time it will be outdated inventory, but it's ok. And then we need to
>>>>> somehow, evict, invalidate, regenerate inventory filter
>>>>>  - newSearcher listener in inventory core can introspect main core
>>>>> cache entries find {!join .. fromIndex=inventory}... regenerate and insert
>>>>> results.
>>>>> I'm afraid to think about queryResult cache.
>>>>>
>>>>> Is it worth to have something like this in Solr distro?
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>>
>>>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Inventory updates via join query and caches

Posted by Mikhail Khludnev <mk...@apache.org>.
It turned out to be a little bit more optimistic after I moved the cache
check into QueryWrapper.createWeight(Searcher, ...,... ). TBC.

Joel,
Regarding moving inventory into the main index, I'm afraid it requires
frequent commits into the main index and impacts search latency.

On Mon, Feb 14, 2022 at 12:45 AM Mikhail Khludnev <mk...@apache.org> wrote:

> Hi, David and Joel.
> It took a while. I kicked tires a little
> https://github.com/apache/solr/pull/623
> I introduced {!join cacheEventually=true} param. It yields false positive
> JoinQueries (ignores fromCore timestamp), and backed on docsets reside in
> the user cache.
> Cache listener doesn't suit for this purpose - fresh "from" searcher isn't
> available for refreshing queries. So, I made it work with special update
> processor which registered at inventory ("from") core and refreshes user
> cache of "to" searcher with regenerator.warm.
> You know, it's even work passing a simple test.
> Here's the bummer q=*:*&fq={!join cacheEventually=true fromCore=inventory
> ..}.. if it's cached in query result cache, and commit into main index
> starts to warm query result cache with a new "to" searcher, and it picks up
> old searcher doc set. Boom. Presumably it can worked around by
> q={!cache=false}... or disabling query result cache, but it seems not so
> elegant, as I thought.
>
> On Mon, Dec 20, 2021 at 4:09 AM Joel Bernstein <jo...@gmail.com> wrote:
>
>> The second approach (newSearcher listener) is a nice approach if the
>> filter cache is too full to rely on auto-warming.
>> Static warming queries fail on cross core joins but I believe succeed on
>> a self core join. So you could move the inventory into the same core and
>> use a static warming query. The downside to this is the pollution of the
>> main index with ever changing inventory segments.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Sun, Dec 19, 2021 at 6:10 PM David Smiley <ds...@apache.org> wrote:
>>
>>> I'm not sure there is a clean/simple solution to this specific problem.
>>> But I could imagine a more general & simple feature that could solve this
>>> scenario, with just a bit more work by the user.
>>>
>>> Imagine an optional cache-key on ExtendedQuery auto-parsed, perhaps with
>>> local-param "cacheKey".  It would wrap any Query with one having a special
>>> equals & hashcode on this key.  Solr wouldn't parse the string for
>>> this query so long as it can look it up in a special cache of these.  That
>>> special cache would be Map<String,Query> with weak values such that if it's
>>> not used anymore (e.g. not in the filter cache), it would be GC'ed.  This
>>> would be useful for expensive queries that might resolve from some
>>> network location (e.g. access control filters that refer to data in
>>> who-knows-where).  So that's useful on its own but doesn't solve your
>>> conundrum.  Then, imagine some new request handler that allows you to
>>> provide this key & query and have it perform a filter cache save,
>>> overwriting whatever entry that may have been there.  You could even do
>>> this in a newSearcher event on the inventory core, calling into the primary
>>> product core.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Tue, Dec 14, 2021 at 4:24 PM Mikhail Khludnev <mk...@apache.org>
>>> wrote:
>>>
>>>> Hello, Colleagues.
>>>> I want to discuss one frequent usecase: inventory updates.
>>>> Let's say we can't reindex docs when inventory numbers updated. We can
>>>> put inventory in separate index, and apply fq={!join ..
>>>> fromIndex=inventory}left:(0 TO *]. Once it's cached in main index filter
>>>> cache it gets a good response time. We can even shard main collection, but
>>>> keep inventory single shard. Ok.
>>>> The sad moment occurs when commit goes into inventory core, after
>>>> searcher is refreshed it's going to be cache misses on those inventory
>>>> queries, and many of them go into new inventory searcher. That's not good.
>>>> I can think of two workarounds:
>>>>  - relax {!join} equality regarding fromIndex timestamp, so for some
>>>> time it will be outdated inventory, but it's ok. And then we need to
>>>> somehow, evict, invalidate, regenerate inventory filter
>>>>  - newSearcher listener in inventory core can introspect main core
>>>> cache entries find {!join .. fromIndex=inventory}... regenerate and insert
>>>> results.
>>>> I'm afraid to think about queryResult cache.
>>>>
>>>> Is it worth to have something like this in Solr distro?
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>>
>>>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Inventory updates via join query and caches

Posted by Mikhail Khludnev <mk...@apache.org>.
Hi, David and Joel.
It took a while. I kicked tires a little
https://github.com/apache/solr/pull/623
I introduced {!join cacheEventually=true} param. It yields false positive
JoinQueries (ignores fromCore timestamp), and backed on docsets reside in
the user cache.
Cache listener doesn't suit for this purpose - fresh "from" searcher isn't
available for refreshing queries. So, I made it work with special update
processor which registered at inventory ("from") core and refreshes user
cache of "to" searcher with regenerator.warm.
You know, it's even work passing a simple test.
Here's the bummer q=*:*&fq={!join cacheEventually=true fromCore=inventory
..}.. if it's cached in query result cache, and commit into main index
starts to warm query result cache with a new "to" searcher, and it picks up
old searcher doc set. Boom. Presumably it can worked around by
q={!cache=false}... or disabling query result cache, but it seems not so
elegant, as I thought.

On Mon, Dec 20, 2021 at 4:09 AM Joel Bernstein <jo...@gmail.com> wrote:

> The second approach (newSearcher listener) is a nice approach if the
> filter cache is too full to rely on auto-warming.
> Static warming queries fail on cross core joins but I believe succeed on a
> self core join. So you could move the inventory into the same core and use
> a static warming query. The downside to this is the pollution of the main
> index with ever changing inventory segments.
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Sun, Dec 19, 2021 at 6:10 PM David Smiley <ds...@apache.org> wrote:
>
>> I'm not sure there is a clean/simple solution to this specific problem.
>> But I could imagine a more general & simple feature that could solve this
>> scenario, with just a bit more work by the user.
>>
>> Imagine an optional cache-key on ExtendedQuery auto-parsed, perhaps with
>> local-param "cacheKey".  It would wrap any Query with one having a special
>> equals & hashcode on this key.  Solr wouldn't parse the string for
>> this query so long as it can look it up in a special cache of these.  That
>> special cache would be Map<String,Query> with weak values such that if it's
>> not used anymore (e.g. not in the filter cache), it would be GC'ed.  This
>> would be useful for expensive queries that might resolve from some
>> network location (e.g. access control filters that refer to data in
>> who-knows-where).  So that's useful on its own but doesn't solve your
>> conundrum.  Then, imagine some new request handler that allows you to
>> provide this key & query and have it perform a filter cache save,
>> overwriting whatever entry that may have been there.  You could even do
>> this in a newSearcher event on the inventory core, calling into the primary
>> product core.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Tue, Dec 14, 2021 at 4:24 PM Mikhail Khludnev <mk...@apache.org> wrote:
>>
>>> Hello, Colleagues.
>>> I want to discuss one frequent usecase: inventory updates.
>>> Let's say we can't reindex docs when inventory numbers updated. We can
>>> put inventory in separate index, and apply fq={!join ..
>>> fromIndex=inventory}left:(0 TO *]. Once it's cached in main index filter
>>> cache it gets a good response time. We can even shard main collection, but
>>> keep inventory single shard. Ok.
>>> The sad moment occurs when commit goes into inventory core, after
>>> searcher is refreshed it's going to be cache misses on those inventory
>>> queries, and many of them go into new inventory searcher. That's not good.
>>> I can think of two workarounds:
>>>  - relax {!join} equality regarding fromIndex timestamp, so for some
>>> time it will be outdated inventory, but it's ok. And then we need to
>>> somehow, evict, invalidate, regenerate inventory filter
>>>  - newSearcher listener in inventory core can introspect main core cache
>>> entries find {!join .. fromIndex=inventory}... regenerate and insert
>>> results.
>>> I'm afraid to think about queryResult cache.
>>>
>>> Is it worth to have something like this in Solr distro?
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>>
>>

-- 
Sincerely yours
Mikhail Khludnev

Re: Inventory updates via join query and caches

Posted by Joel Bernstein <jo...@gmail.com>.
The second approach (newSearcher listener) is a nice approach if the filter
cache is too full to rely on auto-warming.
Static warming queries fail on cross core joins but I believe succeed on a
self core join. So you could move the inventory into the same core and use
a static warming query. The downside to this is the pollution of the main
index with ever changing inventory segments.









Joel Bernstein
http://joelsolr.blogspot.com/


On Sun, Dec 19, 2021 at 6:10 PM David Smiley <ds...@apache.org> wrote:

> I'm not sure there is a clean/simple solution to this specific problem.
> But I could imagine a more general & simple feature that could solve this
> scenario, with just a bit more work by the user.
>
> Imagine an optional cache-key on ExtendedQuery auto-parsed, perhaps with
> local-param "cacheKey".  It would wrap any Query with one having a special
> equals & hashcode on this key.  Solr wouldn't parse the string for
> this query so long as it can look it up in a special cache of these.  That
> special cache would be Map<String,Query> with weak values such that if it's
> not used anymore (e.g. not in the filter cache), it would be GC'ed.  This
> would be useful for expensive queries that might resolve from some
> network location (e.g. access control filters that refer to data in
> who-knows-where).  So that's useful on its own but doesn't solve your
> conundrum.  Then, imagine some new request handler that allows you to
> provide this key & query and have it perform a filter cache save,
> overwriting whatever entry that may have been there.  You could even do
> this in a newSearcher event on the inventory core, calling into the primary
> product core.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Dec 14, 2021 at 4:24 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
>> Hello, Colleagues.
>> I want to discuss one frequent usecase: inventory updates.
>> Let's say we can't reindex docs when inventory numbers updated. We can
>> put inventory in separate index, and apply fq={!join ..
>> fromIndex=inventory}left:(0 TO *]. Once it's cached in main index filter
>> cache it gets a good response time. We can even shard main collection, but
>> keep inventory single shard. Ok.
>> The sad moment occurs when commit goes into inventory core, after
>> searcher is refreshed it's going to be cache misses on those inventory
>> queries, and many of them go into new inventory searcher. That's not good.
>> I can think of two workarounds:
>>  - relax {!join} equality regarding fromIndex timestamp, so for some time
>> it will be outdated inventory, but it's ok. And then we need to somehow,
>> evict, invalidate, regenerate inventory filter
>>  - newSearcher listener in inventory core can introspect main core cache
>> entries find {!join .. fromIndex=inventory}... regenerate and insert
>> results.
>> I'm afraid to think about queryResult cache.
>>
>> Is it worth to have something like this in Solr distro?
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

Re: Inventory updates via join query and caches

Posted by David Smiley <ds...@apache.org>.
I'm not sure there is a clean/simple solution to this specific problem.
But I could imagine a more general & simple feature that could solve this
scenario, with just a bit more work by the user.

Imagine an optional cache-key on ExtendedQuery auto-parsed, perhaps with
local-param "cacheKey".  It would wrap any Query with one having a special
equals & hashcode on this key.  Solr wouldn't parse the string for
this query so long as it can look it up in a special cache of these.  That
special cache would be Map<String,Query> with weak values such that if it's
not used anymore (e.g. not in the filter cache), it would be GC'ed.  This
would be useful for expensive queries that might resolve from some
network location (e.g. access control filters that refer to data in
who-knows-where).  So that's useful on its own but doesn't solve your
conundrum.  Then, imagine some new request handler that allows you to
provide this key & query and have it perform a filter cache save,
overwriting whatever entry that may have been there.  You could even do
this in a newSearcher event on the inventory core, calling into the primary
product core.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Dec 14, 2021 at 4:24 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Hello, Colleagues.
> I want to discuss one frequent usecase: inventory updates.
> Let's say we can't reindex docs when inventory numbers updated. We can put
> inventory in separate index, and apply fq={!join ..
> fromIndex=inventory}left:(0 TO *]. Once it's cached in main index filter
> cache it gets a good response time. We can even shard main collection, but
> keep inventory single shard. Ok.
> The sad moment occurs when commit goes into inventory core, after searcher
> is refreshed it's going to be cache misses on those inventory queries, and
> many of them go into new inventory searcher. That's not good. I can think
> of two workarounds:
>  - relax {!join} equality regarding fromIndex timestamp, so for some time
> it will be outdated inventory, but it's ok. And then we need to somehow,
> evict, invalidate, regenerate inventory filter
>  - newSearcher listener in inventory core can introspect main core cache
> entries find {!join .. fromIndex=inventory}... regenerate and insert
> results.
> I'm afraid to think about queryResult cache.
>
> Is it worth to have something like this in Solr distro?
>
> --
> Sincerely yours
> Mikhail Khludnev
>