You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ramyogi <ra...@gmail.com> on 2020/07/01 19:51:09 UTC
Re: Suggestion or recommendation for NRT
Even though same document indexed over and over again due to incremental
update. Index size is being increased.
Do I miss any configuration to make optimization occur by internally ?
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Suggestion or recommendation for NRT
Posted by ramyogi <ra...@gmail.com>.
Hi Team, Any suggestion or recommendation for the above approach which we are
doing to have better search performance.
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Suggestion or recommendation for NRT
Posted by ramyogi <ra...@gmail.com>.
Thanks a lot for your time to respond for my clarifications.
We are having two environment,
ENV A and ENV B ( Both same capacity of RAM ( r5.2xlarge and same number of
shards and replicas type (NRT) for the collection)
ENV A - it is having a collection which is optimized ( segment count 1 and
numdocs = maxdocs ) it is used only for Search request. No delta updates are
being triggerred.
ENV B - It is having same collection copied from "ENV A" and continues DELTA
updates in progress so it is used for Indexing and search request. Indexing
using KAFKA connect plugin that uses SOLRJ with
solr.commit.within=300000 ( milli seconds )
We are comparing performance between those environments for search request
using automation test running with bunch of queries.
Regarding search warmup:
<query>
<maxBooleanClauses>10000</maxBooleanClauses>
<filterCache class="solr.FastLRUCache"
size="10120"
initialSize="4192"
autowarmCount="0"/>
<cache name="perSegFilter"
class="solr.search.LRUCache"
size="10"
initialSize="0"
autowarmCount="10"
regenerator="solr.NoOpRegenerator"/>
<fieldValueCache class="solr.FastLRUCache"
size="4096"
autowarmCount="1024"
showItems="32"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<queryResultWindowSize>20</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">*:*</str>
<str name="facet">true</str>
</lst>
</arr>
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">*:*</str>
<str name="facet">true</str>
</lst>
</arr>
</listener>
<useColdSearcher>false</useColdSearcher>
<maxWarmingSearchers>24</maxWarmingSearchers>
</query>
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Suggestion or recommendation for NRT
Posted by Erick Erickson <er...@gmail.com>.
That seems high. It can be tricky to get tests. Are you running with
some kind of test runner? Do you have, say, 3-4 thousand queries
you run? Are you running the tests after warming the searchers?
Also, if you have indexed down to one segment, _then_ tried
adding docs and measuring you are not getting accurate results.
See: https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
Best,
Erick
> On Jul 1, 2020, at 5:55 PM, ramyogi <ra...@gmail.com> wrote:
>
> Thanks Erick for the details and reference to understand better about merging
> segment stuff.
> When I compare performance of uninterrupted/optimized ( segment count 1)
> collection for search request vs (indexing + search) in parallel going on
> collection performance is 3 times higher,
> for example : first one is responding 100ms in average but second one around
> 400ms.
>
> is that expected behaviour like we need to tradeoff if we do Indexing and
> search in the same collection parallel.
> or we can still fine tune with some parameters for better performance then
> please suggest some.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Suggestion or recommendation for NRT
Posted by ramyogi <ra...@gmail.com>.
Thanks Erick for the details and reference to understand better about merging
segment stuff.
When I compare performance of uninterrupted/optimized ( segment count 1)
collection for search request vs (indexing + search) in parallel going on
collection performance is 3 times higher,
for example : first one is responding 100ms in average but second one around
400ms.
is that expected behaviour like we need to tradeoff if we do Indexing and
search in the same collection parallel.
or we can still fine tune with some parameters for better performance then
please suggest some.
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Suggestion or recommendation for NRT
Posted by Erick Erickson <er...@gmail.com>.
Updated documents are marked as deleted in the
old segment and added to a new segment. When
commits happen, merges occur and only then is the
space occupied by the deleted document reclaimed.
Which segments are merged on commit depends
on a number of factors.
Unless you can prove the extra space is a problem,
you should just ignore the issue. The percentage of
deleted documents should max out at around 33%
on Solr 7.5+.
For background on merging, see:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
The third animation (TieredMergePolicy) is the default.
Best,
Erick
> On Jul 1, 2020, at 3:51 PM, ramyogi <ra...@gmail.com> wrote:
>
> Even though same document indexed over and over again due to incremental
> update. Index size is being increased.
> Do I miss any configuration to make optimization occur by internally ?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html