You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ramyogi <ra...@gmail.com> on 2020/07/01 19:51:09 UTC

Re: Suggestion or recommendation for NRT

Even though same document indexed over and over again due to incremental
update. Index size is being increased.
Do I miss any configuration to make optimization occur by internally ?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Suggestion or recommendation for NRT

Posted by ramyogi <ra...@gmail.com>.
Hi Team, Any suggestion or recommendation for the above approach which we are
doing  to have better search performance.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Suggestion or recommendation for NRT

Posted by ramyogi <ra...@gmail.com>.
Thanks a lot for your time to respond for my clarifications.

We are having two environment,
ENV A and ENV B ( Both same capacity of RAM ( r5.2xlarge  and same number of
shards and replicas type (NRT) for the collection)

ENV A -  it is having a collection which is optimized ( segment count 1 and
numdocs = maxdocs ) it is used only for Search request. No delta updates are
being triggerred.


ENV B - It is having same collection copied from "ENV A" and continues DELTA
updates in progress so it is used for Indexing and search request. Indexing
using KAFKA connect plugin that uses SOLRJ with
solr.commit.within=300000 ( milli seconds )


We are comparing performance between those environments for search request
using automation test running with bunch of queries.

Regarding search warmup:

    <query>

        <maxBooleanClauses>10000</maxBooleanClauses>

        <filterCache class="solr.FastLRUCache"
                     size="10120"
                     initialSize="4192"
                     autowarmCount="0"/>

        
        <cache name="perSegFilter"
               class="solr.search.LRUCache"
               size="10"
               initialSize="0"
               autowarmCount="10"
               regenerator="solr.NoOpRegenerator"/>

        <fieldValueCache class="solr.FastLRUCache"
                         size="4096"
                         autowarmCount="1024"
                         showItems="32"/>

        <enableLazyFieldLoading>true</enableLazyFieldLoading>

        <queryResultWindowSize>20</queryResultWindowSize>

        <queryResultMaxDocsCached>200</queryResultMaxDocsCached>

        <listener event="newSearcher" class="solr.QuerySenderListener">
            <arr name="queries">
                <lst>
                    <str name="q">*:*</str>
                    <str name="facet">true</str>
                </lst>
            </arr>
        </listener>
        <listener event="firstSearcher" class="solr.QuerySenderListener">
            <arr name="queries">
                <lst>
                    <str name="q">*:*</str>
                    <str name="facet">true</str>
                </lst>
            </arr>
        </listener>

        <useColdSearcher>false</useColdSearcher>

        <maxWarmingSearchers>24</maxWarmingSearchers>

    </query>



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Suggestion or recommendation for NRT

Posted by Erick Erickson <er...@gmail.com>.
That seems high. It can be tricky to get tests. Are you running with
some kind of test runner? Do you have, say, 3-4 thousand queries
you run? Are you running the tests after warming the searchers?

Also, if you have indexed down to one segment, _then_ tried
adding docs and measuring you are not getting accurate results.

See: https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

> On Jul 1, 2020, at 5:55 PM, ramyogi <ra...@gmail.com> wrote:
> 
> Thanks Erick for the details and reference to understand better about merging
> segment stuff.
> When I compare  performance of uninterrupted/optimized ( segment count 1)
> collection  for search request vs (indexing + search) in parallel  going on
> collection   performance is 3 times higher,
> for example : first one is responding 100ms in average but second one around
> 400ms.
> 
> is that expected behaviour like we need to tradeoff if we do Indexing and
> search in the same collection parallel.
> or we can still fine tune with some parameters for better performance then
> please suggest some.
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Suggestion or recommendation for NRT

Posted by ramyogi <ra...@gmail.com>.
Thanks Erick for the details and reference to understand better about merging
segment stuff.
When I compare  performance of uninterrupted/optimized ( segment count 1)
collection  for search request vs (indexing + search) in parallel  going on
collection   performance is 3 times higher,
for example : first one is responding 100ms in average but second one around
400ms.

is that expected behaviour like we need to tradeoff if we do Indexing and
search in the same collection parallel.
or we can still fine tune with some parameters for better performance then
please suggest some.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Suggestion or recommendation for NRT

Posted by Erick Erickson <er...@gmail.com>.
Updated documents are marked as deleted in the
old segment and added to a new segment. When
commits happen, merges occur and only then is the
space occupied by the deleted document reclaimed.

Which segments are merged on commit depends
on a number of factors.

Unless you can prove the extra space is a problem,
you should just ignore the issue. The percentage of
deleted documents should max out at around 33%
on Solr 7.5+.

For background on merging, see:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

The third animation (TieredMergePolicy) is the default.

Best,
Erick

> On Jul 1, 2020, at 3:51 PM, ramyogi <ra...@gmail.com> wrote:
> 
> Even though same document indexed over and over again due to incremental
> update. Index size is being increased.
> Do I miss any configuration to make optimization occur by internally ?
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html