You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2019/02/06 14:58:51 UTC

Re: Indexing in one collection affect index in another collection

Hi everyone,

Does anyone has further updates on this issue?
Thank you.

Regards,
Edwin

On Wed, 30 Jan 2019 at 14:17, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi everyone,
>
> We have tried to do the setup and indexing on the latest Solr 7.6.0
>
> However, we faced exactly the same issue as what we faced in Solr 7.5.0,
> in which the search for customers collection slowed down once we indexed
> policies collection.
>
> Regards,
> Edwin
>
> On Wed, 30 Jan 2019 at 01:19, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
>
>> Hi Paul,
>>
>> Thanks for the reply and suggestion
>>
>> Yes, we have installed RamMap, and are analyzing the results from there.
>> The problem we are facing is that once the query for that collection
>> becomes slow, it will not be fast again even after we restart Solr or the
>> entire machine.
>>
>> Regards,
>> Edwin
>>
>> On Tue, 29 Jan 2019 at 20:30, <pa...@ub.unibe.ch> wrote:
>>
>>> Hi
>>>
>>> If the reason for the difference in speed is that the index is being
>>> read from disk, I would expect that the first query would be slow, but
>>> subsequent queries on the same collection should speed up. A query on the
>>> other collection could then be slower. In this case I would say that this
>>> is normal behavior. The OS file cache cannot be relied upon to give the
>>> same results in different circumstances, including different software
>>> versions.
>>>
>>> You may wish to install the RamMap tool[1], [2], although you may be
>>> having the inverse problem to that described in [1]. You can then see how
>>> much space is used by the cache and other demands.
>>>
>>> If subsequent queries are fast, then to me it does not seem like a
>>> problem for a development machine.  For production you may wish to store
>>> the indices in ram and/or change from windows to linux, id it is important
>>> that all queries including the first are very fast.
>>>
>>> Have a nice day
>>> Paul
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Shawn Heisey <ap...@elyograg.org>
>>> Gesendet: Dienstag, 29. Januar 2019 13:25
>>> An: solr-user@lucene.apache.org
>>> Betreff: Re: Indexing in one collection affect index in another
>>> collection
>>>
>>> On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote:
>>> > My guess is after we change our searchFields_tcs schema which is:
>>> >
>>> > *From*:
>>> > <dynamicField name="*_tcs"  type="text_chinese" indexed="true"
>>> > stored="true" multiValued="true" termVectors="true"
>>> termPositions="true"
>>> > termOffsets="true"/>
>>> >
>>> > *To:*
>>> > <dynamicField name="*_tcs"  type="text_chinese" indexed="true"
>>> > stored="true" multiValued="true" storeOffsetsWithPositions="true"
>>> > termVectors="true" termPositions="false" termOffsets="false"/>
>>>
>>> Adding termVectors will make the index bigger.  Potentially much bigger.
>>>   This will increase the overall RAM requirement of the server,
>>> especially if the server is handling software other than Solr.  Anything
>>> that makes the index bigger can affect performance.
>>>
>>> > The above change was done in order to use the Solr recommended unified
>>> > highlighter (Posting with light term vectors) with Solr's
>>> > documentation claimed it is the fastest.
>>> >
>>> > My best guess is Solr 7.5.0 has some bugs that slowed down the whole
>>> > index and queries with the new approach (above new dynamicField
>>> > schema), which it affects the index OS filecaching or any other issues.
>>> >
>>> > So I kindly suggest you look deeper and see whether such bugs are
>>> exists?
>>>
>>> I know almost nothing about highlighting.  I wouldn't be able to look
>>> for bugs.
>>>
>>> Thanks,
>>> Shawn
>>>
>>

Re: Indexing in one collection affect index in another collection

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi all,

This issue is still surfacing in the new Soir 8.0.0.
Can't really figure out what is the issue, as it occurs also in system with
more memory.

Anyone has any further insights on this?

Regards,
Edwin

On Fri, 15 Feb 2019 at 18:40, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi Shawn,
>
> This issue is also occurring in the new Solr 7.7.0, with only the same
> data size of 20 GB.
>
> Regards,
> Edwin
>
> On Fri, 8 Feb 2019 at 23:53, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
>
>> Hi Shawn,
>>
>> Thanks for your reply.
>>
>> Although the space in the OS disk cache could be the issue, but we didn't
>> face this problem previously, especially in our other setup using Solr
>> 6.5.1, which contains much more data (more than 1 TB), as compared to our
>> current setup in Solr 7.6.0, in which the data size is only 20 GB.
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On Wed, 6 Feb 2019 at 23:52, Shawn Heisey <ap...@elyograg.org> wrote:
>>
>>> On 2/6/2019 7:58 AM, Zheng Lin Edwin Yeo wrote:
>>> > Hi everyone,
>>> >
>>> > Does anyone has further updates on this issue?
>>>
>>> It is my strong belief that all the software running on this server
>>> OTHER than Solr is competing with Solr for space in the OS disk cache,
>>> and that Solr's data is getting pushed out of that cache.
>>>
>>> Best guess is that with only one collection, the disk cache was able to
>>> hold onto Solr's data better, and that with another collection present,
>>> there's not enough disk cache space available to cache both of them
>>> effectively.
>>>
>>> I think you're going to need a dedicated machine for Solr, so Solr isn't
>>> competing for system resources.
>>>
>>> Thanks,
>>> Shawn
>>>
>>

Re: Indexing in one collection affect index in another collection

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Shawn,

This issue is also occurring in the new Solr 7.7.0, with only the same data
size of 20 GB.

Regards,
Edwin

On Fri, 8 Feb 2019 at 23:53, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi Shawn,
>
> Thanks for your reply.
>
> Although the space in the OS disk cache could be the issue, but we didn't
> face this problem previously, especially in our other setup using Solr
> 6.5.1, which contains much more data (more than 1 TB), as compared to our
> current setup in Solr 7.6.0, in which the data size is only 20 GB.
>
> Regards,
> Edwin
>
>
>
> On Wed, 6 Feb 2019 at 23:52, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 2/6/2019 7:58 AM, Zheng Lin Edwin Yeo wrote:
>> > Hi everyone,
>> >
>> > Does anyone has further updates on this issue?
>>
>> It is my strong belief that all the software running on this server
>> OTHER than Solr is competing with Solr for space in the OS disk cache,
>> and that Solr's data is getting pushed out of that cache.
>>
>> Best guess is that with only one collection, the disk cache was able to
>> hold onto Solr's data better, and that with another collection present,
>> there's not enough disk cache space available to cache both of them
>> effectively.
>>
>> I think you're going to need a dedicated machine for Solr, so Solr isn't
>> competing for system resources.
>>
>> Thanks,
>> Shawn
>>
>

Re: Indexing in one collection affect index in another collection

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Shawn,

Thanks for your reply.

Although the space in the OS disk cache could be the issue, but we didn't
face this problem previously, especially in our other setup using Solr
6.5.1, which contains much more data (more than 1 TB), as compared to our
current setup in Solr 7.6.0, in which the data size is only 20 GB.

Regards,
Edwin



On Wed, 6 Feb 2019 at 23:52, Shawn Heisey <ap...@elyograg.org> wrote:

> On 2/6/2019 7:58 AM, Zheng Lin Edwin Yeo wrote:
> > Hi everyone,
> >
> > Does anyone has further updates on this issue?
>
> It is my strong belief that all the software running on this server
> OTHER than Solr is competing with Solr for space in the OS disk cache,
> and that Solr's data is getting pushed out of that cache.
>
> Best guess is that with only one collection, the disk cache was able to
> hold onto Solr's data better, and that with another collection present,
> there's not enough disk cache space available to cache both of them
> effectively.
>
> I think you're going to need a dedicated machine for Solr, so Solr isn't
> competing for system resources.
>
> Thanks,
> Shawn
>

Re: Indexing in one collection affect index in another collection

Posted by Shawn Heisey <ap...@elyograg.org>.
On 2/6/2019 7:58 AM, Zheng Lin Edwin Yeo wrote:
> Hi everyone,
> 
> Does anyone has further updates on this issue?

It is my strong belief that all the software running on this server 
OTHER than Solr is competing with Solr for space in the OS disk cache, 
and that Solr's data is getting pushed out of that cache.

Best guess is that with only one collection, the disk cache was able to 
hold onto Solr's data better, and that with another collection present, 
there's not enough disk cache space available to cache both of them 
effectively.

I think you're going to need a dedicated machine for Solr, so Solr isn't 
competing for system resources.

Thanks,
Shawn