You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Rallavagu <ra...@gmail.com> on 2016/07/21 15:18:47 UTC

solr.NRTCachingDirectoryFactory

Solr 5.4.1 with embedded jetty with cloud enabled

We have a Solr deployment (approximately 3 million documents) with both 
write and search operations happening. We have a requirement to have 
updates available immediately (NRT). Configured with default 
"solr.NRTCachingDirectoryFactory" for directory factory. Considering the 
fact that every time there is an update, caches are invalidated and 
re-built I assume that "solr.NRTCachingDirectoryFactory" would memory 
map index files so "reading from disk" will be as simple and quick as 
reading from memory hence would not incur any significant performance 
degradation. Am I right in my assumption? We have allocated significant 
amount of RAM (48G total physical memory, 12G heap, Total index disk 
size is 15G) but not sure if I am seeing the optimal QTimes (for 
searches). Any inputs are welcome. Thanks in advance.

Re: solr.NRTCachingDirectoryFactory

Posted by Rallavagu <ra...@gmail.com>.

Thanks Michail.

I am unable to locate bottleneck so far. Will try jstack and other tools.

On 8/25/16 11:40 PM, Mikhail Khludnev wrote:
> Rough sampling under load makes sense as usual. JMC is one of the suitable
> tools for this.
> Sometimes even just jstack <PID> or looking at SolrAdmin/Threads is enough.
> If the only small ratio of documents is updated and a bottleneck is
> filterCache you can experiment with segmened filters which suite more for
> NRT.
> http://blog-archive.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html
>
>
> On Fri, Aug 26, 2016 at 2:56 AM, Rallavagu <ra...@gmail.com> wrote:
>
>> Follow up update ...
>>
>> Set autowarm count to zero for caches for NRT and I could negotiate
>> latency from 2 min to 5 min :)
>>
>> However, still seeing high QTimes and wondering where else can I look?
>> Should I debug the code or run some tools to isolate bottlenecks (disk I/O,
>> CPU or Query itself). Looking for some tuning advice. Thanks.
>>
>>
>> On 7/26/16 9:42 AM, Erick Erickson wrote:
>>
>>> And, I might add, you should look through your old logs
>>> and see how long it takes to open a searcher. Let's
>>> say Shawn's lower bound is what you see, i.e.
>>> it takes a minute each to execute all the autowarming
>>> in filterCache and queryResultCache... So you're current
>>> latency is _at least_ 2 minutes between the time something
>>> is indexed and it's available for search just for autowarming.
>>>
>>> Plus up to another 2 minutes for your soft commit interval
>>> to expire.
>>>
>>> So if your business people haven't noticed a 4 minute
>>> latency yet, tell them they don't know what they're talking
>>> about when they insist on the NRT interval being a few
>>> seconds ;).
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu <ra...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On 7/26/16 5:46 AM, Shawn Heisey wrote:
>>>>
>>>>>
>>>>> On 7/22/2016 10:15 AM, Rallavagu wrote:
>>>>>
>>>>>>
>>>>>>     <filterCache class="solr.FastLRUCache"
>>>>>>                  size="5000"
>>>>>>                  initialSize="5000"
>>>>>>                  autowarmCount="500"/>
>>>>>>
>>>>>>
>>>>>     <queryResultCache class="solr.LRUCache"
>>>>>>                      size="20000"
>>>>>>                      initialSize="20000"
>>>>>>                      autowarmCount="500"/>
>>>>>>
>>>>>
>>>>>
>>>>> As Erick indicated, these settings are incompatible with Near Real Time
>>>>> updates.
>>>>>
>>>>> With those settings, every time you commit and create a new searcher,
>>>>> Solr will execute up to 1000 queries (potentially 500 for each of the
>>>>> caches above) before that new searcher will begin returning new results.
>>>>>
>>>>> I do not know how fast your filter queries execute when they aren't
>>>>> cached... but even if they only take 100 milliseconds each, that's could
>>>>> take up to a minute for filterCache warming.  If each one takes two
>>>>> seconds and there are 500 entries in the cache, then autowarming the
>>>>> filterCache would take nearly 17 minutes. You would also need to wait
>>>>> for the warming queries on queryResultCache.
>>>>>
>>>>> The autowarmCount on my filterCache is 4, and warming that cache *still*
>>>>> sometimes takes ten or more seconds to complete.
>>>>>
>>>>> If you want true NRT, you need to set all your autowarmCount values to
>>>>> zero.  The tradeoff with NRT is that your caches are ineffective
>>>>> immediately after a new searcher is created.
>>>>>
>>>>
>>>> Will look into this and make changes as suggested.
>>>>
>>>>
>>>>> Looking at the "top" screenshot ... you have plenty of memory to cache
>>>>> the entire index.  Unless your queries are extreme, this is usually
>>>>> enough for good performance.
>>>>>
>>>>> One possible problem is that cache warming is taking far longer than
>>>>> your autoSoftCommit interval, and the server is constantly busy making
>>>>> thousands of warming queries.  Reducing autowarmCount, possibly to zero,
>>>>> *might* fix that. I would expect higher CPU load than what your
>>>>> screenshot shows if this were happening, but it still might be the
>>>>> problem.
>>>>>
>>>>
>>>> Great point. Thanks for the help.
>>>>
>>>>
>>>>> Thanks,
>>>>> Shawn
>>>>>
>>>>>
>>>>
>
>

Re: solr.NRTCachingDirectoryFactory

Posted by Mikhail Khludnev <mk...@apache.org>.

Rough sampling under load makes sense as usual. JMC is one of the suitable
tools for this.
Sometimes even just jstack <PID> or looking at SolrAdmin/Threads is enough.
If the only small ratio of documents is updated and a bottleneck is
filterCache you can experiment with segmened filters which suite more for
NRT.
http://blog-archive.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html


On Fri, Aug 26, 2016 at 2:56 AM, Rallavagu <ra...@gmail.com> wrote:

> Follow up update ...
>
> Set autowarm count to zero for caches for NRT and I could negotiate
> latency from 2 min to 5 min :)
>
> However, still seeing high QTimes and wondering where else can I look?
> Should I debug the code or run some tools to isolate bottlenecks (disk I/O,
> CPU or Query itself). Looking for some tuning advice. Thanks.
>
>
> On 7/26/16 9:42 AM, Erick Erickson wrote:
>
>> And, I might add, you should look through your old logs
>> and see how long it takes to open a searcher. Let's
>> say Shawn's lower bound is what you see, i.e.
>> it takes a minute each to execute all the autowarming
>> in filterCache and queryResultCache... So you're current
>> latency is _at least_ 2 minutes between the time something
>> is indexed and it's available for search just for autowarming.
>>
>> Plus up to another 2 minutes for your soft commit interval
>> to expire.
>>
>> So if your business people haven't noticed a 4 minute
>> latency yet, tell them they don't know what they're talking
>> about when they insist on the NRT interval being a few
>> seconds ;).
>>
>> Best,
>> Erick
>>
>> On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu <ra...@gmail.com> wrote:
>>
>>>
>>>
>>> On 7/26/16 5:46 AM, Shawn Heisey wrote:
>>>
>>>>
>>>> On 7/22/2016 10:15 AM, Rallavagu wrote:
>>>>
>>>>>
>>>>>     <filterCache class="solr.FastLRUCache"
>>>>>                  size="5000"
>>>>>                  initialSize="5000"
>>>>>                  autowarmCount="500"/>
>>>>>
>>>>>
>>>>     <queryResultCache class="solr.LRUCache"
>>>>>                      size="20000"
>>>>>                      initialSize="20000"
>>>>>                      autowarmCount="500"/>
>>>>>
>>>>
>>>>
>>>> As Erick indicated, these settings are incompatible with Near Real Time
>>>> updates.
>>>>
>>>> With those settings, every time you commit and create a new searcher,
>>>> Solr will execute up to 1000 queries (potentially 500 for each of the
>>>> caches above) before that new searcher will begin returning new results.
>>>>
>>>> I do not know how fast your filter queries execute when they aren't
>>>> cached... but even if they only take 100 milliseconds each, that's could
>>>> take up to a minute for filterCache warming.  If each one takes two
>>>> seconds and there are 500 entries in the cache, then autowarming the
>>>> filterCache would take nearly 17 minutes. You would also need to wait
>>>> for the warming queries on queryResultCache.
>>>>
>>>> The autowarmCount on my filterCache is 4, and warming that cache *still*
>>>> sometimes takes ten or more seconds to complete.
>>>>
>>>> If you want true NRT, you need to set all your autowarmCount values to
>>>> zero.  The tradeoff with NRT is that your caches are ineffective
>>>> immediately after a new searcher is created.
>>>>
>>>
>>> Will look into this and make changes as suggested.
>>>
>>>
>>>> Looking at the "top" screenshot ... you have plenty of memory to cache
>>>> the entire index.  Unless your queries are extreme, this is usually
>>>> enough for good performance.
>>>>
>>>> One possible problem is that cache warming is taking far longer than
>>>> your autoSoftCommit interval, and the server is constantly busy making
>>>> thousands of warming queries.  Reducing autowarmCount, possibly to zero,
>>>> *might* fix that. I would expect higher CPU load than what your
>>>> screenshot shows if this were happening, but it still might be the
>>>> problem.
>>>>
>>>
>>> Great point. Thanks for the help.
>>>
>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>>
>>>


-- 
Sincerely yours
Mikhail Khludnev

Re: solr.NRTCachingDirectoryFactory

Posted by Rallavagu <ra...@gmail.com>.

Follow up update ...

Set autowarm count to zero for caches for NRT and I could negotiate 
latency from 2 min to 5 min :)

However, still seeing high QTimes and wondering where else can I look? 
Should I debug the code or run some tools to isolate bottlenecks (disk 
I/O, CPU or Query itself). Looking for some tuning advice. Thanks.


On 7/26/16 9:42 AM, Erick Erickson wrote:
> And, I might add, you should look through your old logs
> and see how long it takes to open a searcher. Let's
> say Shawn's lower bound is what you see, i.e.
> it takes a minute each to execute all the autowarming
> in filterCache and queryResultCache... So you're current
> latency is _at least_ 2 minutes between the time something
> is indexed and it's available for search just for autowarming.
>
> Plus up to another 2 minutes for your soft commit interval
> to expire.
>
> So if your business people haven't noticed a 4 minute
> latency yet, tell them they don't know what they're talking
> about when they insist on the NRT interval being a few
> seconds ;).
>
> Best,
> Erick
>
> On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu <ra...@gmail.com> wrote:
>>
>>
>> On 7/26/16 5:46 AM, Shawn Heisey wrote:
>>>
>>> On 7/22/2016 10:15 AM, Rallavagu wrote:
>>>>
>>>>     <filterCache class="solr.FastLRUCache"
>>>>                  size="5000"
>>>>                  initialSize="5000"
>>>>                  autowarmCount="500"/>
>>>>
>>>
>>>>     <queryResultCache class="solr.LRUCache"
>>>>                      size="20000"
>>>>                      initialSize="20000"
>>>>                      autowarmCount="500"/>
>>>
>>>
>>> As Erick indicated, these settings are incompatible with Near Real Time
>>> updates.
>>>
>>> With those settings, every time you commit and create a new searcher,
>>> Solr will execute up to 1000 queries (potentially 500 for each of the
>>> caches above) before that new searcher will begin returning new results.
>>>
>>> I do not know how fast your filter queries execute when they aren't
>>> cached... but even if they only take 100 milliseconds each, that's could
>>> take up to a minute for filterCache warming.  If each one takes two
>>> seconds and there are 500 entries in the cache, then autowarming the
>>> filterCache would take nearly 17 minutes. You would also need to wait
>>> for the warming queries on queryResultCache.
>>>
>>> The autowarmCount on my filterCache is 4, and warming that cache *still*
>>> sometimes takes ten or more seconds to complete.
>>>
>>> If you want true NRT, you need to set all your autowarmCount values to
>>> zero.  The tradeoff with NRT is that your caches are ineffective
>>> immediately after a new searcher is created.
>>
>> Will look into this and make changes as suggested.
>>
>>>
>>> Looking at the "top" screenshot ... you have plenty of memory to cache
>>> the entire index.  Unless your queries are extreme, this is usually
>>> enough for good performance.
>>>
>>> One possible problem is that cache warming is taking far longer than
>>> your autoSoftCommit interval, and the server is constantly busy making
>>> thousands of warming queries.  Reducing autowarmCount, possibly to zero,
>>> *might* fix that. I would expect higher CPU load than what your
>>> screenshot shows if this were happening, but it still might be the
>>> problem.
>>
>> Great point. Thanks for the help.
>>
>>>
>>> Thanks,
>>> Shawn
>>>
>>

Re: solr.NRTCachingDirectoryFactory

Posted by Erick Erickson <er...@gmail.com>.

And, I might add, you should look through your old logs
and see how long it takes to open a searcher. Let's
say Shawn's lower bound is what you see, i.e.
it takes a minute each to execute all the autowarming
in filterCache and queryResultCache... So you're current
latency is _at least_ 2 minutes between the time something
is indexed and it's available for search just for autowarming.

Plus up to another 2 minutes for your soft commit interval
to expire.

So if your business people haven't noticed a 4 minute
latency yet, tell them they don't know what they're talking
about when they insist on the NRT interval being a few
seconds ;).

Best,
Erick

On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu <ra...@gmail.com> wrote:
>
>
> On 7/26/16 5:46 AM, Shawn Heisey wrote:
>>
>> On 7/22/2016 10:15 AM, Rallavagu wrote:
>>>
>>>     <filterCache class="solr.FastLRUCache"
>>>                  size="5000"
>>>                  initialSize="5000"
>>>                  autowarmCount="500"/>
>>>
>>
>>>     <queryResultCache class="solr.LRUCache"
>>>                      size="20000"
>>>                      initialSize="20000"
>>>                      autowarmCount="500"/>
>>
>>
>> As Erick indicated, these settings are incompatible with Near Real Time
>> updates.
>>
>> With those settings, every time you commit and create a new searcher,
>> Solr will execute up to 1000 queries (potentially 500 for each of the
>> caches above) before that new searcher will begin returning new results.
>>
>> I do not know how fast your filter queries execute when they aren't
>> cached... but even if they only take 100 milliseconds each, that's could
>> take up to a minute for filterCache warming.  If each one takes two
>> seconds and there are 500 entries in the cache, then autowarming the
>> filterCache would take nearly 17 minutes. You would also need to wait
>> for the warming queries on queryResultCache.
>>
>> The autowarmCount on my filterCache is 4, and warming that cache *still*
>> sometimes takes ten or more seconds to complete.
>>
>> If you want true NRT, you need to set all your autowarmCount values to
>> zero.  The tradeoff with NRT is that your caches are ineffective
>> immediately after a new searcher is created.
>
> Will look into this and make changes as suggested.
>
>>
>> Looking at the "top" screenshot ... you have plenty of memory to cache
>> the entire index.  Unless your queries are extreme, this is usually
>> enough for good performance.
>>
>> One possible problem is that cache warming is taking far longer than
>> your autoSoftCommit interval, and the server is constantly busy making
>> thousands of warming queries.  Reducing autowarmCount, possibly to zero,
>> *might* fix that. I would expect higher CPU load than what your
>> screenshot shows if this were happening, but it still might be the
>> problem.
>
> Great point. Thanks for the help.
>
>>
>> Thanks,
>> Shawn
>>
>

Re: solr.NRTCachingDirectoryFactory

Posted by Rallavagu <ra...@gmail.com>.


On 7/26/16 5:46 AM, Shawn Heisey wrote:
> On 7/22/2016 10:15 AM, Rallavagu wrote:
>>     <filterCache class="solr.FastLRUCache"
>>                  size="5000"
>>                  initialSize="5000"
>>                  autowarmCount="500"/>
>>
>
>>     <queryResultCache class="solr.LRUCache"
>>                      size="20000"
>>                      initialSize="20000"
>>                      autowarmCount="500"/>
>
> As Erick indicated, these settings are incompatible with Near Real Time
> updates.
>
> With those settings, every time you commit and create a new searcher,
> Solr will execute up to 1000 queries (potentially 500 for each of the
> caches above) before that new searcher will begin returning new results.
>
> I do not know how fast your filter queries execute when they aren't
> cached... but even if they only take 100 milliseconds each, that's could
> take up to a minute for filterCache warming.  If each one takes two
> seconds and there are 500 entries in the cache, then autowarming the
> filterCache would take nearly 17 minutes. You would also need to wait
> for the warming queries on queryResultCache.
>
> The autowarmCount on my filterCache is 4, and warming that cache *still*
> sometimes takes ten or more seconds to complete.
>
> If you want true NRT, you need to set all your autowarmCount values to
> zero.  The tradeoff with NRT is that your caches are ineffective
> immediately after a new searcher is created.
Will look into this and make changes as suggested.

>
> Looking at the "top" screenshot ... you have plenty of memory to cache
> the entire index.  Unless your queries are extreme, this is usually
> enough for good performance.
>
> One possible problem is that cache warming is taking far longer than
> your autoSoftCommit interval, and the server is constantly busy making
> thousands of warming queries.  Reducing autowarmCount, possibly to zero,
> *might* fix that. I would expect higher CPU load than what your
> screenshot shows if this were happening, but it still might be the problem.
Great point. Thanks for the help.

>
> Thanks,
> Shawn
>

Re: solr.NRTCachingDirectoryFactory

Posted by Shawn Heisey <ap...@elyograg.org>.

On 7/22/2016 10:15 AM, Rallavagu wrote:
>     <filterCache class="solr.FastLRUCache"
>                  size="5000"
>                  initialSize="5000"
>                  autowarmCount="500"/>
>

>     <queryResultCache class="solr.LRUCache"
>                      size="20000"
>                      initialSize="20000"
>                      autowarmCount="500"/>

As Erick indicated, these settings are incompatible with Near Real Time
updates.

With those settings, every time you commit and create a new searcher,
Solr will execute up to 1000 queries (potentially 500 for each of the
caches above) before that new searcher will begin returning new results.

I do not know how fast your filter queries execute when they aren't
cached... but even if they only take 100 milliseconds each, that's could
take up to a minute for filterCache warming.  If each one takes two
seconds and there are 500 entries in the cache, then autowarming the
filterCache would take nearly 17 minutes. You would also need to wait
for the warming queries on queryResultCache.

The autowarmCount on my filterCache is 4, and warming that cache *still*
sometimes takes ten or more seconds to complete.

If you want true NRT, you need to set all your autowarmCount values to
zero.  The tradeoff with NRT is that your caches are ineffective
immediately after a new searcher is created.

Looking at the "top" screenshot ... you have plenty of memory to cache
the entire index.  Unless your queries are extreme, this is usually
enough for good performance.

One possible problem is that cache warming is taking far longer than
your autoSoftCommit interval, and the server is constantly busy making
thousands of warming queries.  Reducing autowarmCount, possibly to zero,
*might* fix that. I would expect higher CPU load than what your
screenshot shows if this were happening, but it still might be the problem.

Thanks,
Shawn

Re: solr.NRTCachingDirectoryFactory

Posted by Rallavagu <ra...@gmail.com>.


On 7/22/16 9:56 AM, Erick Erickson wrote:
> OK, scratch autowarming. In fact your autowarm counts
> are quite high, I suspect far past "diminishing returns".
> I usually see autowarm counts < 64, but YMMV.
>
> Are you seeing actual hit ratios that are decent on
> those caches (admin UI>>plugins/stats>>cache>>...)
> And your cache sizes are also quite high in my experience,
> it's probably worth measuring the utilization there as well.
> And, BTW, your filterCache can occupy up to 2G of your heap.
> That's probably not your central problem, but it's something
> to consider.
Will look into it.
>
> So I don't know why your queries are taking that long, my
> assumption is that they may simply be very complex queries,
> or you have grouping on or.....
Queries are a bit complex for sure.
>
> I guess the next thing I'd do is start trying to characterize
> what queries are slow. Grouping? Pivot Faceting? 'cause
> from everything you've said so far it's surprising that you're
> seeing queries take this long, something doesn't feel right
> but what it is I don't have a clue.

Thanks

>
> Best,
> Erick
>
> On Fri, Jul 22, 2016 at 9:15 AM, Rallavagu <ra...@gmail.com> wrote:
>>
>>
>> On 7/22/16 8:34 AM, Erick Erickson wrote:
>>>
>>> Mostly this sounds like a problem that could be cured with
>>> autowarming. But two things are conflicting here:
>>> 1> you say "We have a requirement to have updates available immediately
>>> (NRT)"
>>> 2> your docs aren't available for 120 seconds given your autoSoftCommit
>>> settings unless you're specifying
>>> -Dsolr.autoSoftCommit.maxTime=some_other_interval
>>> as a startup parameter.
>>>
>> Yes. We have 120 seconds available.
>>
>>> So assuming you really do have a 120 second autocommit time, you should be
>>> able to smooth out the spikes by appropriate autowarming. You also haven't
>>> indicated what your filterCache and queryResultCache settings are. They
>>> come with a default of 0 for autowarm. But what is their size? And do you
>>> see a correlation between longer queries every on 2 minute intervals? And
>>> do you have some test harness in place (jmeter works well) to demonstrate
>>> that differences in your configuration help or hurt? I can't
>>> over-emphasize the
>>> importance of this, otherwise if you rely on somebody simply saying "it's
>>> slow"
>>> you have no way to know what effect changes have.
>>
>>
>> Here is the cache configuration.
>>
>>     <filterCache class="solr.FastLRUCache"
>>                  size="5000"
>>                  initialSize="5000"
>>                  autowarmCount="500"/>
>>
>>     <!-- Query Result Cache
>>
>>          Caches results of searches - ordered lists of document ids
>>          (DocList) based on a query, a sort, and the range of documents
>> requested.
>>       -->
>>     <queryResultCache class="solr.LRUCache"
>>                      size="20000"
>>                      initialSize="20000"
>>                      autowarmCount="500"/>
>>
>>     <!-- Document Cache
>>
>>          Caches Lucene Document objects (the stored fields for each
>>          document).  Since Lucene internal document ids are transient,
>>          this cache will not be autowarmed.
>>       -->
>>     <documentCache class="solr.LRUCache"
>>                    size="100000"
>>                    initialSize="100000"
>>                    autowarmCount="0"/>
>>
>> We have run load tests using JMeter with directory pointing to Solr and also
>> tests that are pointing to the application that queries Solr. In both cases,
>> we have noticed the results being slower.
>>
>> Thanks
>>
>>>
>>> Best,
>>> Erick
>>>
>>>
>>> On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey <ap...@elyograg.org>
>>> wrote:
>>>>
>>>> On 7/21/2016 11:25 PM, Rallavagu wrote:
>>>>>
>>>>> There is no other software running on the system and it is completely
>>>>> dedicated to Solr. It is running on Linux. Here is the full version.
>>>>>
>>>>> Linux version 3.8.13-55.1.6.el7uek.x86_64
>>>>> (mockbuild@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
>>>>> Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015
>>>>
>>>>
>>>> Run the top program, press shift-M to sort by memory usage, and then
>>>> grab a screenshot of the terminal window.  Share it with a site like
>>>> dropbox, imgur, or something similar, and send the URL.  You'll end up
>>>> with something like this:
>>>>
>>>> https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0
>>>>
>>>> If you know what to look for, you can figure out all the relevant memory
>>>> details from that.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>

Re: solr.NRTCachingDirectoryFactory

Posted by Erick Erickson <er...@gmail.com>.

OK, scratch autowarming. In fact your autowarm counts
are quite high, I suspect far past "diminishing returns".
I usually see autowarm counts < 64, but YMMV.

Are you seeing actual hit ratios that are decent on
those caches (admin UI>>plugins/stats>>cache>>...)
And your cache sizes are also quite high in my experience,
it's probably worth measuring the utilization there as well.
And, BTW, your filterCache can occupy up to 2G of your heap.
That's probably not your central problem, but it's something
to consider.

So I don't know why your queries are taking that long, my
assumption is that they may simply be very complex queries,
or you have grouping on or.....

I guess the next thing I'd do is start trying to characterize
what queries are slow. Grouping? Pivot Faceting? 'cause
from everything you've said so far it's surprising that you're
seeing queries take this long, something doesn't feel right
but what it is I don't have a clue.

Best,
Erick

On Fri, Jul 22, 2016 at 9:15 AM, Rallavagu <ra...@gmail.com> wrote:
>
>
> On 7/22/16 8:34 AM, Erick Erickson wrote:
>>
>> Mostly this sounds like a problem that could be cured with
>> autowarming. But two things are conflicting here:
>> 1> you say "We have a requirement to have updates available immediately
>> (NRT)"
>> 2> your docs aren't available for 120 seconds given your autoSoftCommit
>> settings unless you're specifying
>> -Dsolr.autoSoftCommit.maxTime=some_other_interval
>> as a startup parameter.
>>
> Yes. We have 120 seconds available.
>
>> So assuming you really do have a 120 second autocommit time, you should be
>> able to smooth out the spikes by appropriate autowarming. You also haven't
>> indicated what your filterCache and queryResultCache settings are. They
>> come with a default of 0 for autowarm. But what is their size? And do you
>> see a correlation between longer queries every on 2 minute intervals? And
>> do you have some test harness in place (jmeter works well) to demonstrate
>> that differences in your configuration help or hurt? I can't
>> over-emphasize the
>> importance of this, otherwise if you rely on somebody simply saying "it's
>> slow"
>> you have no way to know what effect changes have.
>
>
> Here is the cache configuration.
>
>     <filterCache class="solr.FastLRUCache"
>                  size="5000"
>                  initialSize="5000"
>                  autowarmCount="500"/>
>
>     <!-- Query Result Cache
>
>          Caches results of searches - ordered lists of document ids
>          (DocList) based on a query, a sort, and the range of documents
> requested.
>       -->
>     <queryResultCache class="solr.LRUCache"
>                      size="20000"
>                      initialSize="20000"
>                      autowarmCount="500"/>
>
>     <!-- Document Cache
>
>          Caches Lucene Document objects (the stored fields for each
>          document).  Since Lucene internal document ids are transient,
>          this cache will not be autowarmed.
>       -->
>     <documentCache class="solr.LRUCache"
>                    size="100000"
>                    initialSize="100000"
>                    autowarmCount="0"/>
>
> We have run load tests using JMeter with directory pointing to Solr and also
> tests that are pointing to the application that queries Solr. In both cases,
> we have noticed the results being slower.
>
> Thanks
>
>>
>> Best,
>> Erick
>>
>>
>> On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey <ap...@elyograg.org>
>> wrote:
>>>
>>> On 7/21/2016 11:25 PM, Rallavagu wrote:
>>>>
>>>> There is no other software running on the system and it is completely
>>>> dedicated to Solr. It is running on Linux. Here is the full version.
>>>>
>>>> Linux version 3.8.13-55.1.6.el7uek.x86_64
>>>> (mockbuild@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
>>>> Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015
>>>
>>>
>>> Run the top program, press shift-M to sort by memory usage, and then
>>> grab a screenshot of the terminal window.  Share it with a site like
>>> dropbox, imgur, or something similar, and send the URL.  You'll end up
>>> with something like this:
>>>
>>> https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0
>>>
>>> If you know what to look for, you can figure out all the relevant memory
>>> details from that.
>>>
>>> Thanks,
>>> Shawn
>>>
>

Re: solr.NRTCachingDirectoryFactory

Posted by Rallavagu <ra...@gmail.com>.


On 7/22/16 8:34 AM, Erick Erickson wrote:
> Mostly this sounds like a problem that could be cured with
> autowarming. But two things are conflicting here:
> 1> you say "We have a requirement to have updates available immediately (NRT)"
> 2> your docs aren't available for 120 seconds given your autoSoftCommit
> settings unless you're specifying
> -Dsolr.autoSoftCommit.maxTime=some_other_interval
> as a startup parameter.
>
Yes. We have 120 seconds available.

> So assuming you really do have a 120 second autocommit time, you should be
> able to smooth out the spikes by appropriate autowarming. You also haven't
> indicated what your filterCache and queryResultCache settings are. They
> come with a default of 0 for autowarm. But what is their size? And do you
> see a correlation between longer queries every on 2 minute intervals? And
> do you have some test harness in place (jmeter works well) to demonstrate
> that differences in your configuration help or hurt? I can't over-emphasize the
> importance of this, otherwise if you rely on somebody simply saying "it's slow"
> you have no way to know what effect changes have.

Here is the cache configuration.

     <filterCache class="solr.FastLRUCache"
                  size="5000"
                  initialSize="5000"
                  autowarmCount="500"/>

     <!-- Query Result Cache

          Caches results of searches - ordered lists of document ids
          (DocList) based on a query, a sort, and the range of documents 
requested.
       -->
     <queryResultCache class="solr.LRUCache"
                      size="20000"
                      initialSize="20000"
                      autowarmCount="500"/>

     <!-- Document Cache

          Caches Lucene Document objects (the stored fields for each
          document).  Since Lucene internal document ids are transient,
          this cache will not be autowarmed.
       -->
     <documentCache class="solr.LRUCache"
                    size="100000"
                    initialSize="100000"
                    autowarmCount="0"/>

We have run load tests using JMeter with directory pointing to Solr and 
also tests that are pointing to the application that queries Solr. In 
both cases, we have noticed the results being slower.

Thanks

>
> Best,
> Erick
>
>
> On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>> On 7/21/2016 11:25 PM, Rallavagu wrote:
>>> There is no other software running on the system and it is completely
>>> dedicated to Solr. It is running on Linux. Here is the full version.
>>>
>>> Linux version 3.8.13-55.1.6.el7uek.x86_64
>>> (mockbuild@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
>>> Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015
>>
>> Run the top program, press shift-M to sort by memory usage, and then
>> grab a screenshot of the terminal window.  Share it with a site like
>> dropbox, imgur, or something similar, and send the URL.  You'll end up
>> with something like this:
>>
>> https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0
>>
>> If you know what to look for, you can figure out all the relevant memory
>> details from that.
>>
>> Thanks,
>> Shawn
>>

Re: solr.NRTCachingDirectoryFactory

Posted by Erick Erickson <er...@gmail.com>.

Mostly this sounds like a problem that could be cured with
autowarming. But two things are conflicting here:
1> you say "We have a requirement to have updates available immediately (NRT)"
2> your docs aren't available for 120 seconds given your autoSoftCommit
settings unless you're specifying
-Dsolr.autoSoftCommit.maxTime=some_other_interval
as a startup parameter.

So assuming you really do have a 120 second autocommit time, you should be
able to smooth out the spikes by appropriate autowarming. You also haven't
indicated what your filterCache and queryResultCache settings are. They
come with a default of 0 for autowarm. But what is their size? And do you
see a correlation between longer queries every on 2 minute intervals? And
do you have some test harness in place (jmeter works well) to demonstrate
that differences in your configuration help or hurt? I can't over-emphasize the
importance of this, otherwise if you rely on somebody simply saying "it's slow"
you have no way to know what effect changes have.

Best,
Erick

On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 7/21/2016 11:25 PM, Rallavagu wrote:
>> There is no other software running on the system and it is completely
>> dedicated to Solr. It is running on Linux. Here is the full version.
>>
>> Linux version 3.8.13-55.1.6.el7uek.x86_64
>> (mockbuild@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
>> Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015
>
> Run the top program, press shift-M to sort by memory usage, and then
> grab a screenshot of the terminal window.  Share it with a site like
> dropbox, imgur, or something similar, and send the URL.  You'll end up
> with something like this:
>
> https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0
>
> If you know what to look for, you can figure out all the relevant memory
> details from that.
>
> Thanks,
> Shawn
>

Re: solr.NRTCachingDirectoryFactory

Posted by Rallavagu <ra...@gmail.com>.

Also, here is the link to screenshot.

https://dl.dropboxusercontent.com/u/39813705/Screen%20Shot%202016-07-22%20at%2010.40.21%20AM.png

Thanks

On 7/21/16 11:22 PM, Shawn Heisey wrote:
> On 7/21/2016 11:25 PM, Rallavagu wrote:
>> There is no other software running on the system and it is completely
>> dedicated to Solr. It is running on Linux. Here is the full version.
>>
>> Linux version 3.8.13-55.1.6.el7uek.x86_64
>> (mockbuild@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
>> Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015
>
> Run the top program, press shift-M to sort by memory usage, and then
> grab a screenshot of the terminal window.  Share it with a site like
> dropbox, imgur, or something similar, and send the URL.  You'll end up
> with something like this:
>
> https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0
>
> If you know what to look for, you can figure out all the relevant memory
> details from that.
>
> Thanks,
> Shawn
>

Re: solr.NRTCachingDirectoryFactory

Posted by Rallavagu <ra...@gmail.com>.

Here is the snapshot of memory usage from "top" as you mentioned. First 
row is "solr" process. Thanks.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
COMMAND
29468 solr      20   0 27.536g 0.013t 3.297g S  45.7 27.6   4251:45 java 
 
 
 
 

21366 root      20   0 14.499g 217824  12952 S   1.0  0.4 192:11.54 java 
 
 
 
 

  2077 root      20   0 14.049g 190824   9980 S   0.7  0.4  62:44.00 
java 
 
 
 

   511 root      20   0  125792  56848  56616 S   0.0  0.1   9:33.23 
systemd-journal 
 
 
 

   316 splunk    20   0  232056  44284  11804 S   0.7  0.1  84:52.74 
splunkd 
 
 
 

  1045 root      20   0  257680  39956   6836 S   0.3  0.1   7:05.78 
puppet 
 
 
 

32631 root      20   0  360956  39292   4788 S   0.0  0.1   4:55.37 
mcollectived 
 
 
 

   703 root      20   0  250372   9000    976 S   0.0  0.0   1:35.52 
rsyslogd 
 
 
 

  1058 nslcd     20   0  454192   6004   2996 S   0.0  0.0  15:08.87 nslcd

On 7/21/16 11:22 PM, Shawn Heisey wrote:
> On 7/21/2016 11:25 PM, Rallavagu wrote:
>> There is no other software running on the system and it is completely
>> dedicated to Solr. It is running on Linux. Here is the full version.
>>
>> Linux version 3.8.13-55.1.6.el7uek.x86_64
>> (mockbuild@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
>> Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015
>
> Run the top program, press shift-M to sort by memory usage, and then
> grab a screenshot of the terminal window.  Share it with a site like
> dropbox, imgur, or something similar, and send the URL.  You'll end up
> with something like this:
>
> https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0
>
> If you know what to look for, you can figure out all the relevant memory
> details from that.
>
> Thanks,
> Shawn
>

Re: solr.NRTCachingDirectoryFactory

Posted by Shawn Heisey <ap...@elyograg.org>.

On 7/21/2016 11:25 PM, Rallavagu wrote:
> There is no other software running on the system and it is completely
> dedicated to Solr. It is running on Linux. Here is the full version.
>
> Linux version 3.8.13-55.1.6.el7uek.x86_64
> (mockbuild@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
> Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 

Run the top program, press shift-M to sort by memory usage, and then
grab a screenshot of the terminal window.  Share it with a site like
dropbox, imgur, or something similar, and send the URL.  You'll end up
with something like this:

https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0

If you know what to look for, you can figure out all the relevant memory
details from that.

Thanks,
Shawn

Re: solr.NRTCachingDirectoryFactory

Posted by Rallavagu <ra...@gmail.com>.


On 7/21/16 9:16 PM, Shawn Heisey wrote:
> On 7/21/2016 9:37 AM, Rallavagu wrote:
>> I suspect swapping as well. But, for my understanding - are the index
>> files from disk memory mapped automatically at the startup time?
>
> They are *mapped* at startup time, but they are not *read* at startup.
> The mapping just sets up a virtual address space for the entire file,
> but until something actually reads the data from the disk, it will not
> be in memory.  Getting the data in memory is what makes mmap fast.
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
>> We are not performing "commit" after every update and here is the
>> configuration for softCommit and hardCommit.
>>
>> <autoCommit>
>>        <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
>>        <openSearcher>false</openSearcher>
>> </autoCommit>
>>
>> <autoSoftCommit>
>>        <maxTime>${solr.autoSoftCommit.maxTime:120000}</maxTime>
>> </autoSoftCommit>
>>
>> I am seeing QTimes (for searches) swing between 10 seconds - 2
>> seconds. Some queries were showing the slowness caused to due to
>> faceting (debug=true). Since we have adjusted indexing and facet times
>> are improved but basic query QTime is still high so wondering where
>> can I look? Is there a way to debug (instrument) a query on Solr node?
>
> Assuming you have not defined the maxTime system properties mentioned in
> those configs, that config means you will potentially be creating a new
> searcher every two minutes ... but if you are sending explicit commits
> or using commitWithin on your updates, then the true situation may be
> very different than what's configured here.
>
>>>> We have allocated significant amount of RAM (48G total
>>>> physical memory, 12G heap, Total index disk size is 15G)
>
> Assuming there's no other software on the system besides the one
> instance of Solr with a 12GB heap, this would mean that you have enough
> room to cache the entire index.  What OS are you running on? With that
> information, I may be able to relay some instructions that will help
> determine what the complete memory situation is on your server.

There is no other software running on the system and it is completely 
dedicated to Solr. It is running on Linux. Here is the full version.

Linux version 3.8.13-55.1.6.el7uek.x86_64 
(mockbuild@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red 
Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015

Thanks

>
> Thanks,
> Shawn
>

Re: solr.NRTCachingDirectoryFactory

Posted by Shawn Heisey <ap...@elyograg.org>.

On 7/21/2016 9:37 AM, Rallavagu wrote:
> I suspect swapping as well. But, for my understanding - are the index
> files from disk memory mapped automatically at the startup time?

They are *mapped* at startup time, but they are not *read* at startup. 
The mapping just sets up a virtual address space for the entire file,
but until something actually reads the data from the disk, it will not
be in memory.  Getting the data in memory is what makes mmap fast.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

> We are not performing "commit" after every update and here is the
> configuration for softCommit and hardCommit.
>
> <autoCommit>
>        <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
>        <openSearcher>false</openSearcher>
> </autoCommit>
>
> <autoSoftCommit>
>        <maxTime>${solr.autoSoftCommit.maxTime:120000}</maxTime>
> </autoSoftCommit>
>
> I am seeing QTimes (for searches) swing between 10 seconds - 2
> seconds. Some queries were showing the slowness caused to due to
> faceting (debug=true). Since we have adjusted indexing and facet times
> are improved but basic query QTime is still high so wondering where
> can I look? Is there a way to debug (instrument) a query on Solr node?

Assuming you have not defined the maxTime system properties mentioned in
those configs, that config means you will potentially be creating a new
searcher every two minutes ... but if you are sending explicit commits
or using commitWithin on your updates, then the true situation may be
very different than what's configured here.

>>> We have allocated significant amount of RAM (48G total
>>> physical memory, 12G heap, Total index disk size is 15G)

Assuming there's no other software on the system besides the one
instance of Solr with a 12GB heap, this would mean that you have enough
room to cache the entire index.  What OS are you running on? With that
information, I may be able to relay some instructions that will help
determine what the complete memory situation is on your server.

Thanks,
Shawn

Re: solr.NRTCachingDirectoryFactory

Posted by Rallavagu <ra...@gmail.com>.

Thanks Erick.

On 7/21/16 8:25 AM, Erick Erickson wrote:
> bq: map index files so "reading from disk" will be as simple and quick
> as reading from memory hence would not incur any significant
> performance degradation.
>
> Well, if
> 1> the read has already been done. First time a page of the file is
> accessed, it must be read from disk.
> 2> You have enough physical memory that _all_ of the files can be held
> in memory at once.
>
> <2> is a little tricky since the big slowdown comes from swapping
> eventually. But in an LRU scheme, that may be OK if the oldest pages
> are the stored=true data which are only accessed to return the top N,
> not to satisfy the search.
I suspect swapping as well. But, for my understanding - are the index 
files from disk memory mapped automatically at the startup time?
>
> What are your QTimes anyway? Define "optimal"....
>
> I'd really push back on this statement: "We have a requirement to have
> updates available immediately (NRT)". Truly? You can't set
> expectations that 5 seconds will be needed (or 10?). Often this is an
> artificial requirement that does no real service to the user, it's
> just something people think they want. If this means you're sending a
> commit after every document, it's actually a really bad practice
> that'll get you into trouble eventually. Plus you won't be able to do
> any autowarming which will read data from disk into the OS memory and
> smooth out any spikes

We are not performing "commit" after every update and here is the 
configuration for softCommit and hardCommit.

<autoCommit>
        <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
        <openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
        <maxTime>${solr.autoSoftCommit.maxTime:120000}</maxTime>
</autoSoftCommit>

I am seeing QTimes (for searches) swing between 10 seconds - 2 seconds. 
Some queries were showing the slowness caused to due to faceting 
(debug=true). Since we have adjusted indexing and facet times are 
improved but basic query QTime is still high so wondering where can I 
look? Is there a way to debug (instrument) a query on Solr node?

>
> FWIW,
> Erick
>
> On Thu, Jul 21, 2016 at 8:18 AM, Rallavagu <ra...@gmail.com> wrote:
>> Solr 5.4.1 with embedded jetty with cloud enabled
>>
>> We have a Solr deployment (approximately 3 million documents) with both
>> write and search operations happening. We have a requirement to have updates
>> available immediately (NRT). Configured with default
>> "solr.NRTCachingDirectoryFactory" for directory factory. Considering the
>> fact that every time there is an update, caches are invalidated and re-built
>> I assume that "solr.NRTCachingDirectoryFactory" would memory map index files
>> so "reading from disk" will be as simple and quick as reading from memory
>> hence would not incur any significant performance degradation. Am I right in
>> my assumption? We have allocated significant amount of RAM (48G total
>> physical memory, 12G heap, Total index disk size is 15G) but not sure if I
>> am seeing the optimal QTimes (for searches). Any inputs are welcome. Thanks
>> in advance.

Re: solr.NRTCachingDirectoryFactory

Posted by Erick Erickson <er...@gmail.com>.

bq: map index files so "reading from disk" will be as simple and quick
as reading from memory hence would not incur any significant
performance degradation.

Well, if
1> the read has already been done. First time a page of the file is
accessed, it must be read from disk.
2> You have enough physical memory that _all_ of the files can be held
in memory at once.

<2> is a little tricky since the big slowdown comes from swapping
eventually. But in an LRU scheme, that may be OK if the oldest pages
are the stored=true data which are only accessed to return the top N,
not to satisfy the search.

What are your QTimes anyway? Define "optimal"....

I'd really push back on this statement: "We have a requirement to have
updates available immediately (NRT)". Truly? You can't set
expectations that 5 seconds will be needed (or 10?). Often this is an
artificial requirement that does no real service to the user, it's
just something people think they want. If this means you're sending a
commit after every document, it's actually a really bad practice
that'll get you into trouble eventually. Plus you won't be able to do
any autowarming which will read data from disk into the OS memory and
smooth out any spikes.

FWIW,
Erick

On Thu, Jul 21, 2016 at 8:18 AM, Rallavagu <ra...@gmail.com> wrote:
> Solr 5.4.1 with embedded jetty with cloud enabled
>
> We have a Solr deployment (approximately 3 million documents) with both
> write and search operations happening. We have a requirement to have updates
> available immediately (NRT). Configured with default
> "solr.NRTCachingDirectoryFactory" for directory factory. Considering the
> fact that every time there is an update, caches are invalidated and re-built
> I assume that "solr.NRTCachingDirectoryFactory" would memory map index files
> so "reading from disk" will be as simple and quick as reading from memory
> hence would not incur any significant performance degradation. Am I right in
> my assumption? We have allocated significant amount of RAM (48G total
> physical memory, 12G heap, Total index disk size is 15G) but not sure if I
> am seeing the optimal QTimes (for searches). Any inputs are welcome. Thanks
> in advance.