You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vishal patel <vi...@outlook.com> on 2019/06/05 11:35:42 UTC

Query takes a long time Solr 6.1.0

We have 2 shards and 2 replicas in Live also have multiple collections. We are performing heavy search and update.

-> I have attached some query which takes time for executing. why does it take too much time? Due to the query length?

-> Some times replica goes in recovery mode and from the log, we can not identify the issue but GC pause time 15 to 20 seconds. Ideally what should be GC pause time? GC pause time increase due to indexing or searching documents?

My Solr live data :

                SIZE(gb)
Collections     Total documents shard1  shard2
documents       20419967                        117     99.4
commentdetails  18305485                6.47    6.83
documentcontent         8810482                 191     102
forms   4316563                                                         80.1    76.4

Regards,
Vishal


Re: Query takes a long time Solr 6.1.0

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/10/2019 3:24 AM, vishal patel wrote:
> We have 27 collections and each collection has many schema fields and in live too many search and index create&update requests come and most of the searching requests are sorting, faceting, grouping, and long query.
> So approx average 40GB heap are used so we gave 80GB memory.

Unless you've been watching an actual *graph* of heap usage over a 
significant amount of time, you can't learn anything useful from it.

And it's very possible that you can't get anything useful even from a 
graph, unless that graph is generated by analyzing a lengthy garbage 
collection log.

> our directory in solrconfig.xml
> <directoryFactory name="DirectoryFactory"
>                      class="${solr.directoryFactory:solr.MMapDirectoryFactory}">
> </directoryFactory>

When using MMAP, one of the memory columns should show a total that's 
approximately equal to the max heap plus the size of all indexes being 
handled by Solr.  None of the columns in your Resource Monitor memory 
screenshot show numbers over 400GB, which is what I would expect based 
on what you said about the index size.

MMapDirectoryFactory is a decent choice, but Solr's default of 
NRTCachingDirectoryFactory is probably better.  Switching to NRT will 
not help whatever is causing your performance problems, though.

> Here our schema file and solrconfig XML and GC log, please verify it. is it anything wrong or suggestions for improvement?
> https://drive.google.com/drive/folders/1wV9bdQ5-pP4s4yc8jrYNz77YYVRmT7FG

That GC log covers a grand total of three and a half minutes.  It's 
useless.  Heap usage is nearly constant for the full time at about 30GB. 
  Without a much more comprehensive log, I cannot offer any useful 
advice.  I'm looking for logs that lasts several hours, and a few DAYS 
would be better.

Your caches are commented out, so that is not contributing to heap 
usage.  Another reason to drop the heap size, maybe.

> 2019-06-06T11:55:53.456+0100: 1053797.556: Total time for which application threads were stopped: 42.4594545 seconds, Stopping threads took: 26.7301882 seconds

Part of the problem here is that stopping threads took 26 seconds.  I 
have never seen anything that high before.  It should only take a 
*small* fraction of a second to stop all threads.  Something seems to be 
going very wrong here.  One thing that it *might* be is something called 
"the four month bug", which is fixed by adding -XX:+PerfDisableSharedMem 
to the JVM options.  Here's a link to the blog post about that problem:

https://www.evanjones.ca/jvm-mmap-pause.html

It's not clear whether the 42 seconds *includes* the 26 seconds, or 
whether there was 42 seconds of pause AFTER the threads were stopped.  I 
would imagine that the larger number includes the smaller number.  Might 
need to ask Oracle engineers.  Pause times like this do not surprise me 
with a heap this big, but 26 seconds to stop threads sounds like a major 
issue, and I am not sure about what might be causing it.  My guess about 
the four month bug above is a shot in the dark that might be completely 
wrong.

Thanks,
Shawn

Re: Query takes a long time Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.
> An 80GB heap is ENORMOUS.  And you have two of those per server.  Do you
> *know* that you need a heap that large?  You only have 50 million
> documents total, two instances that each have 80GB seems completely
> unnecessary.  I would think that one instance with a much smaller heap
> would handle just about anything you could throw at 50 million documents.

> With 160GB taken by heaps, you're leaving less than 100GB of memory to
> cache over 700GB of index.  This is not going to work well, especially
> if your index doesn't have many fields that are stored.  It will cause a
> lot of disk I/O.

We have 27 collections and each collection has many schema fields and in live too many search and index create&update requests come and most of the searching requests are sorting, faceting, grouping, and long query.
So approx average 40GB heap are used so we gave 80GB memory.

> Unless you have changed the DirectoryFactory to something that's not
> default, your process listing does not reflect over 700GB of index data.
> If you have changed the DirectoryFactory, then I would strongly
> recommend removing that part of your config and letting Solr use its
> default.

our directory in solrconfig.xml
<directoryFactory name="DirectoryFactory"
                    class="${solr.directoryFactory:solr.MMapDirectoryFactory}">
</directoryFactory>

Here our schema file and solrconfig XML and GC log, please verify it. is it anything wrong or suggestions for improvement?
https://drive.google.com/drive/folders/1wV9bdQ5-pP4s4yc8jrYNz77YYVRmT7FG


GC log ::
2019-06-06T11:55:37.729+0100: 1053781.828: [GC (Allocation Failure) 1053781.828: [ParNew
Desired survivor size 3221205808 bytes, new threshold 8 (max 8)
- age   1:  268310312 bytes,  268310312 total
- age   2:  220271984 bytes,  488582296 total
- age   3:   75942632 bytes,  564524928 total
- age   4:   76397104 bytes,  640922032 total
- age   5:  126931768 bytes,  767853800 total
- age   6:   92672080 bytes,  860525880 total
- age   7:    2810048 bytes,  863335928 total
- age   8:   11755104 bytes,  875091032 total
: 15126407K->1103229K(17476288K), 15.7272287 secs] 45423308K->31414239K(80390848K), 15.7274518 secs] [Times: user=212.05 sys=16.08, real=15.73 secs]
Heap after GC invocations=68829 (full 187):
 par new generation   total 17476288K, used 1103229K [0x0000000080000000, 0x0000000580000000, 0x0000000580000000)
  eden space 13981056K,   0% used [0x0000000080000000, 0x0000000080000000, 0x00000003d5560000)
  from space 3495232K,  31% used [0x00000004aaab0000, 0x00000004ee00f508, 0x0000000580000000)
  to   space 3495232K,   0% used [0x00000003d5560000, 0x00000003d5560000, 0x00000004aaab0000)
 concurrent mark-sweep generation total 62914560K, used 30311010K [0x0000000580000000, 0x0000001480000000, 0x0000001480000000)
 Metaspace       used 50033K, capacity 50805K, committed 53700K, reserved 55296K
}
2019-06-06T11:55:53.456+0100: 1053797.556: Total time for which application threads were stopped: 42.4594545 seconds, Stopping threads took: 26.7301882 seconds

For which reason GC paused 42 seconds?

Heavy searching and indexing create & update in our Solr Cloud.
So, Should we divide a cloud between 27 collections? Should we add one more shard?

Sent from Outlook<http://aka.ms/weboutlook>
________________________________
From: Shawn Heisey <ap...@elyograg.org>
Sent: Friday, June 7, 2019 9:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/6/2019 5:45 AM, vishal patel wrote:
> One server(256GB RAM) has two below Solr instance and other application also
> 1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
> 2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)
>
> The second server(256GB RAM and 1 TB storage) has two below Solr instance and other application also
> 1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
> 2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

An 80GB heap is ENORMOUS.  And you have two of those per server.  Do you
*know* that you need a heap that large?  You only have 50 million
documents total, two instances that each have 80GB seems completely
unnecessary.  I would think that one instance with a much smaller heap
would handle just about anything you could throw at 50 million documents.

With 160GB taken by heaps, you're leaving less than 100GB of memory to
cache over 700GB of index.  This is not going to work well, especially
if your index doesn't have many fields that are stored.  It will cause a
lot of disk I/O.

> Both server memory and disk usage:
> https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Unless you have changed the DirectoryFactory to something that's not
default, your process listing does not reflect over 700GB of index data.
  If you have changed the DirectoryFactory, then I would strongly
recommend removing that part of your config and letting Solr use its
default.

> Note: Average 40GB heap used normally in each Solr instance. when replica gets down at that time disk IO are high and also GC pause time above 15 seconds. We can not identify the exact issue of replica recovery OR down from logs. due to the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to heavy indexing?

With an 80GB heap, I'm not really surprised you're seeing GC pauses
above 15 seconds.  I have seen pauses that long with a heap that's only 8GB.

GC pauses lasting that long will cause problems with SolrCloud.  Nodes
going into recovery is common.

Thanks,
Shawn

Re: Query takes a long time Solr 6.1.0

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/6/2019 5:45 AM, vishal patel wrote:
> One server(256GB RAM) has two below Solr instance and other application also
> 1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
> 2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)
> 
> The second server(256GB RAM and 1 TB storage) has two below Solr instance and other application also
> 1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
> 2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

An 80GB heap is ENORMOUS.  And you have two of those per server.  Do you 
*know* that you need a heap that large?  You only have 50 million 
documents total, two instances that each have 80GB seems completely 
unnecessary.  I would think that one instance with a much smaller heap 
would handle just about anything you could throw at 50 million documents.

With 160GB taken by heaps, you're leaving less than 100GB of memory to 
cache over 700GB of index.  This is not going to work well, especially 
if your index doesn't have many fields that are stored.  It will cause a 
lot of disk I/O.

> Both server memory and disk usage:
> https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Unless you have changed the DirectoryFactory to something that's not 
default, your process listing does not reflect over 700GB of index data. 
  If you have changed the DirectoryFactory, then I would strongly 
recommend removing that part of your config and letting Solr use its 
default.

> Note: Average 40GB heap used normally in each Solr instance. when replica gets down at that time disk IO are high and also GC pause time above 15 seconds. We can not identify the exact issue of replica recovery OR down from logs. due to the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to heavy indexing?

With an 80GB heap, I'm not really surprised you're seeing GC pauses 
above 15 seconds.  I have seen pauses that long with a heap that's only 8GB.

GC pauses lasting that long will cause problems with SolrCloud.  Nodes 
going into recovery is common.

Thanks,
Shawn

Re: Re: Query takes a long time Solr 6.1.0

Posted by David Hastings <ha...@gmail.com>.
There isnt anything wrong aside from your query is poorly thought out.

On Fri, Jun 7, 2019 at 11:04 AM vishal patel <vi...@outlook.com>
wrote:

> Any one is looking my issue??
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> ________________________________
> From: vishal patel
> Sent: Thursday, June 6, 2019 5:15:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query takes a long time Solr 6.1.0
>
> Thanks for your reply.
>
> > How much index data is on one server with 256GB of memory?  What is the
> > max heap size on the Solr instance?  Is there only one Solr instance?
>
> One server(256GB RAM) has two below Solr instance and other application
> also
> 1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
> 2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)
>
> The second server(256GB RAM and 1 TB storage) has two below Solr instance
> and other application also
> 1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
> 2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)
>
> Both server memory and disk usage:
> https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5
>
> Note: Average 40GB heap used normally in each Solr instance. when replica
> gets down at that time disk IO are high and also GC pause time above 15
> seconds. We can not identify the exact issue of replica recovery OR down
> from logs. due to the GC pause? OR due to disk IO high? OR due to
> time-consuming query? OR due to heavy indexing?
>
> Regards,
> Vishal
> ________________________________
> From: Shawn Heisey <ap...@elyograg.org>
> Sent: Wednesday, June 5, 2019 7:10 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query takes a long time Solr 6.1.0
>
> On 6/5/2019 7:08 AM, vishal patel wrote:
> > I have attached RAR file but not attached properly. Again attached txt
> file.
> >
> > For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
> > and 1 TB storage. One shard and another shard replica in one server.
>
> You got lucky.  Even text files usually don't make it to the list --
> yours did this time.  Use a file sharing website in the future.
>
> That is a massive query.  The primary reason that Lucene defaults to a
> maxBooleanClauses value of 1024, which you are definitely exceeding
> here, is that queries with that many clauses tend to be slow and consume
> massive levels of resources.  It might not be possible to improve the
> query speed very much here if you cannot reduce the size of the query.
>
> Your query doesn't look like it is simple enough to replace with the
> terms query parser, which has better performance than a boolean query
> with thousands of "OR" clauses.
>
> How much index data is on one server with 256GB of memory?  What is the
> max heap size on the Solr instance?  Is there only one Solr instance?
>
> The screenshot mentioned here will most likely relay all the info I am
> looking for.  Be sure the sort is correct:
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
>
> You will not be able to successfully attach the screenshot to a message.
>   That will require a file sharing website.
>
> Thanks,
> Shawn
>

Fwd: Re: Query takes a long time Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.
Any one is looking my issue??

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: vishal patel
Sent: Thursday, June 6, 2019 5:15:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

Thanks for your reply.

> How much index data is on one server with 256GB of memory?  What is the
> max heap size on the Solr instance?  Is there only one Solr instance?

One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance and other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Note: Average 40GB heap used normally in each Solr instance. when replica gets down at that time disk IO are high and also GC pause time above 15 seconds. We can not identify the exact issue of replica recovery OR down from logs. due to the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to heavy indexing?

Regards,
Vishal
________________________________
From: Shawn Heisey <ap...@elyograg.org>
Sent: Wednesday, June 5, 2019 7:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/5/2019 7:08 AM, vishal patel wrote:
> I have attached RAR file but not attached properly. Again attached txt file.
>
> For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
> and 1 TB storage. One shard and another shard replica in one server.

You got lucky.  Even text files usually don't make it to the list --
yours did this time.  Use a file sharing website in the future.

That is a massive query.  The primary reason that Lucene defaults to a
maxBooleanClauses value of 1024, which you are definitely exceeding
here, is that queries with that many clauses tend to be slow and consume
massive levels of resources.  It might not be possible to improve the
query speed very much here if you cannot reduce the size of the query.

Your query doesn't look like it is simple enough to replace with the
terms query parser, which has better performance than a boolean query
with thousands of "OR" clauses.

How much index data is on one server with 256GB of memory?  What is the
max heap size on the Solr instance?  Is there only one Solr instance?

The screenshot mentioned here will most likely relay all the info I am
looking for.  Be sure the sort is correct:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

You will not be able to successfully attach the screenshot to a message.
  That will require a file sharing website.

Thanks,
Shawn

Re: Query takes a long time Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.
Thanks for your reply.

> How much index data is on one server with 256GB of memory?  What is the
> max heap size on the Solr instance?  Is there only one Solr instance?

One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance and other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Note: Average 40GB heap used normally in each Solr instance. when replica gets down at that time disk IO are high and also GC pause time above 15 seconds. We can not identify the exact issue of replica recovery OR down from logs. due to the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to heavy indexing?

Regards,
Vishal
________________________________
From: Shawn Heisey <ap...@elyograg.org>
Sent: Wednesday, June 5, 2019 7:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/5/2019 7:08 AM, vishal patel wrote:
> I have attached RAR file but not attached properly. Again attached txt file.
>
> For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
> and 1 TB storage. One shard and another shard replica in one server.

You got lucky.  Even text files usually don't make it to the list --
yours did this time.  Use a file sharing website in the future.

That is a massive query.  The primary reason that Lucene defaults to a
maxBooleanClauses value of 1024, which you are definitely exceeding
here, is that queries with that many clauses tend to be slow and consume
massive levels of resources.  It might not be possible to improve the
query speed very much here if you cannot reduce the size of the query.

Your query doesn't look like it is simple enough to replace with the
terms query parser, which has better performance than a boolean query
with thousands of "OR" clauses.

How much index data is on one server with 256GB of memory?  What is the
max heap size on the Solr instance?  Is there only one Solr instance?

The screenshot mentioned here will most likely relay all the info I am
looking for.  Be sure the sort is correct:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

You will not be able to successfully attach the screenshot to a message.
  That will require a file sharing website.

Thanks,
Shawn

Re: Query takes a long time Solr 6.1.0

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/5/2019 7:08 AM, vishal patel wrote:
> I have attached RAR file but not attached properly. Again attached txt file.
> 
> For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram 
> and 1 TB storage. One shard and another shard replica in one server.

You got lucky.  Even text files usually don't make it to the list -- 
yours did this time.  Use a file sharing website in the future.

That is a massive query.  The primary reason that Lucene defaults to a 
maxBooleanClauses value of 1024, which you are definitely exceeding 
here, is that queries with that many clauses tend to be slow and consume 
massive levels of resources.  It might not be possible to improve the 
query speed very much here if you cannot reduce the size of the query.

Your query doesn't look like it is simple enough to replace with the 
terms query parser, which has better performance than a boolean query 
with thousands of "OR" clauses.

How much index data is on one server with 256GB of memory?  What is the 
max heap size on the Solr instance?  Is there only one Solr instance?

The screenshot mentioned here will most likely relay all the info I am 
looking for.  Be sure the sort is correct:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

You will not be able to successfully attach the screenshot to a message. 
  That will require a file sharing website.

Thanks,
Shawn

Re: Query takes a long time Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.
I have attached RAR file but not attached properly. Again attached txt file.

For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram and 1 TB storage. One shard and another shard replica in one server.

Sent from Outlook<http://aka.ms/weboutlook>
________________________________
From: Shawn Heisey <ap...@elyograg.org>
Sent: Wednesday, June 5, 2019 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/5/2019 5:35 AM, vishal patel wrote:
> We have 2 shards and 2 replicas in Live also have multiple collections.
> We are performing heavy search and update.

There is no information here about how many servers are serving those
four shard replicas.

> -> I have*attached*some query which takes time for executing. why does
> it take too much time? Due to the query length?

No attachments made it to the list.  Attachments rarely make it --
you'll need to find some other way to share content.

> -> Some times replica goes in recovery mode and from the log, we can not
> identify the issue but GC pause time 15 to 20 seconds. Ideally what
> should be GC pause time? GC pause time increase due to indexing or
> searching documents?

Individual GC pauses that are long are caused by having a large heap
that undergoes a full collection.  Long pauses from multiple collections
are typically caused by a heap that's too small.  When the heap is
properly sized and GC is tuned well, full collections will be very rare,
and the generation-specific collections will typically be very fast.

> My Solr live data :

This indicates that your total size for shard1 is almost 400 gigabytes,
and your total size for shard2 is almost 300 gigabytes.

If you have 400 or 700 GB of data on one server, then you will need a
SIGNIFICANT amount of memory in that server, with most of it NOT
allocated to the heap for Solr.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Thanks,
Shawn

Re: Query takes a long time Solr 6.1.0

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/5/2019 5:35 AM, vishal patel wrote:
> We have 2 shards and 2 replicas in Live also have multiple collections. 
> We are performing heavy search and update.

There is no information here about how many servers are serving those 
four shard replicas.

> -> I have*attached*some query which takes time for executing. why does 
> it take too much time? Due to the query length?

No attachments made it to the list.  Attachments rarely make it -- 
you'll need to find some other way to share content.

> -> Some times replica goes in recovery mode and from the log, we can not 
> identify the issue but GC pause time 15 to 20 seconds. Ideally what 
> should be GC pause time? GC pause time increase due to indexing or 
> searching documents?

Individual GC pauses that are long are caused by having a large heap 
that undergoes a full collection.  Long pauses from multiple collections 
are typically caused by a heap that's too small.  When the heap is 
properly sized and GC is tuned well, full collections will be very rare, 
and the generation-specific collections will typically be very fast.

> My Solr live data :

This indicates that your total size for shard1 is almost 400 gigabytes, 
and your total size for shard2 is almost 300 gigabytes.

If you have 400 or 700 GB of data on one server, then you will need a 
SIGNIFICANT amount of memory in that server, with most of it NOT 
allocated to the heap for Solr.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Thanks,
Shawn