You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Modassar Ather <mo...@gmail.com> on 2015/11/02 07:30:54 UTC

Very high memory and CPU utilization.

Hi,

I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb on
each shard. The Solr version is 5.2.1.
When I query "network se*", the memory utilization goes upto 24-26 gb and
the query takes around 3+ minutes to execute. Also the CPU utilization goes
upto 400% in few of the nodes.

Kindly note that use of wildcard in above query can not be restricted.

Please help me understand why so much of the memory utilization? Please
correct me if I am wrong that it is because of the term expansion of *se**.
Why the CPU utilization is so high and more than one core is used. As far
as I understand querying is single threaded.

Help me understand the behavior of query timeout. How the client is
notified about the query time out?
How can I disable replication(as it is implicitly enabled) permanently as
in our case we are not using it but can see warnings related to leader
election?

Thanks,
Modassar

Re: Very high memory and CPU utilization.

Posted by jim ferenczi <ji...@gmail.com>.

Well it seems that doing q="network se*" is working but not in the way you
expect. Doing this q="network se*" would not trigger a prefix query and the
"*" character would be treated as any character. I suspect that your query
is in fact "network se" (assuming you're using a StandardTokenizer) and
that the word "se" is very popular in your documents. That would explain
the slow response time. Bottom line is that doing "network se*" will not
trigger prefix query at all (I may be wrong but this is the expected
behaviour for Solr up to 4.3).

2015-11-02 13:47 GMT+01:00 Modassar Ather <mo...@gmail.com>:

> The problem is with the same query as phrase. q="network se*".
>
> The last . is fullstops for the sentence and the query is q=field:"network
> se*"
>
> Best,
> Modassar
>
> On Mon, Nov 2, 2015 at 6:10 PM, jim ferenczi <ji...@gmail.com>
> wrote:
>
> > Oups I did not read the thread carrefully.
> > *The problem is with the same query as phrase. q="network se*".*
> > I was not aware that you could do that with Solr ;). I would say this is
> > expected because in such case if the number of expansions for "se*" is
> big
> > then you would have to check the positions for a significant words. I
> don't
> > know if there is a limitation in the number of expansions for a prefix
> > query contained into a phrase query but I would look at this parameter
> > first (limit the number of expansion per prefix search, let's say the N
> > most significant words based on the frequency of the words for instance).
> >
> > 2015-11-02 13:36 GMT+01:00 jim ferenczi <ji...@gmail.com>:
> >
> > >
> > >
> > >
> > > *I am not able to get  the above point. So when I start Solr with 28g
> > RAM,
> > > for all the activities related to Solr it should not go beyond 28g. And
> > the
> > > remaining heap will be used for activities other than Solr. Please help
> > me
> > > understand.*
> > >
> > > Well those 28GB of heap are the memory "reserved" for your Solr
> > > application, though some parts of the index (not to say all) are
> > retrieved
> > > via MMap (if you use the default MMapDirectory) which do not use the
> heap
> > > at all. This is a very important part of Lucene/Solr, the heap should
> be
> > > sized in a way that let a significant amount of RAM available for the
> > > index. If not then you rely on the speed of your disk, if you have SSDs
> > > it's better but reads are still significantly slower with SSDs than
> with
> > > direct RAM access. Another thing to keep in mind is that mmap will
> always
> > > tries to put things in RAM, this is why I suspect that you swap
> activity
> > is
> > > killing your performance.
> > >
> > > 2015-11-02 11:55 GMT+01:00 Modassar Ather <mo...@gmail.com>:
> > >
> > >> Thanks Jim for your response.
> > >>
> > >> The remaining size after you removed the heap usage should be reserved
> > for
> > >> the index (not only the other system activities).
> > >> I am not able to get  the above point. So when I start Solr with 28g
> > RAM,
> > >> for all the activities related to Solr it should not go beyond 28g.
> And
> > >> the
> > >> remaining heap will be used for activities other than Solr. Please
> help
> > me
> > >> understand.
> > >>
> > >> *Also the CPU utilization goes upto 400% in few of the nodes:*
> > >> You said that only machine is used so I assumed that 400% cpu is for a
> > >> single process (one solr node), right ?
> > >> Yes you are right that 400% is for single process.
> > >> The disks are SSDs.
> > >>
> > >> Regards,
> > >> Modassar
> > >>
> > >> On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi <ji...@gmail.com>
> > >> wrote:
> > >>
> > >> > *if it correlates with the bad performance you're seeing. One
> > important
> > >> > thing to notice is that a significant part of your index needs to be
> > in
> > >> RAM
> > >> > (especially if you're using SSDs) in order to achieve good
> > performance.*
> > >> >
> > >> > Especially if you're not using SSDs, sorry ;)
> > >> >
> > >> > 2015-11-02 11:38 GMT+01:00 jim ferenczi <ji...@gmail.com>:
> > >> >
> > >> > > 12 shards with 28GB for the heap and 90GB for each index means
> that
> > >> you
> > >> > > need at least 336GB for the heap (assuming you're using all of it
> > >> which
> > >> > may
> > >> > > be easily the case considering the way the GC is handling memory)
> > and
> > >> ~=
> > >> > > 1TO for the index. Let's say that you don't need your entire index
> > in
> > >> > RAM,
> > >> > > the problem as I see it is that you don't have enough RAM for your
> > >> index
> > >> > +
> > >> > > heap. Assuming your machine has 370GB of RAM there are only 34GB
> > left
> > >> for
> > >> > > your index, 1TO/34GB means that you can only have 1/30 of your
> > entire
> > >> > index
> > >> > > in RAM. I would advise you to check the swap activity on the
> machine
> > >> and
> > >> > > see if it correlates with the bad performance you're seeing. One
> > >> > important
> > >> > > thing to notice is that a significant part of your index needs to
> be
> > >> in
> > >> > RAM
> > >> > > (especially if you're using SSDs) in order to achieve good
> > >> performance:
> > >> > >
> > >> > >
> > >> > >
> > >> > > *As mentioned above this is a big machine with 370+ gb of RAM and
> > Solr
> > >> > (12
> > >> > > nodes total) is assigned 336 GB. The rest is still a good for
> other
> > >> > system
> > >> > > activities.*
> > >> > > The remaining size after you removed the heap usage should be
> > reserved
> > >> > for
> > >> > > the index (not only the other system activities).
> > >> > >
> > >> > >
> > >> > > *Also the CPU utilization goes upto 400% in few of the nodes:*
> > >> > > You said that only machine is used so I assumed that 400% cpu is
> > for a
> > >> > > single process (one solr node), right ?
> > >> > > This seems impossible if you are sure that only one query is
> played
> > >> at a
> > >> > > time and no indexing is performed. Best thing to do is to dump
> stack
> > >> > trace
> > >> > > of the solr nodes during the query and to check what the threads
> are
> > >> > doing.
> > >> > >
> > >> > > Jim
> > >> > >
> > >> > >
> > >> > >
> > >> > > 2015-11-02 10:38 GMT+01:00 Modassar Ather <modather1981@gmail.com
> >:
> > >> > >
> > >> > >> Just to add one more point that one external Zookeeper instance
> is
> > >> also
> > >> > >> running on this particular machine.
> > >> > >>
> > >> > >> Regards,
> > >> > >> Modassar
> > >> > >>
> > >> > >> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <
> > >> modather1981@gmail.com>
> > >> > >> wrote:
> > >> > >>
> > >> > >> > Hi Toke,
> > >> > >> > Thanks for your response. My comments in-line.
> > >> > >> >
> > >> > >> > That is 12 machines, running a shard each?
> > >> > >> > No! This is a single big machine with 12 shards on it.
> > >> > >> >
> > >> > >> > What is the total amount of physical memory on each machine?
> > >> > >> > Around 370 gb on the single machine.
> > >> > >> >
> > >> > >> > Well, se* probably expands to a great deal of documents, but a
> > huge
> > >> > bump
> > >> > >> > in memory utilization and 3 minutes+ sounds strange.
> > >> > >> >
> > >> > >> > - What are your normal query times?
> > >> > >> > Few simple queries are returned with in a couple of seconds.
> But
> > >> the
> > >> > >> more
> > >> > >> > complex queries with proximity and wild cards have taken more
> > than
> > >> 3-4
> > >> > >> > minutes and some times some queries have timed out too where
> time
> > >> out
> > >> > is
> > >> > >> > set to 5 minutes.
> > >> > >> > - How many hits do you get from 'network se*'?
> > >> > >> > More than a million records.
> > >> > >> > - How many results do you return (the rows-parameter)?
> > >> > >> > It is the default one 10. Grouping is enabled on a field.
> > >> > >> > - If you issue a query without wildcards, but with
> approximately
> > >> the
> > >> > >> > same amount of hits as 'network se*', how long does it take?
> > >> > >> > A query resulting in around half a million record return
> within a
> > >> > couple
> > >> > >> > of seconds.
> > >> > >> >
> > >> > >> > That is strange, yes. Have you checked the logs to see if
> > something
> > >> > >> > unexpected is going on while you test?
> > >> > >> > Have not seen anything particularly. Will try to check again.
> > >> > >> >
> > >> > >> > If you are using spinning drives and only have 32GB of RAM in
> > >> total in
> > >> > >> > each machine, you are probably struggling just to keep things
> > >> running.
> > >> > >> > As mentioned above this is a big machine with 370+ gb of RAM
> and
> > >> Solr
> > >> > >> (12
> > >> > >> > nodes total) is assigned 336 GB. The rest is still a good for
> > other
> > >> > >> system
> > >> > >> > activities.
> > >> > >> >
> > >> > >> > Thanks,
> > >> > >> > Modassar
> > >> > >> >
> > >> > >> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
> > >> > te@statsbiblioteket.dk>
> > >> > >> > wrote:
> > >> > >> >
> > >> > >> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> > >> > >> >> > I have a setup of 12 shard cluster started with 28gb memory
> > each
> > >> > on a
> > >> > >> >> > single server. There are no replica. The size of index is
> > around
> > >> > >> 90gb on
> > >> > >> >> > each shard. The Solr version is 5.2.1.
> > >> > >> >>
> > >> > >> >> That is 12 machines, running a shard each?
> > >> > >> >>
> > >> > >> >> What is the total amount of physical memory on each machine?
> > >> > >> >>
> > >> > >> >> > When I query "network se*", the memory utilization goes upto
> > >> 24-26
> > >> > gb
> > >> > >> >> and
> > >> > >> >> > the query takes around 3+ minutes to execute. Also the CPU
> > >> > >> utilization
> > >> > >> >> goes
> > >> > >> >> > upto 400% in few of the nodes.
> > >> > >> >>
> > >> > >> >> Well, se* probably expands to a great deal of documents, but a
> > >> huge
> > >> > >> bump
> > >> > >> >> in memory utilization and 3 minutes+ sounds strange.
> > >> > >> >>
> > >> > >> >> - What are your normal query times?
> > >> > >> >> - How many hits do you get from 'network se*'?
> > >> > >> >> - How many results do you return (the rows-parameter)?
> > >> > >> >> - If you issue a query without wildcards, but with
> approximately
> > >> the
> > >> > >> >> same amount of hits as 'network se*', how long does it take?
> > >> > >> >>
> > >> > >> >> > Why the CPU utilization is so high and more than one core is
> > >> used.
> > >> > >> >> > As far as I understand querying is single threaded.
> > >> > >> >>
> > >> > >> >> That is strange, yes. Have you checked the logs to see if
> > >> something
> > >> > >> >> unexpected is going on while you test?
> > >> > >> >>
> > >> > >> >> > How can I disable replication(as it is implicitly enabled)
> > >> > >> permanently
> > >> > >> >> as
> > >> > >> >> > in our case we are not using it but can see warnings related
> > to
> > >> > >> leader
> > >> > >> >> > election?
> > >> > >> >>
> > >> > >> >> If you are using spinning drives and only have 32GB of RAM in
> > >> total
> > >> > in
> > >> > >> >> each machine, you are probably struggling just to keep things
> > >> > running.
> > >> > >> >>
> > >> > >> >>
> > >> > >> >> - Toke Eskildsen, State and University Library, Denmark
> > >> > >> >>
> > >> > >> >>
> > >> > >> >>
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Very high memory and CPU utilization.

Posted by Modassar Ather <mo...@gmail.com>.

The problem is with the same query as phrase. q="network se*".

The last . is fullstops for the sentence and the query is q=field:"network
se*"

Best,
Modassar

On Mon, Nov 2, 2015 at 6:10 PM, jim ferenczi <ji...@gmail.com> wrote:

> Oups I did not read the thread carrefully.
> *The problem is with the same query as phrase. q="network se*".*
> I was not aware that you could do that with Solr ;). I would say this is
> expected because in such case if the number of expansions for "se*" is big
> then you would have to check the positions for a significant words. I don't
> know if there is a limitation in the number of expansions for a prefix
> query contained into a phrase query but I would look at this parameter
> first (limit the number of expansion per prefix search, let's say the N
> most significant words based on the frequency of the words for instance).
>
> 2015-11-02 13:36 GMT+01:00 jim ferenczi <ji...@gmail.com>:
>
> >
> >
> >
> > *I am not able to get  the above point. So when I start Solr with 28g
> RAM,
> > for all the activities related to Solr it should not go beyond 28g. And
> the
> > remaining heap will be used for activities other than Solr. Please help
> me
> > understand.*
> >
> > Well those 28GB of heap are the memory "reserved" for your Solr
> > application, though some parts of the index (not to say all) are
> retrieved
> > via MMap (if you use the default MMapDirectory) which do not use the heap
> > at all. This is a very important part of Lucene/Solr, the heap should be
> > sized in a way that let a significant amount of RAM available for the
> > index. If not then you rely on the speed of your disk, if you have SSDs
> > it's better but reads are still significantly slower with SSDs than with
> > direct RAM access. Another thing to keep in mind is that mmap will always
> > tries to put things in RAM, this is why I suspect that you swap activity
> is
> > killing your performance.
> >
> > 2015-11-02 11:55 GMT+01:00 Modassar Ather <mo...@gmail.com>:
> >
> >> Thanks Jim for your response.
> >>
> >> The remaining size after you removed the heap usage should be reserved
> for
> >> the index (not only the other system activities).
> >> I am not able to get  the above point. So when I start Solr with 28g
> RAM,
> >> for all the activities related to Solr it should not go beyond 28g. And
> >> the
> >> remaining heap will be used for activities other than Solr. Please help
> me
> >> understand.
> >>
> >> *Also the CPU utilization goes upto 400% in few of the nodes:*
> >> You said that only machine is used so I assumed that 400% cpu is for a
> >> single process (one solr node), right ?
> >> Yes you are right that 400% is for single process.
> >> The disks are SSDs.
> >>
> >> Regards,
> >> Modassar
> >>
> >> On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi <ji...@gmail.com>
> >> wrote:
> >>
> >> > *if it correlates with the bad performance you're seeing. One
> important
> >> > thing to notice is that a significant part of your index needs to be
> in
> >> RAM
> >> > (especially if you're using SSDs) in order to achieve good
> performance.*
> >> >
> >> > Especially if you're not using SSDs, sorry ;)
> >> >
> >> > 2015-11-02 11:38 GMT+01:00 jim ferenczi <ji...@gmail.com>:
> >> >
> >> > > 12 shards with 28GB for the heap and 90GB for each index means that
> >> you
> >> > > need at least 336GB for the heap (assuming you're using all of it
> >> which
> >> > may
> >> > > be easily the case considering the way the GC is handling memory)
> and
> >> ~=
> >> > > 1TO for the index. Let's say that you don't need your entire index
> in
> >> > RAM,
> >> > > the problem as I see it is that you don't have enough RAM for your
> >> index
> >> > +
> >> > > heap. Assuming your machine has 370GB of RAM there are only 34GB
> left
> >> for
> >> > > your index, 1TO/34GB means that you can only have 1/30 of your
> entire
> >> > index
> >> > > in RAM. I would advise you to check the swap activity on the machine
> >> and
> >> > > see if it correlates with the bad performance you're seeing. One
> >> > important
> >> > > thing to notice is that a significant part of your index needs to be
> >> in
> >> > RAM
> >> > > (especially if you're using SSDs) in order to achieve good
> >> performance:
> >> > >
> >> > >
> >> > >
> >> > > *As mentioned above this is a big machine with 370+ gb of RAM and
> Solr
> >> > (12
> >> > > nodes total) is assigned 336 GB. The rest is still a good for other
> >> > system
> >> > > activities.*
> >> > > The remaining size after you removed the heap usage should be
> reserved
> >> > for
> >> > > the index (not only the other system activities).
> >> > >
> >> > >
> >> > > *Also the CPU utilization goes upto 400% in few of the nodes:*
> >> > > You said that only machine is used so I assumed that 400% cpu is
> for a
> >> > > single process (one solr node), right ?
> >> > > This seems impossible if you are sure that only one query is played
> >> at a
> >> > > time and no indexing is performed. Best thing to do is to dump stack
> >> > trace
> >> > > of the solr nodes during the query and to check what the threads are
> >> > doing.
> >> > >
> >> > > Jim
> >> > >
> >> > >
> >> > >
> >> > > 2015-11-02 10:38 GMT+01:00 Modassar Ather <mo...@gmail.com>:
> >> > >
> >> > >> Just to add one more point that one external Zookeeper instance is
> >> also
> >> > >> running on this particular machine.
> >> > >>
> >> > >> Regards,
> >> > >> Modassar
> >> > >>
> >> > >> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <
> >> modather1981@gmail.com>
> >> > >> wrote:
> >> > >>
> >> > >> > Hi Toke,
> >> > >> > Thanks for your response. My comments in-line.
> >> > >> >
> >> > >> > That is 12 machines, running a shard each?
> >> > >> > No! This is a single big machine with 12 shards on it.
> >> > >> >
> >> > >> > What is the total amount of physical memory on each machine?
> >> > >> > Around 370 gb on the single machine.
> >> > >> >
> >> > >> > Well, se* probably expands to a great deal of documents, but a
> huge
> >> > bump
> >> > >> > in memory utilization and 3 minutes+ sounds strange.
> >> > >> >
> >> > >> > - What are your normal query times?
> >> > >> > Few simple queries are returned with in a couple of seconds. But
> >> the
> >> > >> more
> >> > >> > complex queries with proximity and wild cards have taken more
> than
> >> 3-4
> >> > >> > minutes and some times some queries have timed out too where time
> >> out
> >> > is
> >> > >> > set to 5 minutes.
> >> > >> > - How many hits do you get from 'network se*'?
> >> > >> > More than a million records.
> >> > >> > - How many results do you return (the rows-parameter)?
> >> > >> > It is the default one 10. Grouping is enabled on a field.
> >> > >> > - If you issue a query without wildcards, but with approximately
> >> the
> >> > >> > same amount of hits as 'network se*', how long does it take?
> >> > >> > A query resulting in around half a million record return within a
> >> > couple
> >> > >> > of seconds.
> >> > >> >
> >> > >> > That is strange, yes. Have you checked the logs to see if
> something
> >> > >> > unexpected is going on while you test?
> >> > >> > Have not seen anything particularly. Will try to check again.
> >> > >> >
> >> > >> > If you are using spinning drives and only have 32GB of RAM in
> >> total in
> >> > >> > each machine, you are probably struggling just to keep things
> >> running.
> >> > >> > As mentioned above this is a big machine with 370+ gb of RAM and
> >> Solr
> >> > >> (12
> >> > >> > nodes total) is assigned 336 GB. The rest is still a good for
> other
> >> > >> system
> >> > >> > activities.
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Modassar
> >> > >> >
> >> > >> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
> >> > te@statsbiblioteket.dk>
> >> > >> > wrote:
> >> > >> >
> >> > >> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> >> > >> >> > I have a setup of 12 shard cluster started with 28gb memory
> each
> >> > on a
> >> > >> >> > single server. There are no replica. The size of index is
> around
> >> > >> 90gb on
> >> > >> >> > each shard. The Solr version is 5.2.1.
> >> > >> >>
> >> > >> >> That is 12 machines, running a shard each?
> >> > >> >>
> >> > >> >> What is the total amount of physical memory on each machine?
> >> > >> >>
> >> > >> >> > When I query "network se*", the memory utilization goes upto
> >> 24-26
> >> > gb
> >> > >> >> and
> >> > >> >> > the query takes around 3+ minutes to execute. Also the CPU
> >> > >> utilization
> >> > >> >> goes
> >> > >> >> > upto 400% in few of the nodes.
> >> > >> >>
> >> > >> >> Well, se* probably expands to a great deal of documents, but a
> >> huge
> >> > >> bump
> >> > >> >> in memory utilization and 3 minutes+ sounds strange.
> >> > >> >>
> >> > >> >> - What are your normal query times?
> >> > >> >> - How many hits do you get from 'network se*'?
> >> > >> >> - How many results do you return (the rows-parameter)?
> >> > >> >> - If you issue a query without wildcards, but with approximately
> >> the
> >> > >> >> same amount of hits as 'network se*', how long does it take?
> >> > >> >>
> >> > >> >> > Why the CPU utilization is so high and more than one core is
> >> used.
> >> > >> >> > As far as I understand querying is single threaded.
> >> > >> >>
> >> > >> >> That is strange, yes. Have you checked the logs to see if
> >> something
> >> > >> >> unexpected is going on while you test?
> >> > >> >>
> >> > >> >> > How can I disable replication(as it is implicitly enabled)
> >> > >> permanently
> >> > >> >> as
> >> > >> >> > in our case we are not using it but can see warnings related
> to
> >> > >> leader
> >> > >> >> > election?
> >> > >> >>
> >> > >> >> If you are using spinning drives and only have 32GB of RAM in
> >> total
> >> > in
> >> > >> >> each machine, you are probably struggling just to keep things
> >> > running.
> >> > >> >>
> >> > >> >>
> >> > >> >> - Toke Eskildsen, State and University Library, Denmark
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Very high memory and CPU utilization.

Posted by jim ferenczi <ji...@gmail.com>.

Oups I did not read the thread carrefully.
*The problem is with the same query as phrase. q="network se*".*
I was not aware that you could do that with Solr ;). I would say this is
expected because in such case if the number of expansions for "se*" is big
then you would have to check the positions for a significant words. I don't
know if there is a limitation in the number of expansions for a prefix
query contained into a phrase query but I would look at this parameter
first (limit the number of expansion per prefix search, let's say the N
most significant words based on the frequency of the words for instance).

2015-11-02 13:36 GMT+01:00 jim ferenczi <ji...@gmail.com>:

>
>
>
> *I am not able to get  the above point. So when I start Solr with 28g RAM,
> for all the activities related to Solr it should not go beyond 28g. And the
> remaining heap will be used for activities other than Solr. Please help me
> understand.*
>
> Well those 28GB of heap are the memory "reserved" for your Solr
> application, though some parts of the index (not to say all) are retrieved
> via MMap (if you use the default MMapDirectory) which do not use the heap
> at all. This is a very important part of Lucene/Solr, the heap should be
> sized in a way that let a significant amount of RAM available for the
> index. If not then you rely on the speed of your disk, if you have SSDs
> it's better but reads are still significantly slower with SSDs than with
> direct RAM access. Another thing to keep in mind is that mmap will always
> tries to put things in RAM, this is why I suspect that you swap activity is
> killing your performance.
>
> 2015-11-02 11:55 GMT+01:00 Modassar Ather <mo...@gmail.com>:
>
>> Thanks Jim for your response.
>>
>> The remaining size after you removed the heap usage should be reserved for
>> the index (not only the other system activities).
>> I am not able to get  the above point. So when I start Solr with 28g RAM,
>> for all the activities related to Solr it should not go beyond 28g. And
>> the
>> remaining heap will be used for activities other than Solr. Please help me
>> understand.
>>
>> *Also the CPU utilization goes upto 400% in few of the nodes:*
>> You said that only machine is used so I assumed that 400% cpu is for a
>> single process (one solr node), right ?
>> Yes you are right that 400% is for single process.
>> The disks are SSDs.
>>
>> Regards,
>> Modassar
>>
>> On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi <ji...@gmail.com>
>> wrote:
>>
>> > *if it correlates with the bad performance you're seeing. One important
>> > thing to notice is that a significant part of your index needs to be in
>> RAM
>> > (especially if you're using SSDs) in order to achieve good performance.*
>> >
>> > Especially if you're not using SSDs, sorry ;)
>> >
>> > 2015-11-02 11:38 GMT+01:00 jim ferenczi <ji...@gmail.com>:
>> >
>> > > 12 shards with 28GB for the heap and 90GB for each index means that
>> you
>> > > need at least 336GB for the heap (assuming you're using all of it
>> which
>> > may
>> > > be easily the case considering the way the GC is handling memory) and
>> ~=
>> > > 1TO for the index. Let's say that you don't need your entire index in
>> > RAM,
>> > > the problem as I see it is that you don't have enough RAM for your
>> index
>> > +
>> > > heap. Assuming your machine has 370GB of RAM there are only 34GB left
>> for
>> > > your index, 1TO/34GB means that you can only have 1/30 of your entire
>> > index
>> > > in RAM. I would advise you to check the swap activity on the machine
>> and
>> > > see if it correlates with the bad performance you're seeing. One
>> > important
>> > > thing to notice is that a significant part of your index needs to be
>> in
>> > RAM
>> > > (especially if you're using SSDs) in order to achieve good
>> performance:
>> > >
>> > >
>> > >
>> > > *As mentioned above this is a big machine with 370+ gb of RAM and Solr
>> > (12
>> > > nodes total) is assigned 336 GB. The rest is still a good for other
>> > system
>> > > activities.*
>> > > The remaining size after you removed the heap usage should be reserved
>> > for
>> > > the index (not only the other system activities).
>> > >
>> > >
>> > > *Also the CPU utilization goes upto 400% in few of the nodes:*
>> > > You said that only machine is used so I assumed that 400% cpu is for a
>> > > single process (one solr node), right ?
>> > > This seems impossible if you are sure that only one query is played
>> at a
>> > > time and no indexing is performed. Best thing to do is to dump stack
>> > trace
>> > > of the solr nodes during the query and to check what the threads are
>> > doing.
>> > >
>> > > Jim
>> > >
>> > >
>> > >
>> > > 2015-11-02 10:38 GMT+01:00 Modassar Ather <mo...@gmail.com>:
>> > >
>> > >> Just to add one more point that one external Zookeeper instance is
>> also
>> > >> running on this particular machine.
>> > >>
>> > >> Regards,
>> > >> Modassar
>> > >>
>> > >> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <
>> modather1981@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Hi Toke,
>> > >> > Thanks for your response. My comments in-line.
>> > >> >
>> > >> > That is 12 machines, running a shard each?
>> > >> > No! This is a single big machine with 12 shards on it.
>> > >> >
>> > >> > What is the total amount of physical memory on each machine?
>> > >> > Around 370 gb on the single machine.
>> > >> >
>> > >> > Well, se* probably expands to a great deal of documents, but a huge
>> > bump
>> > >> > in memory utilization and 3 minutes+ sounds strange.
>> > >> >
>> > >> > - What are your normal query times?
>> > >> > Few simple queries are returned with in a couple of seconds. But
>> the
>> > >> more
>> > >> > complex queries with proximity and wild cards have taken more than
>> 3-4
>> > >> > minutes and some times some queries have timed out too where time
>> out
>> > is
>> > >> > set to 5 minutes.
>> > >> > - How many hits do you get from 'network se*'?
>> > >> > More than a million records.
>> > >> > - How many results do you return (the rows-parameter)?
>> > >> > It is the default one 10. Grouping is enabled on a field.
>> > >> > - If you issue a query without wildcards, but with approximately
>> the
>> > >> > same amount of hits as 'network se*', how long does it take?
>> > >> > A query resulting in around half a million record return within a
>> > couple
>> > >> > of seconds.
>> > >> >
>> > >> > That is strange, yes. Have you checked the logs to see if something
>> > >> > unexpected is going on while you test?
>> > >> > Have not seen anything particularly. Will try to check again.
>> > >> >
>> > >> > If you are using spinning drives and only have 32GB of RAM in
>> total in
>> > >> > each machine, you are probably struggling just to keep things
>> running.
>> > >> > As mentioned above this is a big machine with 370+ gb of RAM and
>> Solr
>> > >> (12
>> > >> > nodes total) is assigned 336 GB. The rest is still a good for other
>> > >> system
>> > >> > activities.
>> > >> >
>> > >> > Thanks,
>> > >> > Modassar
>> > >> >
>> > >> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
>> > te@statsbiblioteket.dk>
>> > >> > wrote:
>> > >> >
>> > >> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
>> > >> >> > I have a setup of 12 shard cluster started with 28gb memory each
>> > on a
>> > >> >> > single server. There are no replica. The size of index is around
>> > >> 90gb on
>> > >> >> > each shard. The Solr version is 5.2.1.
>> > >> >>
>> > >> >> That is 12 machines, running a shard each?
>> > >> >>
>> > >> >> What is the total amount of physical memory on each machine?
>> > >> >>
>> > >> >> > When I query "network se*", the memory utilization goes upto
>> 24-26
>> > gb
>> > >> >> and
>> > >> >> > the query takes around 3+ minutes to execute. Also the CPU
>> > >> utilization
>> > >> >> goes
>> > >> >> > upto 400% in few of the nodes.
>> > >> >>
>> > >> >> Well, se* probably expands to a great deal of documents, but a
>> huge
>> > >> bump
>> > >> >> in memory utilization and 3 minutes+ sounds strange.
>> > >> >>
>> > >> >> - What are your normal query times?
>> > >> >> - How many hits do you get from 'network se*'?
>> > >> >> - How many results do you return (the rows-parameter)?
>> > >> >> - If you issue a query without wildcards, but with approximately
>> the
>> > >> >> same amount of hits as 'network se*', how long does it take?
>> > >> >>
>> > >> >> > Why the CPU utilization is so high and more than one core is
>> used.
>> > >> >> > As far as I understand querying is single threaded.
>> > >> >>
>> > >> >> That is strange, yes. Have you checked the logs to see if
>> something
>> > >> >> unexpected is going on while you test?
>> > >> >>
>> > >> >> > How can I disable replication(as it is implicitly enabled)
>> > >> permanently
>> > >> >> as
>> > >> >> > in our case we are not using it but can see warnings related to
>> > >> leader
>> > >> >> > election?
>> > >> >>
>> > >> >> If you are using spinning drives and only have 32GB of RAM in
>> total
>> > in
>> > >> >> each machine, you are probably struggling just to keep things
>> > running.
>> > >> >>
>> > >> >>
>> > >> >> - Toke Eskildsen, State and University Library, Denmark
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Very high memory and CPU utilization.

Posted by jim ferenczi <ji...@gmail.com>.

*I am not able to get  the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.*

Well those 28GB of heap are the memory "reserved" for your Solr
application, though some parts of the index (not to say all) are retrieved
via MMap (if you use the default MMapDirectory) which do not use the heap
at all. This is a very important part of Lucene/Solr, the heap should be
sized in a way that let a significant amount of RAM available for the
index. If not then you rely on the speed of your disk, if you have SSDs
it's better but reads are still significantly slower with SSDs than with
direct RAM access. Another thing to keep in mind is that mmap will always
tries to put things in RAM, this is why I suspect that you swap activity is
killing your performance.

2015-11-02 11:55 GMT+01:00 Modassar Ather <mo...@gmail.com>:

> Thanks Jim for your response.
>
> The remaining size after you removed the heap usage should be reserved for
> the index (not only the other system activities).
> I am not able to get  the above point. So when I start Solr with 28g RAM,
> for all the activities related to Solr it should not go beyond 28g. And the
> remaining heap will be used for activities other than Solr. Please help me
> understand.
>
> *Also the CPU utilization goes upto 400% in few of the nodes:*
> You said that only machine is used so I assumed that 400% cpu is for a
> single process (one solr node), right ?
> Yes you are right that 400% is for single process.
> The disks are SSDs.
>
> Regards,
> Modassar
>
> On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi <ji...@gmail.com>
> wrote:
>
> > *if it correlates with the bad performance you're seeing. One important
> > thing to notice is that a significant part of your index needs to be in
> RAM
> > (especially if you're using SSDs) in order to achieve good performance.*
> >
> > Especially if you're not using SSDs, sorry ;)
> >
> > 2015-11-02 11:38 GMT+01:00 jim ferenczi <ji...@gmail.com>:
> >
> > > 12 shards with 28GB for the heap and 90GB for each index means that you
> > > need at least 336GB for the heap (assuming you're using all of it which
> > may
> > > be easily the case considering the way the GC is handling memory) and
> ~=
> > > 1TO for the index. Let's say that you don't need your entire index in
> > RAM,
> > > the problem as I see it is that you don't have enough RAM for your
> index
> > +
> > > heap. Assuming your machine has 370GB of RAM there are only 34GB left
> for
> > > your index, 1TO/34GB means that you can only have 1/30 of your entire
> > index
> > > in RAM. I would advise you to check the swap activity on the machine
> and
> > > see if it correlates with the bad performance you're seeing. One
> > important
> > > thing to notice is that a significant part of your index needs to be in
> > RAM
> > > (especially if you're using SSDs) in order to achieve good performance:
> > >
> > >
> > >
> > > *As mentioned above this is a big machine with 370+ gb of RAM and Solr
> > (12
> > > nodes total) is assigned 336 GB. The rest is still a good for other
> > system
> > > activities.*
> > > The remaining size after you removed the heap usage should be reserved
> > for
> > > the index (not only the other system activities).
> > >
> > >
> > > *Also the CPU utilization goes upto 400% in few of the nodes:*
> > > You said that only machine is used so I assumed that 400% cpu is for a
> > > single process (one solr node), right ?
> > > This seems impossible if you are sure that only one query is played at
> a
> > > time and no indexing is performed. Best thing to do is to dump stack
> > trace
> > > of the solr nodes during the query and to check what the threads are
> > doing.
> > >
> > > Jim
> > >
> > >
> > >
> > > 2015-11-02 10:38 GMT+01:00 Modassar Ather <mo...@gmail.com>:
> > >
> > >> Just to add one more point that one external Zookeeper instance is
> also
> > >> running on this particular machine.
> > >>
> > >> Regards,
> > >> Modassar
> > >>
> > >> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <
> modather1981@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Toke,
> > >> > Thanks for your response. My comments in-line.
> > >> >
> > >> > That is 12 machines, running a shard each?
> > >> > No! This is a single big machine with 12 shards on it.
> > >> >
> > >> > What is the total amount of physical memory on each machine?
> > >> > Around 370 gb on the single machine.
> > >> >
> > >> > Well, se* probably expands to a great deal of documents, but a huge
> > bump
> > >> > in memory utilization and 3 minutes+ sounds strange.
> > >> >
> > >> > - What are your normal query times?
> > >> > Few simple queries are returned with in a couple of seconds. But the
> > >> more
> > >> > complex queries with proximity and wild cards have taken more than
> 3-4
> > >> > minutes and some times some queries have timed out too where time
> out
> > is
> > >> > set to 5 minutes.
> > >> > - How many hits do you get from 'network se*'?
> > >> > More than a million records.
> > >> > - How many results do you return (the rows-parameter)?
> > >> > It is the default one 10. Grouping is enabled on a field.
> > >> > - If you issue a query without wildcards, but with approximately the
> > >> > same amount of hits as 'network se*', how long does it take?
> > >> > A query resulting in around half a million record return within a
> > couple
> > >> > of seconds.
> > >> >
> > >> > That is strange, yes. Have you checked the logs to see if something
> > >> > unexpected is going on while you test?
> > >> > Have not seen anything particularly. Will try to check again.
> > >> >
> > >> > If you are using spinning drives and only have 32GB of RAM in total
> in
> > >> > each machine, you are probably struggling just to keep things
> running.
> > >> > As mentioned above this is a big machine with 370+ gb of RAM and
> Solr
> > >> (12
> > >> > nodes total) is assigned 336 GB. The rest is still a good for other
> > >> system
> > >> > activities.
> > >> >
> > >> > Thanks,
> > >> > Modassar
> > >> >
> > >> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
> > te@statsbiblioteket.dk>
> > >> > wrote:
> > >> >
> > >> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> > >> >> > I have a setup of 12 shard cluster started with 28gb memory each
> > on a
> > >> >> > single server. There are no replica. The size of index is around
> > >> 90gb on
> > >> >> > each shard. The Solr version is 5.2.1.
> > >> >>
> > >> >> That is 12 machines, running a shard each?
> > >> >>
> > >> >> What is the total amount of physical memory on each machine?
> > >> >>
> > >> >> > When I query "network se*", the memory utilization goes upto
> 24-26
> > gb
> > >> >> and
> > >> >> > the query takes around 3+ minutes to execute. Also the CPU
> > >> utilization
> > >> >> goes
> > >> >> > upto 400% in few of the nodes.
> > >> >>
> > >> >> Well, se* probably expands to a great deal of documents, but a huge
> > >> bump
> > >> >> in memory utilization and 3 minutes+ sounds strange.
> > >> >>
> > >> >> - What are your normal query times?
> > >> >> - How many hits do you get from 'network se*'?
> > >> >> - How many results do you return (the rows-parameter)?
> > >> >> - If you issue a query without wildcards, but with approximately
> the
> > >> >> same amount of hits as 'network se*', how long does it take?
> > >> >>
> > >> >> > Why the CPU utilization is so high and more than one core is
> used.
> > >> >> > As far as I understand querying is single threaded.
> > >> >>
> > >> >> That is strange, yes. Have you checked the logs to see if something
> > >> >> unexpected is going on while you test?
> > >> >>
> > >> >> > How can I disable replication(as it is implicitly enabled)
> > >> permanently
> > >> >> as
> > >> >> > in our case we are not using it but can see warnings related to
> > >> leader
> > >> >> > election?
> > >> >>
> > >> >> If you are using spinning drives and only have 32GB of RAM in total
> > in
> > >> >> each machine, you are probably struggling just to keep things
> > running.
> > >> >>
> > >> >>
> > >> >> - Toke Eskildsen, State and University Library, Denmark
> > >> >>
> > >> >>
> > >> >>
> > >> >
> > >>
> > >
> > >
> >
>

Re: Very high memory and CPU utilization.

Posted by Modassar Ather <mo...@gmail.com>.

I monitored swap activities for the query using vmstat. The *so* and *si*
shows 0 till the completion of query. Also the top showed 0 against swap.
This means there was no scarcity of physical memory. Swap activity seems
not to be a bottleneck.
Kindly note that this I ran on 8 node cluster with 30 gb RAM and 140 gb of
index on each node.

Regards,
Modassar

On Mon, Nov 2, 2015 at 5:27 PM, Modassar Ather <mo...@gmail.com>
wrote:

> Okay. I guess your observation of 400% for a single core is with top and
> looking at that core's entry? If so, the 400% can be explained by
> excessive garbage collection. You could turn GC-logging on to check
> that. With a bit of luck GC would be the cause of the slow down.
>
> Yes it is with top command. I will check GC activities and try to relate
> with CPU usage.
>
> The query q=network se* is quick enough in our system too. It takes around
> 3-4 seconds for around 8 million records.
> The problem is with the same query as phrase. q="network se*".
> Can you please share your experience with such query where the wild card
> expansion is huge like in the query above?
>
> I changed my SolrCloud setup from 12 shard to 8 shard and given each shard
> 30 GB of RAM on the same machine with same index size (re-indexed) but
> could not see the significant improvement for the query given.
>
> I will check the swap activity.
>
> Also can you please share your experiences with respect to RAM, GC, solr
> cache setup etc as it seems by your comment that the SolrCloud environment
> you have is kind of similar to the one I work on?
>
> Regards,
> Modassar
>
> On Mon, Nov 2, 2015 at 5:20 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
>
>> On Mon, 2015-11-02 at 16:25 +0530, Modassar Ather wrote:
>> > The remaining size after you removed the heap usage should be reserved
>> for
>> > the index (not only the other system activities).
>> > I am not able to get  the above point. So when I start Solr with 28g
>> RAM,
>> > for all the activities related to Solr it should not go beyond 28g. And
>> the
>> > remaining heap will be used for activities other than Solr. Please help
>> me
>> > understand.
>>
>> It is described here:
>> https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
>>
>> I will be quick to add that I do not agree with Shawn (the primary
>> author of the page) on the stated limits and find that the page in
>> general ignores that performance requirements differ a great deal.
>> Nevertheless, it is very true that Solr performance is tied to the
>> amount of OS disk cache:
>>
>> You can have a machine with 10TB of RAM, but Solr performance will still
>> be poor if you use it all for JVMs.
>>
>> Practically all modern operating system uses free memory for disk cache.
>> Free memory is the memory not used for JVMs or other programs. It might
>> be that you have a lot less than 30-40GB free: If you are on a Linux
>> server, try calling 'top' and see what is says under 'cached'.
>>
>> Related, I support jim's suggestion to inspect the swap activity:
>> In the past we had problem with a machine that insisted on swapping
>> excessively, although there were high IO and free memory.
>>
>> > The disks are SSDs.
>>
>> That makes your observations stranger still.
>>
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>>
>>
>

Re: Very high memory and CPU utilization.

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Mon, 2015-11-02 at 14:17 +0100, Toke Eskildsen wrote:
> http://rosalind:52300/solr/collection1/select?q=%22der+se*%
> 22&wt=json&indent=true&facet=false&group=true&group.field=domain
> 
> gets expanded to
> 
> parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
> author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
> svane* | description:\"kan svane\")) ())/no_coord"

Wrong copy-paste, sorry. The correct expansion of "der se*" is

"rawquerystring": "\"der se*\"",

"querystring": "\"der se*\"",

"parsedquery": "(+DisjunctionMaxQuery((content_text:se | author:der se*
| text:se | title:se | url:der se* | description:se)) ())/no_coord",

"parsedquery_toString": "+(content_text:se | author:der se* | text:se |
title:se | url:der se* | description:se) ()",

"QParser": "ExtendedDismaxQParser",



This supports jim's claim that "foo bar*" is probably not doing what you
(Modassar) think it is doing.


- Toke Eskildsen, State and University Library, Denmark

Re: Very high memory and CPU utilization.

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Tue, 2015-11-03 at 11:09 +0530, Modassar Ather wrote:
> It is around 90GB of index (around 8 million documents) on one shard and
> there are 12 such shards. As per my understanding the sharding is required
> for this case. Please help me understand if it is not required.

Except for an internal limit of 2 billion documents/shard (or 2 billion
unique values in a field in a single shard), there are no requirements
as such.

Our shards are 900GB / 200M+ documents and works well for our use case,
but it all depends on what you are doing. Your heaps are quite large
already, so merging into a single shard would probably require a heap so
large that your would run into trouble with garbage collection.

Your problem seems to be query processing speed. If your machine is not
maxed out by many concurrent requests, sharding should help you there:
As you have noticed, it allows the search to take advantage of multiple
processors.

- Toke Eskildsen, State and University Library, Denmark

Re: Very high memory and CPU utilization.

Posted by Walter Underwood <wu...@wunderwood.org>.

One rule of thumb for Solr is to shard after you reach 100 million documents. With large documents, you might want to shard sooner.

We are running an unsharded index of 7 million documents (55GB) without problems.

The EdgeNgramFilter generates a set of prefix terms for each term in the document. For the term “secondary”, it would generate:

s
se
sec
seco
secon
second
seconda
secondar
secondary

Obviously, this makes the index larger. But it makes prefix match a simple lookup, without needing wildcards.

Again, we can help you more if you describe what you are trying to do.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 2, 2015, at 9:39 PM, Modassar Ather <mo...@gmail.com> wrote:
> 
> Thanks Walter for your response,
> 
> It is around 90GB of index (around 8 million documents) on one shard and
> there are 12 such shards. As per my understanding the sharding is required
> for this case. Please help me understand if it is not required.
> 
> We have requirements where we need full wild card support to be provided to
> our users.
> I will try using EdgeNgramFilter. Can you please help me understand if
> EdgeNgramFilter can be a replacement of wild cards?
> There are situations where the words may be extended with some special
> characters e.g. For se* there can be a match secondry-school which also
> needs to be considered.
> 
> Regards,
> Modassar
> 
> 
> 
> On Mon, Nov 2, 2015 at 10:17 PM, Walter Underwood <wu...@wunderwood.org>
> wrote:
> 
>> To back up a bit, how many documents are in this 90GB index? You might not
>> need to shard at all.
>> 
>> Why are you sending a query with a trailing wildcard? Are you matching the
>> prefix of words, for query completion? If so, look at the suggester, which
>> is designed to solve exactly that. Or you can use the EdgeNgramFilter to
>> index prefixes. That will make your index larger, but prefix searches will
>> be very fast.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Nov 2, 2015, at 5:17 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
>> wrote:
>>> 
>>> On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:
>>> 
>>>> The query q=network se* is quick enough in our system too. It takes
>>>> around 3-4 seconds for around 8 million records.
>>>> 
>>>> The problem is with the same query as phrase. q="network se*".
>>> 
>>> I misunderstood your query then. I tried replicating it with
>>> q="der se*"
>>> 
>>> http://rosalind:52300/solr/collection1/select?q=%22der+se*%
>>> 22&wt=json&indent=true&facet=false&group=true&group.field=domain
>>> 
>>> gets expanded to
>>> 
>>> parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
>>> author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
>>> svane* | description:\"kan svane\")) ())/no_coord"
>>> 
>>> The result was 1,043,258,271 hits in 15,211 ms
>>> 
>>> 
>>> Interestingly enough, a search for
>>> q="kan svane*"
>>> resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
>>> billion+ documents. On that note,
>>> q=se*
>>> resulted in -951812427 hits in 194,276 ms.
>>> 
>>> Now this is interesting. The negative number seems to be caused by
>>> grouping, but I finally got the response time up in the minutes. Still
>>> no memory problems though. Hits without grouping were 3,343,154,869.
>>> 
>>> For comparison,
>>> q=http
>>> resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
>>> count was 7,062,516,538. Twice the hits of 'se*' in half the time.
>>> 
>>>> I changed my SolrCloud setup from 12 shard to 8 shard and given each
>>>> shard 30 GB of RAM on the same machine with same index size
>>>> (re-indexed) but could not see the significant improvement for the
>>>> query given.
>>> 
>>> Strange. I would have expected the extra free memory for disk space to
>>> help performance.
>>> 
>>>> Also can you please share your experiences with respect to RAM, GC,
>>>> solr cache setup etc as it seems by your comment that the SolrCloud
>>>> environment you have is kind of similar to the one I work on?
>>>> 
>>> There is a short write up at
>>> https://sbdevel.wordpress.com/net-archive-search/
>>> 
>>> - Toke Eskildsen, State and University Library, Denmark
>>> 
>>> 
>>> 
>> 
>>

Re: Very high memory and CPU utilization.

Posted by Modassar Ather <mo...@gmail.com>.

Thanks Walter for your response,

It is around 90GB of index (around 8 million documents) on one shard and
there are 12 such shards. As per my understanding the sharding is required
for this case. Please help me understand if it is not required.

We have requirements where we need full wild card support to be provided to
our users.
I will try using EdgeNgramFilter. Can you please help me understand if
EdgeNgramFilter can be a replacement of wild cards?
There are situations where the words may be extended with some special
characters e.g. For se* there can be a match secondry-school which also
needs to be considered.

Regards,
Modassar



On Mon, Nov 2, 2015 at 10:17 PM, Walter Underwood <wu...@wunderwood.org>
wrote:

> To back up a bit, how many documents are in this 90GB index? You might not
> need to shard at all.
>
> Why are you sending a query with a trailing wildcard? Are you matching the
> prefix of words, for query completion? If so, look at the suggester, which
> is designed to solve exactly that. Or you can use the EdgeNgramFilter to
> index prefixes. That will make your index larger, but prefix searches will
> be very fast.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Nov 2, 2015, at 5:17 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
> >
> > On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:
> >
> >> The query q=network se* is quick enough in our system too. It takes
> >> around 3-4 seconds for around 8 million records.
> >>
> >> The problem is with the same query as phrase. q="network se*".
> >
> > I misunderstood your query then. I tried replicating it with
> > q="der se*"
> >
> > http://rosalind:52300/solr/collection1/select?q=%22der+se*%
> > 22&wt=json&indent=true&facet=false&group=true&group.field=domain
> >
> > gets expanded to
> >
> > parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
> > author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
> > svane* | description:\"kan svane\")) ())/no_coord"
> >
> > The result was 1,043,258,271 hits in 15,211 ms
> >
> >
> > Interestingly enough, a search for
> > q="kan svane*"
> > resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
> > billion+ documents. On that note,
> > q=se*
> > resulted in -951812427 hits in 194,276 ms.
> >
> > Now this is interesting. The negative number seems to be caused by
> > grouping, but I finally got the response time up in the minutes. Still
> > no memory problems though. Hits without grouping were 3,343,154,869.
> >
> > For comparison,
> > q=http
> > resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
> > count was 7,062,516,538. Twice the hits of 'se*' in half the time.
> >
> >> I changed my SolrCloud setup from 12 shard to 8 shard and given each
> >> shard 30 GB of RAM on the same machine with same index size
> >> (re-indexed) but could not see the significant improvement for the
> >> query given.
> >
> > Strange. I would have expected the extra free memory for disk space to
> > help performance.
> >
> >> Also can you please share your experiences with respect to RAM, GC,
> >> solr cache setup etc as it seems by your comment that the SolrCloud
> >> environment you have is kind of similar to the one I work on?
> >>
> > There is a short write up at
> > https://sbdevel.wordpress.com/net-archive-search/
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
> >
>
>

Re: Very high memory and CPU utilization.

Posted by Walter Underwood <wu...@wunderwood.org>.

To back up a bit, how many documents are in this 90GB index? You might not need to shard at all.

Why are you sending a query with a trailing wildcard? Are you matching the prefix of words, for query completion? If so, look at the suggester, which is designed to solve exactly that. Or you can use the EdgeNgramFilter to index prefixes. That will make your index larger, but prefix searches will be very fast.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 2, 2015, at 5:17 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> 
> On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:
> 
>> The query q=network se* is quick enough in our system too. It takes
>> around 3-4 seconds for around 8 million records.
>> 
>> The problem is with the same query as phrase. q="network se*".
> 
> I misunderstood your query then. I tried replicating it with
> q="der se*"
> 
> http://rosalind:52300/solr/collection1/select?q=%22der+se*%
> 22&wt=json&indent=true&facet=false&group=true&group.field=domain
> 
> gets expanded to
> 
> parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
> author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
> svane* | description:\"kan svane\")) ())/no_coord"
> 
> The result was 1,043,258,271 hits in 15,211 ms
> 
> 
> Interestingly enough, a search for 
> q="kan svane*"
> resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
> billion+ documents. On that note,
> q=se*
> resulted in -951812427 hits in 194,276 ms.
> 
> Now this is interesting. The negative number seems to be caused by
> grouping, but I finally got the response time up in the minutes. Still
> no memory problems though. Hits without grouping were 3,343,154,869.
> 
> For comparison,
> q=http
> resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
> count was 7,062,516,538. Twice the hits of 'se*' in half the time.
> 
>> I changed my SolrCloud setup from 12 shard to 8 shard and given each
>> shard 30 GB of RAM on the same machine with same index size
>> (re-indexed) but could not see the significant improvement for the
>> query given.
> 
> Strange. I would have expected the extra free memory for disk space to
> help performance.
> 
>> Also can you please share your experiences with respect to RAM, GC,
>> solr cache setup etc as it seems by your comment that the SolrCloud
>> environment you have is kind of similar to the one I work on?
>> 
> There is a short write up at
> https://sbdevel.wordpress.com/net-archive-search/
> 
> - Toke Eskildsen, State and University Library, Denmark
> 
> 
>

Re: Very high memory and CPU utilization.

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:

> The query q=network se* is quick enough in our system too. It takes
> around 3-4 seconds for around 8 million records.
> 
> The problem is with the same query as phrase. q="network se*".

I misunderstood your query then. I tried replicating it with
q="der se*"

http://rosalind:52300/solr/collection1/select?q=%22der+se*%
22&wt=json&indent=true&facet=false&group=true&group.field=domain

gets expanded to

parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
svane* | description:\"kan svane\")) ())/no_coord"

The result was 1,043,258,271 hits in 15,211 ms

Interestingly enough, a search for 
q="kan svane*"
resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
billion+ documents. On that note,
q=se*
resulted in -951812427 hits in 194,276 ms.

Now this is interesting. The negative number seems to be caused by
grouping, but I finally got the response time up in the minutes. Still
no memory problems though. Hits without grouping were 3,343,154,869.

For comparison,
q=http
resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
count was 7,062,516,538. Twice the hits of 'se*' in half the time.

> I changed my SolrCloud setup from 12 shard to 8 shard and given each
> shard 30 GB of RAM on the same machine with same index size
> (re-indexed) but could not see the significant improvement for the
> query given.

Strange. I would have expected the extra free memory for disk space to
help performance.

> Also can you please share your experiences with respect to RAM, GC,
> solr cache setup etc as it seems by your comment that the SolrCloud
> environment you have is kind of similar to the one I work on?
> 
There is a short write up at
https://sbdevel.wordpress.com/net-archive-search/

- Toke Eskildsen, State and University Library, Denmark

Re: Very high memory and CPU utilization.

Posted by Modassar Ather <mo...@gmail.com>.

Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
excessive garbage collection. You could turn GC-logging on to check
that. With a bit of luck GC would be the cause of the slow down.

Yes it is with top command. I will check GC activities and try to relate
with CPU usage.

The query q=network se* is quick enough in our system too. It takes around
3-4 seconds for around 8 million records.
The problem is with the same query as phrase. q="network se*".
Can you please share your experience with such query where the wild card
expansion is huge like in the query above?

I changed my SolrCloud setup from 12 shard to 8 shard and given each shard
30 GB of RAM on the same machine with same index size (re-indexed) but
could not see the significant improvement for the query given.

I will check the swap activity.

Also can you please share your experiences with respect to RAM, GC, solr
cache setup etc as it seems by your comment that the SolrCloud environment
you have is kind of similar to the one I work on?

Regards,
Modassar

On Mon, Nov 2, 2015 at 5:20 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> On Mon, 2015-11-02 at 16:25 +0530, Modassar Ather wrote:
> > The remaining size after you removed the heap usage should be reserved
> for
> > the index (not only the other system activities).
> > I am not able to get  the above point. So when I start Solr with 28g RAM,
> > for all the activities related to Solr it should not go beyond 28g. And
> the
> > remaining heap will be used for activities other than Solr. Please help
> me
> > understand.
>
> It is described here:
> https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
>
> I will be quick to add that I do not agree with Shawn (the primary
> author of the page) on the stated limits and find that the page in
> general ignores that performance requirements differ a great deal.
> Nevertheless, it is very true that Solr performance is tied to the
> amount of OS disk cache:
>
> You can have a machine with 10TB of RAM, but Solr performance will still
> be poor if you use it all for JVMs.
>
> Practically all modern operating system uses free memory for disk cache.
> Free memory is the memory not used for JVMs or other programs. It might
> be that you have a lot less than 30-40GB free: If you are on a Linux
> server, try calling 'top' and see what is says under 'cached'.
>
> Related, I support jim's suggestion to inspect the swap activity:
> In the past we had problem with a machine that insisted on swapping
> excessively, although there were high IO and free memory.
>
> > The disks are SSDs.
>
> That makes your observations stranger still.
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: Very high memory and CPU utilization.

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Mon, 2015-11-02 at 16:25 +0530, Modassar Ather wrote:
> The remaining size after you removed the heap usage should be reserved for
> the index (not only the other system activities).
> I am not able to get  the above point. So when I start Solr with 28g RAM,
> for all the activities related to Solr it should not go beyond 28g. And the
> remaining heap will be used for activities other than Solr. Please help me
> understand.

It is described here:
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

I will be quick to add that I do not agree with Shawn (the primary
author of the page) on the stated limits and find that the page in
general ignores that performance requirements differ a great deal.
Nevertheless, it is very true that Solr performance is tied to the
amount of OS disk cache:

You can have a machine with 10TB of RAM, but Solr performance will still
be poor if you use it all for JVMs.

Practically all modern operating system uses free memory for disk cache.
Free memory is the memory not used for JVMs or other programs. It might
be that you have a lot less than 30-40GB free: If you are on a Linux
server, try calling 'top' and see what is says under 'cached'.

Related, I support jim's suggestion to inspect the swap activity:
In the past we had problem with a machine that insisted on swapping
excessively, although there were high IO and free memory.

> The disks are SSDs.

That makes your observations stranger still.

- Toke Eskildsen, State and University Library, Denmark

Re: Very high memory and CPU utilization.

Posted by Modassar Ather <mo...@gmail.com>.

Thanks Jim for your response.

The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
I am not able to get  the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.

*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
Yes you are right that 400% is for single process.
The disks are SSDs.

Regards,
Modassar

On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi <ji...@gmail.com> wrote:

> *if it correlates with the bad performance you're seeing. One important
> thing to notice is that a significant part of your index needs to be in RAM
> (especially if you're using SSDs) in order to achieve good performance.*
>
> Especially if you're not using SSDs, sorry ;)
>
> 2015-11-02 11:38 GMT+01:00 jim ferenczi <ji...@gmail.com>:
>
> > 12 shards with 28GB for the heap and 90GB for each index means that you
> > need at least 336GB for the heap (assuming you're using all of it which
> may
> > be easily the case considering the way the GC is handling memory) and ~=
> > 1TO for the index. Let's say that you don't need your entire index in
> RAM,
> > the problem as I see it is that you don't have enough RAM for your index
> +
> > heap. Assuming your machine has 370GB of RAM there are only 34GB left for
> > your index, 1TO/34GB means that you can only have 1/30 of your entire
> index
> > in RAM. I would advise you to check the swap activity on the machine and
> > see if it correlates with the bad performance you're seeing. One
> important
> > thing to notice is that a significant part of your index needs to be in
> RAM
> > (especially if you're using SSDs) in order to achieve good performance:
> >
> >
> >
> > *As mentioned above this is a big machine with 370+ gb of RAM and Solr
> (12
> > nodes total) is assigned 336 GB. The rest is still a good for other
> system
> > activities.*
> > The remaining size after you removed the heap usage should be reserved
> for
> > the index (not only the other system activities).
> >
> >
> > *Also the CPU utilization goes upto 400% in few of the nodes:*
> > You said that only machine is used so I assumed that 400% cpu is for a
> > single process (one solr node), right ?
> > This seems impossible if you are sure that only one query is played at a
> > time and no indexing is performed. Best thing to do is to dump stack
> trace
> > of the solr nodes during the query and to check what the threads are
> doing.
> >
> > Jim
> >
> >
> >
> > 2015-11-02 10:38 GMT+01:00 Modassar Ather <mo...@gmail.com>:
> >
> >> Just to add one more point that one external Zookeeper instance is also
> >> running on this particular machine.
> >>
> >> Regards,
> >> Modassar
> >>
> >> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <mo...@gmail.com>
> >> wrote:
> >>
> >> > Hi Toke,
> >> > Thanks for your response. My comments in-line.
> >> >
> >> > That is 12 machines, running a shard each?
> >> > No! This is a single big machine with 12 shards on it.
> >> >
> >> > What is the total amount of physical memory on each machine?
> >> > Around 370 gb on the single machine.
> >> >
> >> > Well, se* probably expands to a great deal of documents, but a huge
> bump
> >> > in memory utilization and 3 minutes+ sounds strange.
> >> >
> >> > - What are your normal query times?
> >> > Few simple queries are returned with in a couple of seconds. But the
> >> more
> >> > complex queries with proximity and wild cards have taken more than 3-4
> >> > minutes and some times some queries have timed out too where time out
> is
> >> > set to 5 minutes.
> >> > - How many hits do you get from 'network se*'?
> >> > More than a million records.
> >> > - How many results do you return (the rows-parameter)?
> >> > It is the default one 10. Grouping is enabled on a field.
> >> > - If you issue a query without wildcards, but with approximately the
> >> > same amount of hits as 'network se*', how long does it take?
> >> > A query resulting in around half a million record return within a
> couple
> >> > of seconds.
> >> >
> >> > That is strange, yes. Have you checked the logs to see if something
> >> > unexpected is going on while you test?
> >> > Have not seen anything particularly. Will try to check again.
> >> >
> >> > If you are using spinning drives and only have 32GB of RAM in total in
> >> > each machine, you are probably struggling just to keep things running.
> >> > As mentioned above this is a big machine with 370+ gb of RAM and Solr
> >> (12
> >> > nodes total) is assigned 336 GB. The rest is still a good for other
> >> system
> >> > activities.
> >> >
> >> > Thanks,
> >> > Modassar
> >> >
> >> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <
> te@statsbiblioteket.dk>
> >> > wrote:
> >> >
> >> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> >> >> > I have a setup of 12 shard cluster started with 28gb memory each
> on a
> >> >> > single server. There are no replica. The size of index is around
> >> 90gb on
> >> >> > each shard. The Solr version is 5.2.1.
> >> >>
> >> >> That is 12 machines, running a shard each?
> >> >>
> >> >> What is the total amount of physical memory on each machine?
> >> >>
> >> >> > When I query "network se*", the memory utilization goes upto 24-26
> gb
> >> >> and
> >> >> > the query takes around 3+ minutes to execute. Also the CPU
> >> utilization
> >> >> goes
> >> >> > upto 400% in few of the nodes.
> >> >>
> >> >> Well, se* probably expands to a great deal of documents, but a huge
> >> bump
> >> >> in memory utilization and 3 minutes+ sounds strange.
> >> >>
> >> >> - What are your normal query times?
> >> >> - How many hits do you get from 'network se*'?
> >> >> - How many results do you return (the rows-parameter)?
> >> >> - If you issue a query without wildcards, but with approximately the
> >> >> same amount of hits as 'network se*', how long does it take?
> >> >>
> >> >> > Why the CPU utilization is so high and more than one core is used.
> >> >> > As far as I understand querying is single threaded.
> >> >>
> >> >> That is strange, yes. Have you checked the logs to see if something
> >> >> unexpected is going on while you test?
> >> >>
> >> >> > How can I disable replication(as it is implicitly enabled)
> >> permanently
> >> >> as
> >> >> > in our case we are not using it but can see warnings related to
> >> leader
> >> >> > election?
> >> >>
> >> >> If you are using spinning drives and only have 32GB of RAM in total
> in
> >> >> each machine, you are probably struggling just to keep things
> running.
> >> >>
> >> >>
> >> >> - Toke Eskildsen, State and University Library, Denmark
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >
> >
>

Re: Very high memory and CPU utilization.

Posted by jim ferenczi <ji...@gmail.com>.

*if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance.*

Especially if you're not using SSDs, sorry ;)

2015-11-02 11:38 GMT+01:00 jim ferenczi <ji...@gmail.com>:

> 12 shards with 28GB for the heap and 90GB for each index means that you
> need at least 336GB for the heap (assuming you're using all of it which may
> be easily the case considering the way the GC is handling memory) and ~=
> 1TO for the index. Let's say that you don't need your entire index in RAM,
> the problem as I see it is that you don't have enough RAM for your index +
> heap. Assuming your machine has 370GB of RAM there are only 34GB left for
> your index, 1TO/34GB means that you can only have 1/30 of your entire index
> in RAM. I would advise you to check the swap activity on the machine and
> see if it correlates with the bad performance you're seeing. One important
> thing to notice is that a significant part of your index needs to be in RAM
> (especially if you're using SSDs) in order to achieve good performance:
>
>
>
> *As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
> nodes total) is assigned 336 GB. The rest is still a good for other system
> activities.*
> The remaining size after you removed the heap usage should be reserved for
> the index (not only the other system activities).
>
>
> *Also the CPU utilization goes upto 400% in few of the nodes:*
> You said that only machine is used so I assumed that 400% cpu is for a
> single process (one solr node), right ?
> This seems impossible if you are sure that only one query is played at a
> time and no indexing is performed. Best thing to do is to dump stack trace
> of the solr nodes during the query and to check what the threads are doing.
>
> Jim
>
>
>
> 2015-11-02 10:38 GMT+01:00 Modassar Ather <mo...@gmail.com>:
>
>> Just to add one more point that one external Zookeeper instance is also
>> running on this particular machine.
>>
>> Regards,
>> Modassar
>>
>> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <mo...@gmail.com>
>> wrote:
>>
>> > Hi Toke,
>> > Thanks for your response. My comments in-line.
>> >
>> > That is 12 machines, running a shard each?
>> > No! This is a single big machine with 12 shards on it.
>> >
>> > What is the total amount of physical memory on each machine?
>> > Around 370 gb on the single machine.
>> >
>> > Well, se* probably expands to a great deal of documents, but a huge bump
>> > in memory utilization and 3 minutes+ sounds strange.
>> >
>> > - What are your normal query times?
>> > Few simple queries are returned with in a couple of seconds. But the
>> more
>> > complex queries with proximity and wild cards have taken more than 3-4
>> > minutes and some times some queries have timed out too where time out is
>> > set to 5 minutes.
>> > - How many hits do you get from 'network se*'?
>> > More than a million records.
>> > - How many results do you return (the rows-parameter)?
>> > It is the default one 10. Grouping is enabled on a field.
>> > - If you issue a query without wildcards, but with approximately the
>> > same amount of hits as 'network se*', how long does it take?
>> > A query resulting in around half a million record return within a couple
>> > of seconds.
>> >
>> > That is strange, yes. Have you checked the logs to see if something
>> > unexpected is going on while you test?
>> > Have not seen anything particularly. Will try to check again.
>> >
>> > If you are using spinning drives and only have 32GB of RAM in total in
>> > each machine, you are probably struggling just to keep things running.
>> > As mentioned above this is a big machine with 370+ gb of RAM and Solr
>> (12
>> > nodes total) is assigned 336 GB. The rest is still a good for other
>> system
>> > activities.
>> >
>> > Thanks,
>> > Modassar
>> >
>> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
>> > wrote:
>> >
>> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
>> >> > I have a setup of 12 shard cluster started with 28gb memory each on a
>> >> > single server. There are no replica. The size of index is around
>> 90gb on
>> >> > each shard. The Solr version is 5.2.1.
>> >>
>> >> That is 12 machines, running a shard each?
>> >>
>> >> What is the total amount of physical memory on each machine?
>> >>
>> >> > When I query "network se*", the memory utilization goes upto 24-26 gb
>> >> and
>> >> > the query takes around 3+ minutes to execute. Also the CPU
>> utilization
>> >> goes
>> >> > upto 400% in few of the nodes.
>> >>
>> >> Well, se* probably expands to a great deal of documents, but a huge
>> bump
>> >> in memory utilization and 3 minutes+ sounds strange.
>> >>
>> >> - What are your normal query times?
>> >> - How many hits do you get from 'network se*'?
>> >> - How many results do you return (the rows-parameter)?
>> >> - If you issue a query without wildcards, but with approximately the
>> >> same amount of hits as 'network se*', how long does it take?
>> >>
>> >> > Why the CPU utilization is so high and more than one core is used.
>> >> > As far as I understand querying is single threaded.
>> >>
>> >> That is strange, yes. Have you checked the logs to see if something
>> >> unexpected is going on while you test?
>> >>
>> >> > How can I disable replication(as it is implicitly enabled)
>> permanently
>> >> as
>> >> > in our case we are not using it but can see warnings related to
>> leader
>> >> > election?
>> >>
>> >> If you are using spinning drives and only have 32GB of RAM in total in
>> >> each machine, you are probably struggling just to keep things running.
>> >>
>> >>
>> >> - Toke Eskildsen, State and University Library, Denmark
>> >>
>> >>
>> >>
>> >
>>
>
>

Re: Very high memory and CPU utilization.

Posted by jim ferenczi <ji...@gmail.com>.

12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which may
be easily the case considering the way the GC is handling memory) and ~=
1TO for the index. Let's say that you don't need your entire index in RAM,
the problem as I see it is that you don't have enough RAM for your index +
heap. Assuming your machine has 370GB of RAM there are only 34GB left for
your index, 1TO/34GB means that you can only have 1/30 of your entire index
in RAM. I would advise you to check the swap activity on the machine and
see if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance:



*As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.*
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).


*Also the CPU utilization goes upto 400% in few of the nodes:*
You said that only machine is used so I assumed that 400% cpu is for a
single process (one solr node), right ?
This seems impossible if you are sure that only one query is played at a
time and no indexing is performed. Best thing to do is to dump stack trace
of the solr nodes during the query and to check what the threads are doing.

Jim



2015-11-02 10:38 GMT+01:00 Modassar Ather <mo...@gmail.com>:

> Just to add one more point that one external Zookeeper instance is also
> running on this particular machine.
>
> Regards,
> Modassar
>
> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <mo...@gmail.com>
> wrote:
>
> > Hi Toke,
> > Thanks for your response. My comments in-line.
> >
> > That is 12 machines, running a shard each?
> > No! This is a single big machine with 12 shards on it.
> >
> > What is the total amount of physical memory on each machine?
> > Around 370 gb on the single machine.
> >
> > Well, se* probably expands to a great deal of documents, but a huge bump
> > in memory utilization and 3 minutes+ sounds strange.
> >
> > - What are your normal query times?
> > Few simple queries are returned with in a couple of seconds. But the more
> > complex queries with proximity and wild cards have taken more than 3-4
> > minutes and some times some queries have timed out too where time out is
> > set to 5 minutes.
> > - How many hits do you get from 'network se*'?
> > More than a million records.
> > - How many results do you return (the rows-parameter)?
> > It is the default one 10. Grouping is enabled on a field.
> > - If you issue a query without wildcards, but with approximately the
> > same amount of hits as 'network se*', how long does it take?
> > A query resulting in around half a million record return within a couple
> > of seconds.
> >
> > That is strange, yes. Have you checked the logs to see if something
> > unexpected is going on while you test?
> > Have not seen anything particularly. Will try to check again.
> >
> > If you are using spinning drives and only have 32GB of RAM in total in
> > each machine, you are probably struggling just to keep things running.
> > As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
> > nodes total) is assigned 336 GB. The rest is still a good for other
> system
> > activities.
> >
> > Thanks,
> > Modassar
> >
> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
> > wrote:
> >
> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> >> > I have a setup of 12 shard cluster started with 28gb memory each on a
> >> > single server. There are no replica. The size of index is around 90gb
> on
> >> > each shard. The Solr version is 5.2.1.
> >>
> >> That is 12 machines, running a shard each?
> >>
> >> What is the total amount of physical memory on each machine?
> >>
> >> > When I query "network se*", the memory utilization goes upto 24-26 gb
> >> and
> >> > the query takes around 3+ minutes to execute. Also the CPU utilization
> >> goes
> >> > upto 400% in few of the nodes.
> >>
> >> Well, se* probably expands to a great deal of documents, but a huge bump
> >> in memory utilization and 3 minutes+ sounds strange.
> >>
> >> - What are your normal query times?
> >> - How many hits do you get from 'network se*'?
> >> - How many results do you return (the rows-parameter)?
> >> - If you issue a query without wildcards, but with approximately the
> >> same amount of hits as 'network se*', how long does it take?
> >>
> >> > Why the CPU utilization is so high and more than one core is used.
> >> > As far as I understand querying is single threaded.
> >>
> >> That is strange, yes. Have you checked the logs to see if something
> >> unexpected is going on while you test?
> >>
> >> > How can I disable replication(as it is implicitly enabled) permanently
> >> as
> >> > in our case we are not using it but can see warnings related to leader
> >> > election?
> >>
> >> If you are using spinning drives and only have 32GB of RAM in total in
> >> each machine, you are probably struggling just to keep things running.
> >>
> >>
> >> - Toke Eskildsen, State and University Library, Denmark
> >>
> >>
> >>
> >
>

Re: Very high memory and CPU utilization.

Posted by Modassar Ather <mo...@gmail.com>.

Just to add one more point that one external Zookeeper instance is also
running on this particular machine.

Regards,
Modassar

On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <mo...@gmail.com>
wrote:

> Hi Toke,
> Thanks for your response. My comments in-line.
>
> That is 12 machines, running a shard each?
> No! This is a single big machine with 12 shards on it.
>
> What is the total amount of physical memory on each machine?
> Around 370 gb on the single machine.
>
> Well, se* probably expands to a great deal of documents, but a huge bump
> in memory utilization and 3 minutes+ sounds strange.
>
> - What are your normal query times?
> Few simple queries are returned with in a couple of seconds. But the more
> complex queries with proximity and wild cards have taken more than 3-4
> minutes and some times some queries have timed out too where time out is
> set to 5 minutes.
> - How many hits do you get from 'network se*'?
> More than a million records.
> - How many results do you return (the rows-parameter)?
> It is the default one 10. Grouping is enabled on a field.
> - If you issue a query without wildcards, but with approximately the
> same amount of hits as 'network se*', how long does it take?
> A query resulting in around half a million record return within a couple
> of seconds.
>
> That is strange, yes. Have you checked the logs to see if something
> unexpected is going on while you test?
> Have not seen anything particularly. Will try to check again.
>
> If you are using spinning drives and only have 32GB of RAM in total in
> each machine, you are probably struggling just to keep things running.
> As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
> nodes total) is assigned 336 GB. The rest is still a good for other system
> activities.
>
> Thanks,
> Modassar
>
> On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
>
>> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
>> > I have a setup of 12 shard cluster started with 28gb memory each on a
>> > single server. There are no replica. The size of index is around 90gb on
>> > each shard. The Solr version is 5.2.1.
>>
>> That is 12 machines, running a shard each?
>>
>> What is the total amount of physical memory on each machine?
>>
>> > When I query "network se*", the memory utilization goes upto 24-26 gb
>> and
>> > the query takes around 3+ minutes to execute. Also the CPU utilization
>> goes
>> > upto 400% in few of the nodes.
>>
>> Well, se* probably expands to a great deal of documents, but a huge bump
>> in memory utilization and 3 minutes+ sounds strange.
>>
>> - What are your normal query times?
>> - How many hits do you get from 'network se*'?
>> - How many results do you return (the rows-parameter)?
>> - If you issue a query without wildcards, but with approximately the
>> same amount of hits as 'network se*', how long does it take?
>>
>> > Why the CPU utilization is so high and more than one core is used.
>> > As far as I understand querying is single threaded.
>>
>> That is strange, yes. Have you checked the logs to see if something
>> unexpected is going on while you test?
>>
>> > How can I disable replication(as it is implicitly enabled) permanently
>> as
>> > in our case we are not using it but can see warnings related to leader
>> > election?
>>
>> If you are using spinning drives and only have 32GB of RAM in total in
>> each machine, you are probably struggling just to keep things running.
>>
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>>
>>
>

Re: Very high memory and CPU utilization.

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Mon, 2015-11-02 at 14:34 +0530, Modassar Ather wrote:

> No! This is a single big machine with 12 shards on it.
> Around 370 gb on the single machine.

Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
excessive garbage collection. You could turn GC-logging on to check
that. With a bit of luck GC would be the cause of the slow down.

> Few simple queries are returned with in a couple of seconds. But the
> more complex queries with proximity and wild cards have taken more
> than 3-4 minutes and some times some queries have timed out too where
> time out is set to 5 minutes.

The proximity information seems relevant here.

> - How many results do you return (the rows-parameter)?
> It is the default one 10. Grouping is enabled on a field.

If you have group.ngroups=true that would be heavy (and require a lot of
memory), but as your non-wildcard searches with many hits are fast, that
is probably not the problem here.

Toke:
> If you are using spinning drives and only have 32GB of RAM in total in
> each machine, you are probably struggling just to keep things running.
> 
> As mentioned above this is a big machine with 370+ gb of RAM and Solr
> (12 nodes total) is assigned 336 GB. The rest is still a good for
> other system activities.

Assuming the storage is spinning drives, it is quite a small machine,
measured by cache memory vs. index size: You have 30-40GB free for disk
cache and your index is 1TB, so ~3%. Unless you have a great deal of
stored content, 3% for disk caching means that there will be a high
amount of IO during a search. It works for you when the queries are
simple field:term, but I am not surprised that it doesn't work well in
other cases.

By nature, truncated queries touches a lot of terms, which means a lot
of lookups. I have no in-depth knowledge on how these lookups are
performed, but I guesstimate that it involves IO-intensive lookups. 

Coincidentally we also run a machine with multiple Solrs, terabytes of
index data and not much memory (< 1%) for disk cache. One difference
being that it is backed by SSDs. I tried doing a few ad-hoc searches
with grouping turned on (search terms are Danish words):

q=ostekiks 38,646 hits, 530 ms.
q=ost* 49,713,655 hits, 2,190 ms.
q=køer mælk 1,232,445 hits, 767 ms.
q=kat mad* 10,926,107 hits, 4624 ms.
q="kaniner harer"~50 161,009 hits, 726 ms.
q=kantarel 337,279 hits, 455 ms.
q=deres kan* 245,719,036 hits, 13,565 ms.

This was with Solr 4.10. No special garbage collection activity
occurred. Heap usage stayed well below 8GB per Solr, which is the
standard behaviour of our system.

In short, I could not replicate your observed special activity based on
the queries you have described. I have no reason to believe that Solr
5.3 should perform worse in this aspect.

The SSDs are probably part of the explanation, but I suspect we are
missing something else. It should not make a difference (as your
non-truncated queries are fast), but could you try to reduce the slow
request to the simplest possible? No grouping, faceting or other special
processing, just q=network se*

- Toke Eskildsen, State and University Library, Denmark

Re: Very high memory and CPU utilization.

Posted by Modassar Ather <mo...@gmail.com>.

Hi Toke,
Thanks for your response. My comments in-line.

That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.

What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.

Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.

- What are your normal query times?
Few simple queries are returned with in a couple of seconds. But the more
complex queries with proximity and wild cards have taken more than 3-4
minutes and some times some queries have timed out too where time out is
set to 5 minutes.
- How many hits do you get from 'network se*'?
More than a million records.
- How many results do you return (the rows-parameter)?
It is the default one 10. Grouping is enabled on a field.
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?
A query resulting in around half a million record return within a couple of
seconds.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?
Have not seen anything particularly. Will try to check again.

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.
As mentioned above this is a big machine with 370+ gb of RAM and Solr (12
nodes total) is assigned 336 GB. The rest is still a good for other system
activities.

Thanks,
Modassar

On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> > I have a setup of 12 shard cluster started with 28gb memory each on a
> > single server. There are no replica. The size of index is around 90gb on
> > each shard. The Solr version is 5.2.1.
>
> That is 12 machines, running a shard each?
>
> What is the total amount of physical memory on each machine?
>
> > When I query "network se*", the memory utilization goes upto 24-26 gb and
> > the query takes around 3+ minutes to execute. Also the CPU utilization
> goes
> > upto 400% in few of the nodes.
>
> Well, se* probably expands to a great deal of documents, but a huge bump
> in memory utilization and 3 minutes+ sounds strange.
>
> - What are your normal query times?
> - How many hits do you get from 'network se*'?
> - How many results do you return (the rows-parameter)?
> - If you issue a query without wildcards, but with approximately the
> same amount of hits as 'network se*', how long does it take?
>
> > Why the CPU utilization is so high and more than one core is used.
> > As far as I understand querying is single threaded.
>
> That is strange, yes. Have you checked the logs to see if something
> unexpected is going on while you test?
>
> > How can I disable replication(as it is implicitly enabled) permanently as
> > in our case we are not using it but can see warnings related to leader
> > election?
>
> If you are using spinning drives and only have 32GB of RAM in total in
> each machine, you are probably struggling just to keep things running.
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: Very high memory and CPU utilization.

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> I have a setup of 12 shard cluster started with 28gb memory each on a
> single server. There are no replica. The size of index is around 90gb on
> each shard. The Solr version is 5.2.1.

That is 12 machines, running a shard each?

What is the total amount of physical memory on each machine?

> When I query "network se*", the memory utilization goes upto 24-26 gb and
> the query takes around 3+ minutes to execute. Also the CPU utilization goes
> upto 400% in few of the nodes.

Well, se* probably expands to a great deal of documents, but a huge bump
in memory utilization and 3 minutes+ sounds strange.

- What are your normal query times?
- How many hits do you get from 'network se*'?
- How many results do you return (the rows-parameter)?
- If you issue a query without wildcards, but with approximately the
same amount of hits as 'network se*', how long does it take?

> Why the CPU utilization is so high and more than one core is used.
> As far as I understand querying is single threaded.

That is strange, yes. Have you checked the logs to see if something
unexpected is going on while you test?

> How can I disable replication(as it is implicitly enabled) permanently as
> in our case we are not using it but can see warnings related to leader
> election?

If you are using spinning drives and only have 32GB of RAM in total in
each machine, you are probably struggling just to keep things running.

- Toke Eskildsen, State and University Library, Denmark