You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by ba ba <so...@gmail.com> on 2009/11/06 03:20:13 UTC

CPU Max Utilization

Greetings,

I'm running a solr instance with 100 million documents in it. The index is
18 GB.

The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
running on an 8 core machine with 32 GB or ram. Every concurrent query I run
on it uses up one of the cores. So, if I am running 1 concurrent query I'm
using up the cpu of one of the cores. If I have 8 concurrent queries I'm
using up all of the cores.

Is this normal to have such a high CPU utilization. If not, what am I doing
wrong here. The only thing I have modified is the schema.xml file to
correspond to the documents I want to store. Everything else is just using
the default values for all the config files.

Thanks.

Re: CPU Max Utilization

Posted by Walter Underwood <wu...@wunderwood.org>.

Are you requesting results by relevance or are you sorting by a field?

How many results are you requesting?

Are you using real user queries (with repetition) or a flat  
distrubution of queries?

wunder

On Nov 5, 2009, at 6:20 PM, ba ba wrote:

> Greetings,
>
> I'm running a solr instance with 100 million documents in it. The  
> index is
> 18 GB.
>
> The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
> running on an 8 core machine with 32 GB or ram. Every concurrent  
> query I run
> on it uses up one of the cores. So, if I am running 1 concurrent  
> query I'm
> using up the cpu of one of the cores. If I have 8 concurrent queries  
> I'm
> using up all of the cores.
>
> Is this normal to have such a high CPU utilization. If not, what am  
> I doing
> wrong here. The only thing I have modified is the schema.xml file to
> correspond to the documents I want to store. Everything else is just  
> using
> the default values for all the config files.
>
> Thanks.

Re: CPU Max Utilization

Posted by Otis Gospodnetic <ot...@yahoo.com>.

doFilter is a Servlet Filter method -- SolrDispatchFilter is a Servlet Filter and all requests go through that method.  You ened to dig deeper in your profiler.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: ba ba <so...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Mon, November 9, 2009 5:15:17 PM
> Subject: Re: CPU Max Utilization
> 
> After doing some more testing, I've seen the performance decrease yet again.
> It happens after solr has been run for about 1/2 hour. I left my test
> running over the weekend and saw the CPU usage go down to a reasonable level
> at the end of the weekend. It is the same problem where the CPU has maximum
> usage. I attached a profiler to the solr instance and found that 99% of the
> CPU time is spent in the doFilter method of the SolrDispatchFilter class.
> 
> Does anyone know why all of the CPU would be hogged on this particular
> method?
> 
> I'm requesting by relevance without sorting. I'm requesting 500 results per
> query. There are no repititions in the query set.
> 
> As for the fields. I'm using String and SortableInt fields. There are 3
> string fields and 3 Sortable Int fields in my schema. One of the String
> Fields is multivalued. The fields are quite small. Since its 18 GB for a 100
> million document index.
> 
> Thanks,
> Brad
> 
> 2009/11/6 ba ba 
> 
> > After looking at the question about the sorting. It seems that the schema
> > was using the SortableIntField class. When I did not return these fields in
> > the queries, I got reasonable CPU usage. If I search only on one of these
> > SortableIntFields, I get the bad query performance. I think the problem is
> > the schema is using a Sortable field when I don't need a sortable field.
> >
> > Thanks for the help.
> >
> > -Brad
> >
> > 2009/11/5 Otis Gospodnetic 
> >
> > You may also want to share some sample queries, your fields definitions,
> >> and tell us how long a core remains 100% utilized.
> >>
> >>  Otis
> >> --
> >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >>
> >>
> >>
> >> ----- Original Message ----
> >> > From: ba ba 
> >> > To: solr-user@lucene.apache.org
> >> > Sent: Thu, November 5, 2009 9:20:13 PM
> >> > Subject: CPU Max Utilization
> >> >
> >> > Greetings,
> >> >
> >> > I'm running a solr instance with 100 million documents in it. The index
> >> is
> >> > 18 GB.
> >> >
> >> > The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
> >> > running on an 8 core machine with 32 GB or ram. Every concurrent query I
> >> run
> >> > on it uses up one of the cores. So, if I am running 1 concurrent query
> >> I'm
> >> > using up the cpu of one of the cores. If I have 8 concurrent queries I'm
> >> > using up all of the cores.
> >> >
> >> > Is this normal to have such a high CPU utilization. If not, what am I
> >> doing
> >> > wrong here. The only thing I have modified is the schema.xml file to
> >> > correspond to the documents I want to store. Everything else is just
> >> using
> >> > the default values for all the config files.
> >> >
> >> > Thanks.
> >>
> >>
> >

Re: CPU Max Utilization

Posted by ba ba <so...@gmail.com>.

After doing some more testing, I've seen the performance decrease yet again.
It happens after solr has been run for about 1/2 hour. I left my test
running over the weekend and saw the CPU usage go down to a reasonable level
at the end of the weekend. It is the same problem where the CPU has maximum
usage. I attached a profiler to the solr instance and found that 99% of the
CPU time is spent in the doFilter method of the SolrDispatchFilter class.

Does anyone know why all of the CPU would be hogged on this particular
method?

I'm requesting by relevance without sorting. I'm requesting 500 results per
query. There are no repititions in the query set.

As for the fields. I'm using String and SortableInt fields. There are 3
string fields and 3 Sortable Int fields in my schema. One of the String
Fields is multivalued. The fields are quite small. Since its 18 GB for a 100
million document index.

Thanks,
Brad

2009/11/6 ba ba <so...@gmail.com>

> After looking at the question about the sorting. It seems that the schema
> was using the SortableIntField class. When I did not return these fields in
> the queries, I got reasonable CPU usage. If I search only on one of these
> SortableIntFields, I get the bad query performance. I think the problem is
> the schema is using a Sortable field when I don't need a sortable field.
>
> Thanks for the help.
>
> -Brad
>
> 2009/11/5 Otis Gospodnetic <ot...@yahoo.com>
>
> You may also want to share some sample queries, your fields definitions,
>> and tell us how long a core remains 100% utilized.
>>
>>  Otis
>> --
>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>
>>
>>
>> ----- Original Message ----
>> > From: ba ba <so...@gmail.com>
>> > To: solr-user@lucene.apache.org
>> > Sent: Thu, November 5, 2009 9:20:13 PM
>> > Subject: CPU Max Utilization
>> >
>> > Greetings,
>> >
>> > I'm running a solr instance with 100 million documents in it. The index
>> is
>> > 18 GB.
>> >
>> > The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
>> > running on an 8 core machine with 32 GB or ram. Every concurrent query I
>> run
>> > on it uses up one of the cores. So, if I am running 1 concurrent query
>> I'm
>> > using up the cpu of one of the cores. If I have 8 concurrent queries I'm
>> > using up all of the cores.
>> >
>> > Is this normal to have such a high CPU utilization. If not, what am I
>> doing
>> > wrong here. The only thing I have modified is the schema.xml file to
>> > correspond to the documents I want to store. Everything else is just
>> using
>> > the default values for all the config files.
>> >
>> > Thanks.
>>
>>
>

Re: CPU Max Utilization

Posted by ba ba <so...@gmail.com>.

After looking at the question about the sorting. It seems that the schema
was using the SortableIntField class. When I did not return these fields in
the queries, I got reasonable CPU usage. If I search only on one of these
SortableIntFields, I get the bad query performance. I think the problem is
the schema is using a Sortable field when I don't need a sortable field.

Thanks for the help.

-Brad

2009/11/5 Otis Gospodnetic <ot...@yahoo.com>

> You may also want to share some sample queries, your fields definitions,
> and tell us how long a core remains 100% utilized.
>
>  Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message ----
> > From: ba ba <so...@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Thu, November 5, 2009 9:20:13 PM
> > Subject: CPU Max Utilization
> >
> > Greetings,
> >
> > I'm running a solr instance with 100 million documents in it. The index
> is
> > 18 GB.
> >
> > The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
> > running on an 8 core machine with 32 GB or ram. Every concurrent query I
> run
> > on it uses up one of the cores. So, if I am running 1 concurrent query
> I'm
> > using up the cpu of one of the cores. If I have 8 concurrent queries I'm
> > using up all of the cores.
> >
> > Is this normal to have such a high CPU utilization. If not, what am I
> doing
> > wrong here. The only thing I have modified is the schema.xml file to
> > correspond to the documents I want to store. Everything else is just
> using
> > the default values for all the config files.
> >
> > Thanks.
>
>

Re: CPU Max Utilization

Posted by Otis Gospodnetic <ot...@yahoo.com>.

You may also want to share some sample queries, your fields definitions, and tell us how long a core remains 100% utilized.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: ba ba <so...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, November 5, 2009 9:20:13 PM
> Subject: CPU Max Utilization
> 
> Greetings,
> 
> I'm running a solr instance with 100 million documents in it. The index is
> 18 GB.
> 
> The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
> running on an 8 core machine with 32 GB or ram. Every concurrent query I run
> on it uses up one of the cores. So, if I am running 1 concurrent query I'm
> using up the cpu of one of the cores. If I have 8 concurrent queries I'm
> using up all of the cores.
> 
> Is this normal to have such a high CPU utilization. If not, what am I doing
> wrong here. The only thing I have modified is the schema.xml file to
> correspond to the documents I want to store. Everything else is just using
> the default values for all the config files.
> 
> Thanks.