You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jean-Sebastien Vachon <je...@wantedanalytics.com> on 2013/08/14 17:09:33 UTC

Huge discrepancy between QTime and ElapsedTime

Hi All,

I am running some benchmarks to tune our Solr 4.3 cloud and noticed that while the reported QTime  is quite satisfactory (100 ms or so), the elapsed time is quite large (around 5 seconds). The collection contains 12.8M documents and the index size on disk is about 35 GB.. I have only one shard and 4 replicas (we intent to have 5 shards but wanted to see how Solr would perform with only one shard so that we could benefit from all Solr functions)

I checked for huge GC but found none. I also checked if we had intensive IO and we don't. All five nodes have 48GB of ram of which 4GB is allocated to Tomcat 7 and Solr. The caches have a hit ratio over 80%. Zookeeper is running on the same boxes (5 instances, one per node) but there does not seem to be much activity going on.

This is a sample query:

http://10.0.5.211:8201/solr/Current/select?fq=position_first_seen_date_id:[3484 TO 3516]&q= (title:java OR semi_clean_title:java OR ad_description:java)&rows=10&start=0&fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id, position_posted_date_id, position_refreshed_date_id, position_job_type_id, position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id, state_id,county_id,msa_id,country_id,position_id,position_job_type_mva, ad_activity_status_id, position_score, ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id, is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id, title, semi_clean_title, ad_description, position_description, ad_bls_salary, position_bls_salary, covering_source_id, content_model_id,position_soc_2011_8_code&group.field=position_id&group=true&group.ngroups=false&group.main=true&sort=position_first_seen_date_id desc,score desc

Any idea what could cause this?

Re: Huge discrepancy between QTime and ElapsedTime

Posted by Scott Lundgren <sc...@carbonblack.com>.

Jean-Sebastien,

We have had similar issues.  In our cases, our QTime varied between 100ms
and as much as 120s (that's right, 120,000ms).  The times were so long that
they resulted in timeouts upstream.

In our case, we have settled in on the following hypothesis:

The actual retrieval time (clock time, as we call it) requires that the
matching documents be fetched from disk, decompressed (in our case,
anyway), and processed.  We found that we had numerous "very large"
documents in our index - on the order of tens of MBs or more sometimes, and
that performance in these cases was horrific.

As an example, we could execute a simple query and see QTime of 100ms and
clock time of 300ms when sorted in such a way that the smallest documents
came first.  In particular, we would retrieve the first 10 documents out of
a result set of thousands.  If we reversed the sort order, and the largest
documents came first, we found clock times that I showed above - very large.

Our conclusion, therefore, was that practical search times - as opposed to
QTime - were a function of document size of the retrieved documents.  We
therefore settled on an approach that 'split' large logical documents into
numerous underlying documents, thereby limiting their size.

FWIW.

Scott

On Wed, Aug 14, 2013 at 10:09 AM, Jean-Sebastien Vachon <
jean-sebastien.vachon@wantedanalytics.com> wrote:

> Hi All,
>
> I am running some benchmarks to tune our Solr 4.3 cloud and noticed that
> while the reported QTime  is quite satisfactory (100 ms or so), the elapsed
> time is quite large (around 5 seconds). The collection contains 12.8M
> documents and the index size on disk is about 35 GB.. I have only one shard
> and 4 replicas (we intent to have 5 shards but wanted to see how Solr would
> perform with only one shard so that we could benefit from all Solr
> functions)
>
> I checked for huge GC but found none. I also checked if we had intensive
> IO and we don't. All five nodes have 48GB of ram of which 4GB is allocated
> to Tomcat 7 and Solr. The caches have a hit ratio over 80%. Zookeeper is
> running on the same boxes (5 instances, one per node) but there does not
> seem to be much activity going on.
>
> This is a sample query:
>
>
> http://10.0.5.211:8201/solr/Current/select?fq=position_first_seen_date_id:[3484TO 3516]&q= (title:java OR semi_clean_title:java OR
> ad_description:java)&rows=10&start=0&fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id,
> position_posted_date_id, position_refreshed_date_id, position_job_type_id,
> position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id,
> state_id,county_id,msa_id,country_id,position_id,position_job_type_mva,
> ad_activity_status_id, position_score,
> ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id,
> is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id,
> title, semi_clean_title, ad_description, position_description,
> ad_bls_salary, position_bls_salary, covering_source_id,
> content_model_id,position_soc_2011_8_code&group.field=position_id&group=true&group.ngroups=false&group.main=true&sort=position_first_seen_date_id
> desc,score desc
>
> Any idea what could cause this?
>

-- 
Scott Lundgren
Director of Engineering
Carbon Black, Inc.
(210) 204-0483 | scott.lundgren@carbonblack.com

RE: Huge discrepancy between QTime and ElapsedTime

Posted by Jean-Sebastien Vachon <je...@wantedanalytics.com>.

Thanks Shawn and Scott for your feedback. It is really appreciated.

> -----Original Message-----
> From: Shawn Heisey [mailto:solr@elyograg.org]
> Sent: August-14-13 12:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Huge discrepancy between QTime and ElapsedTime
> 
> On 8/14/2013 9:09 AM, Jean-Sebastien Vachon wrote:
> > I am running some benchmarks to tune our Solr 4.3 cloud and noticed
> > that while the reported QTime  is quite satisfactory (100 ms or so),
> > the elapsed time is quite large (around 5 seconds). The collection
> > contains 12.8M documents and the index size on disk is about 35 GB.. I
> > have only one shard and 4 replicas (we intent to have 5 shards but
> > wanted to see how Solr would perform with only one shard so that we
> > could benefit from all Solr functions)
> 
> As your other reply from Scott says, you may be dealing with the fact that
> Solr must fetch stored field data from the index on disk and decompress it.
> Solr 4.1 and later have compressed stored fields.  There is no way other than
> writing custom Solr code to turn off the compression.  If the documents are
> very large, the decompression step can be a big performance penalty.
> 
> You have a VERY large field list - fl parameter.  Have you tried just leaving that
> parameter off so that Solr will return all stored fields instead of identifying
> each field?  This might not help at all, I'm just putting it out there as
> something to try.

I will give it a try. 

> You also have grouping enabled.  From what I understand, that can be slow.
> If you turn that off, what happens to your elapsed times?

Yes grouping is slow but not as much as bad as it was in Solr 1.4 which we are still using in production with a similar index (actually it has 17M documents on 6 shards but all on the same server). I expect the grouping to be much faster in 4.x than in 1.4 and I don't have this problem in 1.4. It's true however that I have some additional stored fields in my new setup. But this was done to limit the number of times I have to fetch the information from MySQL.

> Your free RAM vs. index size is good, assuming that there's nothing else on
> your Solr servers.  With 12.8 million documents plus the use of grouping and
> sorting, you might need a larger java heap.  Try increasing it to 5GB as an
> initial test and see if that makes any difference, either good or bad.
> 
> Your email says you checked for huge GC, but without knowing exactly how
> you checked, it's difficult to know what you would have actually found.

I turned GC logging on and analyzed the resulting file. I've also taken a few heap dumps and all generations seem to be properly sized.
I will give it a try to see if it affects the performance.

> Thanks,
> Shawn
> 
> 
> -----
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013

Re: Huge discrepancy between QTime and ElapsedTime

Posted by Yonik Seeley <yo...@lucidworks.com>.

On Wed, Aug 14, 2013 at 12:39 PM, Shawn Heisey <so...@elyograg.org> wrote:
> You also have grouping enabled.  From what I understand, that can be slow.
> If you turn that off, what happens to your elapsed times?

QTime would include that.  It includes everything up until the point
where the response starts streaming back (which is why it doesn't
include the time for loading stored document fields... that part is
streamed).

-Yonik
http:/lucidworks.com

Re: Huge discrepancy between QTime and ElapsedTime

Posted by Shawn Heisey <so...@elyograg.org>.

On 8/14/2013 9:09 AM, Jean-Sebastien Vachon wrote:
> I am running some benchmarks to tune our Solr 4.3 cloud and noticed that while the reported QTime is quite satisfactory (100 ms or so), the elapsed time is quite large (around 5 seconds). The collection contains 12.8M documents and the index size on disk is about 35 GB.. I have only one shard and 4 replicas (we intent to have 5 shards but wanted to see how Solr would perform with only one shard so that we could benefit from all Solr functions)

As your other reply from Scott says, you may be dealing with the fact
that Solr must fetch stored field data from the index on disk and
decompress it. Solr 4.1 and later have compressed stored fields. There
is no way other than writing custom Solr code to turn off the
compression. If the documents are very large, the decompression step
can be a big performance penalty.

You have a VERY large field list - fl parameter. Have you tried just
leaving that parameter off so that Solr will return all stored fields
instead of identifying each field? This might not help at all, I'm just
putting it out there as something to try.

You also have grouping enabled. From what I understand, that can be
slow. If you turn that off, what happens to your elapsed times?

Your free RAM vs. index size is good, assuming that there's nothing else
on your Solr servers. With 12.8 million documents plus the use of
grouping and sorting, you might need a larger java heap. Try increasing
it to 5GB as an initial test and see if that makes any difference,
either good or bad.

Your email says you checked for huge GC, but without knowing exactly how
you checked, it's difficult to know what you would have actually found.

Thanks,
Shawn