You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rohit Harchandani <rh...@gmail.com> on 2012/11/05 22:00:37 UTC

Re: Solr 4.0 simultaneous query problem

Hi,
So it seems that when I query multiple shards with the sort criteria for
5000 documents, it queries all shards and gets a list of document ids and
then adds the document ids to the original query and queries all the shards
again.
This process of doing the join of query results with the unique ids and
getting the remaining fields is turning out to be really slow. It takes a
while to search for a list of unique ids. Is there any config change  to
make this process faster?
Also what does isDistrib=false mean when solr generates the queries
internally?
Thanks,
Rohit

On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani <rh...@gmail.com>wrote:

> Hi,
>
> The same query is fired always for 500 rows. The only thing different is
> the "start" parameter.
>
> The 3 shards are in the same instance on the same server. They all have
> the same schema. But the inherent type of the documents is different. Also
> most of the apps queries goes to shard "A" which has the smallest index
> size (4gb).
>
> The query is made to a "master" shard which by default goes to all 3
> shards for results. (also, the query that i am trying matches documents
> only only in shard "A" mentioned above)
>
> Will try debugQuery now and post it here.
>
> Thanks,
> Rohit
>
>
>
>
> On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>> Hi,
>>
>> Maybe you can narrow this down a little further.  Are there some
>> queries that are faster and some slower?  Is there a pattern?  Can you
>> share examples of slow queries?  Have you tried &debugQuery=true?
>> These 3 shards.... is each of them on its own server or?  Is the slow
>> one always the one that hits the biggest shard?  Do they hold the same
>> type of data?  How come their sizes are so different?
>>
>> Otis
>> --
>> Search Analytics - http://sematext.com/search-analytics/index.html
>> Performance Monitoring - http://sematext.com/spm/index.html
>>
>>
>> On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani <rh...@gmail.com>
>> wrote:
>> > Hi all,
>> > I have an application which queries a solr instance having 3 shards(4gb,
>> > 13gb and 30gb index size respectively) having 6 million documents in
>> all.
>> > When I start 10 threads in my app to make simultaneous queries (with
>> > rows=500 and different start parameter, sort on 1 field and no facets)
>> to
>> > solr to return 500 different documents in each query, sometimes I see
>> that
>> > most of the responses come back within no time (500ms-1000ms), but the
>> last
>> > response takes close to 50 seconds (Qtime).
>> > I am using the latest 4.0 release. What is the reason for this delay? Is
>> > there a way to prevent this?
>> > Thanks and regards,
>> > Rohit
>>
>
>

Re: Solr 4.0 simultaneous query problem

Posted by Rohit Harchandani <rh...@gmail.com>.
So is it a better approach to query for smaller rows, say 500, and keep
increasing the start parameter? wouldnt that be slower since I have an
increasing start parameter and I will also be sorting by the same field in
each of my queries made to the multiple shards?

Also, does it make sense to have all these documents in the same shard? I
went for this approach because the shard which is queried the most is small
and gives a lot of benefit in terms of time taken for all the stats
queries. This shard is only about 5 gb whereas the entire index will be
about 50 gb.

Thanks for the help,
Rohit

On Mon, Nov 5, 2012 at 4:02 PM, Walter Underwood <wu...@wunderwood.org>wrote:

> Don't query for 5000 documents. That is going to be slow no matter how it
> is implemented.
>
> wunder
>
> On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote:
>
> > Hi,
> > So it seems that when I query multiple shards with the sort criteria for
> > 5000 documents, it queries all shards and gets a list of document ids and
> > then adds the document ids to the original query and queries all the
> shards
> > again.
> > This process of doing the join of query results with the unique ids and
> > getting the remaining fields is turning out to be really slow. It takes a
> > while to search for a list of unique ids. Is there any config change  to
> > make this process faster?
> > Also what does isDistrib=false mean when solr generates the queries
> > internally?
> > Thanks,
> > Rohit
> >
> > On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani <rharchu@gmail.com
> >wrote:
> >
> >> Hi,
> >>
> >> The same query is fired always for 500 rows. The only thing different is
> >> the "start" parameter.
> >>
> >> The 3 shards are in the same instance on the same server. They all have
> >> the same schema. But the inherent type of the documents is different.
> Also
> >> most of the apps queries goes to shard "A" which has the smallest index
> >> size (4gb).
> >>
> >> The query is made to a "master" shard which by default goes to all 3
> >> shards for results. (also, the query that i am trying matches documents
> >> only only in shard "A" mentioned above)
> >>
> >> Will try debugQuery now and post it here.
> >>
> >> Thanks,
> >> Rohit
> >>
> >>
> >>
> >>
> >> On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic <
> >> otis.gospodnetic@gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> Maybe you can narrow this down a little further.  Are there some
> >>> queries that are faster and some slower?  Is there a pattern?  Can you
> >>> share examples of slow queries?  Have you tried &debugQuery=true?
> >>> These 3 shards.... is each of them on its own server or?  Is the slow
> >>> one always the one that hits the biggest shard?  Do they hold the same
> >>> type of data?  How come their sizes are so different?
> >>>
> >>> Otis
> >>> --
> >>> Search Analytics - http://sematext.com/search-analytics/index.html
> >>> Performance Monitoring - http://sematext.com/spm/index.html
> >>>
> >>>
> >>> On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani <rharchu@gmail.com
> >
> >>> wrote:
> >>>> Hi all,
> >>>> I have an application which queries a solr instance having 3
> shards(4gb,
> >>>> 13gb and 30gb index size respectively) having 6 million documents in
> >>> all.
> >>>> When I start 10 threads in my app to make simultaneous queries (with
> >>>> rows=500 and different start parameter, sort on 1 field and no facets)
> >>> to
> >>>> solr to return 500 different documents in each query, sometimes I see
> >>> that
> >>>> most of the responses come back within no time (500ms-1000ms), but the
> >>> last
> >>>> response takes close to 50 seconds (Qtime).
> >>>> I am using the latest 4.0 release. What is the reason for this delay?
> Is
> >>>> there a way to prevent this?
> >>>> Thanks and regards,
> >>>> Rohit
> >>>
> >>
> >>
>
> --
> Walter Underwood
> wunder@wunderwood.org
>
>
>
>

Re: Solr 4.0 simultaneous query problem

Posted by Walter Underwood <wu...@wunderwood.org>.
Don't query for 5000 documents. That is going to be slow no matter how it is implemented.

wunder

On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote:

> Hi,
> So it seems that when I query multiple shards with the sort criteria for
> 5000 documents, it queries all shards and gets a list of document ids and
> then adds the document ids to the original query and queries all the shards
> again.
> This process of doing the join of query results with the unique ids and
> getting the remaining fields is turning out to be really slow. It takes a
> while to search for a list of unique ids. Is there any config change  to
> make this process faster?
> Also what does isDistrib=false mean when solr generates the queries
> internally?
> Thanks,
> Rohit
> 
> On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani <rh...@gmail.com>wrote:
> 
>> Hi,
>> 
>> The same query is fired always for 500 rows. The only thing different is
>> the "start" parameter.
>> 
>> The 3 shards are in the same instance on the same server. They all have
>> the same schema. But the inherent type of the documents is different. Also
>> most of the apps queries goes to shard "A" which has the smallest index
>> size (4gb).
>> 
>> The query is made to a "master" shard which by default goes to all 3
>> shards for results. (also, the query that i am trying matches documents
>> only only in shard "A" mentioned above)
>> 
>> Will try debugQuery now and post it here.
>> 
>> Thanks,
>> Rohit
>> 
>> 
>> 
>> 
>> On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic <
>> otis.gospodnetic@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> Maybe you can narrow this down a little further.  Are there some
>>> queries that are faster and some slower?  Is there a pattern?  Can you
>>> share examples of slow queries?  Have you tried &debugQuery=true?
>>> These 3 shards.... is each of them on its own server or?  Is the slow
>>> one always the one that hits the biggest shard?  Do they hold the same
>>> type of data?  How come their sizes are so different?
>>> 
>>> Otis
>>> --
>>> Search Analytics - http://sematext.com/search-analytics/index.html
>>> Performance Monitoring - http://sematext.com/spm/index.html
>>> 
>>> 
>>> On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani <rh...@gmail.com>
>>> wrote:
>>>> Hi all,
>>>> I have an application which queries a solr instance having 3 shards(4gb,
>>>> 13gb and 30gb index size respectively) having 6 million documents in
>>> all.
>>>> When I start 10 threads in my app to make simultaneous queries (with
>>>> rows=500 and different start parameter, sort on 1 field and no facets)
>>> to
>>>> solr to return 500 different documents in each query, sometimes I see
>>> that
>>>> most of the responses come back within no time (500ms-1000ms), but the
>>> last
>>>> response takes close to 50 seconds (Qtime).
>>>> I am using the latest 4.0 release. What is the reason for this delay? Is
>>>> there a way to prevent this?
>>>> Thanks and regards,
>>>> Rohit
>>> 
>> 
>> 

--
Walter Underwood
wunder@wunderwood.org