You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Elran Dvir <el...@checkpoint.com> on 2013/11/13 08:52:44 UTC

RE: distributed search is significantly slower than direct search

Erick, Thanks for your response.

We are upgrading our system using Solr.
We need to preserve old functionality.  Our client displays 5K document and groups them.

Is there a way to refactor code in order to improve distributed documents fetching?

Thanks. 

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, October 30, 2013 3:17 AM
To: solr-user@lucene.apache.org
Subject: Re: distributed search is significantly slower than direct search

You can't. There will inevitably be some overhead in the distributed case. That said, 7 seconds is quite long.

5,000 rows is excessive, and probably where your issue is. You're having to go out and fetch the docs across the wire. Perhaps there is some batching that could be done there, I don't know whether this is one document per request or not.

Why 5K docs?

Best,
Erick


On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <el...@checkpoint.com> wrote:

> Hi all,
>
> I am using Solr 4.4 with multi cores. One core (called template) is my 
> "routing" core.
>
> When I run
> http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> 0.0.1:8983/solr/core1,
> it consistently takes about 7s.
> When I run http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it 
> consistently takes about 40ms.
>
> I profiled the distributed query.
> This is the distributed query process (I hope the terms are accurate):
> When solr identifies a distributed query, it sends the query to the 
> shard and get matched shard docs.
> Then it sends another query to the shard to get the Solr documents.
> Most time is spent in the last stage in the function "process" of 
> "QueryComponent" in:
>
> for (int i=0; i<idArr.size(); i++) {
>         int id = req.getSearcher().getFirstMatch(
>                 new Term(idField.getName(), 
> idField.getType().toInternal(idArr.get(i))));
>
> How can I make my distributed query as fast as the direct one?
>
> Thanks.
>


Email secured by Check Point

Re: distributed search is significantly slower than direct search

Posted by Manuel Le Normand <ma...@gmail.com>.
https://issues.apache.org/jira/browse/SOLR-5478

There it goes

On Mon, Nov 18, 2013 at 5:44 PM, Manuel Le Normand <
manuel.lenormand@gmail.com> wrote:

> Sure, I am out of office till end of week. I reply after i upload the patch
>

Re: distributed search is significantly slower than direct search

Posted by Manuel Le Normand <ma...@gmail.com>.
Sure, I am out of office till end of week. I reply after i upload the patch

Re: distributed search is significantly slower than direct search

Posted by Yuval Dotan <yu...@gmail.com>.
Hi
Thanks very much for your answers :)
Manuel, if you have a patch I will be glad to test it's performance
Yuval



On Mon, Nov 18, 2013 at 10:49 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> Manuel, that sounds very interesting. Would you be willing to
> contribute this back to the community?
>
> On Mon, Nov 18, 2013 at 9:53 AM, Manuel Le Normand
> <ma...@gmail.com> wrote:
> > In order to accelerate the BinaryResponseWriter.write we extended this
> > writer class to implement the docid to id tranformation by docValues (on
> > memory) with no need to access stored field for id reading nor lazy
> loading
> > of fields that also has a cost. That should improve read rate as
> docValues
> > are sequential and should avoid disk IO. This docValues implementation is
> > accessed during both query stages (as mentioned above) in case you ask
> for
> > id's only, or only once, during the distributed search stage, in case you
> > intend asking for stored fields different than id.
> >
> > We just started testing it for performance. I would love hearing any
> > oppinions or test performances for this implementation
> >
> > Manu
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: distributed search is significantly slower than direct search

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Manuel, that sounds very interesting. Would you be willing to
contribute this back to the community?

On Mon, Nov 18, 2013 at 9:53 AM, Manuel Le Normand
<ma...@gmail.com> wrote:
> In order to accelerate the BinaryResponseWriter.write we extended this
> writer class to implement the docid to id tranformation by docValues (on
> memory) with no need to access stored field for id reading nor lazy loading
> of fields that also has a cost. That should improve read rate as docValues
> are sequential and should avoid disk IO. This docValues implementation is
> accessed during both query stages (as mentioned above) in case you ask for
> id's only, or only once, during the distributed search stage, in case you
> intend asking for stored fields different than id.
>
> We just started testing it for performance. I would love hearing any
> oppinions or test performances for this implementation
>
> Manu



-- 
Regards,
Shalin Shekhar Mangar.

Re: distributed search is significantly slower than direct search

Posted by Manuel Le Normand <ma...@gmail.com>.
In order to accelerate the BinaryResponseWriter.write we extended this
writer class to implement the docid to id tranformation by docValues (on
memory) with no need to access stored field for id reading nor lazy loading
of fields that also has a cost. That should improve read rate as docValues
are sequential and should avoid disk IO. This docValues implementation is
accessed during both query stages (as mentioned above) in case you ask for
id's only, or only once, during the distributed search stage, in case you
intend asking for stored fields different than id.

We just started testing it for performance. I would love hearing any
oppinions or test performances for this implementation

Manu

Re: distributed search is significantly slower than direct search

Posted by Mark Miller <ma...@gmail.com>.
You are asking for 5000 docs right? And that’s forcing us to look up 5000 external to internal ids. I think this always had a cost, but it’s obviously worse if you ask for a ton of results. I don’t think single node has to do this? And if we had like Searcher leases or something (we will eventually), I think we could avoid it and just use internal ids.

- Mark

On Nov 17, 2013, at 12:44 PM, Yuval Dotan <yu...@gmail.com> wrote:

> Hi Tomás
> This is just a test environment meant only to reproduce the issue I am
> currently investigating.
> The number of documents should grow substantially (billions of docs).
> 
> 
> 
> On Sun, Nov 17, 2013 at 7:12 PM, Tomás Fernández Löbbe <
> tomasflobbe@gmail.com> wrote:
> 
>> Hi Yuval, quick question. You say that your code has 750k docs and around
>> 400mb? Is this some kind of test dataset and you expect it to grow
>> significantly? For an index of this size, I wouldn't use distributed
>> search, single shard should be fine.
>> 
>> 
>> Tomás
>> 
>> 
>> On Sun, Nov 17, 2013 at 6:50 AM, Yuval Dotan <yu...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> I isolated the case
>>> 
>>> Installed on a new machine (2 x Xeon E5410 2.33GHz)
>>> 
>>> I have an environment with 12Gb of memory.
>>> 
>>> I assigned 6gb of memory to Solr and I’m not running any other memory
>>> consuming process so no memory issues should arise.
>>> 
>>> Removed all indexes apart from two:
>>> 
>>> emptyCore – empty – used for routing
>>> 
>>> core1 – holds the stored data – has ~750,000 docs and size of 400Mb
>>> 
>>> Again this is a single machine that holds both indexes.
>>> 
>>> The query
>>> 
>>> 
>> http://localhost:8210/solr/emptyCore/select?rows=5000&q=*:*&shards=127.0.0.1:8210/solr/core1&wt=jsonQTime
>>> takes ~3 seconds
>>> 
>>> and direct query
>>> http://localhost:8210/solr/core1/select?rows=5000&q=*:*&wt=json Qtime
>>> takes
>>> ~15 ms - a magnitude difference.
>>> 
>>> I ran the long query several times and got an improvement of about a sec
>>> (33%) but that’s it.
>>> 
>>> I need to better understand why this is happening.
>>> 
>>> I tried looking at Solr code and debugging the issue but with no success.
>>> 
>>> The one thing I did notice is that the getFirstMatch method which
>> receives
>>> the doc id, searches the term dict and returns the internal id takes most
>>> of the time for some reason.
>>> 
>>> I am pretty stuck and would appreciate any ideas
>>> 
>>> My only solution for the moment is to bypass the distributed query,
>>> implement code in my own app that directly queries the relevant cores and
>>> handles the sorting etc..
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> 
>>> On Sat, Nov 16, 2013 at 2:39 PM, Michael Sokolov <
>>> msokolov@safaribooksonline.com> wrote:
>>> 
>>>> Did you say what the memory profile of your machine is?  How much
>> memory,
>>>> and how large are the shards? This is just a random guess, but it might
>>> be
>>>> that if you are memory-constrained, there is a lot of thrashing caused
>> by
>>>> paging (swapping?) in and out the sharded indexes while a single index
>>> can
>>>> be scanned linearly, even if it does need to be paged in.
>>>> 
>>>> -Mike
>>>> 
>>>> 
>>>> On 11/14/2013 8:10 AM, Elran Dvir wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> We tried returning just the id field and got exactly the same
>>> performance.
>>>>> Our system is distributed but all shards are in a single machine so
>>>>> network issues are not a factor.
>>>>> The code we found where Solr is spending its time is on the shard and
>>> not
>>>>> on the routing core, again all shards are local.
>>>>> We investigated the getFirstMatch() method and noticed that the
>>>>> MultiTermEnum.reset (inside MultiTerm.iterator) and
>> MultiTerm.seekExact
>>>>> take 99% of the time.
>>>>> Inside these methods, the call to BlockTreeTermsReader$
>>>>> FieldReader$SegmentTermsEnum$Frame.loadBlock  takes most of the time.
>>>>> Out of the 7 seconds  run these methods take ~5 and
>>>>> BinaryResponseWriter.write takes the rest(~ 2 seconds).
>>>>> 
>>>>> We tried increasing cache sizes and got hits, but it only improved the
>>>>> query time by a second (~6), so no major effect.
>>>>> We are not indexing during our tests. The performance is similar.
>>>>> (How do we measure doc size? Is it important due to the fact that the
>>>>> performance is the same when returning only id field?)
>>>>> 
>>>>> We still don't completely understand why the query takes this much
>>> longer
>>>>> although the cores are on the same machine.
>>>>> 
>>>>> Is there a way to improve the performance (code, configuration,
>> query)?
>>>>> 
>>>>> -----Original Message-----
>>>>> From: idokissos@gmail.com [mailto:idokissos@gmail.com] On Behalf Of
>>>>> Manuel Le Normand
>>>>> Sent: Thursday, November 14, 2013 1:30 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: distributed search is significantly slower than direct
>>> search
>>>>> 
>>>>> It's surprising such a query takes a long time, I would assume that
>>> after
>>>>> trying consistently q=*:* you should be getting cache hits and times
>>> should
>>>>> be faster. Try see in the adminUI how do your query/doc cache perform.
>>>>> Moreover, the query in itself is just asking the first 5000 docs that
>>>>> were indexed (returing the first [docid]), so seems all this time is
>>> wasted
>>>>> on transfer. Out of these 7 secs how much is spent on the above
>> method?
>>>>> What do you return by default? How big is every doc you display in
>> your
>>>>> results?
>>>>> Might be the matter that both collections work on the same ressources.
>>>>> Try elaborating your use-case.
>>>>> 
>>>>> Anyway, it seems like you just made a test to see what will be the
>>>>> performance hit in a distributed environment so I'll try to explain
>> some
>>>>> things we encountered in our benchmarks, with a case that has at least
>>> the
>>>>> similarity of the num of docs fetched.
>>>>> 
>>>>> We reclaim 2000 docs every query, running over 40 shards. This means
>>>>> every shard is actually transfering to our frontend 2000 docs every
>>>>> document-match request (the first you were referring to). Even if
>> lazily
>>>>> loaded, reading 2000 id's (on 40 servers) and lazy loading the fields
>>> is a
>>>>> tough job. Waiting for the slowest shard to respond, then sorting the
>>> docs
>>>>> and reloading (lazy or not) the top 2000 docs might take a long time.
>>>>> 
>>>>> Our times are 4-8 secs, but do it's not possible comparing cases.
>> We've
>>>>> done few steps that improved it along the way, steps that led to
>> others.
>>>>> These were our starters:
>>>>> 
>>>>>    1. Profile these queries from different servers and solr
>> instances,
>>>>> try
>>>>>    putting your finger what collection is working hard and why. Check
>>> if
>>>>>    you're stuck on components that don't have an added value for you
>>> but
>>>>> are
>>>>>    used by default.
>>>>>    2. Consider eliminating the doc cache. It loads lots of (partly)
>>> lazy
>>>>>    documents that their probability of secondary usage is low.
>> There's
>>>>> no such
>>>>>    thing "popular docs" when requesting so many docs. You may be
>> using
>>>>> your
>>>>>    memory in a better way.
>>>>>    3. Bottleneck check - inner server metrics as cpu user / iowait,
>>>>> packets
>>>>>    transferred over the network, page faults etc. are excellent in
>>> order
>>>>> to
>>>>>    understand if the disk/network/cpu is slowing you down. Then
>> upgrade
>>>>>    hardware in one of the shards to check if it helps by looking at
>> the
>>>>>    upgraded shard qTime compared to other.
>>>>>    4. Warm up the index after commiting - try to benchmark how do
>>> queries
>>>>>    performs before and after some warm-up, let's say some few
>> hundreds
>>> of
>>>>>    queries (from your previous system) in order to warm up the os
>> cache
>>>>>    (assuming your using NRTDirectoryFactory)
>>>>> 
>>>>> 
>>>>> Good luck,
>>>>> Manu
>>>>> 
>>>>> 
>>>>> On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <
>>> erickerickson@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> One thing you can try, and this is more diagnostic than a cure, is
>>>>>> return just the id field (and insure that lazy field loading is
>> true).
>>>>>> That'll tell you whether the issue is actually fetching the document
>>>>>> off disk and decompressing, although frankly that's unlikely since
>> you
>>>>>> can get your 5,000 rows from a single machine quickly.
>>>>>> 
>>>>>> The code you found where Solr is spending its time, is that on the
>>>>>> "routing" core or on the shards? I actually have a hard time
>>>>>> understanding how that code could take a long time, doesn't seem
>>>>>> right.
>>>>>> 
>>>>>> You are transferring 5,000 docs across the network, so it's possible
>>>>>> that your network is just slow, that's certainly a difference between
>>>>>> the local and remote case, but that's a stab in the dark.
>>>>>> 
>>>>>> Not much help I know,
>>>>>> Erick
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <el...@checkpoint.com>
>>>>>> wrote:
>>>>>> 
>>>>>> Erick, Thanks for your response.
>>>>>>> 
>>>>>>> We are upgrading our system using Solr.
>>>>>>> We need to preserve old functionality.  Our client displays 5K
>>>>>>> document and groups them.
>>>>>>> 
>>>>>>> Is there a way to refactor code in order to improve distributed
>>>>>>> documents fetching?
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>>>>>>> Sent: Wednesday, October 30, 2013 3:17 AM
>>>>>>> To: solr-user@lucene.apache.org
>>>>>>> Subject: Re: distributed search is significantly slower than direct
>>>>>>> 
>>>>>> search
>>>>>> 
>>>>>>> You can't. There will inevitably be some overhead in the distributed
>>>>>>> 
>>>>>> case.
>>>>>> 
>>>>>>> That said, 7 seconds is quite long.
>>>>>>> 
>>>>>>> 5,000 rows is excessive, and probably where your issue is. You're
>>>>>>> having to go out and fetch the docs across the wire. Perhaps there
>>>>>>> is some batching that could be done there, I don't know whether this
>>>>>>> is one document per request or not.
>>>>>>> 
>>>>>>> Why 5K docs?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Erick
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <el...@checkpoint.com>
>>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I am using Solr 4.4 with multi cores. One core (called template)
>>>>>>>> is my "routing" core.
>>>>>>>> 
>>>>>>>> When I run
>>>>>>>> 
>>> http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
>>>>>>>> 0.0.1:8983/solr/core1,
>>>>>>>> it consistently takes about 7s.
>>>>>>>> When I run
>>>>>>>> http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
>>>>>>>> consistently takes about 40ms.
>>>>>>>> 
>>>>>>>> I profiled the distributed query.
>>>>>>>> This is the distributed query process (I hope the terms are
>>> accurate):
>>>>>>>> When solr identifies a distributed query, it sends the query to
>>>>>>>> the shard and get matched shard docs.
>>>>>>>> Then it sends another query to the shard to get the Solr documents.
>>>>>>>> Most time is spent in the last stage in the function "process" of
>>>>>>>> "QueryComponent" in:
>>>>>>>> 
>>>>>>>> for (int i=0; i<idArr.size(); i++) {
>>>>>>>>         int id = req.getSearcher().getFirstMatch(
>>>>>>>>                 new Term(idField.getName(),
>>>>>>>> idField.getType().toInternal(idArr.get(i))));
>>>>>>>> 
>>>>>>>> How can I make my distributed query as fast as the direct one?
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> 
>>>>>>> Email secured by Check Point
>>>>>>> 
>>>>>>> 
>>>>> Email secured by Check Point
>>>>> 
>>>> 
>>>> 
>>> 
>> 


Re: distributed search is significantly slower than direct search

Posted by Yuval Dotan <yu...@gmail.com>.
Hi Tomás
This is just a test environment meant only to reproduce the issue I am
currently investigating.
The number of documents should grow substantially (billions of docs).



On Sun, Nov 17, 2013 at 7:12 PM, Tomás Fernández Löbbe <
tomasflobbe@gmail.com> wrote:

> Hi Yuval, quick question. You say that your code has 750k docs and around
> 400mb? Is this some kind of test dataset and you expect it to grow
> significantly? For an index of this size, I wouldn't use distributed
> search, single shard should be fine.
>
>
> Tomás
>
>
> On Sun, Nov 17, 2013 at 6:50 AM, Yuval Dotan <yu...@gmail.com> wrote:
>
> > Hi,
> >
> > I isolated the case
> >
> > Installed on a new machine (2 x Xeon E5410 2.33GHz)
> >
> > I have an environment with 12Gb of memory.
> >
> > I assigned 6gb of memory to Solr and I’m not running any other memory
> > consuming process so no memory issues should arise.
> >
> > Removed all indexes apart from two:
> >
> > emptyCore – empty – used for routing
> >
> > core1 – holds the stored data – has ~750,000 docs and size of 400Mb
> >
> > Again this is a single machine that holds both indexes.
> >
> > The query
> >
> >
> http://localhost:8210/solr/emptyCore/select?rows=5000&q=*:*&shards=127.0.0.1:8210/solr/core1&wt=jsonQTime
> > takes ~3 seconds
> >
> > and direct query
> > http://localhost:8210/solr/core1/select?rows=5000&q=*:*&wt=json Qtime
> > takes
> > ~15 ms - a magnitude difference.
> >
> > I ran the long query several times and got an improvement of about a sec
> > (33%) but that’s it.
> >
> > I need to better understand why this is happening.
> >
> > I tried looking at Solr code and debugging the issue but with no success.
> >
> > The one thing I did notice is that the getFirstMatch method which
> receives
> > the doc id, searches the term dict and returns the internal id takes most
> > of the time for some reason.
> >
> > I am pretty stuck and would appreciate any ideas
> >
> > My only solution for the moment is to bypass the distributed query,
> > implement code in my own app that directly queries the relevant cores and
> > handles the sorting etc..
> >
> > Thanks
> >
> >
> >
> >
> > On Sat, Nov 16, 2013 at 2:39 PM, Michael Sokolov <
> > msokolov@safaribooksonline.com> wrote:
> >
> > > Did you say what the memory profile of your machine is?  How much
> memory,
> > > and how large are the shards? This is just a random guess, but it might
> > be
> > > that if you are memory-constrained, there is a lot of thrashing caused
> by
> > > paging (swapping?) in and out the sharded indexes while a single index
> > can
> > > be scanned linearly, even if it does need to be paged in.
> > >
> > > -Mike
> > >
> > >
> > > On 11/14/2013 8:10 AM, Elran Dvir wrote:
> > >
> > >> Hi,
> > >>
> > >> We tried returning just the id field and got exactly the same
> > performance.
> > >> Our system is distributed but all shards are in a single machine so
> > >> network issues are not a factor.
> > >> The code we found where Solr is spending its time is on the shard and
> > not
> > >> on the routing core, again all shards are local.
> > >> We investigated the getFirstMatch() method and noticed that the
> > >> MultiTermEnum.reset (inside MultiTerm.iterator) and
> MultiTerm.seekExact
> > >> take 99% of the time.
> > >> Inside these methods, the call to BlockTreeTermsReader$
> > >> FieldReader$SegmentTermsEnum$Frame.loadBlock  takes most of the time.
> > >> Out of the 7 seconds  run these methods take ~5 and
> > >> BinaryResponseWriter.write takes the rest(~ 2 seconds).
> > >>
> > >> We tried increasing cache sizes and got hits, but it only improved the
> > >> query time by a second (~6), so no major effect.
> > >> We are not indexing during our tests. The performance is similar.
> > >> (How do we measure doc size? Is it important due to the fact that the
> > >> performance is the same when returning only id field?)
> > >>
> > >> We still don't completely understand why the query takes this much
> > longer
> > >> although the cores are on the same machine.
> > >>
> > >> Is there a way to improve the performance (code, configuration,
> query)?
> > >>
> > >> -----Original Message-----
> > >> From: idokissos@gmail.com [mailto:idokissos@gmail.com] On Behalf Of
> > >> Manuel Le Normand
> > >> Sent: Thursday, November 14, 2013 1:30 AM
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: distributed search is significantly slower than direct
> > search
> > >>
> > >> It's surprising such a query takes a long time, I would assume that
> > after
> > >> trying consistently q=*:* you should be getting cache hits and times
> > should
> > >> be faster. Try see in the adminUI how do your query/doc cache perform.
> > >> Moreover, the query in itself is just asking the first 5000 docs that
> > >> were indexed (returing the first [docid]), so seems all this time is
> > wasted
> > >> on transfer. Out of these 7 secs how much is spent on the above
> method?
> > >> What do you return by default? How big is every doc you display in
> your
> > >> results?
> > >> Might be the matter that both collections work on the same ressources.
> > >> Try elaborating your use-case.
> > >>
> > >> Anyway, it seems like you just made a test to see what will be the
> > >> performance hit in a distributed environment so I'll try to explain
> some
> > >> things we encountered in our benchmarks, with a case that has at least
> > the
> > >> similarity of the num of docs fetched.
> > >>
> > >> We reclaim 2000 docs every query, running over 40 shards. This means
> > >> every shard is actually transfering to our frontend 2000 docs every
> > >> document-match request (the first you were referring to). Even if
> lazily
> > >> loaded, reading 2000 id's (on 40 servers) and lazy loading the fields
> > is a
> > >> tough job. Waiting for the slowest shard to respond, then sorting the
> > docs
> > >> and reloading (lazy or not) the top 2000 docs might take a long time.
> > >>
> > >> Our times are 4-8 secs, but do it's not possible comparing cases.
> We've
> > >> done few steps that improved it along the way, steps that led to
> others.
> > >> These were our starters:
> > >>
> > >>     1. Profile these queries from different servers and solr
> instances,
> > >> try
> > >>     putting your finger what collection is working hard and why. Check
> > if
> > >>     you're stuck on components that don't have an added value for you
> > but
> > >> are
> > >>     used by default.
> > >>     2. Consider eliminating the doc cache. It loads lots of (partly)
> > lazy
> > >>     documents that their probability of secondary usage is low.
> There's
> > >> no such
> > >>     thing "popular docs" when requesting so many docs. You may be
> using
> > >> your
> > >>     memory in a better way.
> > >>     3. Bottleneck check - inner server metrics as cpu user / iowait,
> > >> packets
> > >>     transferred over the network, page faults etc. are excellent in
> > order
> > >> to
> > >>     understand if the disk/network/cpu is slowing you down. Then
> upgrade
> > >>     hardware in one of the shards to check if it helps by looking at
> the
> > >>     upgraded shard qTime compared to other.
> > >>     4. Warm up the index after commiting - try to benchmark how do
> > queries
> > >>     performs before and after some warm-up, let's say some few
> hundreds
> > of
> > >>     queries (from your previous system) in order to warm up the os
> cache
> > >>     (assuming your using NRTDirectoryFactory)
> > >>
> > >>
> > >> Good luck,
> > >> Manu
> > >>
> > >>
> > >> On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <
> > erickerickson@gmail.com>
> > >> wrote:
> > >>
> > >>  One thing you can try, and this is more diagnostic than a cure, is
> > >>> return just the id field (and insure that lazy field loading is
> true).
> > >>> That'll tell you whether the issue is actually fetching the document
> > >>> off disk and decompressing, although frankly that's unlikely since
> you
> > >>> can get your 5,000 rows from a single machine quickly.
> > >>>
> > >>> The code you found where Solr is spending its time, is that on the
> > >>> "routing" core or on the shards? I actually have a hard time
> > >>> understanding how that code could take a long time, doesn't seem
> > >>> right.
> > >>>
> > >>> You are transferring 5,000 docs across the network, so it's possible
> > >>> that your network is just slow, that's certainly a difference between
> > >>> the local and remote case, but that's a stab in the dark.
> > >>>
> > >>> Not much help I know,
> > >>> Erick
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <el...@checkpoint.com>
> > >>> wrote:
> > >>>
> > >>>  Erick, Thanks for your response.
> > >>>>
> > >>>> We are upgrading our system using Solr.
> > >>>> We need to preserve old functionality.  Our client displays 5K
> > >>>> document and groups them.
> > >>>>
> > >>>> Is there a way to refactor code in order to improve distributed
> > >>>> documents fetching?
> > >>>>
> > >>>> Thanks.
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: Erick Erickson [mailto:erickerickson@gmail.com]
> > >>>> Sent: Wednesday, October 30, 2013 3:17 AM
> > >>>> To: solr-user@lucene.apache.org
> > >>>> Subject: Re: distributed search is significantly slower than direct
> > >>>>
> > >>> search
> > >>>
> > >>>> You can't. There will inevitably be some overhead in the distributed
> > >>>>
> > >>> case.
> > >>>
> > >>>> That said, 7 seconds is quite long.
> > >>>>
> > >>>> 5,000 rows is excessive, and probably where your issue is. You're
> > >>>> having to go out and fetch the docs across the wire. Perhaps there
> > >>>> is some batching that could be done there, I don't know whether this
> > >>>> is one document per request or not.
> > >>>>
> > >>>> Why 5K docs?
> > >>>>
> > >>>> Best,
> > >>>> Erick
> > >>>>
> > >>>>
> > >>>> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <el...@checkpoint.com>
> > >>>>
> > >>> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>>
> > >>>>> I am using Solr 4.4 with multi cores. One core (called template)
> > >>>>> is my "routing" core.
> > >>>>>
> > >>>>> When I run
> > >>>>>
> > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> > >>>>> 0.0.1:8983/solr/core1,
> > >>>>> it consistently takes about 7s.
> > >>>>> When I run
> > >>>>> http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
> > >>>>> consistently takes about 40ms.
> > >>>>>
> > >>>>> I profiled the distributed query.
> > >>>>> This is the distributed query process (I hope the terms are
> > accurate):
> > >>>>> When solr identifies a distributed query, it sends the query to
> > >>>>> the shard and get matched shard docs.
> > >>>>> Then it sends another query to the shard to get the Solr documents.
> > >>>>> Most time is spent in the last stage in the function "process" of
> > >>>>> "QueryComponent" in:
> > >>>>>
> > >>>>> for (int i=0; i<idArr.size(); i++) {
> > >>>>>          int id = req.getSearcher().getFirstMatch(
> > >>>>>                  new Term(idField.getName(),
> > >>>>> idField.getType().toInternal(idArr.get(i))));
> > >>>>>
> > >>>>> How can I make my distributed query as fast as the direct one?
> > >>>>>
> > >>>>> Thanks.
> > >>>>>
> > >>>>>
> > >>>> Email secured by Check Point
> > >>>>
> > >>>>
> > >> Email secured by Check Point
> > >>
> > >
> > >
> >
>

Re: distributed search is significantly slower than direct search

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
Hi Yuval, quick question. You say that your code has 750k docs and around
400mb? Is this some kind of test dataset and you expect it to grow
significantly? For an index of this size, I wouldn't use distributed
search, single shard should be fine.


Tomás


On Sun, Nov 17, 2013 at 6:50 AM, Yuval Dotan <yu...@gmail.com> wrote:

> Hi,
>
> I isolated the case
>
> Installed on a new machine (2 x Xeon E5410 2.33GHz)
>
> I have an environment with 12Gb of memory.
>
> I assigned 6gb of memory to Solr and I’m not running any other memory
> consuming process so no memory issues should arise.
>
> Removed all indexes apart from two:
>
> emptyCore – empty – used for routing
>
> core1 – holds the stored data – has ~750,000 docs and size of 400Mb
>
> Again this is a single machine that holds both indexes.
>
> The query
>
> http://localhost:8210/solr/emptyCore/select?rows=5000&q=*:*&shards=127.0.0.1:8210/solr/core1&wt=jsonQTime
> takes ~3 seconds
>
> and direct query
> http://localhost:8210/solr/core1/select?rows=5000&q=*:*&wt=json Qtime
> takes
> ~15 ms - a magnitude difference.
>
> I ran the long query several times and got an improvement of about a sec
> (33%) but that’s it.
>
> I need to better understand why this is happening.
>
> I tried looking at Solr code and debugging the issue but with no success.
>
> The one thing I did notice is that the getFirstMatch method which receives
> the doc id, searches the term dict and returns the internal id takes most
> of the time for some reason.
>
> I am pretty stuck and would appreciate any ideas
>
> My only solution for the moment is to bypass the distributed query,
> implement code in my own app that directly queries the relevant cores and
> handles the sorting etc..
>
> Thanks
>
>
>
>
> On Sat, Nov 16, 2013 at 2:39 PM, Michael Sokolov <
> msokolov@safaribooksonline.com> wrote:
>
> > Did you say what the memory profile of your machine is?  How much memory,
> > and how large are the shards? This is just a random guess, but it might
> be
> > that if you are memory-constrained, there is a lot of thrashing caused by
> > paging (swapping?) in and out the sharded indexes while a single index
> can
> > be scanned linearly, even if it does need to be paged in.
> >
> > -Mike
> >
> >
> > On 11/14/2013 8:10 AM, Elran Dvir wrote:
> >
> >> Hi,
> >>
> >> We tried returning just the id field and got exactly the same
> performance.
> >> Our system is distributed but all shards are in a single machine so
> >> network issues are not a factor.
> >> The code we found where Solr is spending its time is on the shard and
> not
> >> on the routing core, again all shards are local.
> >> We investigated the getFirstMatch() method and noticed that the
> >> MultiTermEnum.reset (inside MultiTerm.iterator) and MultiTerm.seekExact
> >> take 99% of the time.
> >> Inside these methods, the call to BlockTreeTermsReader$
> >> FieldReader$SegmentTermsEnum$Frame.loadBlock  takes most of the time.
> >> Out of the 7 seconds  run these methods take ~5 and
> >> BinaryResponseWriter.write takes the rest(~ 2 seconds).
> >>
> >> We tried increasing cache sizes and got hits, but it only improved the
> >> query time by a second (~6), so no major effect.
> >> We are not indexing during our tests. The performance is similar.
> >> (How do we measure doc size? Is it important due to the fact that the
> >> performance is the same when returning only id field?)
> >>
> >> We still don't completely understand why the query takes this much
> longer
> >> although the cores are on the same machine.
> >>
> >> Is there a way to improve the performance (code, configuration, query)?
> >>
> >> -----Original Message-----
> >> From: idokissos@gmail.com [mailto:idokissos@gmail.com] On Behalf Of
> >> Manuel Le Normand
> >> Sent: Thursday, November 14, 2013 1:30 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: distributed search is significantly slower than direct
> search
> >>
> >> It's surprising such a query takes a long time, I would assume that
> after
> >> trying consistently q=*:* you should be getting cache hits and times
> should
> >> be faster. Try see in the adminUI how do your query/doc cache perform.
> >> Moreover, the query in itself is just asking the first 5000 docs that
> >> were indexed (returing the first [docid]), so seems all this time is
> wasted
> >> on transfer. Out of these 7 secs how much is spent on the above method?
> >> What do you return by default? How big is every doc you display in your
> >> results?
> >> Might be the matter that both collections work on the same ressources.
> >> Try elaborating your use-case.
> >>
> >> Anyway, it seems like you just made a test to see what will be the
> >> performance hit in a distributed environment so I'll try to explain some
> >> things we encountered in our benchmarks, with a case that has at least
> the
> >> similarity of the num of docs fetched.
> >>
> >> We reclaim 2000 docs every query, running over 40 shards. This means
> >> every shard is actually transfering to our frontend 2000 docs every
> >> document-match request (the first you were referring to). Even if lazily
> >> loaded, reading 2000 id's (on 40 servers) and lazy loading the fields
> is a
> >> tough job. Waiting for the slowest shard to respond, then sorting the
> docs
> >> and reloading (lazy or not) the top 2000 docs might take a long time.
> >>
> >> Our times are 4-8 secs, but do it's not possible comparing cases. We've
> >> done few steps that improved it along the way, steps that led to others.
> >> These were our starters:
> >>
> >>     1. Profile these queries from different servers and solr instances,
> >> try
> >>     putting your finger what collection is working hard and why. Check
> if
> >>     you're stuck on components that don't have an added value for you
> but
> >> are
> >>     used by default.
> >>     2. Consider eliminating the doc cache. It loads lots of (partly)
> lazy
> >>     documents that their probability of secondary usage is low. There's
> >> no such
> >>     thing "popular docs" when requesting so many docs. You may be using
> >> your
> >>     memory in a better way.
> >>     3. Bottleneck check - inner server metrics as cpu user / iowait,
> >> packets
> >>     transferred over the network, page faults etc. are excellent in
> order
> >> to
> >>     understand if the disk/network/cpu is slowing you down. Then upgrade
> >>     hardware in one of the shards to check if it helps by looking at the
> >>     upgraded shard qTime compared to other.
> >>     4. Warm up the index after commiting - try to benchmark how do
> queries
> >>     performs before and after some warm-up, let's say some few hundreds
> of
> >>     queries (from your previous system) in order to warm up the os cache
> >>     (assuming your using NRTDirectoryFactory)
> >>
> >>
> >> Good luck,
> >> Manu
> >>
> >>
> >> On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <
> erickerickson@gmail.com>
> >> wrote:
> >>
> >>  One thing you can try, and this is more diagnostic than a cure, is
> >>> return just the id field (and insure that lazy field loading is true).
> >>> That'll tell you whether the issue is actually fetching the document
> >>> off disk and decompressing, although frankly that's unlikely since you
> >>> can get your 5,000 rows from a single machine quickly.
> >>>
> >>> The code you found where Solr is spending its time, is that on the
> >>> "routing" core or on the shards? I actually have a hard time
> >>> understanding how that code could take a long time, doesn't seem
> >>> right.
> >>>
> >>> You are transferring 5,000 docs across the network, so it's possible
> >>> that your network is just slow, that's certainly a difference between
> >>> the local and remote case, but that's a stab in the dark.
> >>>
> >>> Not much help I know,
> >>> Erick
> >>>
> >>>
> >>>
> >>> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <el...@checkpoint.com>
> >>> wrote:
> >>>
> >>>  Erick, Thanks for your response.
> >>>>
> >>>> We are upgrading our system using Solr.
> >>>> We need to preserve old functionality.  Our client displays 5K
> >>>> document and groups them.
> >>>>
> >>>> Is there a way to refactor code in order to improve distributed
> >>>> documents fetching?
> >>>>
> >>>> Thanks.
> >>>>
> >>>> -----Original Message-----
> >>>> From: Erick Erickson [mailto:erickerickson@gmail.com]
> >>>> Sent: Wednesday, October 30, 2013 3:17 AM
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: Re: distributed search is significantly slower than direct
> >>>>
> >>> search
> >>>
> >>>> You can't. There will inevitably be some overhead in the distributed
> >>>>
> >>> case.
> >>>
> >>>> That said, 7 seconds is quite long.
> >>>>
> >>>> 5,000 rows is excessive, and probably where your issue is. You're
> >>>> having to go out and fetch the docs across the wire. Perhaps there
> >>>> is some batching that could be done there, I don't know whether this
> >>>> is one document per request or not.
> >>>>
> >>>> Why 5K docs?
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>>
> >>>> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <el...@checkpoint.com>
> >>>>
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>>>
> >>>>> I am using Solr 4.4 with multi cores. One core (called template)
> >>>>> is my "routing" core.
> >>>>>
> >>>>> When I run
> >>>>>
> http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> >>>>> 0.0.1:8983/solr/core1,
> >>>>> it consistently takes about 7s.
> >>>>> When I run
> >>>>> http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
> >>>>> consistently takes about 40ms.
> >>>>>
> >>>>> I profiled the distributed query.
> >>>>> This is the distributed query process (I hope the terms are
> accurate):
> >>>>> When solr identifies a distributed query, it sends the query to
> >>>>> the shard and get matched shard docs.
> >>>>> Then it sends another query to the shard to get the Solr documents.
> >>>>> Most time is spent in the last stage in the function "process" of
> >>>>> "QueryComponent" in:
> >>>>>
> >>>>> for (int i=0; i<idArr.size(); i++) {
> >>>>>          int id = req.getSearcher().getFirstMatch(
> >>>>>                  new Term(idField.getName(),
> >>>>> idField.getType().toInternal(idArr.get(i))));
> >>>>>
> >>>>> How can I make my distributed query as fast as the direct one?
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>>
> >>>> Email secured by Check Point
> >>>>
> >>>>
> >> Email secured by Check Point
> >>
> >
> >
>

Re: distributed search is significantly slower than direct search

Posted by Yuval Dotan <yu...@gmail.com>.
Hi,

I isolated the case

Installed on a new machine (2 x Xeon E5410 2.33GHz)

I have an environment with 12Gb of memory.

I assigned 6gb of memory to Solr and I’m not running any other memory
consuming process so no memory issues should arise.

Removed all indexes apart from two:

emptyCore – empty – used for routing

core1 – holds the stored data – has ~750,000 docs and size of 400Mb

Again this is a single machine that holds both indexes.

The query
http://localhost:8210/solr/emptyCore/select?rows=5000&q=*:*&shards=127.0.0.1:8210/solr/core1&wt=jsonQTime
takes ~3 seconds

and direct query
http://localhost:8210/solr/core1/select?rows=5000&q=*:*&wt=json Qtime takes
~15 ms - a magnitude difference.

I ran the long query several times and got an improvement of about a sec
(33%) but that’s it.

I need to better understand why this is happening.

I tried looking at Solr code and debugging the issue but with no success.

The one thing I did notice is that the getFirstMatch method which receives
the doc id, searches the term dict and returns the internal id takes most
of the time for some reason.

I am pretty stuck and would appreciate any ideas

My only solution for the moment is to bypass the distributed query,
implement code in my own app that directly queries the relevant cores and
handles the sorting etc..

Thanks




On Sat, Nov 16, 2013 at 2:39 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> Did you say what the memory profile of your machine is?  How much memory,
> and how large are the shards? This is just a random guess, but it might be
> that if you are memory-constrained, there is a lot of thrashing caused by
> paging (swapping?) in and out the sharded indexes while a single index can
> be scanned linearly, even if it does need to be paged in.
>
> -Mike
>
>
> On 11/14/2013 8:10 AM, Elran Dvir wrote:
>
>> Hi,
>>
>> We tried returning just the id field and got exactly the same performance.
>> Our system is distributed but all shards are in a single machine so
>> network issues are not a factor.
>> The code we found where Solr is spending its time is on the shard and not
>> on the routing core, again all shards are local.
>> We investigated the getFirstMatch() method and noticed that the
>> MultiTermEnum.reset (inside MultiTerm.iterator) and MultiTerm.seekExact
>> take 99% of the time.
>> Inside these methods, the call to BlockTreeTermsReader$
>> FieldReader$SegmentTermsEnum$Frame.loadBlock  takes most of the time.
>> Out of the 7 seconds  run these methods take ~5 and
>> BinaryResponseWriter.write takes the rest(~ 2 seconds).
>>
>> We tried increasing cache sizes and got hits, but it only improved the
>> query time by a second (~6), so no major effect.
>> We are not indexing during our tests. The performance is similar.
>> (How do we measure doc size? Is it important due to the fact that the
>> performance is the same when returning only id field?)
>>
>> We still don't completely understand why the query takes this much longer
>> although the cores are on the same machine.
>>
>> Is there a way to improve the performance (code, configuration, query)?
>>
>> -----Original Message-----
>> From: idokissos@gmail.com [mailto:idokissos@gmail.com] On Behalf Of
>> Manuel Le Normand
>> Sent: Thursday, November 14, 2013 1:30 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: distributed search is significantly slower than direct search
>>
>> It's surprising such a query takes a long time, I would assume that after
>> trying consistently q=*:* you should be getting cache hits and times should
>> be faster. Try see in the adminUI how do your query/doc cache perform.
>> Moreover, the query in itself is just asking the first 5000 docs that
>> were indexed (returing the first [docid]), so seems all this time is wasted
>> on transfer. Out of these 7 secs how much is spent on the above method?
>> What do you return by default? How big is every doc you display in your
>> results?
>> Might be the matter that both collections work on the same ressources.
>> Try elaborating your use-case.
>>
>> Anyway, it seems like you just made a test to see what will be the
>> performance hit in a distributed environment so I'll try to explain some
>> things we encountered in our benchmarks, with a case that has at least the
>> similarity of the num of docs fetched.
>>
>> We reclaim 2000 docs every query, running over 40 shards. This means
>> every shard is actually transfering to our frontend 2000 docs every
>> document-match request (the first you were referring to). Even if lazily
>> loaded, reading 2000 id's (on 40 servers) and lazy loading the fields is a
>> tough job. Waiting for the slowest shard to respond, then sorting the docs
>> and reloading (lazy or not) the top 2000 docs might take a long time.
>>
>> Our times are 4-8 secs, but do it's not possible comparing cases. We've
>> done few steps that improved it along the way, steps that led to others.
>> These were our starters:
>>
>>     1. Profile these queries from different servers and solr instances,
>> try
>>     putting your finger what collection is working hard and why. Check if
>>     you're stuck on components that don't have an added value for you but
>> are
>>     used by default.
>>     2. Consider eliminating the doc cache. It loads lots of (partly) lazy
>>     documents that their probability of secondary usage is low. There's
>> no such
>>     thing "popular docs" when requesting so many docs. You may be using
>> your
>>     memory in a better way.
>>     3. Bottleneck check - inner server metrics as cpu user / iowait,
>> packets
>>     transferred over the network, page faults etc. are excellent in order
>> to
>>     understand if the disk/network/cpu is slowing you down. Then upgrade
>>     hardware in one of the shards to check if it helps by looking at the
>>     upgraded shard qTime compared to other.
>>     4. Warm up the index after commiting - try to benchmark how do queries
>>     performs before and after some warm-up, let's say some few hundreds of
>>     queries (from your previous system) in order to warm up the os cache
>>     (assuming your using NRTDirectoryFactory)
>>
>>
>> Good luck,
>> Manu
>>
>>
>> On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <er...@gmail.com>
>> wrote:
>>
>>  One thing you can try, and this is more diagnostic than a cure, is
>>> return just the id field (and insure that lazy field loading is true).
>>> That'll tell you whether the issue is actually fetching the document
>>> off disk and decompressing, although frankly that's unlikely since you
>>> can get your 5,000 rows from a single machine quickly.
>>>
>>> The code you found where Solr is spending its time, is that on the
>>> "routing" core or on the shards? I actually have a hard time
>>> understanding how that code could take a long time, doesn't seem
>>> right.
>>>
>>> You are transferring 5,000 docs across the network, so it's possible
>>> that your network is just slow, that's certainly a difference between
>>> the local and remote case, but that's a stab in the dark.
>>>
>>> Not much help I know,
>>> Erick
>>>
>>>
>>>
>>> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <el...@checkpoint.com>
>>> wrote:
>>>
>>>  Erick, Thanks for your response.
>>>>
>>>> We are upgrading our system using Solr.
>>>> We need to preserve old functionality.  Our client displays 5K
>>>> document and groups them.
>>>>
>>>> Is there a way to refactor code in order to improve distributed
>>>> documents fetching?
>>>>
>>>> Thanks.
>>>>
>>>> -----Original Message-----
>>>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>>>> Sent: Wednesday, October 30, 2013 3:17 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: distributed search is significantly slower than direct
>>>>
>>> search
>>>
>>>> You can't. There will inevitably be some overhead in the distributed
>>>>
>>> case.
>>>
>>>> That said, 7 seconds is quite long.
>>>>
>>>> 5,000 rows is excessive, and probably where your issue is. You're
>>>> having to go out and fetch the docs across the wire. Perhaps there
>>>> is some batching that could be done there, I don't know whether this
>>>> is one document per request or not.
>>>>
>>>> Why 5K docs?
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>>
>>>> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <el...@checkpoint.com>
>>>>
>>> wrote:
>>>
>>>> Hi all,
>>>>>
>>>>> I am using Solr 4.4 with multi cores. One core (called template)
>>>>> is my "routing" core.
>>>>>
>>>>> When I run
>>>>> http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
>>>>> 0.0.1:8983/solr/core1,
>>>>> it consistently takes about 7s.
>>>>> When I run
>>>>> http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
>>>>> consistently takes about 40ms.
>>>>>
>>>>> I profiled the distributed query.
>>>>> This is the distributed query process (I hope the terms are accurate):
>>>>> When solr identifies a distributed query, it sends the query to
>>>>> the shard and get matched shard docs.
>>>>> Then it sends another query to the shard to get the Solr documents.
>>>>> Most time is spent in the last stage in the function "process" of
>>>>> "QueryComponent" in:
>>>>>
>>>>> for (int i=0; i<idArr.size(); i++) {
>>>>>          int id = req.getSearcher().getFirstMatch(
>>>>>                  new Term(idField.getName(),
>>>>> idField.getType().toInternal(idArr.get(i))));
>>>>>
>>>>> How can I make my distributed query as fast as the direct one?
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>> Email secured by Check Point
>>>>
>>>>
>> Email secured by Check Point
>>
>
>

Re: distributed search is significantly slower than direct search

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
Did you say what the memory profile of your machine is?  How much 
memory, and how large are the shards? This is just a random guess, but 
it might be that if you are memory-constrained, there is a lot of 
thrashing caused by paging (swapping?) in and out the sharded indexes 
while a single index can be scanned linearly, even if it does need to be 
paged in.

-Mike

On 11/14/2013 8:10 AM, Elran Dvir wrote:
> Hi,
>
> We tried returning just the id field and got exactly the same performance.
> Our system is distributed but all shards are in a single machine so network issues are not a factor.
> The code we found where Solr is spending its time is on the shard and not on the routing core, again all shards are local.
> We investigated the getFirstMatch() method and noticed that the MultiTermEnum.reset (inside MultiTerm.iterator) and MultiTerm.seekExact take 99% of the time.
> Inside these methods, the call to BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock  takes most of the time.
> Out of the 7 seconds  run these methods take ~5 and BinaryResponseWriter.write takes the rest(~ 2 seconds).
>
> We tried increasing cache sizes and got hits, but it only improved the query time by a second (~6), so no major effect.
> We are not indexing during our tests. The performance is similar.
> (How do we measure doc size? Is it important due to the fact that the performance is the same when returning only id field?)
>
> We still don't completely understand why the query takes this much longer although the cores are on the same machine.
>
> Is there a way to improve the performance (code, configuration, query)?
>
> -----Original Message-----
> From: idokissos@gmail.com [mailto:idokissos@gmail.com] On Behalf Of Manuel Le Normand
> Sent: Thursday, November 14, 2013 1:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: distributed search is significantly slower than direct search
>
> It's surprising such a query takes a long time, I would assume that after trying consistently q=*:* you should be getting cache hits and times should be faster. Try see in the adminUI how do your query/doc cache perform.
> Moreover, the query in itself is just asking the first 5000 docs that were indexed (returing the first [docid]), so seems all this time is wasted on transfer. Out of these 7 secs how much is spent on the above method? What do you return by default? How big is every doc you display in your results?
> Might be the matter that both collections work on the same ressources. Try elaborating your use-case.
>
> Anyway, it seems like you just made a test to see what will be the performance hit in a distributed environment so I'll try to explain some things we encountered in our benchmarks, with a case that has at least the similarity of the num of docs fetched.
>
> We reclaim 2000 docs every query, running over 40 shards. This means every shard is actually transfering to our frontend 2000 docs every document-match request (the first you were referring to). Even if lazily loaded, reading 2000 id's (on 40 servers) and lazy loading the fields is a tough job. Waiting for the slowest shard to respond, then sorting the docs and reloading (lazy or not) the top 2000 docs might take a long time.
>
> Our times are 4-8 secs, but do it's not possible comparing cases. We've done few steps that improved it along the way, steps that led to others.
> These were our starters:
>
>     1. Profile these queries from different servers and solr instances, try
>     putting your finger what collection is working hard and why. Check if
>     you're stuck on components that don't have an added value for you but are
>     used by default.
>     2. Consider eliminating the doc cache. It loads lots of (partly) lazy
>     documents that their probability of secondary usage is low. There's no such
>     thing "popular docs" when requesting so many docs. You may be using your
>     memory in a better way.
>     3. Bottleneck check - inner server metrics as cpu user / iowait, packets
>     transferred over the network, page faults etc. are excellent in order to
>     understand if the disk/network/cpu is slowing you down. Then upgrade
>     hardware in one of the shards to check if it helps by looking at the
>     upgraded shard qTime compared to other.
>     4. Warm up the index after commiting - try to benchmark how do queries
>     performs before and after some warm-up, let's say some few hundreds of
>     queries (from your previous system) in order to warm up the os cache
>     (assuming your using NRTDirectoryFactory)
>
>
> Good luck,
> Manu
>
>
> On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <er...@gmail.com>wrote:
>
>> One thing you can try, and this is more diagnostic than a cure, is
>> return just the id field (and insure that lazy field loading is true).
>> That'll tell you whether the issue is actually fetching the document
>> off disk and decompressing, although frankly that's unlikely since you
>> can get your 5,000 rows from a single machine quickly.
>>
>> The code you found where Solr is spending its time, is that on the
>> "routing" core or on the shards? I actually have a hard time
>> understanding how that code could take a long time, doesn't seem
>> right.
>>
>> You are transferring 5,000 docs across the network, so it's possible
>> that your network is just slow, that's certainly a difference between
>> the local and remote case, but that's a stab in the dark.
>>
>> Not much help I know,
>> Erick
>>
>>
>>
>> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <el...@checkpoint.com> wrote:
>>
>>> Erick, Thanks for your response.
>>>
>>> We are upgrading our system using Solr.
>>> We need to preserve old functionality.  Our client displays 5K
>>> document and groups them.
>>>
>>> Is there a way to refactor code in order to improve distributed
>>> documents fetching?
>>>
>>> Thanks.
>>>
>>> -----Original Message-----
>>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>>> Sent: Wednesday, October 30, 2013 3:17 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: distributed search is significantly slower than direct
>> search
>>> You can't. There will inevitably be some overhead in the distributed
>> case.
>>> That said, 7 seconds is quite long.
>>>
>>> 5,000 rows is excessive, and probably where your issue is. You're
>>> having to go out and fetch the docs across the wire. Perhaps there
>>> is some batching that could be done there, I don't know whether this
>>> is one document per request or not.
>>>
>>> Why 5K docs?
>>>
>>> Best,
>>> Erick
>>>
>>>
>>> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <el...@checkpoint.com>
>> wrote:
>>>> Hi all,
>>>>
>>>> I am using Solr 4.4 with multi cores. One core (called template)
>>>> is my "routing" core.
>>>>
>>>> When I run
>>>> http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
>>>> 0.0.1:8983/solr/core1,
>>>> it consistently takes about 7s.
>>>> When I run
>>>> http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it consistently takes about 40ms.
>>>>
>>>> I profiled the distributed query.
>>>> This is the distributed query process (I hope the terms are accurate):
>>>> When solr identifies a distributed query, it sends the query to
>>>> the shard and get matched shard docs.
>>>> Then it sends another query to the shard to get the Solr documents.
>>>> Most time is spent in the last stage in the function "process" of
>>>> "QueryComponent" in:
>>>>
>>>> for (int i=0; i<idArr.size(); i++) {
>>>>          int id = req.getSearcher().getFirstMatch(
>>>>                  new Term(idField.getName(),
>>>> idField.getType().toInternal(idArr.get(i))));
>>>>
>>>> How can I make my distributed query as fast as the direct one?
>>>>
>>>> Thanks.
>>>>
>>>
>>> Email secured by Check Point
>>>
>
> Email secured by Check Point


RE: distributed search is significantly slower than direct search

Posted by Elran Dvir <el...@checkpoint.com>.
Hi,

We tried returning just the id field and got exactly the same performance.
Our system is distributed but all shards are in a single machine so network issues are not a factor.
The code we found where Solr is spending its time is on the shard and not on the routing core, again all shards are local.
We investigated the getFirstMatch() method and noticed that the MultiTermEnum.reset (inside MultiTerm.iterator) and MultiTerm.seekExact take 99% of the time. 
Inside these methods, the call to BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock  takes most of the time.
Out of the 7 seconds  run these methods take ~5 and BinaryResponseWriter.write takes the rest(~ 2 seconds).

We tried increasing cache sizes and got hits, but it only improved the query time by a second (~6), so no major effect.
We are not indexing during our tests. The performance is similar.
(How do we measure doc size? Is it important due to the fact that the performance is the same when returning only id field?)

We still don't completely understand why the query takes this much longer although the cores are on the same machine.

Is there a way to improve the performance (code, configuration, query)?

-----Original Message-----
From: idokissos@gmail.com [mailto:idokissos@gmail.com] On Behalf Of Manuel Le Normand
Sent: Thursday, November 14, 2013 1:30 AM
To: solr-user@lucene.apache.org
Subject: Re: distributed search is significantly slower than direct search

It's surprising such a query takes a long time, I would assume that after trying consistently q=*:* you should be getting cache hits and times should be faster. Try see in the adminUI how do your query/doc cache perform.
Moreover, the query in itself is just asking the first 5000 docs that were indexed (returing the first [docid]), so seems all this time is wasted on transfer. Out of these 7 secs how much is spent on the above method? What do you return by default? How big is every doc you display in your results?
Might be the matter that both collections work on the same ressources. Try elaborating your use-case.

Anyway, it seems like you just made a test to see what will be the performance hit in a distributed environment so I'll try to explain some things we encountered in our benchmarks, with a case that has at least the similarity of the num of docs fetched.

We reclaim 2000 docs every query, running over 40 shards. This means every shard is actually transfering to our frontend 2000 docs every document-match request (the first you were referring to). Even if lazily loaded, reading 2000 id's (on 40 servers) and lazy loading the fields is a tough job. Waiting for the slowest shard to respond, then sorting the docs and reloading (lazy or not) the top 2000 docs might take a long time.

Our times are 4-8 secs, but do it's not possible comparing cases. We've done few steps that improved it along the way, steps that led to others.
These were our starters:

   1. Profile these queries from different servers and solr instances, try
   putting your finger what collection is working hard and why. Check if
   you're stuck on components that don't have an added value for you but are
   used by default.
   2. Consider eliminating the doc cache. It loads lots of (partly) lazy
   documents that their probability of secondary usage is low. There's no such
   thing "popular docs" when requesting so many docs. You may be using your
   memory in a better way.
   3. Bottleneck check - inner server metrics as cpu user / iowait, packets
   transferred over the network, page faults etc. are excellent in order to
   understand if the disk/network/cpu is slowing you down. Then upgrade
   hardware in one of the shards to check if it helps by looking at the
   upgraded shard qTime compared to other.
   4. Warm up the index after commiting - try to benchmark how do queries
   performs before and after some warm-up, let's say some few hundreds of
   queries (from your previous system) in order to warm up the os cache
   (assuming your using NRTDirectoryFactory)


Good luck,
Manu


On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <er...@gmail.com>wrote:

> One thing you can try, and this is more diagnostic than a cure, is 
> return just the id field (and insure that lazy field loading is true). 
> That'll tell you whether the issue is actually fetching the document 
> off disk and decompressing, although frankly that's unlikely since you 
> can get your 5,000 rows from a single machine quickly.
>
> The code you found where Solr is spending its time, is that on the 
> "routing" core or on the shards? I actually have a hard time 
> understanding how that code could take a long time, doesn't seem 
> right.
>
> You are transferring 5,000 docs across the network, so it's possible 
> that your network is just slow, that's certainly a difference between 
> the local and remote case, but that's a stab in the dark.
>
> Not much help I know,
> Erick
>
>
>
> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <el...@checkpoint.com> wrote:
>
> > Erick, Thanks for your response.
> >
> > We are upgrading our system using Solr.
> > We need to preserve old functionality.  Our client displays 5K 
> > document and groups them.
> >
> > Is there a way to refactor code in order to improve distributed 
> > documents fetching?
> >
> > Thanks.
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: Wednesday, October 30, 2013 3:17 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: distributed search is significantly slower than direct
> search
> >
> > You can't. There will inevitably be some overhead in the distributed
> case.
> > That said, 7 seconds is quite long.
> >
> > 5,000 rows is excessive, and probably where your issue is. You're 
> > having to go out and fetch the docs across the wire. Perhaps there 
> > is some batching that could be done there, I don't know whether this 
> > is one document per request or not.
> >
> > Why 5K docs?
> >
> > Best,
> > Erick
> >
> >
> > On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <el...@checkpoint.com>
> wrote:
> >
> > > Hi all,
> > >
> > > I am using Solr 4.4 with multi cores. One core (called template) 
> > > is my "routing" core.
> > >
> > > When I run
> > > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> > > 0.0.1:8983/solr/core1,
> > > it consistently takes about 7s.
> > > When I run 
> > > http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it consistently takes about 40ms.
> > >
> > > I profiled the distributed query.
> > > This is the distributed query process (I hope the terms are accurate):
> > > When solr identifies a distributed query, it sends the query to 
> > > the shard and get matched shard docs.
> > > Then it sends another query to the shard to get the Solr documents.
> > > Most time is spent in the last stage in the function "process" of 
> > > "QueryComponent" in:
> > >
> > > for (int i=0; i<idArr.size(); i++) {
> > >         int id = req.getSearcher().getFirstMatch(
> > >                 new Term(idField.getName(), 
> > > idField.getType().toInternal(idArr.get(i))));
> > >
> > > How can I make my distributed query as fast as the direct one?
> > >
> > > Thanks.
> > >
> >
> >
> > Email secured by Check Point
> >
>


Email secured by Check Point

Re: distributed search is significantly slower than direct search

Posted by Manuel Le Normand <ma...@gmail.com>.
It's surprising such a query takes a long time, I would assume that after
trying consistently q=*:* you should be getting cache hits and times should
be faster. Try see in the adminUI how do your query/doc cache perform.
Moreover, the query in itself is just asking the first 5000 docs that were
indexed (returing the first [docid]), so seems all this time is wasted on
transfer. Out of these 7 secs how much is spent on the above method? What
do you return by default? How big is every doc you display in your results?
Might be the matter that both collections work on the same ressources. Try
elaborating your use-case.

Anyway, it seems like you just made a test to see what will be the
performance hit in a distributed environment so I'll try to explain some
things we encountered in our benchmarks, with a case that has at least the
similarity of the num of docs fetched.

We reclaim 2000 docs every query, running over 40 shards. This means every
shard is actually transfering to our frontend 2000 docs every
document-match request (the first you were referring to). Even if lazily
loaded, reading 2000 id's (on 40 servers) and lazy loading the fields is a
tough job. Waiting for the slowest shard to respond, then sorting the docs
and reloading (lazy or not) the top 2000 docs might take a long time.

Our times are 4-8 secs, but do it's not possible comparing cases. We've
done few steps that improved it along the way, steps that led to others.
These were our starters:

   1. Profile these queries from different servers and solr instances, try
   putting your finger what collection is working hard and why. Check if
   you're stuck on components that don't have an added value for you but are
   used by default.
   2. Consider eliminating the doc cache. It loads lots of (partly) lazy
   documents that their probability of secondary usage is low. There's no such
   thing "popular docs" when requesting so many docs. You may be using your
   memory in a better way.
   3. Bottleneck check - inner server metrics as cpu user / iowait, packets
   transferred over the network, page faults etc. are excellent in order to
   understand if the disk/network/cpu is slowing you down. Then upgrade
   hardware in one of the shards to check if it helps by looking at the
   upgraded shard qTime compared to other.
   4. Warm up the index after commiting - try to benchmark how do queries
   performs before and after some warm-up, let's say some few hundreds of
   queries (from your previous system) in order to warm up the os cache
   (assuming your using NRTDirectoryFactory)


Good luck,
Manu


On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <er...@gmail.com>wrote:

> One thing you can try, and this is more diagnostic than a cure, is return
> just
> the id field (and insure that lazy field loading is true). That'll tell you
> whether
> the issue is actually fetching the document off disk and decompressing,
> although
> frankly that's unlikely since you can get your 5,000 rows from a single
> machine
> quickly.
>
> The code you found where Solr is spending its time, is that on the
> "routing" core
> or on the shards? I actually have a hard time understanding how that
> code could take a long time, doesn't seem right.
>
> You are transferring 5,000 docs across the network, so it's possible that
> your network is just slow, that's certainly a difference between the local
> and remote case, but that's a stab in the dark.
>
> Not much help I know,
> Erick
>
>
>
> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <el...@checkpoint.com> wrote:
>
> > Erick, Thanks for your response.
> >
> > We are upgrading our system using Solr.
> > We need to preserve old functionality.  Our client displays 5K document
> > and groups them.
> >
> > Is there a way to refactor code in order to improve distributed documents
> > fetching?
> >
> > Thanks.
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: Wednesday, October 30, 2013 3:17 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: distributed search is significantly slower than direct
> search
> >
> > You can't. There will inevitably be some overhead in the distributed
> case.
> > That said, 7 seconds is quite long.
> >
> > 5,000 rows is excessive, and probably where your issue is. You're having
> > to go out and fetch the docs across the wire. Perhaps there is some
> > batching that could be done there, I don't know whether this is one
> > document per request or not.
> >
> > Why 5K docs?
> >
> > Best,
> > Erick
> >
> >
> > On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <el...@checkpoint.com>
> wrote:
> >
> > > Hi all,
> > >
> > > I am using Solr 4.4 with multi cores. One core (called template) is my
> > > "routing" core.
> > >
> > > When I run
> > > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> > > 0.0.1:8983/solr/core1,
> > > it consistently takes about 7s.
> > > When I run http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
> > > consistently takes about 40ms.
> > >
> > > I profiled the distributed query.
> > > This is the distributed query process (I hope the terms are accurate):
> > > When solr identifies a distributed query, it sends the query to the
> > > shard and get matched shard docs.
> > > Then it sends another query to the shard to get the Solr documents.
> > > Most time is spent in the last stage in the function "process" of
> > > "QueryComponent" in:
> > >
> > > for (int i=0; i<idArr.size(); i++) {
> > >         int id = req.getSearcher().getFirstMatch(
> > >                 new Term(idField.getName(),
> > > idField.getType().toInternal(idArr.get(i))));
> > >
> > > How can I make my distributed query as fast as the direct one?
> > >
> > > Thanks.
> > >
> >
> >
> > Email secured by Check Point
> >
>

Re: distributed search is significantly slower than direct search

Posted by Erick Erickson <er...@gmail.com>.
One thing you can try, and this is more diagnostic than a cure, is return
just
the id field (and insure that lazy field loading is true). That'll tell you
whether
the issue is actually fetching the document off disk and decompressing,
although
frankly that's unlikely since you can get your 5,000 rows from a single
machine
quickly.

The code you found where Solr is spending its time, is that on the
"routing" core
or on the shards? I actually have a hard time understanding how that
code could take a long time, doesn't seem right.

You are transferring 5,000 docs across the network, so it's possible that
your network is just slow, that's certainly a difference between the local
and remote case, but that's a stab in the dark.

Not much help I know,
Erick



On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <el...@checkpoint.com> wrote:

> Erick, Thanks for your response.
>
> We are upgrading our system using Solr.
> We need to preserve old functionality.  Our client displays 5K document
> and groups them.
>
> Is there a way to refactor code in order to improve distributed documents
> fetching?
>
> Thanks.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, October 30, 2013 3:17 AM
> To: solr-user@lucene.apache.org
> Subject: Re: distributed search is significantly slower than direct search
>
> You can't. There will inevitably be some overhead in the distributed case.
> That said, 7 seconds is quite long.
>
> 5,000 rows is excessive, and probably where your issue is. You're having
> to go out and fetch the docs across the wire. Perhaps there is some
> batching that could be done there, I don't know whether this is one
> document per request or not.
>
> Why 5K docs?
>
> Best,
> Erick
>
>
> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <el...@checkpoint.com> wrote:
>
> > Hi all,
> >
> > I am using Solr 4.4 with multi cores. One core (called template) is my
> > "routing" core.
> >
> > When I run
> > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> > 0.0.1:8983/solr/core1,
> > it consistently takes about 7s.
> > When I run http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
> > consistently takes about 40ms.
> >
> > I profiled the distributed query.
> > This is the distributed query process (I hope the terms are accurate):
> > When solr identifies a distributed query, it sends the query to the
> > shard and get matched shard docs.
> > Then it sends another query to the shard to get the Solr documents.
> > Most time is spent in the last stage in the function "process" of
> > "QueryComponent" in:
> >
> > for (int i=0; i<idArr.size(); i++) {
> >         int id = req.getSearcher().getFirstMatch(
> >                 new Term(idField.getName(),
> > idField.getType().toInternal(idArr.get(i))));
> >
> > How can I make my distributed query as fast as the direct one?
> >
> > Thanks.
> >
>
>
> Email secured by Check Point
>