You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by mike anderson <sa...@gmail.com> on 2010/02/06 16:35:16 UTC

priority queue in query component

I have a need to favor documents from one shard over another when duplicates
occur. I found this code in the query component:

          String prevShard = uniqueDoc.put(id, srsp.getShard());
          if (prevShard != null) {
            // duplicate detected
            numFound--;

            // For now, just always use the first encountered since we can't
currently
            // remove the previous one added to the priority queue.  If we
switched
            // to the Java5 PriorityQueue, this would be easier.
            continue;
            // make which duplicate is used deterministic based on shard
            // if (prevShard.compareTo(srsp.shard) >= 0) {
            //  TODO: remove previous from priority queue
            //  continue;
            // }
          }


Is there a ticket open for this issue? What would it take to fix?

Thanks,
Mike

Re: priority queue in query component

Posted by Ted Dunning <te...@gmail.com>.
Katta has a very flexible and usable option for this even in the absence of
replicas.

The idea is that shards may report results, may report failure, may report
late, may never report or may have a transport layer issue.  All kinds of
behavior should be handled.

What is done with katta is that each search has a deadline and a partial
results policy.  At any time, if all results have been received, a complete
set of results is returned.  If a deadline is reached, then the policy is
interrogated with the results so far.  The policy has the option to return a
failure, partial results (with timeouts reported on missing shards) or to
set a new deadline and possibly a new policy (so that the number of missing
results gets more relaxed as time passes).  The policy is also called each
time a new result is received or failure is noted.

Transport layer issues and explicit error returns are handled by the
framework.  Any time one of these is encountered, the search is immediately
dispatched to a replica of the shard if one exists.  In that case, that
query may have a late start and may not return by the deadline, depending on
policy.  If no replica is available that has not been queried, an error
result is recorded for that shard.

Note that Katta even supports fail-fast in this scenario since the partial
result policy can return a new deadline for all partial results that have no
hard failures and can return a failure if it notes any shard failures.

On Tue, Feb 9, 2010 at 5:25 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> The SolrCloud branch now has load balancing and fail-over amongst
> shard replicas.
> Partial results aren't available yet (if there are no up replicas for
> a shard), but that is planned.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Tue, Feb 9, 2010 at 8:21 AM, Jan Høydahl / Cominvent
> <ja...@cominvent.com> wrote:
> > Isn't that OK as long as there is the option of allowing partial results
> if you really want?
> > Keeping the logic simple has its benefits. Let client be responsible for
> query resubmit strategy, and let load balancer (or shard manager) be
> responsible for marking a node/shard as dead/inresponsive and choosing
> another for the next query.
> >
> > --
> > Jan Høydahl  - search architect
> > Cominvent AS - www.cominvent.com
> >
> > On 9. feb. 2010, at 04.36, Lance Norskog wrote:
> >
> >> At this point, Distributed Search does not support any recovery if
> >> when one or more shards fail. If any fail or time out, the whole query
> >> fails.
> >>
> >> On Sat, Feb 6, 2010 at 9:34 AM, mike anderson <sa...@gmail.com>
> wrote:
> >>> "so if we received the response from shard2 before shard1, we would
> just
> >>> queue it up and wait for the response to shard1."
> >>>
> >>> This crossed my mind, but my concern was how to handle the case when
> shard1
> >>> never responds. Is this something I need to worry about?
> >>>
> >>> -mike
> >>>
> >>> On Sat, Feb 6, 2010 at 11:33 AM, Yonik Seeley <
> yonik@lucidimagination.com>wrote:
> >>>
> >>>> It seems like changing an element in a priority queue breaks the
> >>>> invariants, and hence it's not doable with a priority queue and with
> >>>> the current strategy of adding sub-responses as they are received.
> >>>>
> >>>> One way to continue using a priority queue would be to add
> >>>> sub-responses to the queue in the preferred order... so if we received
> >>>> the response from shard2 before shard1, we would just queue it up and
> >>>> wait for the response to shard1.
> >>>>
> >>>> -Yonik
> >>>> http://www.lucidimagination.com
> >>>>
> >>>>
> >>>> On Sat, Feb 6, 2010 at 10:35 AM, mike anderson <
> saidtherobot@gmail.com>
> >>>> wrote:
> >>>>> I have a need to favor documents from one shard over another when
> >>>> duplicates
> >>>>> occur. I found this code in the query component:
> >>>>>
> >>>>>          String prevShard = uniqueDoc.put(id, srsp.getShard());
> >>>>>          if (prevShard != null) {
> >>>>>            // duplicate detected
> >>>>>            numFound--;
> >>>>>
> >>>>>            // For now, just always use the first encountered since we
> >>>> can't
> >>>>> currently
> >>>>>            // remove the previous one added to the priority queue.
>  If we
> >>>>> switched
> >>>>>            // to the Java5 PriorityQueue, this would be easier.
> >>>>>            continue;
> >>>>>            // make which duplicate is used deterministic based on
> shard
> >>>>>            // if (prevShard.compareTo(srsp.shard) >= 0) {
> >>>>>            //  TODO: remove previous from priority queue
> >>>>>            //  continue;
> >>>>>            // }
> >>>>>          }
> >>>>>
> >>>>>
> >>>>> Is there a ticket open for this issue? What would it take to fix?
> >>>>>
> >>>>> Thanks,
> >>>>> Mike
> >>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Re: priority queue in query component

Posted by Yonik Seeley <yo...@lucidimagination.com>.
The SolrCloud branch now has load balancing and fail-over amongst
shard replicas.
Partial results aren't available yet (if there are no up replicas for
a shard), but that is planned.

-Yonik
http://www.lucidimagination.com


On Tue, Feb 9, 2010 at 8:21 AM, Jan Høydahl / Cominvent
<ja...@cominvent.com> wrote:
> Isn't that OK as long as there is the option of allowing partial results if you really want?
> Keeping the logic simple has its benefits. Let client be responsible for query resubmit strategy, and let load balancer (or shard manager) be responsible for marking a node/shard as dead/inresponsive and choosing another for the next query.
>
> --
> Jan Høydahl  - search architect
> Cominvent AS - www.cominvent.com
>
> On 9. feb. 2010, at 04.36, Lance Norskog wrote:
>
>> At this point, Distributed Search does not support any recovery if
>> when one or more shards fail. If any fail or time out, the whole query
>> fails.
>>
>> On Sat, Feb 6, 2010 at 9:34 AM, mike anderson <sa...@gmail.com> wrote:
>>> "so if we received the response from shard2 before shard1, we would just
>>> queue it up and wait for the response to shard1."
>>>
>>> This crossed my mind, but my concern was how to handle the case when shard1
>>> never responds. Is this something I need to worry about?
>>>
>>> -mike
>>>
>>> On Sat, Feb 6, 2010 at 11:33 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:
>>>
>>>> It seems like changing an element in a priority queue breaks the
>>>> invariants, and hence it's not doable with a priority queue and with
>>>> the current strategy of adding sub-responses as they are received.
>>>>
>>>> One way to continue using a priority queue would be to add
>>>> sub-responses to the queue in the preferred order... so if we received
>>>> the response from shard2 before shard1, we would just queue it up and
>>>> wait for the response to shard1.
>>>>
>>>> -Yonik
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>> On Sat, Feb 6, 2010 at 10:35 AM, mike anderson <sa...@gmail.com>
>>>> wrote:
>>>>> I have a need to favor documents from one shard over another when
>>>> duplicates
>>>>> occur. I found this code in the query component:
>>>>>
>>>>>          String prevShard = uniqueDoc.put(id, srsp.getShard());
>>>>>          if (prevShard != null) {
>>>>>            // duplicate detected
>>>>>            numFound--;
>>>>>
>>>>>            // For now, just always use the first encountered since we
>>>> can't
>>>>> currently
>>>>>            // remove the previous one added to the priority queue.  If we
>>>>> switched
>>>>>            // to the Java5 PriorityQueue, this would be easier.
>>>>>            continue;
>>>>>            // make which duplicate is used deterministic based on shard
>>>>>            // if (prevShard.compareTo(srsp.shard) >= 0) {
>>>>>            //  TODO: remove previous from priority queue
>>>>>            //  continue;
>>>>>            // }
>>>>>          }
>>>>>
>>>>>
>>>>> Is there a ticket open for this issue? What would it take to fix?
>>>>>
>>>>> Thanks,
>>>>> Mike
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>
>

Re: priority queue in query component

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.
Isn't that OK as long as there is the option of allowing partial results if you really want?
Keeping the logic simple has its benefits. Let client be responsible for query resubmit strategy, and let load balancer (or shard manager) be responsible for marking a node/shard as dead/inresponsive and choosing another for the next query.

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 9. feb. 2010, at 04.36, Lance Norskog wrote:

> At this point, Distributed Search does not support any recovery if
> when one or more shards fail. If any fail or time out, the whole query
> fails.
> 
> On Sat, Feb 6, 2010 at 9:34 AM, mike anderson <sa...@gmail.com> wrote:
>> "so if we received the response from shard2 before shard1, we would just
>> queue it up and wait for the response to shard1."
>> 
>> This crossed my mind, but my concern was how to handle the case when shard1
>> never responds. Is this something I need to worry about?
>> 
>> -mike
>> 
>> On Sat, Feb 6, 2010 at 11:33 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:
>> 
>>> It seems like changing an element in a priority queue breaks the
>>> invariants, and hence it's not doable with a priority queue and with
>>> the current strategy of adding sub-responses as they are received.
>>> 
>>> One way to continue using a priority queue would be to add
>>> sub-responses to the queue in the preferred order... so if we received
>>> the response from shard2 before shard1, we would just queue it up and
>>> wait for the response to shard1.
>>> 
>>> -Yonik
>>> http://www.lucidimagination.com
>>> 
>>> 
>>> On Sat, Feb 6, 2010 at 10:35 AM, mike anderson <sa...@gmail.com>
>>> wrote:
>>>> I have a need to favor documents from one shard over another when
>>> duplicates
>>>> occur. I found this code in the query component:
>>>> 
>>>>          String prevShard = uniqueDoc.put(id, srsp.getShard());
>>>>          if (prevShard != null) {
>>>>            // duplicate detected
>>>>            numFound--;
>>>> 
>>>>            // For now, just always use the first encountered since we
>>> can't
>>>> currently
>>>>            // remove the previous one added to the priority queue.  If we
>>>> switched
>>>>            // to the Java5 PriorityQueue, this would be easier.
>>>>            continue;
>>>>            // make which duplicate is used deterministic based on shard
>>>>            // if (prevShard.compareTo(srsp.shard) >= 0) {
>>>>            //  TODO: remove previous from priority queue
>>>>            //  continue;
>>>>            // }
>>>>          }
>>>> 
>>>> 
>>>> Is there a ticket open for this issue? What would it take to fix?
>>>> 
>>>> Thanks,
>>>> Mike
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com


Re: priority queue in query component

Posted by Lance Norskog <go...@gmail.com>.
At this point, Distributed Search does not support any recovery if
when one or more shards fail. If any fail or time out, the whole query
fails.

On Sat, Feb 6, 2010 at 9:34 AM, mike anderson <sa...@gmail.com> wrote:
> "so if we received the response from shard2 before shard1, we would just
> queue it up and wait for the response to shard1."
>
> This crossed my mind, but my concern was how to handle the case when shard1
> never responds. Is this something I need to worry about?
>
> -mike
>
> On Sat, Feb 6, 2010 at 11:33 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:
>
>> It seems like changing an element in a priority queue breaks the
>> invariants, and hence it's not doable with a priority queue and with
>> the current strategy of adding sub-responses as they are received.
>>
>> One way to continue using a priority queue would be to add
>> sub-responses to the queue in the preferred order... so if we received
>> the response from shard2 before shard1, we would just queue it up and
>> wait for the response to shard1.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>> On Sat, Feb 6, 2010 at 10:35 AM, mike anderson <sa...@gmail.com>
>> wrote:
>> > I have a need to favor documents from one shard over another when
>> duplicates
>> > occur. I found this code in the query component:
>> >
>> >          String prevShard = uniqueDoc.put(id, srsp.getShard());
>> >          if (prevShard != null) {
>> >            // duplicate detected
>> >            numFound--;
>> >
>> >            // For now, just always use the first encountered since we
>> can't
>> > currently
>> >            // remove the previous one added to the priority queue.  If we
>> > switched
>> >            // to the Java5 PriorityQueue, this would be easier.
>> >            continue;
>> >            // make which duplicate is used deterministic based on shard
>> >            // if (prevShard.compareTo(srsp.shard) >= 0) {
>> >            //  TODO: remove previous from priority queue
>> >            //  continue;
>> >            // }
>> >          }
>> >
>> >
>> > Is there a ticket open for this issue? What would it take to fix?
>> >
>> > Thanks,
>> > Mike
>> >
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: priority queue in query component

Posted by mike anderson <sa...@gmail.com>.
"so if we received the response from shard2 before shard1, we would just
queue it up and wait for the response to shard1."

This crossed my mind, but my concern was how to handle the case when shard1
never responds. Is this something I need to worry about?

-mike

On Sat, Feb 6, 2010 at 11:33 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> It seems like changing an element in a priority queue breaks the
> invariants, and hence it's not doable with a priority queue and with
> the current strategy of adding sub-responses as they are received.
>
> One way to continue using a priority queue would be to add
> sub-responses to the queue in the preferred order... so if we received
> the response from shard2 before shard1, we would just queue it up and
> wait for the response to shard1.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Sat, Feb 6, 2010 at 10:35 AM, mike anderson <sa...@gmail.com>
> wrote:
> > I have a need to favor documents from one shard over another when
> duplicates
> > occur. I found this code in the query component:
> >
> >          String prevShard = uniqueDoc.put(id, srsp.getShard());
> >          if (prevShard != null) {
> >            // duplicate detected
> >            numFound--;
> >
> >            // For now, just always use the first encountered since we
> can't
> > currently
> >            // remove the previous one added to the priority queue.  If we
> > switched
> >            // to the Java5 PriorityQueue, this would be easier.
> >            continue;
> >            // make which duplicate is used deterministic based on shard
> >            // if (prevShard.compareTo(srsp.shard) >= 0) {
> >            //  TODO: remove previous from priority queue
> >            //  continue;
> >            // }
> >          }
> >
> >
> > Is there a ticket open for this issue? What would it take to fix?
> >
> > Thanks,
> > Mike
> >
>

Re: priority queue in query component

Posted by Ted Dunning <te...@gmail.com>.
It also seems like it should be possible to include the shard number in the
ordering of responses.

On Sat, Feb 6, 2010 at 8:33 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> It seems like changing an element in a priority queue breaks the
> invariants, and hence it's not doable with a priority queue and with
> the current strategy of adding sub-responses as they are received.
>
> One way to continue using a priority queue would be to add
> sub-responses to the queue in the preferred order... so if we received
> the response from shard2 before shard1, we would just queue it up and
> wait for the response to shard1.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Sat, Feb 6, 2010 at 10:35 AM, mike anderson <sa...@gmail.com>
> wrote:
> > I have a need to favor documents from one shard over another when
> duplicates
> > occur. I found this code in the query component:
> >
> >          String prevShard = uniqueDoc.put(id, srsp.getShard());
> >          if (prevShard != null) {
> >            // duplicate detected
> >            numFound--;
> >
> >            // For now, just always use the first encountered since we
> can't
> > currently
> >            // remove the previous one added to the priority queue.  If we
> > switched
> >            // to the Java5 PriorityQueue, this would be easier.
> >            continue;
> >            // make which duplicate is used deterministic based on shard
> >            // if (prevShard.compareTo(srsp.shard) >= 0) {
> >            //  TODO: remove previous from priority queue
> >            //  continue;
> >            // }
> >          }
> >
> >
> > Is there a ticket open for this issue? What would it take to fix?
> >
> > Thanks,
> > Mike
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Re: priority queue in query component

Posted by Yonik Seeley <yo...@lucidimagination.com>.
It seems like changing an element in a priority queue breaks the
invariants, and hence it's not doable with a priority queue and with
the current strategy of adding sub-responses as they are received.

One way to continue using a priority queue would be to add
sub-responses to the queue in the preferred order... so if we received
the response from shard2 before shard1, we would just queue it up and
wait for the response to shard1.

-Yonik
http://www.lucidimagination.com


On Sat, Feb 6, 2010 at 10:35 AM, mike anderson <sa...@gmail.com> wrote:
> I have a need to favor documents from one shard over another when duplicates
> occur. I found this code in the query component:
>
>          String prevShard = uniqueDoc.put(id, srsp.getShard());
>          if (prevShard != null) {
>            // duplicate detected
>            numFound--;
>
>            // For now, just always use the first encountered since we can't
> currently
>            // remove the previous one added to the priority queue.  If we
> switched
>            // to the Java5 PriorityQueue, this would be easier.
>            continue;
>            // make which duplicate is used deterministic based on shard
>            // if (prevShard.compareTo(srsp.shard) >= 0) {
>            //  TODO: remove previous from priority queue
>            //  continue;
>            // }
>          }
>
>
> Is there a ticket open for this issue? What would it take to fix?
>
> Thanks,
> Mike
>