You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Nicolas Dessaigne (JIRA)" <ji...@apache.org> on 2009/05/04 12:06:31 UTC

[jira] Created: (SOLR-1143) Return partial results when a connection to a shard is refused

Return partial results when a connection to a shard is refused
--------------------------------------------------------------

                 Key: SOLR-1143
                 URL: https://issues.apache.org/jira/browse/SOLR-1143
             Project: Solr
          Issue Type: Improvement
          Components: search
            Reporter: Nicolas Dessaigne


If any shard is down in a distributed search, a ConnectException it thrown.

Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.

This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)

We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753918#action_12753918 ] 

Jason Rutherglen commented on SOLR-1143:
----------------------------------------

The particular case not solved today that I'm running into is a
Solr server that simply takes too long and slows down the entire
distributed query. Maybe we need a patch to timeout an
individual distributed shard request and return partial results
and/or indicate which server is taking too long?

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.5
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747462#action_12747462 ] 

Grant Ingersoll commented on SOLR-1143:
---------------------------------------

Small, FYI on patch submission:  No need to name them XXXX-1, XXXX-2, etc.  JIRA will actually version them automatically and gray out all but the most current one.  Doing so makes it easier to see what is the current patch w/o reading every one.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1143:
-------------------------------

    Fix Version/s: 1.4

Seems like this is something we should consider for 1.4

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated SOLR-1143:
----------------------------------

    Fix Version/s:     (was: 1.4)
                   1.5

Given that there is likely going to be a whole lot more work on distributed search in 1.5 (see the ZooKeeper, Hadoop, etc.) I think it makes sense to defer this to 1.5.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.5
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747567#action_12747567 ] 

Martijn van Groningen commented on SOLR-1143:
---------------------------------------------

Hi Grant, thanks for mentioning that, I did not realized that.

I have two other ideas about returning a partial result that might be usable in this patch:
1) Currently when a partial result is returned the response does not tell you which shard has failed, it only tells you that it is a partial result. Wouldn't it be handy to include hostnames or ip addresses in the response of the shards that had a connection timeout?

2) A partial result is only returned when when a connection exception occurs, is it practical to return a partial result when another type of exception occurs? Let say one shard has a corrupted index and therefore while searching only that shard throws an exception, I can imagine that in such situation it is also useful to return a partial result instead of only returning an error for the complete search. a

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747567#action_12747567 ] 

Martijn van Groningen edited comment on SOLR-1143 at 8/25/09 11:55 AM:
-----------------------------------------------------------------------

Hi Grant, thanks for mentioning that, I did not realized that.

I have two other ideas about returning a partial result that might be usable in this patch:
1) Currently when a partial result is returned the response does not tell you which shard has failed, it only tells you that it is a partial result. Wouldn't it be handy to include hostnames or ip addresses in the response of the shards that had a connection timeout?

2) A partial result is only returned when when a connection exception occurs, is it practical to return a partial result when another type of exception occurs? Let say one shard has a corrupted index and therefore while searching only that shard throws an exception, I can imagine that in such situation it is also useful to return a partial result instead of only returning an error for the complete search.

      was (Author: martijn):
    Hi Grant, thanks for mentioning that, I did not realized that.

I have two other ideas about returning a partial result that might be usable in this patch:
1) Currently when a partial result is returned the response does not tell you which shard has failed, it only tells you that it is a partial result. Wouldn't it be handy to include hostnames or ip addresses in the response of the shards that had a connection timeout?

2) A partial result is only returned when when a connection exception occurs, is it practical to return a partial result when another type of exception occurs? Let say one shard has a corrupted index and therefore while searching only that shard throws an exception, I can imagine that in such situation it is also useful to return a partial result instead of only returning an error for the complete search. a
  
> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743481#action_12743481 ] 

Jason Rutherglen commented on SOLR-1143:
----------------------------------------

What happens today when a query times out?  

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Artem Russakovskii (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743478#action_12743478 ] 

Artem Russakovskii commented on SOLR-1143:
------------------------------------------

+1 for importance of this feature. If I have 10 shards, I should be able to handle 1 of them going down without returning 0 results to the user.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Mike Anderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772681#action_12772681 ] 

Mike Anderson commented on SOLR-1143:
-------------------------------------

What's the current state of this use case? I have a shard that is a slower than all the others and I'd rather just get partial (or no) results back from the slow shard instead of slowing down the whole operation. 

I've looked over SOLR-1143, SOLR-502, and SOLR-850 but I'm not exactly sure how they all tie together and what's available from trunk today. 

I tried setting timeAllowed to something really small like 5, but I still got back all of the results I got when timeAllowed wasn't set (I would have expected no results). 

-mike

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.5
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747933#action_12747933 ] 

Grant Ingersoll commented on SOLR-1143:
---------------------------------------

I don't follow.  The only difference between the two methods is that takeOrError returns immediately if there was an error and doesn't put it in the response list, which is what you are checking for in your loop anyway.  From what I can tell the while loop isn't going to break until all pending are accounted for, either by error or by valid results.  I don't see how it is beneficial to examine every shard response every time and I don't see why that would prevent you from losing responses as it is independent of the request sent.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "David Bowen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833440#action_12833440 ] 

David Bowen commented on SOLR-1143:
-----------------------------------

I've found this patch very useful.  I recommend extending it to check for instanceof IOException rather than just java.net.ConnectException.  This is useful in order to catch org.apache.commons.httpclient.ConnectTimeoutException and java.net.SocketTimeoutException.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.5
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738678#action_12738678 ] 

Grant Ingersoll commented on SOLR-1143:
---------------------------------------

This needs tests.  

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned SOLR-1143:
-------------------------------------

    Assignee: Grant Ingersoll

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743775#action_12743775 ] 

Lance Norskog commented on SOLR-1143:
-------------------------------------

If the search subsystem has a problem, the ops team wants to know about it and fix it. This just hides problems.

An example: 2 servers with the same shard are behind a load balancer. One server fails. The load balancer notices this and directs all traffic to the other server.

This is a production network which serves and outside API, where everything is supposed to work >from the viewpoint of the outside API<. When the load balancer gets a failure it usually returns an error on that one request, then marks the server down. So that one search request eventually returns with a "temporary error" condition.

These search requests come from an app server which serves the API. The app server then has the option of retrying one or two times, or returning "service not happy" to the outside calling app.

When I have a problem in my system, I want to find it and fix it.  Ignoring shard errors is ok as an option, and should be there. But, please do not make it the default.  Hiding failures should never be the default.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Artem Russakovskii (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744327#action_12744327 ] 

Artem Russakovskii commented on SOLR-1143:
------------------------------------------

Any idea when this will be approved for pushing into trunk?

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747641#action_12747641 ] 

Martijn van Groningen commented on SOLR-1143:
---------------------------------------------

Actually your approach makes more sense, because that is more efficient. But the takeCompletedOrError() method may then not directly return when a shard failure occurs, because then you might lose the response from the other shards. I initially tried to change take() to takeCompletedOrError(), but then I noticed this problem.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744518#action_12744518 ] 

Grant Ingersoll commented on SOLR-1143:
---------------------------------------

I'm not sure about the need for the return-partial-results static boolean.  This could just be handled through the RequestHandler defaults, right?

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747619#action_12747619 ] 

Grant Ingersoll commented on SOLR-1143:
---------------------------------------

I'm not sure I'm following the changes in SearchHandler.

AIUI, before, we check for ShardResponses via comm.takeCompletedOrError() and then process the error and check for an exception.  If there is an exception, we throw it, essentially.

In the new code, it is replaced by just take() which returns the response, null or an exception.  We then iterate over the whole set of responses every time we enter the while (rb.outgoing...) loop.

However, why wouldn't you just keep the existing takeCompletedOrError, check to see if that shard is an error and handle it.  At the end of the loop it should be easy to determine if the number of requests sent equals the number received and then add the partial results indicator, and, potentially, indicate which shards failed.

What am I not understanding?  Basically, I don't get the need for:
{code}
for (ShardResponse shardRsp : srsp.getShardRequest().responses)
...
{code}

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754118#action_12754118 ] 

Martijn van Groningen commented on SOLR-1143:
---------------------------------------------

Sure, that is a good idea. I think that also other types of exceptions should result in a partial result (currently just a connection timeout will result in a partial result). I think that this behaviour should be enabled with a parameter in the request. Something like _shards.requestTimeout=1500_ (time is in ms).

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.5
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748163#action_12748163 ] 

Grant Ingersoll commented on SOLR-1143:
---------------------------------------

But, take or takeOrError() isn't the thing that cancels the other two responses, comm.cancelAll is, AIUI, and that is not called in the take methods.  Also, Take deals with the Future callbacks, each of which are executed in separate threads via the Executor.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Nicolas Dessaigne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Dessaigne updated SOLR-1143:
------------------------------------

    Attachment: SOLR-1143.patch

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>         Attachments: SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-1143:
----------------------------------------

    Attachment: SOLR-1143-3.patch

You are right Grant. I guess I forgot about the request handler defaults when I was creating this patch...
I have removed the possibility to configure partial results via the _return-partial-results_ property in the latest patch. The request handler defaults can perfectly configure partial results to be enabled by default.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-1143:
----------------------------------------

    Attachment: SOLR-1143-2.patch

I have added a test in _TestDistributedSearch_ class. This test sets up a cluster of shards and then kills one shard and then it expects that the search request as a whole to continue. The _TestDistributedSearch_ class in general tests distributed search by having a non distributed instance and a cluster of shards both have the same documents. All results from the cluster are compared with results from the non distributed instance. Some things in the test I added like facets and maxScore could not be tested because one shard in the cluster is down (so part of the corpus is missing). Only the documents that are returned from the shards are compared against the documents in the non distributed instance.

I have also included the option to disable / enable partial results as Lance described. I agree with Lance that ignoring a  shard failure should *not* be enabled by default, if you do not know about this feature then finding the cause of the actual problem might be difficult. 

In this patch you return a partial result when a shard has failed by setting _partialResults_ to _true_ in the request or if you want it to for all requests your can add _<bool name="return-partial-results">true</bool>_  to your search handler in your solrconfig.xml. If both are not specified, partial results are disabled. Currently the _partialResults_ parameter overrides the _return-partial-results_ property in the search handler.

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744518#action_12744518 ] 

Grant Ingersoll edited comment on SOLR-1143 at 8/18/09 7:19 AM:
----------------------------------------------------------------

{quote}
In this patch you return a partial result when a shard has failed by setting partialResults to true in the request or if you want it to for all requests your can add <bool name="return-partial-results">true</bool> to your search handler in your solrconfig.xml. If both are not specified, partial results are disabled. Currently the partialResults parameter overrides the return-partial-results property in the search handler.
{quote}

I'm not sure about the need for the return-partial-results static boolean.  This could just be handled through the RequestHandler defaults, right?

      was (Author: gsingers):
    I'm not sure about the need for the return-partial-results static boolean.  This could just be handled through the RequestHandler defaults, right?
  
> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749310#action_12749310 ] 

Martijn van Groningen commented on SOLR-1143:
---------------------------------------------

I think we need to bird's-eye view at the partial results solution, so that we can hook in the partial results behaviour at the right place. This is quiet a long comment, but first I will describe how I think that distributed search works and then propose a solution. In think that this solutions is better than the current one in the patch.

>From my understanding the distributed search in the trunk currently works as follows:
1) When it has been determined that the a search request is a multi shard request an instance of HttpCommComponent is created and outgoing and finished lists are initialised. Also the nextStage is set to zero.
2) The ResponseBuilder's stage is set to the nextStage and the nextStage is set to stage done. The distributedProcess(...) method is invoked on each search component. Each search component can add ShardRequests to the outgoing list in the ResponseBuilder. Besides adding ShardRequests, a search component also returns a stage. The lowest stage from all search components will end up to be the next stage.
{code:java}
// call all components
for( SearchComponent c : components ) {
	// the next stage is the minimum of what all components report
	nextStage = Math.min(nextStage, c.distributedProcess(rb));
}
{code}
3a) Next step is to send all the ShardRequests from ResponseBuilder's output list to the shards. First a ShardRequest is taken and removed from the ResponseBuilder's output list, then the actual shards are determined for the current ShardRequest. 
{code:java}ShardRequest sreq = rb.outgoing.remove(0);{code}
It checks if for the current overall search request the shards are specified and than use. If that is not the case the predefined shards become the actual shards.  
{code:java}
sreq.actualShards = sreq.shards;
if (sreq.actualShards==ShardRequest.ALL_SHARDS) {
	sreq.actualShards = rb.shards;
}
{code}
3b)Now that the actual shards are known, a request can be sent to each individual shard. The actual sending of the request is done by the HttpCommComponent.submit(...) method. Before the request is sent, a new SolrParams is constructed based on the overall search request parameters. But with some parameters removed and some parameters added. Then the SolrParams is given to HttpCommComponent.submit(...) method as a argument and is used to create a QueryRequest. In the HttpCommComponent.submit(...) a Callable is instantiated to handle sending request to a shard and receiving a response in an asynchronised manner. 

In the takes's call() method the actual request (QueryRequest) is created, that will be send to a shard. Also in this method the response is received and if an exception occurred, it is set on the shard response. The callable is then submitted to the completionService's submit method. The submit methods returns a Future that is then added to a set of futures named pending.From my understanding this pending list of futures is only used to keep track of how many request were send and to cancel a request when an exception occurred. 

4) When the request are sent for a stage, the next step is to receive the response for each shard request that has been sent. The comm.takeCompletedOrError() returns a shard response. It first checks if an exception was set on the response, if so the search is aborted and the exception is re-thrown. If all went well, then the request of the shard response is added to a list of successful request named finished. After that, the SearchComponent's handleResponses(...) method is invoked that allows the search components to inspect the shard response and perhaps do something with it. The behaviour is repeated until comm.takeCompletedOrError() returns null, which means that all response for the current stage were retrieved. 

The comm.takeCompletedOrError() handles each response from the shards individually (sub ShardRequest). It uses the completionService's take() method that get a future and uses that to remove that same instance for the pending set. Then the method get is invoked on the future and the response is returned. If the response contains an exception then the response is immediately returned. When the response does not contain a exception it is added to the responses of the ShardRequest. When the number of responses in the ShardRequest is equal to the number of shards then the last response from the get() method of Future is returned (it contains the ShardRequest that contains all the responses).      

5) When all request were sent and response were received, on each search component the finishStage(...) method is executed. This allows components to execute some custom logic that is only possible if all shard requests are collected. When that is done it checks if the current stage is not equal to stage done. It then continues with step 2 till 5, until the stage finish is the current stage. That indicates that the distributed search is finished and the response can be written to the client. 

I think the best way to handle shard failures in my opinion is by not sending a request to a shard that has failed. I think the best way to implement that is by doing the following:
1) Currently ShardRequest has a property actualShards that is a string array of shard host names. Let say we create a Shard data type that contains a string hostname and a boolean failed as properties. The actualShards property will be changed to this Shard data type.
2) In phase 4 when we discover that a ShardRequest failed we need to mark a shard as failed. Therefore the take() or takeCompletedOrError() need store the shard hostname with the exception. In the handleRequestBody we then check if one or more exceptions / hostnames were set, if so we mark those hostnames in ShardRequest as failed.
3) In phase 3b we only invoke HttpCommComponent.submit(...) on the shards that are not marked as failed. 
Something like this:
{code:java}
for (Shard shard : sreq.actualShards) {
	if (shard.hasFailed()) {
		continue;
	}
	ModifiableSolrParams params = new ModifiableSolrParams(sreq.params);
	params.remove(ShardParams.SHARDS);      // not a top-level request
	params.remove("indent");
	params.remove(CommonParams.HEADER_ECHO_PARAMS);
	params.set(ShardParams.IS_SHARD, true);  // a sub (shard) request
	String shardHandler = req.getParams().get(ShardParams.SHARDS_QT);
	if (shardHandler == null) {
		params.remove(CommonParams.QT);
	else {
		params.set(CommonParams.QT, shardHandler);
	}
	comm.submit(sreq, shard.getHostname(), params);
}
{code}

I think that this approach is much more efficient than the current approach, because no request is sent to the failed shard and thus HttpClient does not try to make a connection to a shard that would not response properly anyway. I think implementing this solution is not that much work. What are your thoughts about this approach?


> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748128#action_12748128 ] 

Martijn van Groningen commented on SOLR-1143:
---------------------------------------------

Sorry for my confusing comment. I meant to say takeOrError() does return immediately when an exception occurs. To avoid more confusion I will sketch a situation from what I currently understand from the code to show that takeOrError() should not be used when returning partial result.

For each stage a number of requests may be send to the shards and a number of responses may be returned from the shards for further processing.
Lets say we have three shards and we send a shard request in a certain stage to all three shards. If the first response contains an error the current behaviour is to return the response immediately, without adding the two other responses (that did return without an error). Because of this the so called partial result might contain less data or even nothing.  Therefore I think take() should be used there. I think takeOrError() is only suitable when not using partial result.

{code:java}
ShardResponse takeCompletedOrError() {
    while (pending.size() > 0) {
      try {
        Future<ShardResponse> future = completionService.take();
        pending.remove(future);
        ShardResponse rsp = future.get();
        if (rsp.getException() != null) return rsp; // now we return and if there are more pending results, we lose them
        ...............
        rsp.getShardRequest().responses.add(rsp);
        if (rsp.getShardRequest().responses.size() == rsp.getShardRequest().actualShards.length) {
          return rsp;
        }
      } catch (InterruptedException e) {
      ......
    }
    return null;
  }
{code}

Again this what I understand from the code. What do you think about this? 

I also did some more thinking about how to improve shard failures. Currently if a shard fails in a early stage of the distributed search we keep sending requests to the shard, although we noticed in a previous stage that it was not responding. You think that it is a good idea to mark a shard as failed, so that it will not use the shard that is marked as failed for the current running search? 

> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.