You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Brian Whitman (JIRA)" <ji...@apache.org> on 2008/08/15 19:09:44 UTC

[jira] Created: (SOLR-705) Distributed search should optionally return docID->shard map

Distributed search should optionally return docID->shard map
------------------------------------------------------------

                 Key: SOLR-705
                 URL: https://issues.apache.org/jira/browse/SOLR-705
             Project: Solr
          Issue Type: Improvement
    Affects Versions: 1.3
         Environment: all
            Reporter: Brian Whitman
             Fix For: 1.4


SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740984#action_12740984 ] 

Yonik Seeley commented on SOLR-705:
-----------------------------------

I go back and forth on the "meta" thing...

On one hand, if one is looking at the output, it makes perfect sense to have a separate meta section per document.
However, when one looks at it from a client API perspective (how one asks for the value of a particular metadata value) having two different ways to access values ("real" fields vs "meta" fields) doesn't seem desirable.

>From a client coding perspective, consistency is nice:
  sdoc.get("id")
  sdoc.get("_shard")

After all, many of the stored fields of a document are actually just metadata too.  So an alternative is simple convention... metadata fields start with an underscore, and no more work needs ot be done at the client side.

But I'm really not convinced either way ;-)

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Ryan McKinley
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741502#action_12741502 ] 

Hoss Man commented on SOLR-705:
-------------------------------

bq. On one hand, if one is looking at the output, it makes perfect sense to have a separate meta section per document. However, when one looks at it from a client API perspective (how one asks for the value of a particular metadata value) having two different ways to access values ("real" fields vs "meta" fields) doesn't seem desirable.

I'm trying to look at it from an internal datastructure perspective, and a client code perspective.  From an internals perspective, keeping the data isolcated makes sense -- one set comes straight from the Documents in the index, the other is computed -- so they should be seperate datastructures internally in solr, one hanging off the other.

Then the response writer can decide how it wants to deal with those data -- for response writers where nested data structures are easy (most of them) this data can be represented cleaning ... or we could add options to flatten the data (using some prefix type option to say all metadata data properties should become fields with the same name prefixed by "_") so that the client can't tell the difference between them ... if we make the internal representation a flattened list of pairs, then we tie the hands of hte response writter.

Ditto for the client library -- the more structure we maintain as the data makes it's way to the client, the more options we have as to if/when we flatten it.  preserving structure allows code to flatten at anytime if it wants to, so let's go with the option that has the most flexibility and get the best of both worlds.



> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Ryan McKinley
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623666#action_12623666 ] 

Erik Hatcher commented on SOLR-705:
-----------------------------------

What about putting the docid->shard mapping elsewhere in the response rather than actually on the document?   Like explain and highlight info, for example.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739246#action_12739246 ] 

Hoss Man commented on SOLR-705:
-------------------------------

Ryan: might be worth while to split the jira issue ... create a new issue for an internal API to add per doc metadata and use of this metadata in at least 2 response writers; then make SOLR-705 (shard mapping) and SOLR-1298 (function query results) dependent on the new issue and sanity test of the new internal APIs.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Ryan McKinley
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-705:
-------------------------------

    Attachment: SOLR-705.patch

Here is an updated patch that:
 1. Calculates the Set of returned fields once per requres (rather then for each document)
 1. Attaches the meta return field list to the SolrQueryResponse
 1. implements a ReturnFieldList that will match "*" to return any meta field (NOTE, we should consider http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams)

-----

I like that this gives us an extendible place to augment each document without having to map the ID.

I don't like the potential name conflict using "meta" (or any string) in the field list.  Since you have to explicitly turn on the query, i guess it is ok; BUT it makes things difficult for parsers.  Is the "meta" field the special key or a field?  

Since solr does not support NamedList as a field, we could asume that any <lst is the special meta field, but that seems kinda ugly.  

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739076#action_12739076 ] 

Ryan McKinley commented on SOLR-705:
------------------------------------

I'll take this one on for 1.4...

I incorporate hoss' suggestion and then we can see how we like it.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Ryan McKinley
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-705:
-------------------------------

    Fix Version/s:     (was: 1.4)
                   1.5

Moving this issue to 1.5 so that the details could be worked out with less haste.

http://www.lucidimagination.com/search/document/1f2e56f58162679d/response_writers_and_doclists

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Ryan McKinley
>             Fix For: 1.5
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623729#action_12623729 ] 

Lance Norskog commented on SOLR-705:
------------------------------------

When you add new fabricated fields to the document return, please use a non-standard naming convention, like "&shard" and "&score". Adding simple alpha words as fields will clash with someone's schema.



> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737018#action_12737018 ] 

Hoss Man commented on SOLR-705:
-------------------------------

i wasn't suggesting that the binary format couldn't handle it ... i was just suggesting that *if* the client fails to specify a "meta" fieldname for the response writer to use when including the metadata in each doc, the behavior can be implementation (ie: response writer) dependent.

i'm speaking to what the "spec" for metadata fields *could* look like, not what the implementations *should* look like.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623657#action_12623657 ] 

Yonik Seeley commented on SOLR-705:
-----------------------------------

I don't think we need to worry about returning an array of all shards... they are supposed to be disjoint, but we handle gracefully if an id is repeated (instead of blowing up).

Instead of "shard", maybe we want to pick something that won't clash with a real field name quite so easily?
_shard?  (yes it's uglier).

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623684#action_12623684 ] 

Brian Whitman commented on SOLR-705:
------------------------------------

Well, can you filter/query/sort by the contents of this "shard" field? If not, it doesn't belong in the doc block, IMO

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-705:
-------------------------------

    Attachment: SOLR-705.patch

Attaching new patch which, inspired by Yonik's suggestion, returns something like
{noformat}
  <doc>
  <lst name="meta">
        <str name="shard">localhost:8983/solr</str>
  </lst>
  <str name="id">MA147LL/A</str>
  <str name="name">Apple 60 GB iPod with Video Playback Black</str>
 </doc>
{noformat}

The parameter format changed as well, to get the doccument -> shard mapping, add &metadata=shard to the request. The patch adds a new field for metadata to SolrDocument. This should probably also be used for score and split out into a separate issue if that general direction is ok.

Issues with the current implementation:
* Only works with XML responses and lists of SolrDocuments.
* The document found in multiple shards issue needs to be clarified.
* Tests.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623738#action_12623738 ] 

Yonik Seeley commented on SOLR-705:
-----------------------------------

bq. Well, can you filter/query/sort by the contents of this "shard" field? If not, it doesn't belong in the doc block, IMO

You can do any of those with stored (non-indexed) fields either.


> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740986#action_12740986 ] 

Yonik Seeley commented on SOLR-705:
-----------------------------------

If we do go with "meta", I'm also not concerned with the hypothetical field-name collision... this is a one time thing, and hard-coding it to "meta" makes things simpler and more predictable.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Ryan McKinley
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-705:
-------------------------------

    Attachment: SOLR-705.patch

Attaching new patch which compiles again.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-705:
-------------------------------

    Attachment: SOLR-705.patch

Syncing patch with trunk.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703286#action_12703286 ] 

Ryan McKinley commented on SOLR-705:
------------------------------------

I like the concept -- it could help for many things, and make some client code a bit easier.

The simple use case is to _augment_ each document with fields that do not exist in the schema.  This could be things like the shard, the score, a calculation (perhaps the output of a function query)

This may also be someplace to consider putting other information that currently requires correlation-by-id.  Consider highlighting, MLT, some debug info...



> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671512#action_12671512 ] 

Yonik Seeley commented on SOLR-705:
-----------------------------------

Sooooo....what do people think of "meta"?
It seems like it could make things easier on clients - removing the need for correlation-by-id.


> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737012#action_12737012 ] 

Hoss Man commented on SOLR-705:
-------------------------------

i haven't looked at any of the patches here, but the simplest way to avoid field name collision problems would be to make the client specify the name of the "meta" field when asking for it

examples...

|{{?q=solr&meta=myMetaFieldData}}|empty NamedList named 'myMetaFieldData' in each doc|
|{{?q=solr&meta=foo&shardMapping=true}}|NamedList named 'foo' in each doc, each containing a single "shard" key/val|
|{{?q=solr&shardMapping=true}}|shard mapping data is computed, but response writer has no instructions on how to display it; behavior can be implementation dependent (xml might be implemented as a <lst> with no name, binary might decide to leave it out completely)|


> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737015#action_12737015 ] 

Noble Paul commented on SOLR-705:
---------------------------------

bq.binary might decide to leave it out completely

The behavior should be consistent irrespective of the output format we choose. The binary format should have no problem in handling this usecase the same way xml format handles it

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623669#action_12623669 ] 

Yonik Seeley commented on SOLR-705:
-----------------------------------

bq. What about putting the docid->shard mapping elsewhere in the response rather than actually on the document?

I've never been sure what the right answer is here.  Putting it in a different place sometimes seems cleaner, but sometimes seems like it just makes responses harder to read, and forces users to do their own id based correlation.

I've also thought about a "meta" part to a document that contains other information specific to the document besides stored fields.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671175#action_12671175 ] 

bwhitman edited comment on SOLR-705 at 2/6/09 7:59 AM:
------------------------------------------------------------

The latest patch doesn't seem to compile anymore, I get:

{{

compile-solrj:
    [javac] Compiling 1 source file to /Users/bwhitman/outside/solr-trunk/build/solrj
    [javac] /Users/bwhitman/outside/solr-trunk/src/common/org/apache/solr/common/SolrDocument.java:50: cannot assign a value to final variable _fields
    [javac]     _fields = new LinkedHashMap<String,Object>();
    [javac]     ^

}}



      was (Author: bwhitman):
    The latest patch doesn't seem to compile anymore, I get:

{{{

compile-solrj:
    [javac] Compiling 1 source file to /Users/bwhitman/outside/solr-trunk/build/solrj
    [javac] /Users/bwhitman/outside/solr-trunk/src/common/org/apache/solr/common/SolrDocument.java:50: cannot assign a value to final variable _fields
    [javac]     _fields = new LinkedHashMap<String,Object>();
    [javac]     ^

}}}


  
> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741046#action_12741046 ] 

Noble Paul commented on SOLR-705:
---------------------------------

Let us have a special field of something like "_meta_"  (to minimize conflict) which can be used for any future metadata.so, the client code would look like
{code:java}
sdoc.get("id")
NamedList meta = sdoc.get("_meta_");
meta.get("shard");
{code}


> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Ryan McKinley
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671175#action_12671175 ] 

Brian Whitman commented on SOLR-705:
------------------------------------

The latest patch doesn't seem to compile anymore, I get:

{{{

compile-solrj:
    [javac] Compiling 1 source file to /Users/bwhitman/outside/solr-trunk/build/solrj
    [javac] /Users/bwhitman/outside/solr-trunk/src/common/org/apache/solr/common/SolrDocument.java:50: cannot assign a value to final variable _fields
    [javac]     _fields = new LinkedHashMap<String,Object>();
    [javac]     ^

}}}



> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683774#action_12683774 ] 

Shalin Shekhar Mangar commented on SOLR-705:
--------------------------------------------

bq. Sooooo....what do people think of "meta"? It seems like it could make things easier on clients - removing the need for correlation-by-id.

I'm +1 on the idea. Let's choose a better name than 'meta' with less likelihood of collision, say '_meta_'?

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley reassigned SOLR-705:
----------------------------------

    Assignee: Ryan McKinley

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Ryan McKinley
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "ian connor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656298#action_12656298 ] 

ian connor commented on SOLR-705:
---------------------------------

I would be keen to test this using Ruby - can you add this for the Ruby Request as well as XML in patch?

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623862#action_12623862 ] 

Lars Kotthoff commented on SOLR-705:
------------------------------------

bq. I don't think we need to worry about returning an array of all shards... they are supposed to be disjoint

I think as the idea behind the issue is to use this information to programmatically update/delete documents, we should return an array of shards. Consider the scenario where 2 sets of shards index the same information for redundancy purposes. For normal queries, you would send requests to one set of shards. If you want to delete a document, it would be nice to be able to send one request to both sets of shards and get all the required information with a single request instead of having to query each set individually.

bq. What about putting the docid->shard mapping elsewhere in the response rather than actually on the document?

That assumes that there's a unique key which we can use to link the two pieces of information. That's probably a reasonable assumption, but in my opinion we shouldn't impose this restriction unless it's really necessary.

bq. I've also thought about a "meta" part to a document that contains other information specific to the document besides stored fields.

Ah, I do like that idea.

bq. Well, can you filter/query/sort by the contents of this "shard" field? If not, it doesn't belong in the doc block, IMO

It's the same thing for the score field though.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-705:
-------------------------------

    Attachment: SOLR-705.patch

Starting implementation, setting the "shardMapping" parameter to any value in the request will add the field "shard" to each response document containing the shard as specified in the request.

* Currently only implemented for XML responses.
* No tests.
* When a document is found in multiple shards, the last one sets the value, the others are lost. Returning an array of all the shards would probably better.

This code is almost untested.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-705) Distributed search should optionally return docID->shard map

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-705:
-------------------------------

    Attachment: SOLR-705.patch

Attaching new patch that actually applies again (sorry, this issue had escaped my attention before) and adds metadata output for Ruby and JSON.

> Distributed search should optionally return docID->shard map
> ------------------------------------------------------------
>
>                 Key: SOLR-705
>                 URL: https://issues.apache.org/jira/browse/SOLR-705
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: all
>            Reporter: Brian Whitman
>             Fix For: 1.4
>
>         Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard mapping in the response. Without it, updating/deleting documents when the # of shards is variable is hard. We currently set this with a special requestHandler that filters /update and inserts the shard as a field in the index but it would be better if the shard location came back in the query response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.