You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2008/02/11 23:51:31 UTC
SolrJ and Unique Doc ID
What's the best way to retrieve the unique key field from SolrJ? From
what I can tell, it seems like I would need to retrieve the schema and
then parse it and get it from there, or am I missing something?
Thanks,
Grant
Re: SolrJ and Unique Doc ID
Posted by Chris Hostetter <ho...@fucit.org>.
: > How do I generate URLs to retrieve a document against any given Solr
: > instance that I happen to be pointing at without knowing which field is the
: > document id?
:
: One cool technique, not instead of your change to Luke RH (a needed change
: IMO) but another way to go about it - we have a DocumentRequestHandler that
: takes a uniqueKey parameter that would retrieve and return that single
: document without having to specify the field name explicitly.
Erik's idea eliminates the need to know what the "name" of the uniqueKey
field is when formulating the query to "fetch one", but it doesn't solve
the crux of grants question: when looking at a list of results (with a
partial "fl" for example) how can you know which value to use to later
query on and get back just thta document (with the full "fl" for example)
My point was that while knowing the uniqueKey field solves the problem,
the person setting up the index may not want clients to know this ... the
clinet has to have *some* pre-existing knowledge about the structure of
the index ... grant's Luke patch solves this by letting the client get
this information from Luke, but in a general case a Solr Admin may not
want to expose that info to his clients (ie: the customerId vs SSN example
from my previous mail) ... so a general purpose client should probably
have a more general way to configure the "what field do i treat as unique"
info without requirng that the LukeHandler be available.
-Hoss
Re: SolrJ and Unique Doc ID
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 12, 2008, at 3:44 PM, Grant Ingersoll wrote:
> On Feb 12, 2008, at 2:10 PM, Chris Hostetter wrote:
>
>> : > Honestly: i can't think of a single use case where client code
>> would care
>> : > about what the uniqueKey field is, unless it already *knew*
>> what the
>> : > uniqueKey field is.
>> :
>> : :-) Abstractions allow one to use different implementations. My
>> : client/display doesn't know about Solr, it just knows it can
>> search and the
>> : Solr implementation part of it can be pointed at any Solr
>> instance (or other
>> : search engines as well), thus it needs to be able to "reflect"
>> on Solr. The
>> : unique key is a pretty generally useful thing across
>> implementations.
>>
>> but why does your client/display care which field is the uniqueKey
>> field?
>> knowing which fields it might query or ask for in the fl list sure
>> -- but
>> why need to know about hte uniqueKey field specificly?
>
> How do I generate URLs to retrieve a document against any given
> Solr instance that I happen to be pointing at without knowing which
> field is the document id?
One cool technique, not instead of your change to Luke RH (a needed
change IMO) but another way to go about it - we have a
DocumentRequestHandler that takes a uniqueKey parameter that would
retrieve and return that single document without having to specify
the field name explicitly.
Erik
Re: SolrJ and Unique Doc ID
Posted by Grant Ingersoll <gs...@apache.org>.
On Feb 12, 2008, at 2:10 PM, Chris Hostetter wrote:
> : > Honestly: i can't think of a single use case where client code
> would care
> : > about what the uniqueKey field is, unless it already *knew* what
> the
> : > uniqueKey field is.
> :
> : :-) Abstractions allow one to use different implementations. My
> : client/display doesn't know about Solr, it just knows it can
> search and the
> : Solr implementation part of it can be pointed at any Solr instance
> (or other
> : search engines as well), thus it needs to be able to "reflect" on
> Solr. The
> : unique key is a pretty generally useful thing across
> implementations.
>
> but why does your client/display care which field is the uniqueKey
> field?
> knowing which fields it might query or ask for in the fl list sure
> -- but
> why need to know about hte uniqueKey field specificly?
How do I generate URLs to retrieve a document against any given Solr
instance that I happen to be pointing at without knowing which field
is the document id? At any rate, the problem is solved in SOLR-478
in less than 10 lines of code and doesn't introduce back-compat.
issues. I invoke this on instantiation of my client, get the field
and then keep it around for use later.
>
>
> I could have an index of people where i document thatthe SSN field is
> unique, and never even tell you that it's not the 'uniqueKey' Field --
> that could be some completley unrelated field i don't want you to know
> about called "customerId" -- but that doesn't acceft you as a
> client, you
> can still query on whatever you wnat, get back whatever docs you want,
> etc... the onlything you can't do is "delete by id" (since you
> can't be
> sure which field is the uniqueKey) but you can always delete by query.
>
> : In fact, I wish all the ReqHandlers had an introspection option,
> where one
> : could see what params are supported as well.
>
> you and me both -- but the introspection shouldn't be intrinsic to the
> ReuestHandler - as the Solr admin i may not want to expose all of
> those
> options to my clients...
>
> http://wiki.apache.org/solr/MakeSolrMoreSelfService
+1
Re: SolrJ and Unique Doc ID
Posted by Chris Hostetter <ho...@fucit.org>.
: > Honestly: i can't think of a single use case where client code would care
: > about what the uniqueKey field is, unless it already *knew* what the
: > uniqueKey field is.
:
: :-) Abstractions allow one to use different implementations. My
: client/display doesn't know about Solr, it just knows it can search and the
: Solr implementation part of it can be pointed at any Solr instance (or other
: search engines as well), thus it needs to be able to "reflect" on Solr. The
: unique key is a pretty generally useful thing across implementations.
but why does your client/display care which field is the uniqueKey field?
knowing which fields it might query or ask for in the fl list sure -- but
why need to know about hte uniqueKey field specificly?
I could have an index of people where i document thatthe SSN field is
unique, and never even tell you that it's not the 'uniqueKey' Field --
that could be some completley unrelated field i don't want you to know
about called "customerId" -- but that doesn't acceft you as a client, you
can still query on whatever you wnat, get back whatever docs you want,
etc... the onlything you can't do is "delete by id" (since you can't be
sure which field is the uniqueKey) but you can always delete by query.
: In fact, I wish all the ReqHandlers had an introspection option, where one
: could see what params are supported as well.
you and me both -- but the introspection shouldn't be intrinsic to the
ReuestHandler - as the Solr admin i may not want to expose all of those
options to my clients...
http://wiki.apache.org/solr/MakeSolrMoreSelfService
-Hoss
Re: SolrJ and Unique Doc ID
Posted by Grant Ingersoll <gs...@apache.org>.
On Feb 11, 2008, at 11:24 PM, Chris Hostetter wrote:
> : Another option is to add it to the responseHeader???? Or it could
> be a quick
> : add to the LukeRH. The former has the advantage that we wouldn't
> have to make
>
> adding the info to LukeRequestHandler makes sense.
>
> Honestly: i can't think of a single use case where client code would
> care
> about what the uniqueKey field is, unless it already *knew* what the
> uniqueKey field is.
:-) Abstractions allow one to use different implementations. My
client/display doesn't know about Solr, it just knows it can search
and the Solr implementation part of it can be pointed at any Solr
instance (or other search engines as well), thus it needs to be able
to "reflect" on Solr. The unique key is a pretty generally useful
thing across implementations.
In fact, I wish all the ReqHandlers had an introspection option, where
one could see what params are supported as well.
>
>
> : Of course, it probably would be useful to be able to request the
> schema from
> : the server and build an IndexSchema object on the client side.
> This could be
> : added to the LukeRH as well.
>
> somebody was working on that at some point ... but i may be thinking
> of
> the Ruby client ... no.... i'm pretty sure i remember it coming up
> in the
> context of Java because i remember dicsussion that a full
> "IndexSchema"
> was too much because it required the client to have the class files
> for
> all of the analysis chain and filedtype classes.
It may be reasonable, as a compromise, to just have metadata about
these things. Sort of like BeanInfo provides.
-Grant
Re: SolrJ and Unique Doc ID
Posted by Chris Hostetter <ho...@fucit.org>.
: Another option is to add it to the responseHeader???? Or it could be a quick
: add to the LukeRH. The former has the advantage that we wouldn't have to make
adding the info to LukeRequestHandler makes sense.
Honestly: i can't think of a single use case where client code would care
about what the uniqueKey field is, unless it already *knew* what the
uniqueKey field is.
: Of course, it probably would be useful to be able to request the schema from
: the server and build an IndexSchema object on the client side. This could be
: added to the LukeRH as well.
somebody was working on that at some point ... but i may be thinking of
the Ruby client ... no.... i'm pretty sure i remember it coming up in the
context of Java because i remember dicsussion that a full "IndexSchema"
was too much because it required the client to have the class files for
all of the analysis chain and filedtype classes.
-Hoss
Re: SolrJ and Unique Doc ID
Posted by Grant Ingersoll <gs...@apache.org>.
Another option is to add it to the responseHeader???? Or it could be
a quick add to the LukeRH. The former has the advantage that we
wouldn't have to make extra calls at the cost of sending an extra
string w/ every message. The latter would work by asking for it up
front and then saving it aside. Any preference? Or, we could add it
to both, making the responseHeader one optional.
Of course, it probably would be useful to be able to request the
schema from the server and build an IndexSchema object on the client
side. This could be added to the LukeRH as well.
Hindsight is 20/20...
On Feb 11, 2008, at 6:51 PM, Ryan McKinley wrote:
> thoughts on requiring that for solrj? perhaps in 2.0? Not
> suggesting it is a good idea (yet)... but we may want to consider it.
>
>
> Yonik Seeley wrote:
>> Hmmm, I should have just mandated that the id field be called "id"
>> from the start :-)
>> On Feb 11, 2008 5:51 PM, Grant Ingersoll <gs...@apache.org> wrote:
>>> What's the best way to retrieve the unique key field from SolrJ?
>>> From
>>> what I can tell, it seems like I would need to retrieve the schema
>>> and
>>> then parse it and get it from there, or am I missing something?
>>>
>>> Thanks,
>>> Grant
>>>
>
Re: SolrJ and Unique Doc ID
Posted by Ryan McKinley <ry...@gmail.com>.
thoughts on requiring that for solrj? perhaps in 2.0? Not suggesting
it is a good idea (yet)... but we may want to consider it.
Yonik Seeley wrote:
> Hmmm, I should have just mandated that the id field be called "id"
> from the start :-)
>
> On Feb 11, 2008 5:51 PM, Grant Ingersoll <gs...@apache.org> wrote:
>> What's the best way to retrieve the unique key field from SolrJ? From
>> what I can tell, it seems like I would need to retrieve the schema and
>> then parse it and get it from there, or am I missing something?
>>
>> Thanks,
>> Grant
>>
>
Re: SolrJ and Unique Doc ID
Posted by Yonik Seeley <yo...@apache.org>.
Hmmm, I should have just mandated that the id field be called "id"
from the start :-)
On Feb 11, 2008 5:51 PM, Grant Ingersoll <gs...@apache.org> wrote:
> What's the best way to retrieve the unique key field from SolrJ? From
> what I can tell, it seems like I would need to retrieve the schema and
> then parse it and get it from there, or am I missing something?
>
> Thanks,
> Grant
>
Re: SolrJ and Unique Doc ID
Posted by Ryan McKinley <ry...@gmail.com>.
right now you need to know the unique key name to get it...
I don't think we have any easy way to get that besides parsing the
schema....
With debugQuery=true, the uniqueKey is added to the 'explain' info:
<lst name="explain">
<str name="id=YOURID,internal_docid=0">
...
this gets parsed into the QueryResults _explainMap and _docIdMap but i'm
not sure that is useful in the general sense...
ryan
Grant Ingersoll wrote:
> What's the best way to retrieve the unique key field from SolrJ? From
> what I can tell, it seems like I would need to retrieve the schema and
> then parse it and get it from there, or am I missing something?
>
> Thanks,
> Grant
>