You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/08/18 00:00:49 UTC
Response Writers and DocLists
I'm looking a little bit at https://issues.apache.org/jira/browse/SOLR-1298
and some of the other "pseudo-field" capabilities and am curious how
the various Response Writers are handling writing out the Docs. The
XMLWriter seems to have a very different approach from the others when
it comes to dealing with multi-valued fields (it sorts first, the
others don't.) Does anyone know the history here?
Also, I'm thinking about having a real simple interface that would
allow for, when materializing the Fields, to pass in something like a
DocumentModifier, which would basically get the document right before
it is to be returned (possibly inside the SolrIndexReader, but maybe
this even belongs at the Lucene level similar to the FieldSelector,
although it is likely too late for 2.9.) Through this DocModifier,
one could easily add fields, etc.
Part of what I think needs to be addressed here is that currently, in
order to add fields, for instance, LocalSolr does this, one needs to
iterate over the DocList (or SolrDocList) multiple times.
SolrPluginUtils.docListtoSolrDocList attempts to help, but it still
requires a double loop. The tricky part here is that one often needs
to have context when modifying the Document that the Response Writer's
simply do not have, so you end up writing a SearchComponent to do it
and thus iterating multiple times.
I know this is a bit stream of conscience, but thought I would get it
out there a little bit to see what others thought.
-Grant
Re: Response Writers and DocLists
Posted by Erik Hatcher <er...@gmail.com>.
On Aug 17, 2009, at 6:59 PM, Ryan McKinley wrote:
> Also with adding a "meta" field, I'm not sure I like that it is a
> double object like:
> doc.get( "_meta_" ).get( "distance")
It'd be more like: doc.getMeta().get("distance"), at least. And
doc.get("distance") could be made to fetch first the main document and
if not found search in the meta data.
> It would be nicer if the user does not have any idea if it is a
> pseudo-field or "real" field. (by "user" I mean how you consume the
> response, not how you construct the URL)
I'm kinda ok with the direction this is heading, with the response
"document" have a pluggable way to add "fields". My main reluctance
is really from a Lucene-legacy way of thinking of the stored values
from the actual Document object as all that should be allowed there.
Things get trickier as we want meta-meta data... like title field,
title highlighted, and then some more like this for each document, and
allowing for "namespaces" or some kind of way to keep different values
that may have the same key from colliding.
> The SQL "as" command comes to mind:
> SELECT name, count(xxx) as cnt
Hmmm, that's an idea.
fl=title, highlighted(title) as highlighted_title,
some_function(popularity) as scaled_popularity
Erik
Re: Response Writers and DocLists
Posted by Ryan McKinley <ry...@gmail.com>.
Ya, I like this idea.
Adding a "meta" field is OK, but it may just be kicking the can. Also
implementation wise, it works well when you have a SolrDocument, but
when directly using DocList, it gets a bit messy.
https://issues.apache.org/jira/browse/SOLR-705
Also with adding a "meta" field, I'm not sure I like that it is a
double object like:
doc.get( "_meta_" ).get( "distance")
It would be nicer if the user does not have any idea if it is a pseudo-
field or "real" field. (by "user" I mean how you consume the
response, not how you construct the URL)
The SQL "as" command comes to mind:
SELECT name, count(xxx) as cnt
ryan
On Aug 17, 2009, at 6:00 PM, Grant Ingersoll wrote:
> I'm looking a little bit at https://issues.apache.org/jira/browse/SOLR-1298
> and some of the other "pseudo-field" capabilities and am curious
> how the various Response Writers are handling writing out the Docs.
> The XMLWriter seems to have a very different approach from the
> others when it comes to dealing with multi-valued fields (it sorts
> first, the others don't.) Does anyone know the history here?
>
> Also, I'm thinking about having a real simple interface that would
> allow for, when materializing the Fields, to pass in something like
> a DocumentModifier, which would basically get the document right
> before it is to be returned (possibly inside the SolrIndexReader,
> but maybe this even belongs at the Lucene level similar to the
> FieldSelector, although it is likely too late for 2.9.) Through
> this DocModifier, one could easily add fields, etc.
>
> Part of what I think needs to be addressed here is that currently,
> in order to add fields, for instance, LocalSolr does this, one needs
> to iterate over the DocList (or SolrDocList) multiple times.
> SolrPluginUtils.docListtoSolrDocList attempts to help, but it still
> requires a double loop. The tricky part here is that one often
> needs to have context when modifying the Document that the Response
> Writer's simply do not have, so you end up writing a SearchComponent
> to do it and thus iterating multiple times.
>
> I know this is a bit stream of conscience, but thought I would get
> it out there a little bit to see what others thought.
>
> -Grant
Re: Response Writers and DocLists
Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 18, 2009, at 8:25 AM, Grant Ingersoll wrote:
>
> On Aug 17, 2009, at 10:15 PM, Yonik Seeley wrote:
>
>>
>> Too high level for Lucene I think, and nothing is currently needed
>> for
>> Lucene - a user calls doc() to get the document and then they can
>> modify or add fields however they want.
>>
>> An interface could be useful for Solr... but getting 1.4 out the door
>> is top priority.
>
> Agreed, I think I may make a simple mod to the SolrPluginUtils for
> now to allow callback there. That would at least save one extra
> loop and it wouldn't require any lower-level changes.
See SOLR-1367 for the SolrPluginUtils change.
Re: Response Writers and DocLists
Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 17, 2009, at 10:15 PM, Yonik Seeley wrote:
>
> Too high level for Lucene I think, and nothing is currently needed for
> Lucene - a user calls doc() to get the document and then they can
> modify or add fields however they want.
>
> An interface could be useful for Solr... but getting 1.4 out the door
> is top priority.
Agreed, I think I may make a simple mod to the SolrPluginUtils for now
to allow callback there. That would at least save one extra loop and
it wouldn't require any lower-level changes.
Re: Response Writers and DocLists
Posted by Yonik Seeley <ys...@gmail.com>.
On Tue, Aug 18, 2009 at 11:20 AM, Ryan McKinley<ry...@gmail.com> wrote:
> dooh, i mean pushing it out of 1.4 (or into 1.5)
+1 - the feature will undoubtedly be very important in the future, but
probably shouldn't be a blocker for 1.4
-Yonik
Re: Response Writers and DocLists
Posted by Ryan McKinley <ry...@gmail.com>.
On Aug 18, 2009, at 10:01 AM, Grant Ingersoll wrote:
>
> On Aug 18, 2009, at 9:49 AM, Ryan McKinley wrote:
>
>>>
>>>
>>>> Also, I'm thinking about having a real simple interface that
>>>> would allow
>>>> for, when materializing the Fields, to pass in something like a
>>>> DocumentModifier, which would basically get the document right
>>>> before it is
>>>> to be returned (possibly inside the SolrIndexReader, but maybe
>>>> this even
>>>> belongs at the Lucene level similar to the FieldSelector,
>>>> although it is
>>>> likely too late for 2.9.) Through this DocModifier, one could
>>>> easily add
>>>> fields, etc.
>>>
>>> Too high level for Lucene I think, and nothing is currently needed
>>> for
>>> Lucene - a user calls doc() to get the document and then they can
>>> modify or add fields however they want.
>>>
>>> An interface could be useful for Solr... but getting 1.4 out the
>>> door
>>> is top priority.
>>>
>>
>> Agreed... i am wondering if pushing:
>> https://issues.apache.org/jira/browse/SOLR-705
>> to 1.4 makes sense...
>
> It already is marked for 1.4 and has been for a while.
dooh, i mean pushing it out of 1.4 (or into 1.5)
ryan
Re: Response Writers and DocLists
Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 18, 2009, at 9:49 AM, Ryan McKinley wrote:
>>
>>
>>> Also, I'm thinking about having a real simple interface that would
>>> allow
>>> for, when materializing the Fields, to pass in something like a
>>> DocumentModifier, which would basically get the document right
>>> before it is
>>> to be returned (possibly inside the SolrIndexReader, but maybe
>>> this even
>>> belongs at the Lucene level similar to the FieldSelector, although
>>> it is
>>> likely too late for 2.9.) Through this DocModifier, one could
>>> easily add
>>> fields, etc.
>>
>> Too high level for Lucene I think, and nothing is currently needed
>> for
>> Lucene - a user calls doc() to get the document and then they can
>> modify or add fields however they want.
>>
>> An interface could be useful for Solr... but getting 1.4 out the door
>> is top priority.
>>
>
> Agreed... i am wondering if pushing:
> https://issues.apache.org/jira/browse/SOLR-705
> to 1.4 makes sense...
It already is marked for 1.4 and has been for a while.
Re: Response Writers and DocLists
Posted by Ryan McKinley <ry...@gmail.com>.
>
>
>> Also, I'm thinking about having a real simple interface that would
>> allow
>> for, when materializing the Fields, to pass in something like a
>> DocumentModifier, which would basically get the document right
>> before it is
>> to be returned (possibly inside the SolrIndexReader, but maybe this
>> even
>> belongs at the Lucene level similar to the FieldSelector, although
>> it is
>> likely too late for 2.9.) Through this DocModifier, one could
>> easily add
>> fields, etc.
>
> Too high level for Lucene I think, and nothing is currently needed for
> Lucene - a user calls doc() to get the document and then they can
> modify or add fields however they want.
>
> An interface could be useful for Solr... but getting 1.4 out the door
> is top priority.
>
Agreed... i am wondering if pushing:
https://issues.apache.org/jira/browse/SOLR-705
to 1.4 makes sense... since it should probably use the same interface/
strategy
ryan
Re: Response Writers and DocLists
Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Aug 17, 2009 at 6:00 PM, Grant Ingersoll<gs...@apache.org> wrote:
> I'm looking a little bit at
> https://issues.apache.org/jira/browse/SOLR-1298 and some of the other
> "pseudo-field" capabilities and am curious how the various Response Writers
> are handling writing out the Docs. The XMLWriter seems to have a very
> different approach from the others when it comes to dealing with
> multi-valued fields (it sorts first, the others don't.) Does anyone know
> the history here?
The first version of Solr didn't know about multiValued fields or not.
The Lucene Document does not aggregate multiple values for the same
field. Sorting was used to group the fields and detect if there were
multiple values for any of them.
> Also, I'm thinking about having a real simple interface that would allow
> for, when materializing the Fields, to pass in something like a
> DocumentModifier, which would basically get the document right before it is
> to be returned (possibly inside the SolrIndexReader, but maybe this even
> belongs at the Lucene level similar to the FieldSelector, although it is
> likely too late for 2.9.) Through this DocModifier, one could easily add
> fields, etc.
Too high level for Lucene I think, and nothing is currently needed for
Lucene - a user calls doc() to get the document and then they can
modify or add fields however they want.
An interface could be useful for Solr... but getting 1.4 out the door
is top priority.
-Yonik
http://www.lucidimagination.com