You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/08/18 00:00:49 UTC

Response Writers and DocLists

I'm looking a little bit at https://issues.apache.org/jira/browse/SOLR-1298 
  and some of the other "pseudo-field" capabilities and am curious how  
the various Response Writers are handling writing out the Docs.  The  
XMLWriter seems to have a very different approach from the others when  
it comes to dealing with multi-valued fields (it sorts first, the  
others don't.)  Does anyone know the history here?

Also, I'm thinking about having a real simple interface that would  
allow for, when materializing the Fields, to pass in something like a  
DocumentModifier, which would basically get the document right before  
it is to be returned (possibly inside the SolrIndexReader, but maybe  
this even belongs at the Lucene level similar to the FieldSelector,  
although it is likely too late for 2.9.)  Through this DocModifier,  
one could easily add fields, etc.

Part of what I think needs to be addressed here is that currently, in  
order to add fields, for instance, LocalSolr does this, one needs to  
iterate over the DocList (or SolrDocList) multiple times.   
SolrPluginUtils.docListtoSolrDocList attempts to help, but it still  
requires a double loop.  The tricky part here is that one often needs  
to have context when modifying the Document that the Response Writer's  
simply do not have, so you end up writing a SearchComponent to do it  
and thus iterating multiple times.

I know this is a bit stream of conscience, but thought I would get it  
out there a little bit to see what others thought.

-Grant

Re: Response Writers and DocLists

Posted by Erik Hatcher <er...@gmail.com>.
On Aug 17, 2009, at 6:59 PM, Ryan McKinley wrote:
> Also with adding a "meta" field, I'm not sure I like that it is a  
> double object like:
> doc.get( "_meta_" ).get( "distance")

It'd be more like:  doc.getMeta().get("distance"), at least.  And  
doc.get("distance") could be made to fetch first the main document and  
if not found search in the meta data.

> It would be nicer if the user does not have any idea if it is a  
> pseudo-field or "real" field.  (by "user" I mean how you consume the  
> response, not how you construct the URL)

I'm kinda ok with the direction this is heading, with the response  
"document" have a pluggable way to add "fields".  My main reluctance  
is really from a Lucene-legacy way of thinking of the stored values  
from the actual Document object as all that should be allowed there.

Things get trickier as we want meta-meta data... like title field,  
title highlighted, and then some more like this for each document, and  
allowing for "namespaces" or some kind of way to keep different values  
that may have the same key from colliding.

> The SQL "as" command comes to mind:
> SELECT name, count(xxx) as cnt

Hmmm, that's an idea.

   fl=title, highlighted(title) as highlighted_title,  
some_function(popularity) as scaled_popularity

	Erik


Re: Response Writers and DocLists

Posted by Ryan McKinley <ry...@gmail.com>.
Ya, I like this idea.

Adding a "meta" field is OK, but it may just be kicking the can.  Also  
implementation wise, it works well when you have a SolrDocument, but  
when directly using DocList, it gets a bit messy.
https://issues.apache.org/jira/browse/SOLR-705

Also with adding a "meta" field, I'm not sure I like that it is a  
double object like:
  doc.get( "_meta_" ).get( "distance")

It would be nicer if the user does not have any idea if it is a pseudo- 
field or "real" field.  (by "user" I mean how you consume the  
response, not how you construct the URL)

The SQL "as" command comes to mind:
  SELECT name, count(xxx) as cnt

ryan



On Aug 17, 2009, at 6:00 PM, Grant Ingersoll wrote:

> I'm looking a little bit at https://issues.apache.org/jira/browse/SOLR-1298 
>  and some of the other "pseudo-field" capabilities and am curious  
> how the various Response Writers are handling writing out the Docs.   
> The XMLWriter seems to have a very different approach from the  
> others when it comes to dealing with multi-valued fields (it sorts  
> first, the others don't.)  Does anyone know the history here?
>
> Also, I'm thinking about having a real simple interface that would  
> allow for, when materializing the Fields, to pass in something like  
> a DocumentModifier, which would basically get the document right  
> before it is to be returned (possibly inside the SolrIndexReader,  
> but maybe this even belongs at the Lucene level similar to the  
> FieldSelector, although it is likely too late for 2.9.)  Through  
> this DocModifier, one could easily add fields, etc.
>
> Part of what I think needs to be addressed here is that currently,  
> in order to add fields, for instance, LocalSolr does this, one needs  
> to iterate over the DocList (or SolrDocList) multiple times.   
> SolrPluginUtils.docListtoSolrDocList attempts to help, but it still  
> requires a double loop.  The tricky part here is that one often  
> needs to have context when modifying the Document that the Response  
> Writer's simply do not have, so you end up writing a SearchComponent  
> to do it and thus iterating multiple times.
>
> I know this is a bit stream of conscience, but thought I would get  
> it out there a little bit to see what others thought.
>
> -Grant


Re: Response Writers and DocLists

Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 18, 2009, at 8:25 AM, Grant Ingersoll wrote:

>
> On Aug 17, 2009, at 10:15 PM, Yonik Seeley wrote:
>
>>
>> Too high level for Lucene I think, and nothing is currently needed  
>> for
>> Lucene - a user calls doc() to get the document and then they can
>> modify or add fields however they want.
>>
>> An interface could be useful for Solr... but getting 1.4 out the door
>> is top priority.
>
> Agreed, I think I may make a simple mod to the SolrPluginUtils for  
> now to allow callback there.  That would at least save one extra  
> loop and it wouldn't require any lower-level changes.

See SOLR-1367 for the SolrPluginUtils change.

Re: Response Writers and DocLists

Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 17, 2009, at 10:15 PM, Yonik Seeley wrote:

>
> Too high level for Lucene I think, and nothing is currently needed for
> Lucene - a user calls doc() to get the document and then they can
> modify or add fields however they want.
>
> An interface could be useful for Solr... but getting 1.4 out the door
> is top priority.

Agreed, I think I may make a simple mod to the SolrPluginUtils for now  
to allow callback there.  That would at least save one extra loop and  
it wouldn't require any lower-level changes.

Re: Response Writers and DocLists

Posted by Yonik Seeley <ys...@gmail.com>.
On Tue, Aug 18, 2009 at 11:20 AM, Ryan McKinley<ry...@gmail.com> wrote:
> dooh, i mean pushing it out of 1.4 (or into 1.5)

+1 - the feature will undoubtedly be very important in the future, but
probably shouldn't be a blocker for 1.4

-Yonik

Re: Response Writers and DocLists

Posted by Ryan McKinley <ry...@gmail.com>.
On Aug 18, 2009, at 10:01 AM, Grant Ingersoll wrote:

>
> On Aug 18, 2009, at 9:49 AM, Ryan McKinley wrote:
>
>>>
>>>
>>>> Also, I'm thinking about having a real simple interface that  
>>>> would allow
>>>> for, when materializing the Fields, to pass in something like a
>>>> DocumentModifier, which would basically get the document right  
>>>> before it is
>>>> to be returned (possibly inside the SolrIndexReader, but maybe  
>>>> this even
>>>> belongs at the Lucene level similar to the FieldSelector,  
>>>> although it is
>>>> likely too late for 2.9.)  Through this DocModifier, one could  
>>>> easily add
>>>> fields, etc.
>>>
>>> Too high level for Lucene I think, and nothing is currently needed  
>>> for
>>> Lucene - a user calls doc() to get the document and then they can
>>> modify or add fields however they want.
>>>
>>> An interface could be useful for Solr... but getting 1.4 out the  
>>> door
>>> is top priority.
>>>
>>
>> Agreed... i am wondering if pushing:
>> https://issues.apache.org/jira/browse/SOLR-705
>> to 1.4 makes sense...
>
> It already is marked for 1.4 and has been for a while.

dooh, i mean pushing it out of 1.4 (or into 1.5)

ryan

Re: Response Writers and DocLists

Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 18, 2009, at 9:49 AM, Ryan McKinley wrote:

>>
>>
>>> Also, I'm thinking about having a real simple interface that would  
>>> allow
>>> for, when materializing the Fields, to pass in something like a
>>> DocumentModifier, which would basically get the document right  
>>> before it is
>>> to be returned (possibly inside the SolrIndexReader, but maybe  
>>> this even
>>> belongs at the Lucene level similar to the FieldSelector, although  
>>> it is
>>> likely too late for 2.9.)  Through this DocModifier, one could  
>>> easily add
>>> fields, etc.
>>
>> Too high level for Lucene I think, and nothing is currently needed  
>> for
>> Lucene - a user calls doc() to get the document and then they can
>> modify or add fields however they want.
>>
>> An interface could be useful for Solr... but getting 1.4 out the door
>> is top priority.
>>
>
> Agreed... i am wondering if pushing:
> https://issues.apache.org/jira/browse/SOLR-705
> to 1.4 makes sense...

It already is marked for 1.4 and has been for a while.

Re: Response Writers and DocLists

Posted by Ryan McKinley <ry...@gmail.com>.
>
>
>> Also, I'm thinking about having a real simple interface that would  
>> allow
>> for, when materializing the Fields, to pass in something like a
>> DocumentModifier, which would basically get the document right  
>> before it is
>> to be returned (possibly inside the SolrIndexReader, but maybe this  
>> even
>> belongs at the Lucene level similar to the FieldSelector, although  
>> it is
>> likely too late for 2.9.)  Through this DocModifier, one could  
>> easily add
>> fields, etc.
>
> Too high level for Lucene I think, and nothing is currently needed for
> Lucene - a user calls doc() to get the document and then they can
> modify or add fields however they want.
>
> An interface could be useful for Solr... but getting 1.4 out the door
> is top priority.
>

Agreed... i am wondering if pushing:
https://issues.apache.org/jira/browse/SOLR-705
to 1.4 makes sense... since it should probably use the same interface/ 
strategy

ryan


Re: Response Writers and DocLists

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Aug 17, 2009 at 6:00 PM, Grant Ingersoll<gs...@apache.org> wrote:
> I'm looking a little bit at
> https://issues.apache.org/jira/browse/SOLR-1298 and some of the other
> "pseudo-field" capabilities and am curious how the various Response Writers
> are handling writing out the Docs.  The XMLWriter seems to have a very
> different approach from the others when it comes to dealing with
> multi-valued fields (it sorts first, the others don't.)  Does anyone know
> the history here?

The first version of Solr didn't know about multiValued fields or not.
 The Lucene Document does not aggregate multiple values for the same
field.  Sorting was used to group the fields and detect if there were
multiple values for any of them.

> Also, I'm thinking about having a real simple interface that would allow
> for, when materializing the Fields, to pass in something like a
> DocumentModifier, which would basically get the document right before it is
> to be returned (possibly inside the SolrIndexReader, but maybe this even
> belongs at the Lucene level similar to the FieldSelector, although it is
> likely too late for 2.9.)  Through this DocModifier, one could easily add
> fields, etc.

Too high level for Lucene I think, and nothing is currently needed for
Lucene - a user calls doc() to get the document and then they can
modify or add fields however they want.

An interface could be useful for Solr... but getting 1.4 out the door
is top priority.

-Yonik
http://www.lucidimagination.com