You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jamie Johnson <je...@gmail.com> on 2011/07/15 21:55:35 UTC

Re: Extending Solr Highlighter to pull information from external source

Boy it's been a long time since I first wrote this, sorry for the delay....

I think I have this working as I expect with a test implementation.  I
created the following interface

public interface SolrExternalFieldProvider extends NamedListInitializedPlugin {
	public String[] getFieldContent(String key, SchemaField field,
SolrQueryRequest request);
}

I then added to DefaultSolrHighlighter the following:

in init()

SolrExternalFieldProvider defaultProvider =
solrCore.initPlugins(info.getChildren("externalFieldProvider") ,
externalFieldProviders,SolrExternalFieldProvider.class,null);
	    if(defaultProvider != null){
	    	externalFieldProviders.put("", defaultProvider);
	    	externalFieldProviders.put(null, defaultProvider);
	    }
then in doHighlightByHighlighter I added the following

if(schemaField != null && !schemaField.stored()){
			SolrExternalFieldProvider externalFieldProvider =
this.getExternalFieldProvider(fieldName, params);
			if(externalFieldProvider != null){
	            SchemaField keyField = schema.getUniqueKeyField();
	            String key = doc.getValues(keyField.getName())[0];  //I
know this field exists and is not multivalued
	            if(key != null && key.length() > 0){
	            	docTexts = externalFieldProvider.getFieldContent(key,
schemaField, req);
	            }
			} else {
				docTexts = new String[]{};
			}
		}
		
		else {
        	docTexts = doc.getValues(fieldName);
        }


This worked for me.  I needed to include the req because there are
some additional thing that I need to have from it, I figure this is
probably something else folks will need as well.  I tried to follow
the pattern used for the other highlighter pieces in that you can have
different externalFieldProviders for each field.  I'm more than happy
to share the actual classes with the community or add them to one of
the JIRA issues mentioned below, I haven't done so yet because I don't
know how to build patches.

On Mon, Jun 20, 2011 at 11:47 PM, Michael Sokolov <so...@ifactory.com> wrote:
> I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not
> much going on there
>
> LUCENE-1522 <https://issues.apache.org/jira/browse/LUCENE-1522>has a lot of
> fascinating discussion on this topic though
>
>
>> There is a couple of long lived issues in jira for this (I'd like to try
>> to search
>> them, but I couldn't access jira now).
>>
>> For FVH, it is needed to be modified at Lucene level to use external data.
>>
>> koji
>
> Koji - is that really so?  It appears to me that would could extend
> BaseFragmentsBuilder and override
>
> createFragments(IndexReader reader, int docId,
>      String fieldName, FieldFragList fieldFragList, int maxNumFragments,
>      String[] preTags, String[] postTags, Encoder encoder )
>
> providing a version that retrieves text from some external source rather
> than from Lucene fields.
>
> It sounds to me like a really useful modification in Lucene core would be to
> retain match points that have already been computed during scoring so the
> highlighter doesn't have to attempt to reinvent all that logic!  This has
> all been discussed at length in LUCENE-1522 already, but is there is any
> recent activity?
>
> My hope is that since (at least in my test) search code seems to spend 80%
> of its time highlighting, folks will take up this banner and do the plumbing
> needed to improve it - should lead to huge speed-ups for searching!  I'm
> continuing to read, but not really capable of making a meaningful
> contribution at this point.
>
> -Mike
>

Re: Extending Solr Highlighter to pull information from external source

Posted by Jamie Johnson <je...@gmail.com>.
I haven't seen any interest in this, but for anyone following, I
updated the alternateField logic to support pulling from the external
field if available.  Would be useful to know how to get solr to use
this external field provider in general so we wouldn't have to modify
the highlighter at all, just whatever was building the document.

On Fri, Jul 15, 2011 at 5:08 PM, Jamie Johnson <je...@gmail.com> wrote:
> I tried the patch at SOLR-1397 but it didn't work as I'd expect.
>
> <lst name="highlighting">
>    <lst name="1">
>        <arr name="subject_phonetic">
>            <str><em>Test</em> subject message</str>
>        </arr>
>        <arr name="subject_phonetic_startPos"><int>0</int></arr>
>        <arr name="subject_phonetic_endPos"><int>29</int></arr>
>    </lst>
> </lst>
> The start position is right, but the end position seems to be the
> length of the field.
>
>
> On Fri, Jul 15, 2011 at 4:25 PM, Jamie Johnson <je...@gmail.com> wrote:
>> I added the highlighting code I am using to this JIRA
>> (https://issues.apache.org/jira/browse/SOLR-1397).  Afterwards I
>> noticed this JIRA (https://issues.apache.org/jira/browse/SOLR-1954)
>> which talks about another solution.  I think David's patch would have
>> worked equally well for my problem, just would require later doing the
>> highlighting on the clients end.  I'll have to give this a whirl over
>> the weekend.
>>
>> On Fri, Jul 15, 2011 at 3:55 PM, Jamie Johnson <je...@gmail.com> wrote:
>>> Boy it's been a long time since I first wrote this, sorry for the delay....
>>>
>>> I think I have this working as I expect with a test implementation.  I
>>> created the following interface
>>>
>>> public interface SolrExternalFieldProvider extends NamedListInitializedPlugin {
>>>        public String[] getFieldContent(String key, SchemaField field,
>>> SolrQueryRequest request);
>>> }
>>>
>>> I then added to DefaultSolrHighlighter the following:
>>>
>>> in init()
>>>
>>> SolrExternalFieldProvider defaultProvider =
>>> solrCore.initPlugins(info.getChildren("externalFieldProvider") ,
>>> externalFieldProviders,SolrExternalFieldProvider.class,null);
>>>            if(defaultProvider != null){
>>>                externalFieldProviders.put("", defaultProvider);
>>>                externalFieldProviders.put(null, defaultProvider);
>>>            }
>>> then in doHighlightByHighlighter I added the following
>>>
>>> if(schemaField != null && !schemaField.stored()){
>>>                        SolrExternalFieldProvider externalFieldProvider =
>>> this.getExternalFieldProvider(fieldName, params);
>>>                        if(externalFieldProvider != null){
>>>                    SchemaField keyField = schema.getUniqueKeyField();
>>>                    String key = doc.getValues(keyField.getName())[0];  //I
>>> know this field exists and is not multivalued
>>>                    if(key != null && key.length() > 0){
>>>                        docTexts = externalFieldProvider.getFieldContent(key,
>>> schemaField, req);
>>>                    }
>>>                        } else {
>>>                                docTexts = new String[]{};
>>>                        }
>>>                }
>>>
>>>                else {
>>>                docTexts = doc.getValues(fieldName);
>>>        }
>>>
>>>
>>> This worked for me.  I needed to include the req because there are
>>> some additional thing that I need to have from it, I figure this is
>>> probably something else folks will need as well.  I tried to follow
>>> the pattern used for the other highlighter pieces in that you can have
>>> different externalFieldProviders for each field.  I'm more than happy
>>> to share the actual classes with the community or add them to one of
>>> the JIRA issues mentioned below, I haven't done so yet because I don't
>>> know how to build patches.
>>>
>>> On Mon, Jun 20, 2011 at 11:47 PM, Michael Sokolov <so...@ifactory.com> wrote:
>>>> I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not
>>>> much going on there
>>>>
>>>> LUCENE-1522 <https://issues.apache.org/jira/browse/LUCENE-1522>has a lot of
>>>> fascinating discussion on this topic though
>>>>
>>>>
>>>>> There is a couple of long lived issues in jira for this (I'd like to try
>>>>> to search
>>>>> them, but I couldn't access jira now).
>>>>>
>>>>> For FVH, it is needed to be modified at Lucene level to use external data.
>>>>>
>>>>> koji
>>>>
>>>> Koji - is that really so?  It appears to me that would could extend
>>>> BaseFragmentsBuilder and override
>>>>
>>>> createFragments(IndexReader reader, int docId,
>>>>      String fieldName, FieldFragList fieldFragList, int maxNumFragments,
>>>>      String[] preTags, String[] postTags, Encoder encoder )
>>>>
>>>> providing a version that retrieves text from some external source rather
>>>> than from Lucene fields.
>>>>
>>>> It sounds to me like a really useful modification in Lucene core would be to
>>>> retain match points that have already been computed during scoring so the
>>>> highlighter doesn't have to attempt to reinvent all that logic!  This has
>>>> all been discussed at length in LUCENE-1522 already, but is there is any
>>>> recent activity?
>>>>
>>>> My hope is that since (at least in my test) search code seems to spend 80%
>>>> of its time highlighting, folks will take up this banner and do the plumbing
>>>> needed to improve it - should lead to huge speed-ups for searching!  I'm
>>>> continuing to read, but not really capable of making a meaningful
>>>> contribution at this point.
>>>>
>>>> -Mike
>>>>
>>>
>>
>

Re: Extending Solr Highlighter to pull information from external source

Posted by Jamie Johnson <je...@gmail.com>.
I tried the patch at SOLR-1397 but it didn't work as I'd expect.

<lst name="highlighting">
    <lst name="1">
        <arr name="subject_phonetic">
            <str><em>Test</em> subject message</str>
        </arr>
        <arr name="subject_phonetic_startPos"><int>0</int></arr>
        <arr name="subject_phonetic_endPos"><int>29</int></arr>
    </lst>
</lst>
The start position is right, but the end position seems to be the
length of the field.


On Fri, Jul 15, 2011 at 4:25 PM, Jamie Johnson <je...@gmail.com> wrote:
> I added the highlighting code I am using to this JIRA
> (https://issues.apache.org/jira/browse/SOLR-1397).  Afterwards I
> noticed this JIRA (https://issues.apache.org/jira/browse/SOLR-1954)
> which talks about another solution.  I think David's patch would have
> worked equally well for my problem, just would require later doing the
> highlighting on the clients end.  I'll have to give this a whirl over
> the weekend.
>
> On Fri, Jul 15, 2011 at 3:55 PM, Jamie Johnson <je...@gmail.com> wrote:
>> Boy it's been a long time since I first wrote this, sorry for the delay....
>>
>> I think I have this working as I expect with a test implementation.  I
>> created the following interface
>>
>> public interface SolrExternalFieldProvider extends NamedListInitializedPlugin {
>>        public String[] getFieldContent(String key, SchemaField field,
>> SolrQueryRequest request);
>> }
>>
>> I then added to DefaultSolrHighlighter the following:
>>
>> in init()
>>
>> SolrExternalFieldProvider defaultProvider =
>> solrCore.initPlugins(info.getChildren("externalFieldProvider") ,
>> externalFieldProviders,SolrExternalFieldProvider.class,null);
>>            if(defaultProvider != null){
>>                externalFieldProviders.put("", defaultProvider);
>>                externalFieldProviders.put(null, defaultProvider);
>>            }
>> then in doHighlightByHighlighter I added the following
>>
>> if(schemaField != null && !schemaField.stored()){
>>                        SolrExternalFieldProvider externalFieldProvider =
>> this.getExternalFieldProvider(fieldName, params);
>>                        if(externalFieldProvider != null){
>>                    SchemaField keyField = schema.getUniqueKeyField();
>>                    String key = doc.getValues(keyField.getName())[0];  //I
>> know this field exists and is not multivalued
>>                    if(key != null && key.length() > 0){
>>                        docTexts = externalFieldProvider.getFieldContent(key,
>> schemaField, req);
>>                    }
>>                        } else {
>>                                docTexts = new String[]{};
>>                        }
>>                }
>>
>>                else {
>>                docTexts = doc.getValues(fieldName);
>>        }
>>
>>
>> This worked for me.  I needed to include the req because there are
>> some additional thing that I need to have from it, I figure this is
>> probably something else folks will need as well.  I tried to follow
>> the pattern used for the other highlighter pieces in that you can have
>> different externalFieldProviders for each field.  I'm more than happy
>> to share the actual classes with the community or add them to one of
>> the JIRA issues mentioned below, I haven't done so yet because I don't
>> know how to build patches.
>>
>> On Mon, Jun 20, 2011 at 11:47 PM, Michael Sokolov <so...@ifactory.com> wrote:
>>> I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not
>>> much going on there
>>>
>>> LUCENE-1522 <https://issues.apache.org/jira/browse/LUCENE-1522>has a lot of
>>> fascinating discussion on this topic though
>>>
>>>
>>>> There is a couple of long lived issues in jira for this (I'd like to try
>>>> to search
>>>> them, but I couldn't access jira now).
>>>>
>>>> For FVH, it is needed to be modified at Lucene level to use external data.
>>>>
>>>> koji
>>>
>>> Koji - is that really so?  It appears to me that would could extend
>>> BaseFragmentsBuilder and override
>>>
>>> createFragments(IndexReader reader, int docId,
>>>      String fieldName, FieldFragList fieldFragList, int maxNumFragments,
>>>      String[] preTags, String[] postTags, Encoder encoder )
>>>
>>> providing a version that retrieves text from some external source rather
>>> than from Lucene fields.
>>>
>>> It sounds to me like a really useful modification in Lucene core would be to
>>> retain match points that have already been computed during scoring so the
>>> highlighter doesn't have to attempt to reinvent all that logic!  This has
>>> all been discussed at length in LUCENE-1522 already, but is there is any
>>> recent activity?
>>>
>>> My hope is that since (at least in my test) search code seems to spend 80%
>>> of its time highlighting, folks will take up this banner and do the plumbing
>>> needed to improve it - should lead to huge speed-ups for searching!  I'm
>>> continuing to read, but not really capable of making a meaningful
>>> contribution at this point.
>>>
>>> -Mike
>>>
>>
>

Re: Extending Solr Highlighter to pull information from external source

Posted by Jamie Johnson <je...@gmail.com>.
I added the highlighting code I am using to this JIRA
(https://issues.apache.org/jira/browse/SOLR-1397).  Afterwards I
noticed this JIRA (https://issues.apache.org/jira/browse/SOLR-1954)
which talks about another solution.  I think David's patch would have
worked equally well for my problem, just would require later doing the
highlighting on the clients end.  I'll have to give this a whirl over
the weekend.

On Fri, Jul 15, 2011 at 3:55 PM, Jamie Johnson <je...@gmail.com> wrote:
> Boy it's been a long time since I first wrote this, sorry for the delay....
>
> I think I have this working as I expect with a test implementation.  I
> created the following interface
>
> public interface SolrExternalFieldProvider extends NamedListInitializedPlugin {
>        public String[] getFieldContent(String key, SchemaField field,
> SolrQueryRequest request);
> }
>
> I then added to DefaultSolrHighlighter the following:
>
> in init()
>
> SolrExternalFieldProvider defaultProvider =
> solrCore.initPlugins(info.getChildren("externalFieldProvider") ,
> externalFieldProviders,SolrExternalFieldProvider.class,null);
>            if(defaultProvider != null){
>                externalFieldProviders.put("", defaultProvider);
>                externalFieldProviders.put(null, defaultProvider);
>            }
> then in doHighlightByHighlighter I added the following
>
> if(schemaField != null && !schemaField.stored()){
>                        SolrExternalFieldProvider externalFieldProvider =
> this.getExternalFieldProvider(fieldName, params);
>                        if(externalFieldProvider != null){
>                    SchemaField keyField = schema.getUniqueKeyField();
>                    String key = doc.getValues(keyField.getName())[0];  //I
> know this field exists and is not multivalued
>                    if(key != null && key.length() > 0){
>                        docTexts = externalFieldProvider.getFieldContent(key,
> schemaField, req);
>                    }
>                        } else {
>                                docTexts = new String[]{};
>                        }
>                }
>
>                else {
>                docTexts = doc.getValues(fieldName);
>        }
>
>
> This worked for me.  I needed to include the req because there are
> some additional thing that I need to have from it, I figure this is
> probably something else folks will need as well.  I tried to follow
> the pattern used for the other highlighter pieces in that you can have
> different externalFieldProviders for each field.  I'm more than happy
> to share the actual classes with the community or add them to one of
> the JIRA issues mentioned below, I haven't done so yet because I don't
> know how to build patches.
>
> On Mon, Jun 20, 2011 at 11:47 PM, Michael Sokolov <so...@ifactory.com> wrote:
>> I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not
>> much going on there
>>
>> LUCENE-1522 <https://issues.apache.org/jira/browse/LUCENE-1522>has a lot of
>> fascinating discussion on this topic though
>>
>>
>>> There is a couple of long lived issues in jira for this (I'd like to try
>>> to search
>>> them, but I couldn't access jira now).
>>>
>>> For FVH, it is needed to be modified at Lucene level to use external data.
>>>
>>> koji
>>
>> Koji - is that really so?  It appears to me that would could extend
>> BaseFragmentsBuilder and override
>>
>> createFragments(IndexReader reader, int docId,
>>      String fieldName, FieldFragList fieldFragList, int maxNumFragments,
>>      String[] preTags, String[] postTags, Encoder encoder )
>>
>> providing a version that retrieves text from some external source rather
>> than from Lucene fields.
>>
>> It sounds to me like a really useful modification in Lucene core would be to
>> retain match points that have already been computed during scoring so the
>> highlighter doesn't have to attempt to reinvent all that logic!  This has
>> all been discussed at length in LUCENE-1522 already, but is there is any
>> recent activity?
>>
>> My hope is that since (at least in my test) search code seems to spend 80%
>> of its time highlighting, folks will take up this banner and do the plumbing
>> needed to improve it - should lead to huge speed-ups for searching!  I'm
>> continuing to read, but not really capable of making a meaningful
>> contribution at this point.
>>
>> -Mike
>>
>