You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Nathaniel Grove <nd...@virginia.edu> on 2010/07/27 21:43:57 UTC

Difficulties with Highlighting

I'm a relative beginner at SOLR, indexing and searching Unicode Tibetan 
texts. I am trying to use the highlighter but it just returns, empty 
elements, such as:

    <lst name="highlighting">
        <lst name="kt-d-0103-text-v4p262a"/>
    </lst>

What am I doing wrong?

The query that generated that is:

http://www.thlib.org:8080/thdl-solr/thdl-texts/select?indent=on&version=2.2&q=%E0%BD%91%E0%BD%84%E0%BD%B4%E0%BD%A3%E0%BC%8B%E0%BD%98%E0%BD%81%E0%BD%93%E0%BC%8B+AND+type%3Atext&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&hl=true&hl.fl=pg_bo&hl.snippets=50

The hit is in the multivalued field named "pg_bo" and in a doc with that 
id #. I've looked at the various highlighting parameters (not that I 
fully understand them) and tried fiddling with those but nothing helped. 
I did notice that if you change the hl.fl=*. Then you get the type field 
highlighted:

<lst name="highlighting">
    <lst name="kt-d-0103-text-v4p262a">
       <arr name="type">
            <str><em>text</em></str>
        </arr>
    </lst>
</lst>

But that's not much help. We are using a custom Tibetan tokenizer for 
the Unicode Tibetan text fields. Would this have something to do with it?

Any suggestions would be appreciated!

Thanks for your help,

Than Grove

-- 
Nathaniel Grove
Research Associate & Technical Director
Tibetan & Himalayan Library
University of Virginia
http://www.thlib.org

Re: Difficulties with Highlighting

Posted by Nathaniel Grove <nd...@virginia.edu>.

Erik,

You're right on both accounts. I'll upgrade and then check into whether 
our tokenizer is working properly.

Thanks,

Than

Erik Hatcher wrote:
> Than -
>
> Looks like maybe your text_bo field type isn't analyzing how you'd 
> like?   Though that's just a hunch.  I pasted the value of that field 
> returned in the link you provided into your analysis.jsp page and it 
> chunked tokens by whitespace.  Though I could be experiencing a 
> copy/paste/i18n issue.
>
> Also looks like you're on Solr 1.3 - so it's likely quite worth 
> upgrading to 1.4.1 (don't know if that directly affects this 
> highlighting issue, just a general recommendation).
>
>     Erik
>
> On Jul 27, 2010, at 3:43 PM, Nathaniel Grove wrote:
>
>> I'm a relative beginner at SOLR, indexing and searching Unicode 
>> Tibetan texts. I am trying to use the highlighter but it just 
>> returns, empty elements, such as:
>>
>>   <lst name="highlighting">
>>       <lst name="kt-d-0103-text-v4p262a"/>
>>   </lst>
>>
>> What am I doing wrong?
>>
>> The query that generated that is:
>>
>> http://www.thlib.org:8080/thdl-solr/thdl-texts/select?indent=on&version=2.2&q=%E0%BD%91%E0%BD%84%E0%BD%B4%E0%BD%A3%E0%BC%8B%E0%BD%98%E0%BD%81%E0%BD%93%E0%BC%8B+AND+type%3Atext&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&hl=true&hl.fl=pg_bo&hl.snippets=50 
>>
>>
>> The hit is in the multivalued field named "pg_bo" and in a doc with 
>> that id #. I've looked at the various highlighting parameters (not 
>> that I fully understand them) and tried fiddling with those but 
>> nothing helped. I did notice that if you change the hl.fl=*. Then you 
>> get the type field highlighted:
>>
>> <lst name="highlighting">
>>   <lst name="kt-d-0103-text-v4p262a">
>>      <arr name="type">
>>           <str><em>text</em></str>
>>       </arr>
>>   </lst>
>> </lst>
>>
>> But that's not much help. We are using a custom Tibetan tokenizer for 
>> the Unicode Tibetan text fields. Would this have something to do with 
>> it?
>>
>> Any suggestions would be appreciated!
>>
>> Thanks for your help,
>>
>> Than Grove
>>
>> -- 
>> Nathaniel Grove
>> Research Associate & Technical Director
>> Tibetan & Himalayan Library
>> University of Virginia
>> http://www.thlib.org
>>
>
>

-- 
Nathaniel Grove
Research Associate & Technical Director
Tibetan & Himalayan Library
University of Virginia
http://www.thlib.org

Re: Difficulties with Highlighting

Posted by Erik Hatcher <er...@gmail.com>.

Than -

Looks like maybe your text_bo field type isn't analyzing how you'd  
like?   Though that's just a hunch.  I pasted the value of that field  
returned in the link you provided into your analysis.jsp page and it  
chunked tokens by whitespace.  Though I could be experiencing a copy/ 
paste/i18n issue.

Also looks like you're on Solr 1.3 - so it's likely quite worth  
upgrading to 1.4.1 (don't know if that directly affects this  
highlighting issue, just a general recommendation).

	Erik

On Jul 27, 2010, at 3:43 PM, Nathaniel Grove wrote:

> I'm a relative beginner at SOLR, indexing and searching Unicode  
> Tibetan texts. I am trying to use the highlighter but it just  
> returns, empty elements, such as:
>
>   <lst name="highlighting">
>       <lst name="kt-d-0103-text-v4p262a"/>
>   </lst>
>
> What am I doing wrong?
>
> The query that generated that is:
>
> http://www.thlib.org:8080/thdl-solr/thdl-texts/select?indent=on&version=2.2&q=%E0%BD%91%E0%BD%84%E0%BD%B4%E0%BD%A3%E0%BC%8B%E0%BD%98%E0%BD%81%E0%BD%93%E0%BC%8B+AND+type%3Atext&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&hl=true&hl.fl=pg_bo&hl.snippets=50
>
> The hit is in the multivalued field named "pg_bo" and in a doc with  
> that id #. I've looked at the various highlighting parameters (not  
> that I fully understand them) and tried fiddling with those but  
> nothing helped. I did notice that if you change the hl.fl=*. Then  
> you get the type field highlighted:
>
> <lst name="highlighting">
>   <lst name="kt-d-0103-text-v4p262a">
>      <arr name="type">
>           <str><em>text</em></str>
>       </arr>
>   </lst>
> </lst>
>
> But that's not much help. We are using a custom Tibetan tokenizer  
> for the Unicode Tibetan text fields. Would this have something to do  
> with it?
>
> Any suggestions would be appreciated!
>
> Thanks for your help,
>
> Than Grove
>
> -- 
> Nathaniel Grove
> Research Associate & Technical Director
> Tibetan & Himalayan Library
> University of Virginia
> http://www.thlib.org
>