You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Saïd Radhouani <r....@gmail.com> on 2010/03/23 17:11:42 UTC

Issue w/ highlighting a String field

I have trouble with highlighting field of type "string". It looks like
highlighting is only working with tokenized fields, f.i., it worked with
text and another type I defined. Is this true, or I'm making a mistake that
is preventing me to have the highlighting option working on string?

Thanks for your help.

Re: Issue w/ highlighting a String field

Posted by Saïd Radhouani <r....@gmail.com>.
Thanks a lot Ahmet. Now I'm gonna learn new thing: how to apply a new patch
:)

Cheers.

2010/3/24 Ahmet Arslan <io...@yahoo.com>

> > Yes, that's what I was expecting. Actually, I'd like
> > to highlight phrases
> > containing stopwords, like <em>Terrain à sehloul</em>
>
> Lucene's FastVectorHighlighter[1] can do that kind of phrase highlighting.
> It seems that solr integration [2] has finished. You need to apply
> SOLR-1268 patch.
>
> [1]
> http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.html
>
> [2]http://issues.apache.org/jira/browse/SOLR-1268
>
>
>
>

Re: Issue w/ highlighting a String field

Posted by Ahmet Arslan <io...@yahoo.com>.
> Yes, that's what I was expecting. Actually, I'd like
> to highlight phrases
> containing stopwords, like <em>Terrain à sehloul</em>

Lucene's FastVectorHighlighter[1] can do that kind of phrase highlighting.
It seems that solr integration [2] has finished. You need to apply SOLR-1268 patch. 

[1]http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.html

[2]http://issues.apache.org/jira/browse/SOLR-1268


      

Re: Issue w/ highlighting a String field

Posted by Saïd Radhouani <r....@gmail.com>.
2010/3/24 Ahmet Arslan <io...@yahoo.com>

>
> > Thank a lot Ahmet. In addition, I want to highlight phrases
> > containing stop
> > words. I guess that the best way is to use a tokenized type
> > without
> > stopwordFilter. Do you agree with me defining a new type
> > for this purpose ?
>
> I am not sure about that. May be solr.CommonGramsFilterFactory can do the
> job. I personally do not perform stop-word removal.
>
> > By he way, I wanted to highlight a phrase using a tokenized
> > field type, but
> > I got wrong result; I tried 2 cases (q=Terrain\
> > sehloul  and q="Terrain
> > sehloul"), and I got the following:
> > <em>Terrain</em> <em>sehloul</em>
>
> This is okey. Were you expecting this? : <em>Terrain sehloul</em>
>
> Yes, that's what I was expecting. Actually, I'd like to highlight phrases
containing stopwords, like <em>Terrain à sehloul</em>

Re: Issue w/ highlighting a String field

Posted by Ahmet Arslan <io...@yahoo.com>.
> Thank a lot Ahmet. In addition, I want to highlight phrases
> containing stop
> words. I guess that the best way is to use a tokenized type
> without
> stopwordFilter. Do you agree with me defining a new type
> for this purpose ?

I am not sure about that. May be solr.CommonGramsFilterFactory can do the job. I personally do not perform stop-word removal.

> By he way, I wanted to highlight a phrase using a tokenized
> field type, but
> I got wrong result; I tried 2 cases (q=Terrain\
> sehloul  and q="Terrain
> sehloul"), and I got the following:
> <em>Terrain</em> <em>sehloul</em>

This is okey. Were you expecting this? : <em>Terrain sehloul</em>


      

Re: Issue w/ highlighting a String field

Posted by Saïd Radhouani <r....@gmail.com>.
2010/3/24 Ahmet Arslan <io...@yahoo.com>

> > With this configuration, the title field is highlighted
> > only when there's a
> > perfect match, i.e., the quoted query equals the title
> > content (f.i.,
> > q="Terrain sehloul" allows highlighting the entire title
> > containing "Terrain
> > sehloul",
>
> Exactly. There should be a *perfect* match for string typed fields to
> return snippets.
>
> > but q=Terrain sehloul doesn't enable to highlight
> > this title. Is
> > there a solution to this problem?
>
> Escaping (using backslash) whitespace can solve this problem.
> q=Terrain\ sehloul
>
> Now i clearly understand you. You have a title field containing 'Terrain
> sehloul' and you want to get highlighting with the query Terrain. You cannot
> do that with type="string". You need a tokenized field type in your case.
>


Thank a lot Ahmet. In addition, I want to highlight phrases containing stop
words. I guess that the best way is to use a tokenized type without
stopwordFilter. Do you agree with me defining a new type for this purpose ?

By he way, I wanted to highlight a phrase using a tokenized field type, but
I got wrong result; I tried 2 cases (q=Terrain\ sehloul  and q="Terrain
sehloul"), and I got the following: <em>Terrain</em> <em>sehloul</em>

Any ideas?
Thanks

Re: Issue w/ highlighting a String field

Posted by Ahmet Arslan <io...@yahoo.com>.
> With this configuration, the title field is highlighted
> only when there's a
> perfect match, i.e., the quoted query equals the title
> content (f.i.,
> q="Terrain sehloul" allows highlighting the entire title
> containing "Terrain
> sehloul", 

Exactly. There should be a *perfect* match for string typed fields to return snippets.

> but q=Terrain sehloul doesn't enable to highlight
> this title. Is
> there a solution to this problem?

Escaping (using backslash) whitespace can solve this problem. 
q=Terrain\ sehloul

Now i clearly understand you. You have a title field containing 'Terrain sehloul' and you want to get highlighting with the query Terrain. You cannot do that with type="string". You need a tokenized field type in your case. 


      

Re: Issue w/ highlighting a String field

Posted by Saïd Radhouani <r....@gmail.com>.
> I didn't know that you are using dismax. In your query fields list there is
> no title field. Probably match is coming from title_tokenized, and when you
> request highlighting from title (hl.fl=title) it returns empty snippets. If
> thats the case it is pretty expected because string typed fields are not
> analyzed. I mean there is no partial matches on string fields. If your title
> contains "Terrain something" q=Terrain won't match this document.
> What are the title fields of returned documents?
>
>
You are right, the match is coming from the title_tokenized, but I also
added the field title to the qf clause, but still not working.


> We should re-write this url (just to query on title field) accourding to
> dismax: /?q=Terrain&debugQuery=on&hl=true&hl.fl=title&qf=title
>
>
/?q=Terrain&debugQuery=on&hl=true&hl.fl=title&qf=title   is not giving any
result, perhaps because title is not tokenized. I tried even phrases with
"", but still not working. On the other hand, I got highlighting
*working*by adding to the above URL the following:
&qf=title_tokenized.

With this configuration, the title field is highlighted only when there's a
perfect match, i.e., the quoted query equals the title content (f.i.,
q="Terrain sehloul" allows highlighting the entire title containing "Terrain
sehloul", but q=Terrain sehloul doesn't enable to highlight this title. Is
there a solution to this problem?

Thanks a lot.

Re: Issue w/ highlighting a String field

Posted by Ahmet Arslan <io...@yahoo.com>.
> I don't have defaultSearchField, instead, I have the
> following qf clause,
> where title_tokenized is a tokenized version of title 
>        <str
> name="qf"> title_tokenized^3 text_description_tokenized
> phonetic_text^0.5</str>


I didn't know that you are using dismax. In your query fields list there is no title field. Probably match is coming from title_tokenized, and when you request highlighting from title (hl.fl=title) it returns empty snippets. If thats the case it is pretty expected because string typed fields are not analyzed. I mean there is no partial matches on string fields. If your title contains "Terrain something" q=Terrain won't match this document.
What are the title fields of returned documents?


> /?q=title:Terrain&debugQuery=on&hl=true&hl.fl=title
> 
> 
> > if it is zero, then it means that your match comes
> from your
> > defaultSearchField (not from title field).
> >
> > if it is not zero, highlighting should work. can you
> confirm this?
> >
> >
> this URL gives zero answer.  Again, I don't have
> defaultSearchField, the
> result is coming from the "qf" clause.

We should re-write this url (just to query on title field) accourding to dismax: /?q=Terrain&debugQuery=on&hl=true&hl.fl=title&qf=title


      

Re: Issue w/ highlighting a String field

Posted by Saïd Radhouani <r....@gmail.com>.
2010/3/24 Ahmet Arslan <io...@yahoo.com>

> > There's a match between the query and
> > the content of field I want to
> > highlight on. Solr is giving me the id of the document
> > matching my query,
> > but it's not displaying the field I want to highlight on.
> >
> > Here's the definition of the field I want to highlight
> > on:        <field
> > name="title" type="string" indexed="false"
> > stored="true"  />
> >
> > And here's part of my URL:
> > /?q=Terrain&debugQuery=on&hl=true&hl.fl=title
>
> With &q=Terrain you are querying your defaultSearchField and requesting
> highlighting from title field.
>

I don't have defaultSearchField, instead, I have the following qf clause,
where title_tokenized is a tokenized version of title         <str
name="qf"> title_tokenized^3 text_description_tokenized
phonetic_text^0.5</str>


>
> What is numFound when you hit this url? Highlighting comes?
>

the numFound is not zero, I get results, and also, in the highlighting
section, I get the id of the docs that matched my query


> /?q=title:Terrain&debugQuery=on&hl=true&hl.fl=title


> if it is zero, then it means that your match comes from your
> defaultSearchField (not from title field).
>
> if it is not zero, highlighting should work. can you confirm this?
>
>
this URL gives zero answer.  Again, I don't have defaultSearchField, the
result is coming from the "qf" clause.

What do you think?

Thanks.

Re: Issue w/ highlighting a String field

Posted by Ahmet Arslan <io...@yahoo.com>.
> There's a match between the query and
> the content of field I want to
> highlight on. Solr is giving me the id of the document
> matching my query,
> but it's not displaying the field I want to highlight on.
> 
> Here's the definition of the field I want to highlight
> on:        <field
> name="title" type="string" indexed="false"
> stored="true"  />
> 
> And here's part of my URL:
> /?q=Terrain&debugQuery=on&hl=true&hl.fl=title

With &q=Terrain you are querying your defaultSearchField and requesting highlighting from title field. 

What is numFound when you hit this url? Highlighting comes?

/?q=title:Terrain&debugQuery=on&hl=true&hl.fl=title

if it is zero, then it means that your match comes from your defaultSearchField (not from title field). 

if it is not zero, highlighting should work. can you confirm this?




      

Re: Issue w/ highlighting a String field

Posted by Saïd Radhouani <r....@gmail.com>.
There's a match between the query and the content of field I want to
highlight on. Solr is giving me the id of the document matching my query,
but it's not displaying the field I want to highlight on.

Here's the definition of the field I want to highlight on:        <field
name="title" type="string" indexed="false" stored="true"  />

And here's part of my URL: /?q=Terrain&debugQuery=on&hl=true&hl.fl=title

If I change the type to "text" instead of "string", the highlighting works
well!

Thanks for your help.
-S.



2010/3/23 Ahmet Arslan <io...@yahoo.com>

> > Thanks Erik. Actually, I restarted
> > and reindexed numers of time, but still
> > not working.
>
> Highlighting on string typed fields perferctly works. See the output of :
>
>
> http://localhost:8983/solr/select/?q=id%3ASOLR1000&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=id
>
> But there must be a match/hit to get highlighting. What is your query and
> candidate field content that you want to highlight?
>
>
>
>

Re: Issue w/ highlighting a String field

Posted by Ahmet Arslan <io...@yahoo.com>.
> Thanks Erik. Actually, I restarted
> and reindexed numers of time, but still
> not working.

Highlighting on string typed fields perferctly works. See the output of :

http://localhost:8983/solr/select/?q=id%3ASOLR1000&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=id

But there must be a match/hit to get highlighting. What is your query and candidate field content that you want to highlight?


      

Re: Issue w/ highlighting a String field

Posted by Saïd Radhouani <r....@gmail.com>.
Thanks Erik. Actually, I restarted and reindexed numers of time, but still
not working.

RE: your question, I intend to use this field for automatic PHRASED
boosting; is that ok?:

        <str name="pf"> title_sort </str>

Thanks.

2010/3/23 Erick Erickson <er...@gmail.com>

> Did you restart solr and reindex? just changing the field definition
> won't help you without reindexing...
>
> One thing worries me about your fragment, you call it text_Sort.
> If you really intend to sort by this field, it may NOT be tokenized,
> you'll probably have to use copyfield....
>
> HTH
> Erick
>
> On Tue, Mar 23, 2010 at 12:45 PM, Saïd Radhouani <r.steve.pdx@gmail.com
> >wrote:
>
> > Thanks Markus. It says that a tokenizer ust be defined for the field.
> > Here's
> > is the fildType I'm using and the field I want to highlight on. As you
> can
> > see, I defined a tokenizer, but it's not working though. Any idea?
> >
> > In the schema:
> >
> >        <fieldType name="text_Sort" class="solr.TextField"
> > sortMissingLast="true" omitNorms="true">
> >            <analyzer>
> >                <tokenizer class="solr.KeywordTokenizerFactory" />
> >                <filter class="solr.LowerCaseFilterFactory" />
> >                <filter class="solr.TrimFilterFactory" />
> >            </analyzer>
> >        </fieldType>
> >
> >        <field name="title_sort" type="text_Sort" indexed="true"
> > stored="true" multiValued="false" />
> >
> > In solrconfig.xml:
> >         <str name="hl.fl">title_sort text_description </str>
> >
> > At the same time, I wanted to highlight phrases (including stop words),
> but
> > it's not working. I use "" and as you can see in my fieldType, I don't
> have
> > a stopword filter. Any idea?
> >
> > Thanks a lot,
> > -S.
> >
> >
> > Thanks
> >
> >
> > 2010/3/23 Markus Jelsma <ma...@buyways.nl>
> >
> > > Hello,
> > >
> > >
> > > Check out the wiki [1] on what options to use for highlighting and
> other
> > > components.
> > >
> > >
> > > [1]: http://wiki.apache.org/solr/FieldOptionsByUseCase
> > >
> > >
> > > Cheers,
> > >
> > >
> > >
> > > On Tuesday 23 March 2010 17:11:42 Saïd Radhouani wrote:
> > > > I have trouble with highlighting field of type "string". It looks
> like
> > > > highlighting is only working with tokenized fields, f.i., it worked
> > with
> > > > text and another type I defined. Is this true, or I'm making a
> mistake
> > > that
> > > > is preventing me to have the highlighting option working on string?
> > > >
> > > > Thanks for your help.
> > > >
> > >
> > > Markus Jelsma - Technisch Architect - Buyways BV
> > > http://www.linkedin.com/in/markus17
> > > 050-8536620 <http://www.linkedin.com/in/markus17%0A050-8536620> /
> > > 06-50258350
> > >
> > >
> >
>

Re: Issue w/ highlighting a String field

Posted by Erick Erickson <er...@gmail.com>.
Did you restart solr and reindex? just changing the field definition
won't help you without reindexing...

One thing worries me about your fragment, you call it text_Sort.
If you really intend to sort by this field, it may NOT be tokenized,
you'll probably have to use copyfield....

HTH
Erick

On Tue, Mar 23, 2010 at 12:45 PM, Saïd Radhouani <r....@gmail.com>wrote:

> Thanks Markus. It says that a tokenizer ust be defined for the field.
> Here's
> is the fildType I'm using and the field I want to highlight on. As you can
> see, I defined a tokenizer, but it's not working though. Any idea?
>
> In the schema:
>
>        <fieldType name="text_Sort" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
>            <analyzer>
>                <tokenizer class="solr.KeywordTokenizerFactory" />
>                <filter class="solr.LowerCaseFilterFactory" />
>                <filter class="solr.TrimFilterFactory" />
>            </analyzer>
>        </fieldType>
>
>        <field name="title_sort" type="text_Sort" indexed="true"
> stored="true" multiValued="false" />
>
> In solrconfig.xml:
>         <str name="hl.fl">title_sort text_description </str>
>
> At the same time, I wanted to highlight phrases (including stop words), but
> it's not working. I use "" and as you can see in my fieldType, I don't have
> a stopword filter. Any idea?
>
> Thanks a lot,
> -S.
>
>
> Thanks
>
>
> 2010/3/23 Markus Jelsma <ma...@buyways.nl>
>
> > Hello,
> >
> >
> > Check out the wiki [1] on what options to use for highlighting and other
> > components.
> >
> >
> > [1]: http://wiki.apache.org/solr/FieldOptionsByUseCase
> >
> >
> > Cheers,
> >
> >
> >
> > On Tuesday 23 March 2010 17:11:42 Saïd Radhouani wrote:
> > > I have trouble with highlighting field of type "string". It looks like
> > > highlighting is only working with tokenized fields, f.i., it worked
> with
> > > text and another type I defined. Is this true, or I'm making a mistake
> > that
> > > is preventing me to have the highlighting option working on string?
> > >
> > > Thanks for your help.
> > >
> >
> > Markus Jelsma - Technisch Architect - Buyways BV
> > http://www.linkedin.com/in/markus17
> > 050-8536620 <http://www.linkedin.com/in/markus17%0A050-8536620> /
> > 06-50258350
> >
> >
>

Re: Issue w/ highlighting a String field

Posted by Saïd Radhouani <r....@gmail.com>.
Thanks Markus. It says that a tokenizer ust be defined for the field. Here's
is the fildType I'm using and the field I want to highlight on. As you can
see, I defined a tokenizer, but it's not working though. Any idea?

In the schema:

        <fieldType name="text_Sort" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
            <analyzer>
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.TrimFilterFactory" />
            </analyzer>
        </fieldType>

        <field name="title_sort" type="text_Sort" indexed="true"
stored="true" multiValued="false" />

In solrconfig.xml:
         <str name="hl.fl">title_sort text_description </str>

At the same time, I wanted to highlight phrases (including stop words), but
it's not working. I use "" and as you can see in my fieldType, I don't have
a stopword filter. Any idea?

Thanks a lot,
-S.


Thanks


2010/3/23 Markus Jelsma <ma...@buyways.nl>

> Hello,
>
>
> Check out the wiki [1] on what options to use for highlighting and other
> components.
>
>
> [1]: http://wiki.apache.org/solr/FieldOptionsByUseCase
>
>
> Cheers,
>
>
>
> On Tuesday 23 March 2010 17:11:42 Saïd Radhouani wrote:
> > I have trouble with highlighting field of type "string". It looks like
> > highlighting is only working with tokenized fields, f.i., it worked with
> > text and another type I defined. Is this true, or I'm making a mistake
> that
> > is preventing me to have the highlighting option working on string?
> >
> > Thanks for your help.
> >
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 <http://www.linkedin.com/in/markus17%0A050-8536620> /
> 06-50258350
>
>

Re: Issue w/ highlighting a String field

Posted by Markus Jelsma <ma...@buyways.nl>.
Hello,


Check out the wiki [1] on what options to use for highlighting and other 
components.


[1]: http://wiki.apache.org/solr/FieldOptionsByUseCase


Cheers,



On Tuesday 23 March 2010 17:11:42 Saïd Radhouani wrote:
> I have trouble with highlighting field of type "string". It looks like
> highlighting is only working with tokenized fields, f.i., it worked with
> text and another type I defined. Is this true, or I'm making a mistake that
> is preventing me to have the highlighting option working on string?
> 
> Thanks for your help.
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350