You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Wulf Berschin <be...@dosco.de> on 2011/01/25 18:40:24 UTC

Highlight Wildcard Queries

Hi,

I'm just migrating our small search customization from Lucene version 
2.3 to the current version (3.0.3) and wonder why, in contrast to the 
old version, we no longer get the Wildcard Queries (which are default, 
since surround the search string with asterisks) highlighted.

We're using the "traditional" Highlighter and don't store documents 
content (in total about 100MB) in the index.

As far I understand in the earlier version a WildCard Query was 
rewritten (expanded) into many Term Queries before highlighting (but in 
practice never exceeded the maximum clauses size since we only wrapped 
strings longer than 3 characters...)

Now however, QueryParser returns a BooleanQuery conataining a 
ConstantScoreQuery which has an enmpty extractTerms method and thus 
doesnt contribute terms for the highlighter.

What do I have to do to get the wildcard query results highlighted?
Use th old Rewrite methods (if possible)??
Go back to Lucene 2.3???

Wulf


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Highlight Wildcard Queries

Posted by Dawn Zoë Raison <da...@digitorial.co.uk>.
Removing redundant calls to rewrite was the key when I had this issue 
moving from 2.3.x to 3.0.x...

Dawn

On 25/01/2011 20:04, Uwe Schindler wrote:
> And: you don't need to rewrite queries before highlighting, highlighter does
> this automatically internally if needed.
>
> -----
> Uwe Schindler



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Highlight Wildcard Queries: Scores

Posted by Uwe Schindler <uw...@thetaphi.de>.
You should still not rewrite yourself and let always Lucene do that. When
you rewrite, Lucene is no longer able to detect the correct rewrite mode, as
it only sees ConstantScore query. Rewrite should only be called by internal
Lucene APIs (e.g. IndexSearcher does it before executing the query) and
(that’s most important) highlighter does it more effective by only using the
One-Document-Index. If you rewrite the query before highlighting against
your IndexSearcher, it gets rewritten against the complete index which may
cost lots of time for wildcards or other MultiTermQueries. If you let
highlighter rewrite it, it will only rewrite it against the one-document
MemoryIndex that needs to be highlighted.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Wulf Berschin [mailto:berschin@dosco.de]
> Sent: Wednesday, January 26, 2011 3:24 PM
> To: java-user@lucene.apache.org
> Subject: Re: Highlight Wildcard Queries: Scores
> 
> Sorry for bothering, that was my fault: I my subclass of QueryParser which
> wraps * around the terms I had not yet considered the new
> multiTermRewriteMethod. After adding these scoring seems to work and
> even the rewrite is possible again.
> 
> Wulf
> 
> Am 26.01.2011 15:10, schrieb Wulf Berschin:
> > Now I have the highlighted wildcards but obviously the scoring is lost.
> > I see that a rewrite of the wildcard query produces a constant score
> > query. I added
> >
> >
> setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY
> _REWRITE
> > );
> >
> > to my QueryParser instance but no effect. What's to be done now?
> >
> > Wulf
> >
> >
> >
> > Am 26.01.2011 11:06, schrieb Wulf Berschin:
> >> Thank you Alexander and Uwe, for your help.
> >>
> >> I read Marks explanation but it seems to me that his changes are not
> >> contained in Lucene-3.0.3.
> >>
> >> So I commented out the rewrite, changed QueryTermScorer back to
> >> QueryScorer and now I got the wildcard queries highlighted again.
> >>
> >> Wulf
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> 
> 
> --
> 
> Mit freundlichen Grüßen,
> 
> Wulf Berschin
> 
> --
> 
> <!--
> **********************************************************
> *******
> * Wulf Berschin                            Telefon: +49 6221 1486 16 *
> * DOSCO Document Systems Consulting GmbH   Telefax: +49 6221 1486 19 *
> * Mannheimer Strasse 1                     E-Mail: berschin@dosco.de *
> * 69115 Heidelberg, Germany                http://www.dosco.de       *
> * Handelsregister: Heidelberg HRB 335122                             *
> * Geschäftsführung: Robert Erfle                                     *
> **********************************************************
> ******** -->
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Highlight Wildcard Queries: Scores

Posted by Wulf Berschin <be...@dosco.de>.
Sorry for bothering, that was my fault: I my subclass of QueryParser 
which wraps * around the terms I had not yet considered the new 
multiTermRewriteMethod. After adding these scoring seems to work and 
even the rewrite is possible again.

Wulf

Am 26.01.2011 15:10, schrieb Wulf Berschin:
> Now I have the highlighted wildcards but obviously the scoring is lost.
> I see that a rewrite of the wildcard query produces a constant score
> query. I added
>
> setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
>
> to my QueryParser instance but no effect. What's to be done now?
>
> Wulf
>
>
>
> Am 26.01.2011 11:06, schrieb Wulf Berschin:
>> Thank you Alexander and Uwe, for your help.
>>
>> I read Marks explanation but it seems to me that his changes are not
>> contained in Lucene-3.0.3.
>>
>> So I commented out the rewrite, changed QueryTermScorer back to
>> QueryScorer and now I got the wildcard queries highlighted again.
>>
>> Wulf
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>


-- 

Mit freundlichen Grüßen,

Wulf Berschin

--

<!-- *****************************************************************
* Wulf Berschin                            Telefon: +49 6221 1486 16 *
* DOSCO Document Systems Consulting GmbH   Telefax: +49 6221 1486 19 *
* Mannheimer Strasse 1                     E-Mail: berschin@dosco.de *
* 69115 Heidelberg, Germany                http://www.dosco.de       *
* Handelsregister: Heidelberg HRB 335122                             *
* Geschäftsführung: Robert Erfle                                     *
****************************************************************** -->


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Highlight Wildcard Queries: Scores

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi again,

Sorry, the TokenFilter for decomposing will also add the original token to
the filter, so for Donaudampfschifffahrtskapitän it will produce the
following tokens:

Donaudampfschifffahrtskapitän, donau, dampf, schiff, fahrts, kapitän

If you assume that, you would use the Decompounder only during Indexing and
on the query side you would leave this tokenfilter out (use 2 different
analyszers like solr does for its "index" and "query" analyzers). No need
for separate fields. On query side no decompounding is done so you can enter
any of the above terms and get a hit.

Stemming should be done after decompounding.

Sorry for misinformation before, sometimes you have to read the
documentation!
Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Wednesday, January 26, 2011 5:34 PM
> To: java-user@lucene.apache.org
> Subject: RE: Highlight Wildcard Queries: Scores
> 
> You can always decompose because QueryParser will also decompose and
> will do-the-right-thing (internal using a PhraseQuery - don't hurt me,
Robert).
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Wulf Berschin [mailto:berschin@dosco.de]
> > Sent: Wednesday, January 26, 2011 5:07 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Highlight Wildcard Queries: Scores
> >
> > Hallo Uwe,
> >
> > yes, thanks for the hint, that sounds good, but it seems to me I would
> then
> > need more fields for all our search modes:
> >
> > Now we have the fields "contents" without stoppwords and with
> stemming
> > and "contents-unstemmed" whithout stemming.
> >
> > The search options are:
> > - whole word (search "contents", no asterisks are being added before
> > search)
> > - exact match (search "contents-unstemmed", implies whole word)
> >
> > When decomposition comes into play I will need a third field
> > "contents- undecomposed" (sorry) to perform the whole word search.
> > Furthermore the contents-unstemmed should not be decomposed as well.
> >
> > Would you still prefer this approach?
> >
> > Viele Grüße aus Heidelberg
> > Wulf
> >
> >
> >
> >
> >
> >
> > Am 26.01.2011 16:00, schrieb Uwe Schindler:
> > > Hi Wulf,
> > >
> > > You should consider decompounding! There are filters based on
> > > dictionaries that support decompounding german words. It’s a
> > > TokenFilter to be put into your analysis chain.
> > > There is a simple Lucene-Rule: Whenever you need wildcards think
> > > about your analysis, you probably did something wrong :-) Add
> > > stemming, decompounding, synonyms,...
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > >> -----Original Message-----
> > >> From: Wulf Berschin [mailto:berschin@dosco.de]
> > >> Sent: Wednesday, January 26, 2011 3:56 PM
> > >> To: java-user@lucene.apache.org
> > >> Subject: Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores
> > >>
> > >> Hi Erick,
> > >>
> > >> good points, but:
> > >>
> > >> our index is fed with german text. In german (in contrast to english)
> > > nouns
> > >> are just appended to create new words. E.g.
> > >>
> > >> Kaffee
> > >> Kaffeemaschine
> > >> Kaffeemaschinensatzbehälter
> > >>
> > >> In our scenario standard fulltext search on "Maschine" shall present
> > >> all
> > > of
> > >> these nouns. That's why we add * before and after on each term.
> > >>
> > >> Of course we provide an option "full words only" which finds none of
> > > these.
> > >>
> > >> Since we do not wrap * around words shorter than 4 characters we
> > >> weren't yet faced with the too many clauses exception.
> > >>
> > >> Greetings
> > >> Wulf
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Highlight Wildcard Queries: Scores

Posted by Uwe Schindler <uw...@thetaphi.de>.
You can always decompose because QueryParser will also decompose and will
do-the-right-thing (internal using a PhraseQuery - don't hurt me, Robert).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Wulf Berschin [mailto:berschin@dosco.de]
> Sent: Wednesday, January 26, 2011 5:07 PM
> To: java-user@lucene.apache.org
> Subject: Re: Highlight Wildcard Queries: Scores
> 
> Hallo Uwe,
> 
> yes, thanks for the hint, that sounds good, but it seems to me I would
then
> need more fields for all our search modes:
> 
> Now we have the fields "contents" without stoppwords and with stemming
> and "contents-unstemmed" whithout stemming.
> 
> The search options are:
> - whole word (search "contents", no asterisks are being added before
> search)
> - exact match (search "contents-unstemmed", implies whole word)
> 
> When decomposition comes into play I will need a third field "contents-
> undecomposed" (sorry) to perform the whole word search.
> Furthermore the contents-unstemmed should not be decomposed as well.
> 
> Would you still prefer this approach?
> 
> Viele Grüße aus Heidelberg
> Wulf
> 
> 
> 
> 
> 
> 
> Am 26.01.2011 16:00, schrieb Uwe Schindler:
> > Hi Wulf,
> >
> > You should consider decompounding! There are filters based on
> > dictionaries that support decompounding german words. It’s a
> > TokenFilter to be put into your analysis chain.
> > There is a simple Lucene-Rule: Whenever you need wildcards think about
> > your analysis, you probably did something wrong :-) Add stemming,
> > decompounding, synonyms,...
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Wulf Berschin [mailto:berschin@dosco.de]
> >> Sent: Wednesday, January 26, 2011 3:56 PM
> >> To: java-user@lucene.apache.org
> >> Subject: Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores
> >>
> >> Hi Erick,
> >>
> >> good points, but:
> >>
> >> our index is fed with german text. In german (in contrast to english)
> > nouns
> >> are just appended to create new words. E.g.
> >>
> >> Kaffee
> >> Kaffeemaschine
> >> Kaffeemaschinensatzbehälter
> >>
> >> In our scenario standard fulltext search on "Maschine" shall present
> >> all
> > of
> >> these nouns. That's why we add * before and after on each term.
> >>
> >> Of course we provide an option "full words only" which finds none of
> > these.
> >>
> >> Since we do not wrap * around words shorter than 4 characters we
> >> weren't yet faced with the too many clauses exception.
> >>
> >> Greetings
> >> Wulf
> >>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Highlight Wildcard Queries: Scores

Posted by Wulf Berschin <be...@dosco.de>.
Hallo Uwe,

yes, thanks for the hint, that sounds good, but it seems to me I would 
then need more fields for all our search modes:

Now we have the fields "contents" without stoppwords and with stemming 
and "contents-unstemmed" whithout stemming.

The search options are:
- whole word (search "contents", no asterisks are being added before search)
- exact match (search "contents-unstemmed", implies whole word)

When decomposition comes into play I will need a third field 
"contents-undecomposed" (sorry) to perform the whole word search. 
Furthermore the contents-unstemmed should not be decomposed as well.

Would you still prefer this approach?

Viele Grüße aus Heidelberg
Wulf






Am 26.01.2011 16:00, schrieb Uwe Schindler:
> Hi Wulf,
>
> You should consider decompounding! There are filters based on dictionaries
> that support decompounding german words. It’s a TokenFilter to be put into
> your analysis chain.
> There is a simple Lucene-Rule: Whenever you need wildcards think about your
> analysis, you probably did something wrong :-) Add stemming, decompounding,
> synonyms,...
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Wulf Berschin [mailto:berschin@dosco.de]
>> Sent: Wednesday, January 26, 2011 3:56 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores
>>
>> Hi Erick,
>>
>> good points, but:
>>
>> our index is fed with german text. In german (in contrast to english)
> nouns
>> are just appended to create new words. E.g.
>>
>> Kaffee
>> Kaffeemaschine
>> Kaffeemaschinensatzbehälter
>>
>> In our scenario standard fulltext search on "Maschine" shall present all
> of
>> these nouns. That's why we add * before and after on each term.
>>
>> Of course we provide an option "full words only" which finds none of
> these.
>>
>> Since we do not wrap * around words shorter than 4 characters we weren't
>> yet faced with the too many clauses exception.
>>
>> Greetings
>> Wulf
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Wulf,

You should consider decompounding! There are filters based on dictionaries
that support decompounding german words. It’s a TokenFilter to be put into
your analysis chain.
There is a simple Lucene-Rule: Whenever you need wildcards think about your
analysis, you probably did something wrong :-) Add stemming, decompounding,
synonyms,...

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Wulf Berschin [mailto:berschin@dosco.de]
> Sent: Wednesday, January 26, 2011 3:56 PM
> To: java-user@lucene.apache.org
> Subject: Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores
> 
> Hi Erick,
> 
> good points, but:
> 
> our index is fed with german text. In german (in contrast to english)
nouns
> are just appended to create new words. E.g.
> 
> Kaffee
> Kaffeemaschine
> Kaffeemaschinensatzbehälter
> 
> In our scenario standard fulltext search on "Maschine" shall present all
of
> these nouns. That's why we add * before and after on each term.
> 
> Of course we provide an option "full words only" which finds none of
these.
> 
> Since we do not wrap * around words shorter than 4 characters we weren't
> yet faced with the too many clauses exception.
> 
> Greetings
> Wulf
> 
> Am 26.01.2011 15:18, schrieb Erick Erickson:
> > It is, I think, a legitimate question to ask whether scoring is
> > worthwhile on wildcards. That is, does it really improve the user
> > experience? Because the MaxBooleanClause gets tripped pretty quickly
> > if you add the terms back in, so you'd have to deal with that.
> >
> > Would your users be satisfied with sorting in the case of only
> > searching on wildcard characters by, say, dates or titles or subjects
> > or.....?
> >
> > Then just let the scoring work on fields that aren't wildcarded...
> >
> > This may not be reasonable in your particular case, but it's worth
> > considering before assuming that scoring is necessary on wildcard
> > terms.
> >
> > Best
> > Erick
> >
> > On Wed, Jan 26, 2011 at 9:10 AM, Wulf Berschin<be...@dosco.de>
> wrote:
> >
> >> Now I have the highlighted wildcards but obviously the scoring is
> >> lost. I see that a rewrite of the wildcard query produces a constant
> >> score query. I added
> >>
> >>
> setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY
> _REWRIT
> >> E);
> >>
> >> to my QueryParser instance but no effect. What's to be done now?
> >>
> >> Wulf
> >>
> >>
> >>
> >> Am 26.01.2011 11:06, schrieb Wulf Berschin:
> >>
> >>> Thank you Alexander and Uwe, for your help.
> >>>
> >>> I read Marks explanation but it seems to me that his changes are not
> >>> contained in Lucene-3.0.3.
> >>>
> >>> So I commented out the rewrite, changed QueryTermScorer back to
> >>> QueryScorer and now I got the wildcard queries highlighted again.
> >>>
> >>> Wulf
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores

Posted by Wulf Berschin <be...@dosco.de>.
Hi Erick,

good points, but:

our index is fed with german text. In german (in contrast to english) 
nouns are just appended to create new words. E.g.

Kaffee
Kaffeemaschine
Kaffeemaschinensatzbehälter

In our scenario standard fulltext search on "Maschine" shall present all 
of these nouns. That's why we add * before and after on each term.

Of course we provide an option "full words only" which finds none of these.

Since we do not wrap * around words shorter than 4 characters we weren't 
yet faced with the too many clauses exception.

Greetings
Wulf

Am 26.01.2011 15:18, schrieb Erick Erickson:
> It is, I think, a legitimate question to ask whether scoring is worthwhile
> on wildcards. That is,
> does it really improve the user experience? Because the MaxBooleanClause
> gets tripped
> pretty quickly if you add the terms back in, so you'd have to deal with
> that.
>
> Would your users be satisfied with sorting in the case of only searching on
> wildcard characters
> by, say, dates or titles or subjects or.....?
>
> Then just let the scoring work on fields that aren't wildcarded...
>
> This may not be reasonable in your particular case, but it's worth
> considering before assuming that
> scoring is necessary on wildcard terms.
>
> Best
> Erick
>
> On Wed, Jan 26, 2011 at 9:10 AM, Wulf Berschin<be...@dosco.de>  wrote:
>
>> Now I have the highlighted wildcards but obviously the scoring is lost. I
>> see that a rewrite of the wildcard query produces a constant score query. I
>> added
>>
>> setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
>>
>> to my QueryParser instance but no effect. What's to be done now?
>>
>> Wulf
>>
>>
>>
>> Am 26.01.2011 11:06, schrieb Wulf Berschin:
>>
>>> Thank you Alexander and Uwe, for your help.
>>>
>>> I read Marks explanation but it seems to me that his changes are not
>>> contained in Lucene-3.0.3.
>>>
>>> So I commented out the rewrite, changed QueryTermScorer back to
>>> QueryScorer and now I got the wildcard queries highlighted again.
>>>
>>> Wulf
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Highlight Wildcard Queries: Scores

Posted by Erick Erickson <er...@gmail.com>.
It is, I think, a legitimate question to ask whether scoring is worthwhile
on wildcards. That is,
does it really improve the user experience? Because the MaxBooleanClause
gets tripped
pretty quickly if you add the terms back in, so you'd have to deal with
that.

Would your users be satisfied with sorting in the case of only searching on
wildcard characters
by, say, dates or titles or subjects or.....?

Then just let the scoring work on fields that aren't wildcarded...

This may not be reasonable in your particular case, but it's worth
considering before assuming that
scoring is necessary on wildcard terms.

Best
Erick

On Wed, Jan 26, 2011 at 9:10 AM, Wulf Berschin <be...@dosco.de> wrote:

> Now I have the highlighted wildcards but obviously the scoring is lost. I
> see that a rewrite of the wildcard query produces a constant score query. I
> added
>
> setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
>
> to my QueryParser instance but no effect. What's to be done now?
>
> Wulf
>
>
>
> Am 26.01.2011 11:06, schrieb Wulf Berschin:
>
>> Thank you Alexander and Uwe, for your help.
>>
>> I read Marks explanation but it seems to me that his changes are not
>> contained in Lucene-3.0.3.
>>
>> So I commented out the rewrite, changed QueryTermScorer back to
>> QueryScorer and now I got the wildcard queries highlighted again.
>>
>> Wulf
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Highlight Wildcard Queries: Scores

Posted by Wulf Berschin <be...@dosco.de>.
Now I have the highlighted wildcards but obviously the scoring is lost. 
I see that a rewrite of the wildcard query produces a constant score 
query. I added
 
setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);

to my QueryParser instance but no effect. What's to be done now?

Wulf



Am 26.01.2011 11:06, schrieb Wulf Berschin:
> Thank you Alexander and Uwe, for your help.
>
> I read Marks explanation but it seems to me that his changes are not
> contained in Lucene-3.0.3.
>
> So I commented out the rewrite, changed QueryTermScorer back to
> QueryScorer and now I got the wildcard queries highlighted again.
>
> Wulf
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Highlight Wildcard Queries

Posted by Wulf Berschin <be...@dosco.de>.
Thank you Alexander and Uwe, for your help.

I read Marks explanation but it seems to me that his changes are not 
contained in Lucene-3.0.3.

So I commented out the rewrite, changed QueryTermScorer back to 
QueryScorer and now I got the wildcard queries highlighted again.

Wulf


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Highlight Wildcard Queries

Posted by Uwe Schindler <uw...@thetaphi.de>.
And: you don't need to rewrite queries before highlighting, highlighter does
this automatically internally if needed.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Tuesday, January 25, 2011 9:02 PM
> To: java-user@lucene.apache.org
> Subject: RE: Highlight Wildcard Queries
> 
> You can set setExpandMultiTermQuery(true) on QueryScorer (or
> WeightedSpanTermExtractor), which is  needed by the Highlighter CTOR -
> that would do exactly what you want with the standard highlighter.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Alexander Kanarsky [mailto:kanarsky2010@gmail.com]
> > Sent: Tuesday, January 25, 2011 8:44 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Highlight Wildcard Queries
> >
> > Hi Wulf,
> >
> > Check the
> > http://www.lucidimagination.com/blog/2009/06/08/bringing-the-
> > highlighter-back-to-wildcard-queries-in-solr-14/
> > this may help. I do not know what of Mark's changes are in Lucene 3.x,
> > but most likely you will just need to set a proper RewriteMethod for
> > the MultiTermQuery somewhere.
> >
> > -Alexander
> >
> > On Tue, Jan 25, 2011 at 9:40 AM, Wulf Berschin <be...@dosco.de>
> wrote:
> > > Hi,
> > >
> > > I'm just migrating our small search customization from Lucene
> > > version
> > > 2.3 to the current version (3.0.3) and wonder why, in contrast to
> > > the old version, we no longer get the Wildcard Queries (which are
> > > default, since surround the search string with asterisks) highlighted.
> > >
> > > We're using the "traditional" Highlighter and don't store documents
> > > content (in total about 100MB) in the index.
> > >
> > > As far I understand in the earlier version a WildCard Query was
> > > rewritten
> > > (expanded) into many Term Queries before highlighting (but in
> > > practice never exceeded the maximum clauses size since we only
> > > wrapped strings longer than
> > > 3 characters...)
> > >
> > > Now however, QueryParser returns a BooleanQuery conataining a
> > > ConstantScoreQuery which has an enmpty extractTerms method and
> thus
> > > doesnt contribute terms for the highlighter.
> > >
> > > What do I have to do to get the wildcard query results highlighted?
> > > Use th old Rewrite methods (if possible)??
> > > Go back to Lucene 2.3???
> > >
> > > Wulf
> > >
> > >
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Highlight Wildcard Queries

Posted by Uwe Schindler <uw...@thetaphi.de>.
You can set setExpandMultiTermQuery(true) on QueryScorer (or
WeightedSpanTermExtractor), which is  needed by the Highlighter CTOR - that
would do exactly what you want with the standard highlighter.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Alexander Kanarsky [mailto:kanarsky2010@gmail.com]
> Sent: Tuesday, January 25, 2011 8:44 PM
> To: java-user@lucene.apache.org
> Subject: Re: Highlight Wildcard Queries
> 
> Hi Wulf,
> 
> Check the http://www.lucidimagination.com/blog/2009/06/08/bringing-the-
> highlighter-back-to-wildcard-queries-in-solr-14/
> this may help. I do not know what of Mark's changes are in Lucene 3.x, but
> most likely you will just need to set a proper RewriteMethod for the
> MultiTermQuery somewhere.
> 
> -Alexander
> 
> On Tue, Jan 25, 2011 at 9:40 AM, Wulf Berschin <be...@dosco.de> wrote:
> > Hi,
> >
> > I'm just migrating our small search customization from Lucene version
> > 2.3 to the current version (3.0.3) and wonder why, in contrast to the
> > old version, we no longer get the Wildcard Queries (which are default,
> > since surround the search string with asterisks) highlighted.
> >
> > We're using the "traditional" Highlighter and don't store documents
> > content (in total about 100MB) in the index.
> >
> > As far I understand in the earlier version a WildCard Query was
> > rewritten
> > (expanded) into many Term Queries before highlighting (but in practice
> > never exceeded the maximum clauses size since we only wrapped strings
> > longer than
> > 3 characters...)
> >
> > Now however, QueryParser returns a BooleanQuery conataining a
> > ConstantScoreQuery which has an enmpty extractTerms method and thus
> > doesnt contribute terms for the highlighter.
> >
> > What do I have to do to get the wildcard query results highlighted?
> > Use th old Rewrite methods (if possible)??
> > Go back to Lucene 2.3???
> >
> > Wulf
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Highlight Wildcard Queries

Posted by Alexander Kanarsky <ka...@gmail.com>.
Hi Wulf,

Check the http://www.lucidimagination.com/blog/2009/06/08/bringing-the-highlighter-back-to-wildcard-queries-in-solr-14/
this may help. I do not know what of Mark's changes are in Lucene 3.x,
but most likely you
will just need to set a proper RewriteMethod for the MultiTermQuery somewhere.

-Alexander

On Tue, Jan 25, 2011 at 9:40 AM, Wulf Berschin <be...@dosco.de> wrote:
> Hi,
>
> I'm just migrating our small search customization from Lucene version 2.3 to
> the current version (3.0.3) and wonder why, in contrast to the old version,
> we no longer get the Wildcard Queries (which are default, since surround the
> search string with asterisks) highlighted.
>
> We're using the "traditional" Highlighter and don't store documents content
> (in total about 100MB) in the index.
>
> As far I understand in the earlier version a WildCard Query was rewritten
> (expanded) into many Term Queries before highlighting (but in practice never
> exceeded the maximum clauses size since we only wrapped strings longer than
> 3 characters...)
>
> Now however, QueryParser returns a BooleanQuery conataining a
> ConstantScoreQuery which has an enmpty extractTerms method and thus doesnt
> contribute terms for the highlighter.
>
> What do I have to do to get the wildcard query results highlighted?
> Use th old Rewrite methods (if possible)??
> Go back to Lucene 2.3???
>
> Wulf
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org