You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dmitry Goldenberg <dm...@weblayers.com> on 2005/12/27 19:55:07 UTC

Wildcard and Fuzzy queries - no best fragments generated - ??

Hello,
 
While testing my code that integrates the Highlighter class from org.apache.lucene.search.highlight, I found out that for wildcard and fuzzy queries, it generates no best fragments.
 
Any particular reason why that is the case?  Shouldn't the highlighter be able to work just like with any other query and highlight any matching token sequences?  E.g. if I'm searching for lava~, I'd expect it to highlight words like lava, java, etc.  This is the whole point of highlighting, is it not?
 
Thanks,
- Dmitry


Re: Wildcard and Fuzzy queries - no best fragments generated - ??

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 27, 2005, at 5:51 PM, Dmitry Goldenberg wrote:
> This is great.  Now, just so I understand how rewrite applies.  Are  
> there specific cases when the rewrite method should or should not  
> be invoked on the query before it is passed on to the search  
> method?  Where I'm going with this is, if I have a UI which passes  
> the user query to the back end, can the back end always call  
> rewrite or not?
> By looking at the code, it seems that things just get reinterpreted  
> as BooleanQuery's and such.  I just want to know if there are any  
> pitfalls to watch out for.

It is safe, and wise, to always call Query.rewrite before  
highlighting.  No worries or pitfalls that I know of.

	Erik



>
> Thanks again,
> - Dmitry
>
> ________________________________
>
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Tue 12/27/2005 12:13 PM
> To: java-user@lucene.apache.org
> Subject: Re: Wildcard and Fuzzy queries - no best fragments  
> generated - ??
>
>
>
>
> On Dec 27, 2005, at 2:34 PM, Dmitry Goldenberg wrote:
>> What do you mean by _rewriting_ the query?  I checked all the
>> classes in the highlighter package and did not see any mention of
>> having to rewrite.
>
>  From Highlighter's package.html in it's javadocs:
>
> <pre>
>         IndexSearcher searcher = new IndexSearcher(ramDir);
>         Query query = QueryParser.parse("Kenne*", FIELD_NAME,  
> analyzer);
>         query = query.rewrite(reader); //required to expand search  
> terms
>         Hits hits = searcher.search(query);
>
>         Highlighter highlighter = new Highlighter(this, new  
> QueryScorer
> (query));
>         for (int i = 0; i &lt; hits.length(); i++)
>         {
>                 String text = hits.doc(i).get(FIELD_NAME);
>                 TokenStream tokenStream = analyzer.tokenStream 
> (FIELD_NAME, new
> StringReader(text));
>                 // Get 3 best fragments and seperate with a "..."
>                 String result = highlighter.getBestFragments 
> (tokenStream, text, 3,
> "...");
>                 System.out.println(result);
>         }
> </pre>
>
>
>
>
>
>> ________________________________
>>
>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>> Sent: Tue 12/27/2005 11:03 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Wildcard and Fuzzy queries - no best fragments
>> generated - ??
>>
>>
>>
>> You have to _rewrite_ the Query for this to work.  This, I believe,
>> is mentioned in the javadocs.
>>
>> I think you are hijacking a thread with your recent postings.  Please
>> create a new message rather than reply to one and change the
>> subject.  Thanks.
>>
>>         Erik
>>
>>
>> On Dec 27, 2005, at 1:55 PM, Dmitry Goldenberg wrote:
>>
>>> Hello,
>>>
>>> While testing my code that integrates the Highlighter class from
>>> org.apache.lucene.search.highlight, I found out that for wildcard
>>> and fuzzy queries, it generates no best fragments.
>>>
>>> Any particular reason why that is the case?  Shouldn't the
>>> highlighter be able to work just like with any other query and
>>> highlight any matching token sequences?  E.g. if I'm searching for
>>> lava~, I'd expect it to highlight words like lava, java, etc.  This
>>> is the whole point of highlighting, is it not?
>>>
>>> Thanks,
>>> - Dmitry
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Wildcard and Fuzzy queries - no best fragments generated - ??

Posted by Dmitry Goldenberg <dm...@weblayers.com>.
Erik,
This is great.  Now, just so I understand how rewrite applies.  Are there specific cases when the rewrite method should or should not be invoked on the query before it is passed on to the search method?  Where I'm going with this is, if I have a UI which passes the user query to the back end, can the back end always call rewrite or not?
 
By looking at the code, it seems that things just get reinterpreted as BooleanQuery's and such.  I just want to know if there are any pitfalls to watch out for.
 
Thanks again,
- Dmitry

________________________________

From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Tue 12/27/2005 12:13 PM
To: java-user@lucene.apache.org
Subject: Re: Wildcard and Fuzzy queries - no best fragments generated - ??




On Dec 27, 2005, at 2:34 PM, Dmitry Goldenberg wrote:
> What do you mean by _rewriting_ the query?  I checked all the 
> classes in the highlighter package and did not see any mention of 
> having to rewrite.

 From Highlighter's package.html in it's javadocs:

<pre>
        IndexSearcher searcher = new IndexSearcher(ramDir);
        Query query = QueryParser.parse("Kenne*", FIELD_NAME, analyzer);
        query = query.rewrite(reader); //required to expand search terms
        Hits hits = searcher.search(query);

        Highlighter highlighter = new Highlighter(this, new QueryScorer
(query));
        for (int i = 0; i &lt; hits.length(); i++)
        {
                String text = hits.doc(i).get(FIELD_NAME);
                TokenStream tokenStream = analyzer.tokenStream(FIELD_NAME, new 
StringReader(text));
                // Get 3 best fragments and seperate with a "..."
                String result = highlighter.getBestFragments(tokenStream, text, 3, 
"...");
                System.out.println(result);
        }
</pre>





> ________________________________
>
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Tue 12/27/2005 11:03 AM
> To: java-user@lucene.apache.org
> Subject: Re: Wildcard and Fuzzy queries - no best fragments 
> generated - ??
>
>
>
> You have to _rewrite_ the Query for this to work.  This, I believe,
> is mentioned in the javadocs.
>
> I think you are hijacking a thread with your recent postings.  Please
> create a new message rather than reply to one and change the
> subject.  Thanks.
>
>         Erik
>
>
> On Dec 27, 2005, at 1:55 PM, Dmitry Goldenberg wrote:
>
>> Hello,
>>
>> While testing my code that integrates the Highlighter class from
>> org.apache.lucene.search.highlight, I found out that for wildcard
>> and fuzzy queries, it generates no best fragments.
>>
>> Any particular reason why that is the case?  Shouldn't the
>> highlighter be able to work just like with any other query and
>> highlight any matching token sequences?  E.g. if I'm searching for
>> lava~, I'd expect it to highlight words like lava, java, etc.  This
>> is the whole point of highlighting, is it not?
>>
>> Thanks,
>> - Dmitry
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





Re: Wildcard and Fuzzy queries - no best fragments generated - ??

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 27, 2005, at 2:34 PM, Dmitry Goldenberg wrote:
> What do you mean by _rewriting_ the query?  I checked all the  
> classes in the highlighter package and did not see any mention of  
> having to rewrite.

 From Highlighter's package.html in it's javadocs:

<pre>
	IndexSearcher searcher = new IndexSearcher(ramDir);
	Query query = QueryParser.parse("Kenne*", FIELD_NAME, analyzer);
	query = query.rewrite(reader); //required to expand search terms
	Hits hits = searcher.search(query);

	Highlighter highlighter = new Highlighter(this, new QueryScorer 
(query));
	for (int i = 0; i &lt; hits.length(); i++)
	{
		String text = hits.doc(i).get(FIELD_NAME);
		TokenStream tokenStream = analyzer.tokenStream(FIELD_NAME, new  
StringReader(text));
		// Get 3 best fragments and seperate with a "..."
		String result = highlighter.getBestFragments(tokenStream, text, 3,  
"...");
		System.out.println(result);
	}
</pre>





> ________________________________
>
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Tue 12/27/2005 11:03 AM
> To: java-user@lucene.apache.org
> Subject: Re: Wildcard and Fuzzy queries - no best fragments  
> generated - ??
>
>
>
> You have to _rewrite_ the Query for this to work.  This, I believe,
> is mentioned in the javadocs.
>
> I think you are hijacking a thread with your recent postings.  Please
> create a new message rather than reply to one and change the
> subject.  Thanks.
>
>         Erik
>
>
> On Dec 27, 2005, at 1:55 PM, Dmitry Goldenberg wrote:
>
>> Hello,
>>
>> While testing my code that integrates the Highlighter class from
>> org.apache.lucene.search.highlight, I found out that for wildcard
>> and fuzzy queries, it generates no best fragments.
>>
>> Any particular reason why that is the case?  Shouldn't the
>> highlighter be able to work just like with any other query and
>> highlight any matching token sequences?  E.g. if I'm searching for
>> lava~, I'd expect it to highlight words like lava, java, etc.  This
>> is the whole point of highlighting, is it not?
>>
>> Thanks,
>> - Dmitry
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Wildcard and Fuzzy queries - no best fragments generated - ??

Posted by Dmitry Goldenberg <dm...@weblayers.com>.
Erik,
What do you mean by _rewriting_ the query?  I checked all the classes in the highlighter package and did not see any mention of having to rewrite.
 
Sorry for the highjacking, didn't mean to be a terrorist :)
- Dmitry

________________________________

From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Tue 12/27/2005 11:03 AM
To: java-user@lucene.apache.org
Subject: Re: Wildcard and Fuzzy queries - no best fragments generated - ??



You have to _rewrite_ the Query for this to work.  This, I believe, 
is mentioned in the javadocs.

I think you are hijacking a thread with your recent postings.  Please 
create a new message rather than reply to one and change the 
subject.  Thanks.

        Erik


On Dec 27, 2005, at 1:55 PM, Dmitry Goldenberg wrote:

> Hello,
>
> While testing my code that integrates the Highlighter class from 
> org.apache.lucene.search.highlight, I found out that for wildcard 
> and fuzzy queries, it generates no best fragments.
>
> Any particular reason why that is the case?  Shouldn't the 
> highlighter be able to work just like with any other query and 
> highlight any matching token sequences?  E.g. if I'm searching for 
> lava~, I'd expect it to highlight words like lava, java, etc.  This 
> is the whole point of highlighting, is it not?
>
> Thanks,
> - Dmitry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





Re: Wildcard and Fuzzy queries - no best fragments generated - ??

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
You have to _rewrite_ the Query for this to work.  This, I believe,  
is mentioned in the javadocs.

I think you are hijacking a thread with your recent postings.  Please  
create a new message rather than reply to one and change the  
subject.  Thanks.

	Erik


On Dec 27, 2005, at 1:55 PM, Dmitry Goldenberg wrote:

> Hello,
>
> While testing my code that integrates the Highlighter class from  
> org.apache.lucene.search.highlight, I found out that for wildcard  
> and fuzzy queries, it generates no best fragments.
>
> Any particular reason why that is the case?  Shouldn't the  
> highlighter be able to work just like with any other query and  
> highlight any matching token sequences?  E.g. if I'm searching for  
> lava~, I'd expect it to highlight words like lava, java, etc.  This  
> is the whole point of highlighting, is it not?
>
> Thanks,
> - Dmitry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org