You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ard Schrijvers <a....@hippo.nl> on 2007/08/08 10:28:06 UTC

Fastest way to perform 'like' searches

Hello,

I need to do a search that is capable to also match on substrings, for example:

*oo bar the qu*

should find a document that contains 'foo bar the quux' and 'foo bar the qux'. Now, should I index the text as UN_TOKENIZED also, and do a WildCardQuery on this field? Obviously, then every blobtext is added as a single term in lucene. Clearly, this doesn't scale at all, and searching becomes very slow. 

Does anybody know a more efficient way? A PhraseQuery might get me somewhere, isn't? Does PhraseQuery allow wildcards in the phrase? But, as a phrase is analyzed according some analyzer it might strip the 'the' as a stopword, implying that *oo bar qu* would also match, right?

I know the requirements is a little strange, but it is part of the JSR-170 specification (sql 'like' or xpath 'jcr:like' which mimics the sql like in db)

Thanks for any pointers 

Ard

-- 

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
a.schrijvers@hippo.nl / ard@apache.org / http://www.hippo.nl
-------------------------------------------------------------- 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: term location in doc

Posted by Chris Hostetter <ho...@fucit.org>.
: In-Reply-To: <20...@danielnaber.de>

http://people.apache.org/~hossman/#threadhijack

When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention.   It makes following discussions in the mailing list archives
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: term location in doc

Posted by Grant Ingersoll <gs...@apache.org>.
If you like term vectors, using the latest Trunk version of Lucene  
and are willing to be a guinea pig.  :-)  See https:// 
issues.apache.org/jira/browse/LUCENE-975

Of course, you can always reanalyze the document as well and keep  
track of the positions as you go.  Maybe take a look at how contrib/ 
highlighter does things.

Or maybe I am misunderstanding things...

-Grant

On Aug 8, 2007, at 6:16 PM, Kevin Chen wrote:

> I can see that termpositions gives an enum with all positions of  
> term in document. I want to do the opposite. Given a position , can  
> I query the document for term at that position in document?
>
>
>
>
> ---------------------------------
> Ready for the edge of your seat? Check out tonight's top picks on  
> Yahoo! TV.

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


term location in doc

Posted by Kevin Chen <ch...@yahoo.com>.
I can see that termpositions gives an enum with all positions of term in document. I want to do the opposite. Given a position , can I query the document for term at that position in document?



       
---------------------------------
Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. 

Re: Fastest way to perform 'like' searches

Posted by Daniel Naber <lu...@danielnaber.de>.
On Wednesday 08 August 2007 10:28, Ard Schrijvers wrote:

> Does anybody know a more efficient way? A PhraseQuery might get me
> somewhere, isn't? 

No, you need to use MultiPhraseQuery, and you will need to first epxand the 
terms with the "*" yourself (e.g. using term enumeration).

> as a phrase is analyzed according some analyzer it might strip the 'the'
> as a stopword, implying that *oo bar qu* would also match, right?

Stopwords need to be removed everywhere, i.e. also from phrases, this way 
they generally work as expected.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: configuration of lucene with jsp

Posted by Chris Hostetter <ho...@fucit.org>.
: Message-ID:
:     <81...@EVSMAIL.Evalueserve.com>
: In-Reply-To: <A9...@hai01.hippo.local>

http://people.apache.org/~hossman/#threadhijack

When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention.   It makes following discussions in the mailing list archives
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


configuration of lucene with jsp

Posted by Neha Modi <ne...@evalueserve.com>.
Hi.

I am new to jsp and I have to integrate lucene with a jsp web
application.  I am facing problems in configuring lucene in my web
application. Can someone please provide me with the correct
configuration and installations steps to do the same?

Regards, 


Neha Modi


 


The information in this e-mail is the property of Evalueserve and is confidential and privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken in reliance on it is prohibited and will be unlawful. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org