You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ard Schrijvers <a....@hippo.nl> on 2007/08/08 10:28:06 UTC
Fastest way to perform 'like' searches
Hello,
I need to do a search that is capable to also match on substrings, for example:
*oo bar the qu*
should find a document that contains 'foo bar the quux' and 'foo bar the qux'. Now, should I index the text as UN_TOKENIZED also, and do a WildCardQuery on this field? Obviously, then every blobtext is added as a single term in lucene. Clearly, this doesn't scale at all, and searching becomes very slow.
Does anybody know a more efficient way? A PhraseQuery might get me somewhere, isn't? Does PhraseQuery allow wildcards in the phrase? But, as a phrase is analyzed according some analyzer it might strip the 'the' as a stopword, implying that *oo bar qu* would also match, right?
I know the requirements is a little strange, but it is part of the JSR-170 specification (sql 'like' or xpath 'jcr:like' which mimics the sql like in db)
Thanks for any pointers
Ard
--
Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel +31 (0)20 5224466
-------------------------------------------------------------
a.schrijvers@hippo.nl / ard@apache.org / http://www.hippo.nl
--------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: term location in doc
Posted by Chris Hostetter <ho...@fucit.org>.
: In-Reply-To: <20...@danielnaber.de>
http://people.apache.org/~hossman/#threadhijack
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention. It makes following discussions in the mailing list archives
particularly difficult.
See Also: http://en.wikipedia.org/wiki/Thread_hijacking
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: term location in doc
Posted by Grant Ingersoll <gs...@apache.org>.
If you like term vectors, using the latest Trunk version of Lucene
and are willing to be a guinea pig. :-) See https://
issues.apache.org/jira/browse/LUCENE-975
Of course, you can always reanalyze the document as well and keep
track of the positions as you go. Maybe take a look at how contrib/
highlighter does things.
Or maybe I am misunderstanding things...
-Grant
On Aug 8, 2007, at 6:16 PM, Kevin Chen wrote:
> I can see that termpositions gives an enum with all positions of
> term in document. I want to do the opposite. Given a position , can
> I query the document for term at that position in document?
>
>
>
>
> ---------------------------------
> Ready for the edge of your seat? Check out tonight's top picks on
> Yahoo! TV.
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
term location in doc
Posted by Kevin Chen <ch...@yahoo.com>.
I can see that termpositions gives an enum with all positions of term in document. I want to do the opposite. Given a position , can I query the document for term at that position in document?
---------------------------------
Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV.
Re: Fastest way to perform 'like' searches
Posted by Daniel Naber <lu...@danielnaber.de>.
On Wednesday 08 August 2007 10:28, Ard Schrijvers wrote:
> Does anybody know a more efficient way? A PhraseQuery might get me
> somewhere, isn't?
No, you need to use MultiPhraseQuery, and you will need to first epxand the
terms with the "*" yourself (e.g. using term enumeration).
> as a phrase is analyzed according some analyzer it might strip the 'the'
> as a stopword, implying that *oo bar qu* would also match, right?
Stopwords need to be removed everywhere, i.e. also from phrases, this way
they generally work as expected.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: configuration of lucene with jsp
Posted by Chris Hostetter <ho...@fucit.org>.
: Message-ID:
: <81...@EVSMAIL.Evalueserve.com>
: In-Reply-To: <A9...@hai01.hippo.local>
http://people.apache.org/~hossman/#threadhijack
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention. It makes following discussions in the mailing list archives
particularly difficult.
See Also: http://en.wikipedia.org/wiki/Thread_hijacking
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
configuration of lucene with jsp
Posted by Neha Modi <ne...@evalueserve.com>.
Hi.
I am new to jsp and I have to integrate lucene with a jsp web
application. I am facing problems in configuring lucene in my web
application. Can someone please provide me with the correct
configuration and installations steps to do the same?
Regards,
Neha Modi
The information in this e-mail is the property of Evalueserve and is confidential and privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken in reliance on it is prohibited and will be unlawful. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org