You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "naveen.a" <na...@gmail.com> on 2008/11/26 10:48:14 UTC

how to search for starts with multiple words in lucene

Hi,

Below is a document in lucene
---------------------------------------------
Field   Value
---------------------------------------------
ID:1
110_a:library and information
---------------------------------------------
I need to search for starts with logic, below are the search cases for the
above document

------------------------------------------------------------------------------
Query                                             Result
------------------------------------------------------------------------------
110_a:l*                                           ID - 1
110_a:library*                                   ID - 1
110_a:library *                                  No Results
110_a:library a*                                No Results
110_a:"library a*"                              No Results
------------------------------------------------------------------------------
here, if i apply single word for starts with search, it is found,
but if i add any space after the first word, it is not found

so, how to apply the query to search for starts with multiple words
-- 
View this message in context: http://www.nabble.com/how-to-search-for-starts-with-multiple-words-in-lucene-tp20697741p20697741.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: how to search for starts with multiple words in lucene

Posted by "naveen.a" <na...@gmail.com>.

Hi,

Thanks for your replies,

please go to this link for the actual problem

http://www.nabble.com/SpanFirstQuery-is-not-taking-wildcard-characters-(like-*)-as-a-logical-operator-for-the-preffix-td20719556.html#a20719556
http://www.nabble.com/SpanFirstQuery-is-not-taking-wildcard-characters-(like-*)-as-a-logical-operator-for-the-preffix-td20719556.html#a20719556 


Erick Erickson wrote:
> 
> Your problem here is probably tokenization at query time.
> 
> Queries like 110_a:library a* would search field 110_a for
> library and your default field for a*. You might try
> +110a_:library +110a_:a*, but I doubt that's really
> what you want since there's no guarantee that the terms
> will be next to each other.
> 
> Note that phrase queries l don't go through the wildcard
> parsers, so searching for "library a*" in quotes) won't do
> what you want.
> 
> You might want to look at the SpanQuery family. It's unclear
> whether you'd expect a hit on something that started in the
> middle, but you can get around this by adding a synthetic
> token at the start of each field you index then adding that to
> each query. Something like:
> doc.add("field", "$ <original text>", blah blah)
> at index time and then add the "$" (or whatever) at query time
> if you require that the match never hit in the middle.
> 
> Another possibility would be to index using something like
> KeywordAnalyzer, but this assumes that you never want
> to search for anything in that field that starts in the middle.
> 
> Best
> Erick
> 
> On Wed, Nov 26, 2008 at 4:48 AM, naveen.a <na...@gmail.com> wrote:
> 
>>
>> Hi,
>>
>> Below is a document in lucene
>> ---------------------------------------------
>> Field   Value
>> ---------------------------------------------
>> ID:1
>> 110_a:library and information
>> ---------------------------------------------
>> I need to search for starts with logic, below are the search cases for
>> the
>> above document
>>
>>
>> ------------------------------------------------------------------------------
>> Query                                             Result
>>
>> ------------------------------------------------------------------------------
>> 110_a:l*                                           ID - 1
>> 110_a:library*                                   ID - 1
>> 110_a:library *                                  No Results
>> 110_a:library a*                                No Results
>> 110_a:"library a*"                              No Results
>>
>> ------------------------------------------------------------------------------
>> here, if i apply single word for starts with search, it is found,
>> but if i add any space after the first word, it is not found
>>
>> so, how to apply the query to search for starts with multiple words
>> --
>> View this message in context:
>> http://www.nabble.com/how-to-search-for-starts-with-multiple-words-in-lucene-tp20697741p20697741.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-search-for-starts-with-multiple-words-in-lucene-tp20697741p20731826.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: how to search for starts with multiple words in lucene

Posted by Erick Erickson <er...@gmail.com>.

Your problem here is probably tokenization at query time.

Queries like 110_a:library a* would search field 110_a for
library and your default field for a*. You might try
+110a_:library +110a_:a*, but I doubt that's really
what you want since there's no guarantee that the terms
will be next to each other.

Note that phrase queries l don't go through the wildcard
parsers, so searching for "library a*" in quotes) won't do
what you want.

You might want to look at the SpanQuery family. It's unclear
whether you'd expect a hit on something that started in the
middle, but you can get around this by adding a synthetic
token at the start of each field you index then adding that to
each query. Something like:
doc.add("field", "$ <original text>", blah blah)
at index time and then add the "$" (or whatever) at query time
if you require that the match never hit in the middle.

Another possibility would be to index using something like
KeywordAnalyzer, but this assumes that you never want
to search for anything in that field that starts in the middle.

Best
Erick

On Wed, Nov 26, 2008 at 4:48 AM, naveen.a <na...@gmail.com> wrote:

>
> Hi,
>
> Below is a document in lucene
> ---------------------------------------------
> Field   Value
> ---------------------------------------------
> ID:1
> 110_a:library and information
> ---------------------------------------------
> I need to search for starts with logic, below are the search cases for the
> above document
>
>
> ------------------------------------------------------------------------------
> Query                                             Result
>
> ------------------------------------------------------------------------------
> 110_a:l*                                           ID - 1
> 110_a:library*                                   ID - 1
> 110_a:library *                                  No Results
> 110_a:library a*                                No Results
> 110_a:"library a*"                              No Results
>
> ------------------------------------------------------------------------------
> here, if i apply single word for starts with search, it is found,
> but if i add any space after the first word, it is not found
>
> so, how to apply the query to search for starts with multiple words
> --
> View this message in context:
> http://www.nabble.com/how-to-search-for-starts-with-multiple-words-in-lucene-tp20697741p20697741.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: how to search for starts with multiple words in lucene

Posted by AlexElba <ra...@yahoo.com>.

Hi,
I think you can achieve your goal using StandardAnalyzer during indexing and
for search, and use WildcardQuery for Query I think it will work!!


naveen.a wrote:
> 
> Hi,
> 
> Below is a document in lucene
> ---------------------------------------------
> Field   Value
> ---------------------------------------------
> ID:1
> 110_a:library and information
> ---------------------------------------------
> I need to search for starts with logic, below are the search cases for the
> above document
> 
> ------------------------------------------------------------------------------
> Query                                             Result
> ------------------------------------------------------------------------------
> 110_a:l*                                           ID - 1
> 110_a:library*                                   ID - 1
> 110_a:library *                                  No Results
> 110_a:library a*                                No Results
> 110_a:"library a*"                              No Results
> ------------------------------------------------------------------------------
> here, if i apply single word for starts with search, it is found,
> but if i add any space after the first word, it is not found
> 
> so, how to apply the query to search for starts with multiple words
> 

-- 
View this message in context: http://www.nabble.com/how-to-search-for-starts-with-multiple-words-in-lucene-tp20697741p20707534.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org