You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by vanshi <ni...@gmail.com> on 2009/06/01 17:39:23 UTC

Re: No hits while searching!

Thanks Erick, I was able to get this work...as you said ..Luke is a great
tool to look in to what gets stored as indexes though in my case I was
searching before the indexes were created so i was getting zero hits.

On side note, I'm running a strange output with prefix query...it only works
when i have 3 or more than 3 letters in the first name/last name. Any idea
what is going on here? Please see the output from log here.

02:05:20,996 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms in
PhysicianQuerybuilder with exactName=true
02:05:20,996 INFO  [PhysicianQueryBuilder] Before running Prefix query,
First name: ang
02:05:20,996 INFO  [PhysicianQueryBuilder] Before running  Prefix query,
Last name: john
02:05:21,012 INFO  [LuceneIndexService] the query is:
+(FIRST_NAME_EXACT:ang*) +(LAST_NAME_EXACT:john*)
02:05:21,012 INFO  [LuceneIndexService] Result Size: 1

02:06:03,578 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms in
PhysicianQuerybuilder with exactName=true
02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query, First
name: a
02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query, Last
name: johns
02:06:03,578 INFO  [LuceneIndexService] the query is: +()
+(LAST_NAME_EXACT:johns*)
02:06:03,578 INFO  [LuceneIndexService] Result Size: 0

02:08:01,548 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms in
PhysicianQuerybuilder with exactName=true
02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query, First
name: an
02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query, Last
name: johns
02:08:01,548 INFO  [LuceneIndexService] the query is: +()
+(LAST_NAME_EXACT:johns*)
02:08:01,580 INFO  [LuceneIndexService] Result Size: 0

As one can see the query works with first name=ang but not with first name=a
or an.

Appreciate all your inputs.

Vanshi

Erick Erickson wrote:
> 
> The most common issue with this kind of thing is that UN_TOKENIZEDimplies
> no
> case folding. So if your case differs you won't get a match.
> 
> That aside, the very first thing I'd do is get a copy of Luke (google
> Lucene
> Luke)
> and examine the index to see if what's in your index is what you *think*
> is
> in there.
> 
> 
> The second thing I'd do is look at query.toString() to see what the actual
>> query is. You can even paste the output of toString() into Luke and see
>> what happens.
> 
> I'm not sure what buildMultiTermPrefixQuery is all about, but I assume
> you have a good reason for using that. But the other strategy I use for
> this kind of "what happened?" question is to peel back to simpler cases
> until I get what I expect, then build back up until it breaks.....
> 
> But really get a copy of Luke, it's a wonderful tool that'll give you lots
> of
> insight about what's *really* going on...
> 
> Best
> Erick
> 
> On Wed, May 27, 2009 at 12:43 AM, vanshi <ni...@gmail.com> wrote:
> 
>>
>> In my web application, I need search functionality on first name and last
>> name in 2 different ways, one search must be based on 'Metaphone
>> Analyzer'
>> giving all similar sounding names as result and another search should be
>> exact match on either first name or last name. The name sounds like
>> search
>> has already been coded previously and I need to add another exact match
>> search to the application. For this, I have a Lucene Index based out on
>> fields from database tables which already had the names field indexed
>> with
>> metaphone analyzer. I added 2 more fields in the existing document, which
>> indexes first name/last name as UN_TOKENIZED. While searching for exact
>> match, I create a term query to look in to newly created UN_TOKENIZED
>> fields
>> as shown in the code snippets......however this is not getting any hits.
>> I
>> would like to know is there anything wrong conceptually?
>>
>> //creating fields for the document
>> FIRST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>                FIRST_NAME_EXACT(Field.Store.NO,
>> Field.Index.UN_TOKENIZED),
>>                LAST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>                LAST_NAME_EXACT(Field.Store.NO, Field.Index.UN_TOKENIZED),
>>
>> //name sounds like analyzer class....used while Indexing and searching
>> public class NameSoundsLikeAnalyzer extends Analyzer {
>>        PerFieldAnalyzerWrapper wrapper;
>>
>>        /**
>>         *
>>         */
>>        public NameSoundsLikeAnalyzer() {
>>                wrapper = new PerFieldAnalyzerWrapper(new StopAnalyzer());
>>                wrapper.addAnalyzer(
>>
>>  PhysicianDocumentBuilder.PhysicianFieldInfo.FIRST_NAME
>>                                                .toString(), new
>> MetaphoneReplacementAnalyzer());
>>
>>                wrapper.addAnalyzer(
>>
>>  PhysicianDocumentBuilder.PhysicianFieldInfo.LAST_NAME
>>                                                .toString(), new
>> MetaphoneReplacementAnalyzer());
>>
>>        }
>>
>>        /**
>>         * @see PerFieldAnalyzerWrapper#tokenStream(String, Reader)
>>         */
>>        @Override
>>        public TokenStream tokenStream(String fieldName, Reader reader) {
>>                return wrapper.tokenStream(fieldName, reader);
>>        }
>>
>> }
>>
>> //lastly the query builder
>> if(physicianQuery.getExactNameSearch()){
>>
>>  if(StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith())){
>>                                TermQuery term = new TermQuery(new
>> Term(FIRST_NAME_EXACT.toString(),
>> physicianQuery.getFirstNameStartsWith()));
>>                                query.add(term,MUST);
>>
>>                        }
>>
>>  if(StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())){
>>                                TermQuery term = new TermQuery(new
>> Term(LAST_NAME_EXACT.toString(),
>> physicianQuery.getLastNameStartsWith()));
>>                                query.add(term,MUST);
>>
>>                        }
>> else{
>> //we want metaphone search
>> if (StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith())) {
>>
>>  query.add(buildMultiTermPrefixQuery(FIRST_NAME.toString(),
>>
>>  physicianQuery.getFirstNameStartsWith()), MUST);
>>                        }
>>
>>                        if
>> (StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())) {
>>
>>  query.add(buildMultiTermPrefixQuery(LAST_NAME.toString(),
>>
>>  physicianQuery.getLastNameStartsWith()), MUST);
>>                        }
>> }
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/No-hits-while-searching%21-tp23735920p23735920.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/No-hits-while-searching%21-tp23735920p23817012.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: No hits while searching!

Posted by Matthew Hall <mh...@informatics.jax.org>.

Just build your own.

Here's exactly what you are looking for:

(Mind you I just whipped this out, and didn't compile it... so there 
could be minor syntax errors here.)

You will also obviously have to make your own package declaration, and 
your own imports.

So anyhow, the really neat thing about lucene, is being able to do 
exactly what we just did here, you can chain these tokenizers and 
filters together in almost any way you want, and create custom analyzers 
outta them.

Its a good thing to become familiar with, because I will nearly promise 
you that this analyzer here will ALSO probably be insufficient for your 
needs.

Anyhow, hope this helps.

Matt

/**
 * Custom Lowercase Analyzer
 *
 * @author mhall
 *
 * This analyzer tokenizes on whitespace, and then lowercases the token.
 *
 */

public class LowerCaseAnalyzer extends Analyzer {

    public LowerCaseAnalyzer() {
       super();
    }

    /**
     * Worker for this Analyzer.
     *
     * Specifically this analyzer chains together WhitespaceTokenizer ->
     * LowerCaseFilter together to form customized Tokens
     */

    public TokenStream tokenStream(String fieldName, Reader reader) {
        return new LowerCaseFilter(new WhitespaceTokenizer(reader));
    }

}

vanshi wrote:
> Thanks Matt & sithu. Yes, It was due to stop word analyzer...now i'm using a
> simple analyzer temporarily, as I know even simple analyzer cannot handle
> quotes in names. However, can somebody plz direct me towards how to handle
> quotes with the name in query using lowercase analyzer?
>
> thanks,
> Vanshi
>
> Matthew Hall-7 wrote:
>   
>> Yeah, he's gotta be.
>>
>> You might be better of using something like a lowercase analyzer here, 
>> since punctuation in a name is likely important.
>>
>> Matt
>>
>> Sudarsan, Sithu D. wrote:
>>     
>>>  
>>>
>>> Do you use stopword filtering?
>>>
>>> Sincerely,
>>> Sithu D Sudarsan
>>>
>>> -----Original Message-----
>>> From: vanshi [mailto:nilu.thakur@gmail.com] 
>>> Sent: Monday, June 01, 2009 11:39 AM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: No hits while searching!
>>>
>>>
>>> Thanks Erick, I was able to get this work...as you said ..Luke is a
>>> great
>>> tool to look in to what gets stored as indexes though in my case I was
>>> searching before the indexes were created so i was getting zero hits.
>>>
>>> On side note, I'm running a strange output with prefix query...it only
>>> works
>>> when i have 3 or more than 3 letters in the first name/last name. Any
>>> idea
>>> what is going on here? Please see the output from log here.
>>>
>>> 02:05:20,996 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
>>> in
>>> PhysicianQuerybuilder with exactName=true
>>> 02:05:20,996 INFO  [PhysicianQueryBuilder] Before running Prefix query,
>>> First name: ang
>>> 02:05:20,996 INFO  [PhysicianQueryBuilder] Before running  Prefix query,
>>> Last name: john
>>> 02:05:21,012 INFO  [LuceneIndexService] the query is:
>>> +(FIRST_NAME_EXACT:ang*) +(LAST_NAME_EXACT:john*)
>>> 02:05:21,012 INFO  [LuceneIndexService] Result Size: 1
>>>
>>> 02:06:03,578 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
>>> in
>>> PhysicianQuerybuilder with exactName=true
>>> 02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
>>> First
>>> name: a
>>> 02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
>>> Last
>>> name: johns
>>> 02:06:03,578 INFO  [LuceneIndexService] the query is: +()
>>> +(LAST_NAME_EXACT:johns*)
>>> 02:06:03,578 INFO  [LuceneIndexService] Result Size: 0
>>>
>>> 02:08:01,548 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
>>> in
>>> PhysicianQuerybuilder with exactName=true
>>> 02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
>>> First
>>> name: an
>>> 02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
>>> Last
>>> name: johns
>>> 02:08:01,548 INFO  [LuceneIndexService] the query is: +()
>>> +(LAST_NAME_EXACT:johns*)
>>> 02:08:01,580 INFO  [LuceneIndexService] Result Size: 0
>>>
>>> As one can see the query works with first name=ang but not with first
>>> name=a
>>> or an.
>>>
>>> Appreciate all your inputs.
>>>
>>> Vanshi
>>>
>>> Erick Erickson wrote:
>>>   
>>>       
>>>> The most common issue with this kind of thing is that
>>>>     
>>>>         
>>> UN_TOKENIZEDimplies
>>>   
>>>       
>>>> no
>>>> case folding. So if your case differs you won't get a match.
>>>>
>>>> That aside, the very first thing I'd do is get a copy of Luke (google
>>>> Lucene
>>>> Luke)
>>>> and examine the index to see if what's in your index is what you
>>>>     
>>>>         
>>> *think*
>>>   
>>>       
>>>> is
>>>> in there.
>>>>
>>>>
>>>> The second thing I'd do is look at query.toString() to see what the
>>>>     
>>>>         
>>> actual
>>>   
>>>       
>>>>> query is. You can even paste the output of toString() into Luke and
>>>>>       
>>>>>           
>>> see
>>>   
>>>       
>>>>> what happens.
>>>>>       
>>>>>           
>>>> I'm not sure what buildMultiTermPrefixQuery is all about, but I assume
>>>> you have a good reason for using that. But the other strategy I use
>>>>     
>>>>         
>>> for
>>>   
>>>       
>>>> this kind of "what happened?" question is to peel back to simpler
>>>>     
>>>>         
>>> cases
>>>   
>>>       
>>>> until I get what I expect, then build back up until it breaks.....
>>>>
>>>> But really get a copy of Luke, it's a wonderful tool that'll give you
>>>>     
>>>>         
>>> lots
>>>   
>>>       
>>>> of
>>>> insight about what's *really* going on...
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Wed, May 27, 2009 at 12:43 AM, vanshi <ni...@gmail.com>
>>>>     
>>>>         
>>> wrote:
>>>   
>>>       
>>>>> In my web application, I need search functionality on first name and
>>>>>       
>>>>>           
>>> last
>>>   
>>>       
>>>>> name in 2 different ways, one search must be based on 'Metaphone
>>>>> Analyzer'
>>>>> giving all similar sounding names as result and another search should
>>>>>       
>>>>>           
>>> be
>>>   
>>>       
>>>>> exact match on either first name or last name. The name sounds like
>>>>> search
>>>>> has already been coded previously and I need to add another exact
>>>>>       
>>>>>           
>>> match
>>>   
>>>       
>>>>> search to the application. For this, I have a Lucene Index based out
>>>>>       
>>>>>           
>>> on
>>>   
>>>       
>>>>> fields from database tables which already had the names field indexed
>>>>> with
>>>>> metaphone analyzer. I added 2 more fields in the existing document,
>>>>>       
>>>>>           
>>> which
>>>   
>>>       
>>>>> indexes first name/last name as UN_TOKENIZED. While searching for
>>>>>       
>>>>>           
>>> exact
>>>   
>>>       
>>>>> match, I create a term query to look in to newly created UN_TOKENIZED
>>>>> fields
>>>>> as shown in the code snippets......however this is not getting any
>>>>>       
>>>>>           
>>> hits.
>>>   
>>>       
>>>>> I
>>>>> would like to know is there anything wrong conceptually?
>>>>>
>>>>> //creating fields for the document
>>>>> FIRST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>>>>                FIRST_NAME_EXACT(Field.Store.NO,
>>>>> Field.Index.UN_TOKENIZED),
>>>>>                LAST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>>>>                LAST_NAME_EXACT(Field.Store.NO,
>>>>>       
>>>>>           
>>> Field.Index.UN_TOKENIZED),
>>>   
>>>       
>>>>> //name sounds like analyzer class....used while Indexing and
>>>>>       
>>>>>           
>>> searching
>>>   
>>>       
>>>>> public class NameSoundsLikeAnalyzer extends Analyzer {
>>>>>        PerFieldAnalyzerWrapper wrapper;
>>>>>
>>>>>        /**
>>>>>         *
>>>>>         */
>>>>>        public NameSoundsLikeAnalyzer() {
>>>>>                wrapper = new PerFieldAnalyzerWrapper(new
>>>>>       
>>>>>           
>>> StopAnalyzer());
>>>   
>>>       
>>>>>                wrapper.addAnalyzer(
>>>>>
>>>>>  PhysicianDocumentBuilder.PhysicianFieldInfo.FIRST_NAME
>>>>>                                                .toString(), new
>>>>> MetaphoneReplacementAnalyzer());
>>>>>
>>>>>                wrapper.addAnalyzer(
>>>>>
>>>>>  PhysicianDocumentBuilder.PhysicianFieldInfo.LAST_NAME
>>>>>                                                .toString(), new
>>>>> MetaphoneReplacementAnalyzer());
>>>>>
>>>>>        }
>>>>>
>>>>>        /**
>>>>>         * @see PerFieldAnalyzerWrapper#tokenStream(String, Reader)
>>>>>         */
>>>>>        @Override
>>>>>        public TokenStream tokenStream(String fieldName, Reader
>>>>>       
>>>>>           
>>> reader) {
>>>   
>>>       
>>>>>                return wrapper.tokenStream(fieldName, reader);
>>>>>        }
>>>>>
>>>>> }
>>>>>
>>>>> //lastly the query builder
>>>>> if(physicianQuery.getExactNameSearch()){
>>>>>
>>>>>  if(StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith())){
>>>>>                                TermQuery term = new TermQuery(new
>>>>> Term(FIRST_NAME_EXACT.toString(),
>>>>> physicianQuery.getFirstNameStartsWith()));
>>>>>                                query.add(term,MUST);
>>>>>
>>>>>                        }
>>>>>
>>>>>  if(StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())){
>>>>>                                TermQuery term = new TermQuery(new
>>>>> Term(LAST_NAME_EXACT.toString(),
>>>>> physicianQuery.getLastNameStartsWith()));
>>>>>                                query.add(term,MUST);
>>>>>
>>>>>                        }
>>>>> else{
>>>>> //we want metaphone search
>>>>> if (StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith()))
>>>>>       
>>>>>           
>>> {
>>>   
>>>       
>>>>>  query.add(buildMultiTermPrefixQuery(FIRST_NAME.toString(),
>>>>>
>>>>>  physicianQuery.getFirstNameStartsWith()), MUST);
>>>>>                        }
>>>>>
>>>>>                        if
>>>>> (StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())) {
>>>>>
>>>>>  query.add(buildMultiTermPrefixQuery(LAST_NAME.toString(),
>>>>>
>>>>>  physicianQuery.getLastNameStartsWith()), MUST);
>>>>>                        }
>>>>> }
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>>
>>>>>       
>>>>>           
>>> http://www.nabble.com/No-hits-while-searching%21-tp23735920p23735920.htm
>>> l
>>>   
>>>       
>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>       
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>> -- 
>> Matthew Hall
>> Software Engineer
>> Mouse Genome Informatics
>> mhall@informatics.jax.org
>> (207) 288-6012
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>   


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: No hits while searching!

Posted by vanshi <ni...@gmail.com>.

Thanks Matt & sithu. Yes, It was due to stop word analyzer...now i'm using a
simple analyzer temporarily, as I know even simple analyzer cannot handle
quotes in names. However, can somebody plz direct me towards how to handle
quotes with the name in query using lowercase analyzer?

thanks,
Vanshi

Matthew Hall-7 wrote:
> 
> Yeah, he's gotta be.
> 
> You might be better of using something like a lowercase analyzer here, 
> since punctuation in a name is likely important.
> 
> Matt
> 
> Sudarsan, Sithu D. wrote:
>>  
>>
>> Do you use stopword filtering?
>>
>> Sincerely,
>> Sithu D Sudarsan
>>
>> -----Original Message-----
>> From: vanshi [mailto:nilu.thakur@gmail.com] 
>> Sent: Monday, June 01, 2009 11:39 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: No hits while searching!
>>
>>
>> Thanks Erick, I was able to get this work...as you said ..Luke is a
>> great
>> tool to look in to what gets stored as indexes though in my case I was
>> searching before the indexes were created so i was getting zero hits.
>>
>> On side note, I'm running a strange output with prefix query...it only
>> works
>> when i have 3 or more than 3 letters in the first name/last name. Any
>> idea
>> what is going on here? Please see the output from log here.
>>
>> 02:05:20,996 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
>> in
>> PhysicianQuerybuilder with exactName=true
>> 02:05:20,996 INFO  [PhysicianQueryBuilder] Before running Prefix query,
>> First name: ang
>> 02:05:20,996 INFO  [PhysicianQueryBuilder] Before running  Prefix query,
>> Last name: john
>> 02:05:21,012 INFO  [LuceneIndexService] the query is:
>> +(FIRST_NAME_EXACT:ang*) +(LAST_NAME_EXACT:john*)
>> 02:05:21,012 INFO  [LuceneIndexService] Result Size: 1
>>
>> 02:06:03,578 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
>> in
>> PhysicianQuerybuilder with exactName=true
>> 02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
>> First
>> name: a
>> 02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
>> Last
>> name: johns
>> 02:06:03,578 INFO  [LuceneIndexService] the query is: +()
>> +(LAST_NAME_EXACT:johns*)
>> 02:06:03,578 INFO  [LuceneIndexService] Result Size: 0
>>
>> 02:08:01,548 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
>> in
>> PhysicianQuerybuilder with exactName=true
>> 02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
>> First
>> name: an
>> 02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
>> Last
>> name: johns
>> 02:08:01,548 INFO  [LuceneIndexService] the query is: +()
>> +(LAST_NAME_EXACT:johns*)
>> 02:08:01,580 INFO  [LuceneIndexService] Result Size: 0
>>
>> As one can see the query works with first name=ang but not with first
>> name=a
>> or an.
>>
>> Appreciate all your inputs.
>>
>> Vanshi
>>
>> Erick Erickson wrote:
>>   
>>> The most common issue with this kind of thing is that
>>>     
>> UN_TOKENIZEDimplies
>>   
>>> no
>>> case folding. So if your case differs you won't get a match.
>>>
>>> That aside, the very first thing I'd do is get a copy of Luke (google
>>> Lucene
>>> Luke)
>>> and examine the index to see if what's in your index is what you
>>>     
>> *think*
>>   
>>> is
>>> in there.
>>>
>>>
>>> The second thing I'd do is look at query.toString() to see what the
>>>     
>> actual
>>   
>>>> query is. You can even paste the output of toString() into Luke and
>>>>       
>> see
>>   
>>>> what happens.
>>>>       
>>> I'm not sure what buildMultiTermPrefixQuery is all about, but I assume
>>> you have a good reason for using that. But the other strategy I use
>>>     
>> for
>>   
>>> this kind of "what happened?" question is to peel back to simpler
>>>     
>> cases
>>   
>>> until I get what I expect, then build back up until it breaks.....
>>>
>>> But really get a copy of Luke, it's a wonderful tool that'll give you
>>>     
>> lots
>>   
>>> of
>>> insight about what's *really* going on...
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, May 27, 2009 at 12:43 AM, vanshi <ni...@gmail.com>
>>>     
>> wrote:
>>   
>>>> In my web application, I need search functionality on first name and
>>>>       
>> last
>>   
>>>> name in 2 different ways, one search must be based on 'Metaphone
>>>> Analyzer'
>>>> giving all similar sounding names as result and another search should
>>>>       
>> be
>>   
>>>> exact match on either first name or last name. The name sounds like
>>>> search
>>>> has already been coded previously and I need to add another exact
>>>>       
>> match
>>   
>>>> search to the application. For this, I have a Lucene Index based out
>>>>       
>> on
>>   
>>>> fields from database tables which already had the names field indexed
>>>> with
>>>> metaphone analyzer. I added 2 more fields in the existing document,
>>>>       
>> which
>>   
>>>> indexes first name/last name as UN_TOKENIZED. While searching for
>>>>       
>> exact
>>   
>>>> match, I create a term query to look in to newly created UN_TOKENIZED
>>>> fields
>>>> as shown in the code snippets......however this is not getting any
>>>>       
>> hits.
>>   
>>>> I
>>>> would like to know is there anything wrong conceptually?
>>>>
>>>> //creating fields for the document
>>>> FIRST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>>>                FIRST_NAME_EXACT(Field.Store.NO,
>>>> Field.Index.UN_TOKENIZED),
>>>>                LAST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>>>                LAST_NAME_EXACT(Field.Store.NO,
>>>>       
>> Field.Index.UN_TOKENIZED),
>>   
>>>> //name sounds like analyzer class....used while Indexing and
>>>>       
>> searching
>>   
>>>> public class NameSoundsLikeAnalyzer extends Analyzer {
>>>>        PerFieldAnalyzerWrapper wrapper;
>>>>
>>>>        /**
>>>>         *
>>>>         */
>>>>        public NameSoundsLikeAnalyzer() {
>>>>                wrapper = new PerFieldAnalyzerWrapper(new
>>>>       
>> StopAnalyzer());
>>   
>>>>                wrapper.addAnalyzer(
>>>>
>>>>  PhysicianDocumentBuilder.PhysicianFieldInfo.FIRST_NAME
>>>>                                                .toString(), new
>>>> MetaphoneReplacementAnalyzer());
>>>>
>>>>                wrapper.addAnalyzer(
>>>>
>>>>  PhysicianDocumentBuilder.PhysicianFieldInfo.LAST_NAME
>>>>                                                .toString(), new
>>>> MetaphoneReplacementAnalyzer());
>>>>
>>>>        }
>>>>
>>>>        /**
>>>>         * @see PerFieldAnalyzerWrapper#tokenStream(String, Reader)
>>>>         */
>>>>        @Override
>>>>        public TokenStream tokenStream(String fieldName, Reader
>>>>       
>> reader) {
>>   
>>>>                return wrapper.tokenStream(fieldName, reader);
>>>>        }
>>>>
>>>> }
>>>>
>>>> //lastly the query builder
>>>> if(physicianQuery.getExactNameSearch()){
>>>>
>>>>  if(StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith())){
>>>>                                TermQuery term = new TermQuery(new
>>>> Term(FIRST_NAME_EXACT.toString(),
>>>> physicianQuery.getFirstNameStartsWith()));
>>>>                                query.add(term,MUST);
>>>>
>>>>                        }
>>>>
>>>>  if(StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())){
>>>>                                TermQuery term = new TermQuery(new
>>>> Term(LAST_NAME_EXACT.toString(),
>>>> physicianQuery.getLastNameStartsWith()));
>>>>                                query.add(term,MUST);
>>>>
>>>>                        }
>>>> else{
>>>> //we want metaphone search
>>>> if (StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith()))
>>>>       
>> {
>>   
>>>>  query.add(buildMultiTermPrefixQuery(FIRST_NAME.toString(),
>>>>
>>>>  physicianQuery.getFirstNameStartsWith()), MUST);
>>>>                        }
>>>>
>>>>                        if
>>>> (StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())) {
>>>>
>>>>  query.add(buildMultiTermPrefixQuery(LAST_NAME.toString(),
>>>>
>>>>  physicianQuery.getLastNameStartsWith()), MUST);
>>>>                        }
>>>> }
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>>       
>> http://www.nabble.com/No-hits-while-searching%21-tp23735920p23735920.htm
>> l
>>   
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>       
>>>     
>>
>>   
> 
> 
> -- 
> Matthew Hall
> Software Engineer
> Mouse Genome Informatics
> mhall@informatics.jax.org
> (207) 288-6012
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/No-hits-while-searching%21-tp23735920p23818803.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: No hits while searching!

Posted by Matthew Hall <mh...@informatics.jax.org>.

Yeah, he's gotta be.

You might be better of using something like a lowercase analyzer here, 
since punctuation in a name is likely important.

Matt

Sudarsan, Sithu D. wrote:
>  
>
> Do you use stopword filtering?
>
> Sincerely,
> Sithu D Sudarsan
>
> -----Original Message-----
> From: vanshi [mailto:nilu.thakur@gmail.com] 
> Sent: Monday, June 01, 2009 11:39 AM
> To: java-user@lucene.apache.org
> Subject: Re: No hits while searching!
>
>
> Thanks Erick, I was able to get this work...as you said ..Luke is a
> great
> tool to look in to what gets stored as indexes though in my case I was
> searching before the indexes were created so i was getting zero hits.
>
> On side note, I'm running a strange output with prefix query...it only
> works
> when i have 3 or more than 3 letters in the first name/last name. Any
> idea
> what is going on here? Please see the output from log here.
>
> 02:05:20,996 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
> in
> PhysicianQuerybuilder with exactName=true
> 02:05:20,996 INFO  [PhysicianQueryBuilder] Before running Prefix query,
> First name: ang
> 02:05:20,996 INFO  [PhysicianQueryBuilder] Before running  Prefix query,
> Last name: john
> 02:05:21,012 INFO  [LuceneIndexService] the query is:
> +(FIRST_NAME_EXACT:ang*) +(LAST_NAME_EXACT:john*)
> 02:05:21,012 INFO  [LuceneIndexService] Result Size: 1
>
> 02:06:03,578 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
> in
> PhysicianQuerybuilder with exactName=true
> 02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
> First
> name: a
> 02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
> Last
> name: johns
> 02:06:03,578 INFO  [LuceneIndexService] the query is: +()
> +(LAST_NAME_EXACT:johns*)
> 02:06:03,578 INFO  [LuceneIndexService] Result Size: 0
>
> 02:08:01,548 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
> in
> PhysicianQuerybuilder with exactName=true
> 02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
> First
> name: an
> 02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
> Last
> name: johns
> 02:08:01,548 INFO  [LuceneIndexService] the query is: +()
> +(LAST_NAME_EXACT:johns*)
> 02:08:01,580 INFO  [LuceneIndexService] Result Size: 0
>
> As one can see the query works with first name=ang but not with first
> name=a
> or an.
>
> Appreciate all your inputs.
>
> Vanshi
>
> Erick Erickson wrote:
>   
>> The most common issue with this kind of thing is that
>>     
> UN_TOKENIZEDimplies
>   
>> no
>> case folding. So if your case differs you won't get a match.
>>
>> That aside, the very first thing I'd do is get a copy of Luke (google
>> Lucene
>> Luke)
>> and examine the index to see if what's in your index is what you
>>     
> *think*
>   
>> is
>> in there.
>>
>>
>> The second thing I'd do is look at query.toString() to see what the
>>     
> actual
>   
>>> query is. You can even paste the output of toString() into Luke and
>>>       
> see
>   
>>> what happens.
>>>       
>> I'm not sure what buildMultiTermPrefixQuery is all about, but I assume
>> you have a good reason for using that. But the other strategy I use
>>     
> for
>   
>> this kind of "what happened?" question is to peel back to simpler
>>     
> cases
>   
>> until I get what I expect, then build back up until it breaks.....
>>
>> But really get a copy of Luke, it's a wonderful tool that'll give you
>>     
> lots
>   
>> of
>> insight about what's *really* going on...
>>
>> Best
>> Erick
>>
>> On Wed, May 27, 2009 at 12:43 AM, vanshi <ni...@gmail.com>
>>     
> wrote:
>   
>>> In my web application, I need search functionality on first name and
>>>       
> last
>   
>>> name in 2 different ways, one search must be based on 'Metaphone
>>> Analyzer'
>>> giving all similar sounding names as result and another search should
>>>       
> be
>   
>>> exact match on either first name or last name. The name sounds like
>>> search
>>> has already been coded previously and I need to add another exact
>>>       
> match
>   
>>> search to the application. For this, I have a Lucene Index based out
>>>       
> on
>   
>>> fields from database tables which already had the names field indexed
>>> with
>>> metaphone analyzer. I added 2 more fields in the existing document,
>>>       
> which
>   
>>> indexes first name/last name as UN_TOKENIZED. While searching for
>>>       
> exact
>   
>>> match, I create a term query to look in to newly created UN_TOKENIZED
>>> fields
>>> as shown in the code snippets......however this is not getting any
>>>       
> hits.
>   
>>> I
>>> would like to know is there anything wrong conceptually?
>>>
>>> //creating fields for the document
>>> FIRST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>>                FIRST_NAME_EXACT(Field.Store.NO,
>>> Field.Index.UN_TOKENIZED),
>>>                LAST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>>                LAST_NAME_EXACT(Field.Store.NO,
>>>       
> Field.Index.UN_TOKENIZED),
>   
>>> //name sounds like analyzer class....used while Indexing and
>>>       
> searching
>   
>>> public class NameSoundsLikeAnalyzer extends Analyzer {
>>>        PerFieldAnalyzerWrapper wrapper;
>>>
>>>        /**
>>>         *
>>>         */
>>>        public NameSoundsLikeAnalyzer() {
>>>                wrapper = new PerFieldAnalyzerWrapper(new
>>>       
> StopAnalyzer());
>   
>>>                wrapper.addAnalyzer(
>>>
>>>  PhysicianDocumentBuilder.PhysicianFieldInfo.FIRST_NAME
>>>                                                .toString(), new
>>> MetaphoneReplacementAnalyzer());
>>>
>>>                wrapper.addAnalyzer(
>>>
>>>  PhysicianDocumentBuilder.PhysicianFieldInfo.LAST_NAME
>>>                                                .toString(), new
>>> MetaphoneReplacementAnalyzer());
>>>
>>>        }
>>>
>>>        /**
>>>         * @see PerFieldAnalyzerWrapper#tokenStream(String, Reader)
>>>         */
>>>        @Override
>>>        public TokenStream tokenStream(String fieldName, Reader
>>>       
> reader) {
>   
>>>                return wrapper.tokenStream(fieldName, reader);
>>>        }
>>>
>>> }
>>>
>>> //lastly the query builder
>>> if(physicianQuery.getExactNameSearch()){
>>>
>>>  if(StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith())){
>>>                                TermQuery term = new TermQuery(new
>>> Term(FIRST_NAME_EXACT.toString(),
>>> physicianQuery.getFirstNameStartsWith()));
>>>                                query.add(term,MUST);
>>>
>>>                        }
>>>
>>>  if(StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())){
>>>                                TermQuery term = new TermQuery(new
>>> Term(LAST_NAME_EXACT.toString(),
>>> physicianQuery.getLastNameStartsWith()));
>>>                                query.add(term,MUST);
>>>
>>>                        }
>>> else{
>>> //we want metaphone search
>>> if (StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith()))
>>>       
> {
>   
>>>  query.add(buildMultiTermPrefixQuery(FIRST_NAME.toString(),
>>>
>>>  physicianQuery.getFirstNameStartsWith()), MUST);
>>>                        }
>>>
>>>                        if
>>> (StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())) {
>>>
>>>  query.add(buildMultiTermPrefixQuery(LAST_NAME.toString(),
>>>
>>>  physicianQuery.getLastNameStartsWith()), MUST);
>>>                        }
>>> }
>>>
>>>
>>> --
>>> View this message in context:
>>>
>>>       
> http://www.nabble.com/No-hits-while-searching%21-tp23735920p23735920.htm
> l
>   
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>       
>>     
>
>   


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: No hits while searching!

Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.

 

Do you use stopword filtering?

Sincerely,
Sithu D Sudarsan

-----Original Message-----
From: vanshi [mailto:nilu.thakur@gmail.com] 
Sent: Monday, June 01, 2009 11:39 AM
To: java-user@lucene.apache.org
Subject: Re: No hits while searching!


Thanks Erick, I was able to get this work...as you said ..Luke is a
great
tool to look in to what gets stored as indexes though in my case I was
searching before the indexes were created so i was getting zero hits.

On side note, I'm running a strange output with prefix query...it only
works
when i have 3 or more than 3 letters in the first name/last name. Any
idea
what is going on here? Please see the output from log here.

02:05:20,996 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
in
PhysicianQuerybuilder with exactName=true
02:05:20,996 INFO  [PhysicianQueryBuilder] Before running Prefix query,
First name: ang
02:05:20,996 INFO  [PhysicianQueryBuilder] Before running  Prefix query,
Last name: john
02:05:21,012 INFO  [LuceneIndexService] the query is:
+(FIRST_NAME_EXACT:ang*) +(LAST_NAME_EXACT:john*)
02:05:21,012 INFO  [LuceneIndexService] Result Size: 1

02:06:03,578 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
in
PhysicianQuerybuilder with exactName=true
02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
First
name: a
02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
Last
name: johns
02:06:03,578 INFO  [LuceneIndexService] the query is: +()
+(LAST_NAME_EXACT:johns*)
02:06:03,578 INFO  [LuceneIndexService] Result Size: 0

02:08:01,548 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
in
PhysicianQuerybuilder with exactName=true
02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
First
name: an
02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
Last
name: johns
02:08:01,548 INFO  [LuceneIndexService] the query is: +()
+(LAST_NAME_EXACT:johns*)
02:08:01,580 INFO  [LuceneIndexService] Result Size: 0

As one can see the query works with first name=ang but not with first
name=a
or an.

Appreciate all your inputs.

Vanshi

Erick Erickson wrote:
> 
> The most common issue with this kind of thing is that
UN_TOKENIZEDimplies
> no
> case folding. So if your case differs you won't get a match.
> 
> That aside, the very first thing I'd do is get a copy of Luke (google
> Lucene
> Luke)
> and examine the index to see if what's in your index is what you
*think*
> is
> in there.
> 
> 
> The second thing I'd do is look at query.toString() to see what the
actual
>> query is. You can even paste the output of toString() into Luke and
see
>> what happens.
> 
> I'm not sure what buildMultiTermPrefixQuery is all about, but I assume
> you have a good reason for using that. But the other strategy I use
for
> this kind of "what happened?" question is to peel back to simpler
cases
> until I get what I expect, then build back up until it breaks.....
> 
> But really get a copy of Luke, it's a wonderful tool that'll give you
lots
> of
> insight about what's *really* going on...
> 
> Best
> Erick
> 
> On Wed, May 27, 2009 at 12:43 AM, vanshi <ni...@gmail.com>
wrote:
> 
>>
>> In my web application, I need search functionality on first name and
last
>> name in 2 different ways, one search must be based on 'Metaphone
>> Analyzer'
>> giving all similar sounding names as result and another search should
be
>> exact match on either first name or last name. The name sounds like
>> search
>> has already been coded previously and I need to add another exact
match
>> search to the application. For this, I have a Lucene Index based out
on
>> fields from database tables which already had the names field indexed
>> with
>> metaphone analyzer. I added 2 more fields in the existing document,
which
>> indexes first name/last name as UN_TOKENIZED. While searching for
exact
>> match, I create a term query to look in to newly created UN_TOKENIZED
>> fields
>> as shown in the code snippets......however this is not getting any
hits.
>> I
>> would like to know is there anything wrong conceptually?
>>
>> //creating fields for the document
>> FIRST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>                FIRST_NAME_EXACT(Field.Store.NO,
>> Field.Index.UN_TOKENIZED),
>>                LAST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>                LAST_NAME_EXACT(Field.Store.NO,
Field.Index.UN_TOKENIZED),
>>
>> //name sounds like analyzer class....used while Indexing and
searching
>> public class NameSoundsLikeAnalyzer extends Analyzer {
>>        PerFieldAnalyzerWrapper wrapper;
>>
>>        /**
>>         *
>>         */
>>        public NameSoundsLikeAnalyzer() {
>>                wrapper = new PerFieldAnalyzerWrapper(new
StopAnalyzer());
>>                wrapper.addAnalyzer(
>>
>>  PhysicianDocumentBuilder.PhysicianFieldInfo.FIRST_NAME
>>                                                .toString(), new
>> MetaphoneReplacementAnalyzer());
>>
>>                wrapper.addAnalyzer(
>>
>>  PhysicianDocumentBuilder.PhysicianFieldInfo.LAST_NAME
>>                                                .toString(), new
>> MetaphoneReplacementAnalyzer());
>>
>>        }
>>
>>        /**
>>         * @see PerFieldAnalyzerWrapper#tokenStream(String, Reader)
>>         */
>>        @Override
>>        public TokenStream tokenStream(String fieldName, Reader
reader) {
>>                return wrapper.tokenStream(fieldName, reader);
>>        }
>>
>> }
>>
>> //lastly the query builder
>> if(physicianQuery.getExactNameSearch()){
>>
>>  if(StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith())){
>>                                TermQuery term = new TermQuery(new
>> Term(FIRST_NAME_EXACT.toString(),
>> physicianQuery.getFirstNameStartsWith()));
>>                                query.add(term,MUST);
>>
>>                        }
>>
>>  if(StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())){
>>                                TermQuery term = new TermQuery(new
>> Term(LAST_NAME_EXACT.toString(),
>> physicianQuery.getLastNameStartsWith()));
>>                                query.add(term,MUST);
>>
>>                        }
>> else{
>> //we want metaphone search
>> if (StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith()))
{
>>
>>  query.add(buildMultiTermPrefixQuery(FIRST_NAME.toString(),
>>
>>  physicianQuery.getFirstNameStartsWith()), MUST);
>>                        }
>>
>>                        if
>> (StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())) {
>>
>>  query.add(buildMultiTermPrefixQuery(LAST_NAME.toString(),
>>
>>  physicianQuery.getLastNameStartsWith()), MUST);
>>                        }
>> }
>>
>>
>> --
>> View this message in context:
>>
http://www.nabble.com/No-hits-while-searching%21-tp23735920p23735920.htm
l
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context:
http://www.nabble.com/No-hits-while-searching%21-tp23735920p23817012.htm
l
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org