You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lasse L <la...@gmail.com> on 2005/08/12 21:11:26 UTC

Query that doesn't match a term more than once

I am using the queryparser to search for names.

If I search for: john j*
I'd expect to get everybody called john j-something. john johnson,
john joe doe ect.
Instead I just all john and joes. In many of the hits there is not
second j-word.

Is there a way to get lucene to get "satisfied" after matching "john"
and then require a second j-word to return a match for j*

The error might be on my end though.
If I have the same person registered as "john carlson" and "john doe carlson"
I add both values to the name field -- could that be it? If that is so
I wouldn't expect lucene to return the hit if I search for e.g. "john
joh* jo*" ?

At least I'd like to boost the ranking of the results that match
better. Right now "john j*" gives me john thompson as the top hit even
though there are heaps of john johnsons.

I hope you can point me in the right direction.
Thanks

/Lasse

My query parser code looks like this:
		    QueryParser parser = new QueryParser(field, _analyzer);
		    parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
		    parser.setLocale(new Locale("DA", "dk"));
		    Query q = parser.parse(value);
		    query.add(q, required, false);

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Query that doesn't match a term more than once

Posted by Lasse L <la...@gmail.com>.
However leaving the name as a single term would make me miss a hit
like "john doe johnson" -- which is unacceptable.

Is there a way I could boost the queryresults that match better?
Will the score rise if a query is matched more than once? "j*" in
"john johnson"?

It wouldn't matter that much that I get the extra matches if only I'd
get the best ones on top.

/Lasse

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Query that doesn't match a term more than once

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 12, 2005, at 9:22 PM, Chris Hostetter wrote:
> : Programatically you can use PhrasePrefixQuery - see Lucene's test
> : case code for examples of how it's used, but it is a bit of work to
> : set up.
>
> That's only needed if you assume that the field contains tokenized  
> text.
> if the whole name is indexed as a single Term, then a regular  
> prefix query
> should work -- in fact, telling QueryParser to use an Analyzer that
> doesn't do any tokenizing should work too.

Thats a good point - thanks for mentioning that.  However you'd still  
have to deal with escaping the space.  I haven't tried it, but would  
a query of john\ j* using the KeywordAnalyzer do the trick?  This  
requires that "john jacob... schmidt" be left as a single token  
during indexing.

     Erik


>
> : > Is there a way to get lucene to get "satisfied" after matching  
> "john"
> : > and then require a second j-word to return a match for j*
> :
> : In short, not easily, but it is possible with PhrasePrefixQuery
> : programatically.  QueryParser does not support this at all  
> currently.
> :
> :
> :      Erik
> :
> : > The error might be on my end though.
> : > If I have the same person registered as "john carlson" and "john
> : > doe carlson"
> : > I add both values to the name field -- could that be it? If  
> that is so
> : > I wouldn't expect lucene to return the hit if I search for e.g.  
> "john
> : > joh* jo*" ?
> : >
> : > At least I'd like to boost the ranking of the results that match
> : > better. Right now "john j*" gives me john thompson as the top  
> hit even
> : > though there are heaps of john johnsons.
> : >
> : > I hope you can point me in the right direction.
> : > Thanks
> : >
> : > /Lasse
> : >
> : > My query parser code looks like this:
> : >             QueryParser parser = new QueryParser(field,  
> _analyzer);
> : >             parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
> : >             parser.setLocale(new Locale("DA", "dk"));
> : >             Query q = parser.parse(value);
> : >             query.add(q, required, false);
> : >
> : >  
> ---------------------------------------------------------------------
> : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : > For additional commands, e-mail: java-user-help@lucene.apache.org
> : >
> :
> :
> :  
> ---------------------------------------------------------------------
> : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : For additional commands, e-mail: java-user-help@lucene.apache.org
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Query that doesn't match a term more than once

Posted by Chris Hostetter <ho...@fucit.org>.
: Programatically you can use PhrasePrefixQuery - see Lucene's test
: case code for examples of how it's used, but it is a bit of work to
: set up.

That's only needed if you assume that the field contains tokenized text.
if the whole name is indexed as a single Term, then a regular prefix query
should work -- in fact, telling QueryParser to use an Analyzer that
doesn't do any tokenizing should work too.

: > Is there a way to get lucene to get "satisfied" after matching "john"
: > and then require a second j-word to return a match for j*
:
: In short, not easily, but it is possible with PhrasePrefixQuery
: programatically.  QueryParser does not support this at all currently.
:
:
:      Erik
:
: > The error might be on my end though.
: > If I have the same person registered as "john carlson" and "john
: > doe carlson"
: > I add both values to the name field -- could that be it? If that is so
: > I wouldn't expect lucene to return the hit if I search for e.g. "john
: > joh* jo*" ?
: >
: > At least I'd like to boost the ranking of the results that match
: > better. Right now "john j*" gives me john thompson as the top hit even
: > though there are heaps of john johnsons.
: >
: > I hope you can point me in the right direction.
: > Thanks
: >
: > /Lasse
: >
: > My query parser code looks like this:
: >             QueryParser parser = new QueryParser(field, _analyzer);
: >             parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
: >             parser.setLocale(new Locale("DA", "dk"));
: >             Query q = parser.parse(value);
: >             query.add(q, required, false);
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
: >
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Query that doesn't match a term more than once

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 12, 2005, at 3:11 PM, Lasse L wrote:
> I am using the queryparser to search for names.
>
> If I search for: john j*
> I'd expect to get everybody called john j-something. john johnson,
> john joe doe ect.
> Instead I just all john and joes. In many of the hits there is not
> second j-word.

That is an invalid assumption with QueryParser and how it forms  
queries.  And it actually is a tricky sort of query to do even  
programatically.

The query john j* (with default operator AND) searches for john AND  
j* regardless of position.

Programatically you can use PhrasePrefixQuery - see Lucene's test  
case code for examples of how it's used, but it is a bit of work to  
set up.

> Is there a way to get lucene to get "satisfied" after matching "john"
> and then require a second j-word to return a match for j*

In short, not easily, but it is possible with PhrasePrefixQuery  
programatically.  QueryParser does not support this at all currently.


     Erik

> The error might be on my end though.
> If I have the same person registered as "john carlson" and "john  
> doe carlson"
> I add both values to the name field -- could that be it? If that is so
> I wouldn't expect lucene to return the hit if I search for e.g. "john
> joh* jo*" ?
>
> At least I'd like to boost the ranking of the results that match
> better. Right now "john j*" gives me john thompson as the top hit even
> though there are heaps of john johnsons.
>
> I hope you can point me in the right direction.
> Thanks
>
> /Lasse
>
> My query parser code looks like this:
>             QueryParser parser = new QueryParser(field, _analyzer);
>             parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
>             parser.setLocale(new Locale("DA", "dk"));
>             Query q = parser.parse(value);
>             query.add(q, required, false);
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org