You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lasse L <la...@gmail.com> on 2005/08/12 21:11:26 UTC
Query that doesn't match a term more than once
I am using the queryparser to search for names.
If I search for: john j*
I'd expect to get everybody called john j-something. john johnson,
john joe doe ect.
Instead I just all john and joes. In many of the hits there is not
second j-word.
Is there a way to get lucene to get "satisfied" after matching "john"
and then require a second j-word to return a match for j*
The error might be on my end though.
If I have the same person registered as "john carlson" and "john doe carlson"
I add both values to the name field -- could that be it? If that is so
I wouldn't expect lucene to return the hit if I search for e.g. "john
joh* jo*" ?
At least I'd like to boost the ranking of the results that match
better. Right now "john j*" gives me john thompson as the top hit even
though there are heaps of john johnsons.
I hope you can point me in the right direction.
Thanks
/Lasse
My query parser code looks like this:
QueryParser parser = new QueryParser(field, _analyzer);
parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
parser.setLocale(new Locale("DA", "dk"));
Query q = parser.parse(value);
query.add(q, required, false);
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Query that doesn't match a term more than once
Posted by Lasse L <la...@gmail.com>.
However leaving the name as a single term would make me miss a hit
like "john doe johnson" -- which is unacceptable.
Is there a way I could boost the queryresults that match better?
Will the score rise if a query is matched more than once? "j*" in
"john johnson"?
It wouldn't matter that much that I get the extra matches if only I'd
get the best ones on top.
/Lasse
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Query that doesn't match a term more than once
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 12, 2005, at 9:22 PM, Chris Hostetter wrote:
> : Programatically you can use PhrasePrefixQuery - see Lucene's test
> : case code for examples of how it's used, but it is a bit of work to
> : set up.
>
> That's only needed if you assume that the field contains tokenized
> text.
> if the whole name is indexed as a single Term, then a regular
> prefix query
> should work -- in fact, telling QueryParser to use an Analyzer that
> doesn't do any tokenizing should work too.
Thats a good point - thanks for mentioning that. However you'd still
have to deal with escaping the space. I haven't tried it, but would
a query of john\ j* using the KeywordAnalyzer do the trick? This
requires that "john jacob... schmidt" be left as a single token
during indexing.
Erik
>
> : > Is there a way to get lucene to get "satisfied" after matching
> "john"
> : > and then require a second j-word to return a match for j*
> :
> : In short, not easily, but it is possible with PhrasePrefixQuery
> : programatically. QueryParser does not support this at all
> currently.
> :
> :
> : Erik
> :
> : > The error might be on my end though.
> : > If I have the same person registered as "john carlson" and "john
> : > doe carlson"
> : > I add both values to the name field -- could that be it? If
> that is so
> : > I wouldn't expect lucene to return the hit if I search for e.g.
> "john
> : > joh* jo*" ?
> : >
> : > At least I'd like to boost the ranking of the results that match
> : > better. Right now "john j*" gives me john thompson as the top
> hit even
> : > though there are heaps of john johnsons.
> : >
> : > I hope you can point me in the right direction.
> : > Thanks
> : >
> : > /Lasse
> : >
> : > My query parser code looks like this:
> : > QueryParser parser = new QueryParser(field,
> _analyzer);
> : > parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
> : > parser.setLocale(new Locale("DA", "dk"));
> : > Query q = parser.parse(value);
> : > query.add(q, required, false);
> : >
> : >
> ---------------------------------------------------------------------
> : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : > For additional commands, e-mail: java-user-help@lucene.apache.org
> : >
> :
> :
> :
> ---------------------------------------------------------------------
> : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : For additional commands, e-mail: java-user-help@lucene.apache.org
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Query that doesn't match a term more than once
Posted by Chris Hostetter <ho...@fucit.org>.
: Programatically you can use PhrasePrefixQuery - see Lucene's test
: case code for examples of how it's used, but it is a bit of work to
: set up.
That's only needed if you assume that the field contains tokenized text.
if the whole name is indexed as a single Term, then a regular prefix query
should work -- in fact, telling QueryParser to use an Analyzer that
doesn't do any tokenizing should work too.
: > Is there a way to get lucene to get "satisfied" after matching "john"
: > and then require a second j-word to return a match for j*
:
: In short, not easily, but it is possible with PhrasePrefixQuery
: programatically. QueryParser does not support this at all currently.
:
:
: Erik
:
: > The error might be on my end though.
: > If I have the same person registered as "john carlson" and "john
: > doe carlson"
: > I add both values to the name field -- could that be it? If that is so
: > I wouldn't expect lucene to return the hit if I search for e.g. "john
: > joh* jo*" ?
: >
: > At least I'd like to boost the ranking of the results that match
: > better. Right now "john j*" gives me john thompson as the top hit even
: > though there are heaps of john johnsons.
: >
: > I hope you can point me in the right direction.
: > Thanks
: >
: > /Lasse
: >
: > My query parser code looks like this:
: > QueryParser parser = new QueryParser(field, _analyzer);
: > parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
: > parser.setLocale(new Locale("DA", "dk"));
: > Query q = parser.parse(value);
: > query.add(q, required, false);
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
: >
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Query that doesn't match a term more than once
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 12, 2005, at 3:11 PM, Lasse L wrote:
> I am using the queryparser to search for names.
>
> If I search for: john j*
> I'd expect to get everybody called john j-something. john johnson,
> john joe doe ect.
> Instead I just all john and joes. In many of the hits there is not
> second j-word.
That is an invalid assumption with QueryParser and how it forms
queries. And it actually is a tricky sort of query to do even
programatically.
The query john j* (with default operator AND) searches for john AND
j* regardless of position.
Programatically you can use PhrasePrefixQuery - see Lucene's test
case code for examples of how it's used, but it is a bit of work to
set up.
> Is there a way to get lucene to get "satisfied" after matching "john"
> and then require a second j-word to return a match for j*
In short, not easily, but it is possible with PhrasePrefixQuery
programatically. QueryParser does not support this at all currently.
Erik
> The error might be on my end though.
> If I have the same person registered as "john carlson" and "john
> doe carlson"
> I add both values to the name field -- could that be it? If that is so
> I wouldn't expect lucene to return the hit if I search for e.g. "john
> joh* jo*" ?
>
> At least I'd like to boost the ranking of the results that match
> better. Right now "john j*" gives me john thompson as the top hit even
> though there are heaps of john johnsons.
>
> I hope you can point me in the right direction.
> Thanks
>
> /Lasse
>
> My query parser code looks like this:
> QueryParser parser = new QueryParser(field, _analyzer);
> parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
> parser.setLocale(new Locale("DA", "dk"));
> Query q = parser.parse(value);
> query.add(q, required, false);
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org