You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yura Smolsky <in...@altervision.biz> on 2005/04/05 11:44:10 UTC

Re[2]: exact match

Hello, Erik.


EH> On Apr 4, 2005, at 4:34 PM, Yura Smolsky wrote:
>> Hello, java-user.
>>
>> I have documents with tokenized, indexes and stored field. This field
>> contain one-two words usually. I need to be able to search exact
>> matches for two words.
>> For example search "John" should return documents with field
>> containing "John" only, not "John Doe" or "John Foo".
>>
>> Any ideas?
EH> Use an untokenized field to search on in the case of finding an exact
EH> match.

And no other ways to reach this?

Yura Smolsky.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re[2]: exact match

Posted by Chris Hostetter <ho...@fucit.org>.
: >> I have documents with tokenized, indexes and stored field. This field
: >> contain one-two words usually. I need to be able to search exact
: >> matches for two words.
: >> For example search "John" should return documents with field
: >> containing "John" only, not "John Doe" or "John Foo".
: >>
: >> Any ideas?
: EH> Use an untokenized field to search on in the case of finding an exact
: EH> match.
:
: And no other ways to reach this?

are there any cases in which you ever want to search the field for
tokenized values?

if not, then you can just use an analyzer that knows about this special
field and "tokenizes" any value it gets into a single token that is an
exact match.

if you sometimes need exact matches, and sometimes need "ord" matches (for
hte sake of argument let's assume your tokens are simple shitespace
seperated words) then you're going to need somewhat of knowing which case
you wnat when you parse the query -- the easy way to go is with a seperate
field like Erik described.  if you have some other usecase, then you can
index the field using an analyzer that generates a single unparsed "token"
for hte whole string followed by some marker token that isn't likely to
appear in your data (ie: the token "_BOOYA!_" would probably work),
followed by the more conventional tokens -- given the first of those
tokens a very high  positionalIncriment.

Then if you want an exact match, your custom query parsing code could
generate either a Phrase or Span query containing a single Term for your
input, followed by the marker Term (ie: "_BOOYA!_").  a "regular" toekn
based search would work just as it did before.





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re[4]: exact match

Posted by Yura Smolsky <in...@altervision.biz>.
Hello, Erik.

I have very large index (200Gb) with big amount of documents.
I have field "author", which stores name and this fields is tokenized,
indexed, stored.

This field contains values of following examples:
"John"
"John Doe"
"Bill"
"Bill Gates"

I do not want to reindex all documents again. and I want to search by
this fields with exactly search. Like if I search "John" then it will
return documents with fields "author" containing phrase "John" only
and not "John Doe". Now it returns all combinations with word John and
I just want to have documents with the word "John" only.

Solution might not simple, but I can manage that. I cannot manage
reindexing all documents again. :)

Does it clear to you?



EH> On Apr 5, 2005, at 5:44 AM, Yura Smolsky wrote:
>> EH> On Apr 4, 2005, at 4:34 PM, Yura Smolsky wrote:
>>>> Hello, java-user.
>>>>
>>>> I have documents with tokenized, indexes and stored field. This field
>>>> contain one-two words usually. I need to be able to search exact
>>>> matches for two words.
>>>> For example search "John" should return documents with field
>>>> containing "John" only, not "John Doe" or "John Foo".
>>>>
>>>> Any ideas?
>> EH> Use an untokenized field to search on in the case of finding an
>> exact
>> EH> match.
>>
>> And no other ways to reach this?

EH> Not that I know of.  Could you give us a more concrete example of what
EH> you're trying to achieve?

EH>         Erik


EH> ---------------------------------------------------------------------
EH> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
EH> For additional commands, e-mail: java-user-help@lucene.apache.org





Yura Smolsky.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Re[2]: exact match

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 5, 2005, at 5:44 AM, Yura Smolsky wrote:
> EH> On Apr 4, 2005, at 4:34 PM, Yura Smolsky wrote:
>>> Hello, java-user.
>>>
>>> I have documents with tokenized, indexes and stored field. This field
>>> contain one-two words usually. I need to be able to search exact
>>> matches for two words.
>>> For example search "John" should return documents with field
>>> containing "John" only, not "John Doe" or "John Foo".
>>>
>>> Any ideas?
> EH> Use an untokenized field to search on in the case of finding an 
> exact
> EH> match.
>
> And no other ways to reach this?

Not that I know of.  Could you give us a more concrete example of what 
you're trying to achieve?

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org