You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Anand Kishore <an...@gmail.com> on 2005/09/27 07:58:45 UTC

Is analyzing same as tokenizing???

Hi,

Is 'Analyzing' same as 'Tokenizing'?
When we say the Keyword field is not analyzed, but indexed and stored, does
it indicate it is not tokenized as well? That means inorder to find a query
match against a keyword there has to be an exact match(case sensitive).

--
- Andy

Re: Is analyzing same as tokenizing???

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 27, 2005, at 9:01 AM, Anand Kishore wrote:

>> That is correct. A Keyword field is taken exact case as-is as a
>> single term.
>>
>
> For example: If I have a keyword field named "sender" which has the  
> value
> "The Motely Fool", doing a search for either of these query terms  
> "Fool" or
> "fool" or "Motely" on the "sender" field should match the documents
> containing the above field or does the query has to be explicitly  
> "The Motel
> Fool".

So use Field.Text(), not Field.Keyword() in this case, and ensure you  
know what your analyzer is doing to your text - use Luke to see first- 
hand perhaps.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Is analyzing same as tokenizing???

Posted by Anand Kishore <an...@gmail.com>.
> That is correct. A Keyword field is taken exact case as-is as a
> single term.

For example: If I have a keyword field named "sender" which has the value
"The Motely Fool", doing a search for either of these query terms "Fool" or
"fool" or "Motely" on the "sender" field should match the documents
containing the above field or does the query has to be explicitly "The Motel
Fool".

--
- Andy

Re: Is analyzing same as tokenizing???

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 27, 2005, at 1:58 AM, Anand Kishore wrote:
> Is 'Analyzing' same as 'Tokenizing'?

Yes, in Lucene terminology these two are the same.

> When we say the Keyword field is not analyzed, but indexed and  
> stored, does
> it indicate it is not tokenized as well? That means inorder to find  
> a query
> match against a keyword there has to be an exact match(case  
> sensitive).

That is correct.  A Keyword field is taken exact case as-is as a  
single term.  A token is one piece of the analysis process.  A term  
what tokens are called when indexed.

Saying that it has to be an exact match is being a bit too  
simplistic... it is possible to find terms inexactly by fuzzy,  
prefix, and wildcard queries for example.

     Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org