You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by karl wettin <ka...@gmail.com> on 2007/02/01 01:48:55 UTC

Re: Field methods and usage

31 jan 2007 kl. 12.25 skrev Christoph Pächter:
>
> I was wondering, if there is anywhere a table (similar to Table 1.2  
> An overview
> of different field types, their characteristics, and their usage in  
> Lucene in
> Action), listing the possible methods and their usage.

Implementations will differ, for example:

>
> Store   |TermVector              |Index          |reasonable |Usage
> YES     |NO                      |NO             |1          |URLs
>                                                              | 
> telephone number

You never have to store anything in the index, perhaps that  
information is persistent somewhere else?

If you use a term vector or not depends very little on what kind of  
information you store in there, it is up to what analysis you plan to  
include the documents in. Highlighting? More like this? Neural networks?

Some are more than happy with one large token. Other people might  
want to tokenize the exact same information.

An URL in [protocol://host:port/path], a phone number in country-,  
area, and district parts.

It really up to each and every implementer to decide what settings is  
best for them.

Also, a Lucene index is not made up of static rows and columns the  
way a relational database is. The spoon does not exists. You can bend  
it any way you want. Documents in a corpus can share field that share  
names but not settings. Perhaps you only want to index phone numbers  
in a specific area code.

-- 
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org