You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by praveen pathiyil <pa...@gmail.com> on 2005/08/14 03:28:48 UTC

Re: [Nutch-dev] Field.Text vs Field.UnStored

Hi,

You have four different options for field types

Field method/type                           Tokenized            
Indexed                  Stored

Field.Keyword(String, String)            No                       Yes 
                      Yes
Field.UnIndexed(String, String)         No                        No  
                      Yes
Field.UnStored(String, String)           Yes                      Yes 
                      No
Field.Text(String, String)                  Yes                     
Yes                        Yes
 
Check out Otis' introductory article for a background on this:
http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html?page=1

Regards,
Praveen.



On 8/12/05, EM <em...@cpuedge.com> wrote:
> I need some help figuring out the following:
> 
> I was looking at: BasicIndexingFilter.java where it's stated:
> 
> // url is both stored and indexed, so it's both searchable and returned
> doc.add(Field.Text("url", url));
> 
> // content is indexed, so that it's searchable, but not stored in index
> doc.add(Field.UnStored("content", parse.getText()));
> 
> I'm stuck on what replacement can be made here. I'm assuming doc.add is the
> object that would add tokens to the index? How can a token (word, phrase) be
> "searchable but not stored in the index"?
> 
> I'm basicly trying to do the following, given two pages A and B:
> A is written in eastern alphabet
> B is written in latin alphabet.
> I would like to index page B as it is, and page A as it is, and the content
> of page A translated to latin in addition to it.
> 
> Would I have to add something as:
> String content = parse.getText();
> content +=" ";
> content += myTranslationFunctionToLatin(content);
> doc.add (Field.Text("content", content);
> 
> Or would the last line be:
> doc.add(Field.UnStored("content", content));
> 
> What's the difference with regard to the Field.* object?
> 
> 
> Regards,
> EM
> 
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Nutch-developers mailing list
> Nutch-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>