You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2014/07/29 17:45:50 UTC

[Nutch Wiki] Update of "IndexStructure" by SebastianNagel

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "IndexStructure" page has been changed by SebastianNagel:
https://wiki.apache.org/nutch/IndexStructure?action=diff&rev1=20&rev2=21

Comment:
Add field 'id' (cf. NUTCH-1708)

  
  The index structure formed after indexing is shown below : 
  
- ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin/Class''' ||'''Comment'''|| '''version'''||
+ ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin/Class''' ||'''Comment'''||<-2> '''version'''||
  || || || || || || '''1.x''' || '''2.x''' ||
+ ||      id      ||      YES	||	Indexed, Un-Tokenized	|| [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/indexer/IndexerMapReduce.html|IndexerMapReduce]]/[[http://nutch.apache.org/apidocs/apidocs-2.2.1/org/apache/nutch/indexer/IndexUtil.html|IndexUtil]]  || '''URL''' used as '''ID''' to update and delete documents || X || X ||
  || 	boost 	 ||	YES 	|| 	Not Indexed 	|| various scoring plugins || Adds a '''score''' value field to a particular document. This is allocated based upon its importance within the webgraph. || ?  || ? ||
  || 	digest 	||	YES 	||	Not Indexed 	|| org.apache.nutch.indexer.IndexerMapReduce.java || Adds a '''message digest''' field to a document. Can be MD5 over content and headers or more sophisticated text profile of the content. ||  ?  || ? ||
  || 	lang 	||	YES 	||	Un-Tokenized 	||	language-identifier || Add a '''lang''', language field to a document.||  ?  || ? ||