You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2014/07/29 17:45:50 UTC
[Nutch Wiki] Update of "IndexStructure" by SebastianNagel
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "IndexStructure" page has been changed by SebastianNagel:
https://wiki.apache.org/nutch/IndexStructure?action=diff&rev1=20&rev2=21
Comment:
Add field 'id' (cf. NUTCH-1708)
The index structure formed after indexing is shown below :
- ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin/Class''' ||'''Comment'''|| '''version'''||
+ ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin/Class''' ||'''Comment'''||<-2> '''version'''||
|| || || || || || '''1.x''' || '''2.x''' ||
+ || id || YES || Indexed, Un-Tokenized || [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/indexer/IndexerMapReduce.html|IndexerMapReduce]]/[[http://nutch.apache.org/apidocs/apidocs-2.2.1/org/apache/nutch/indexer/IndexUtil.html|IndexUtil]] || '''URL''' used as '''ID''' to update and delete documents || X || X ||
|| boost || YES || Not Indexed || various scoring plugins || Adds a '''score''' value field to a particular document. This is allocated based upon its importance within the webgraph. || ? || ? ||
|| digest || YES || Not Indexed || org.apache.nutch.indexer.IndexerMapReduce.java || Adds a '''message digest''' field to a document. Can be MD5 over content and headers or more sophisticated text profile of the content. || ? || ? ||
|| lang || YES || Un-Tokenized || language-identifier || Add a '''lang''', language field to a document.|| ? || ? ||