You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2019/01/20 22:57:52 UTC

[Nutch Wiki] Update of "IndexWriters" by RoannelFernandez

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "IndexWriters" page has been changed by RoannelFernandez:
https://wiki.apache.org/nutch/IndexWriters?action=diff&rev1=16&rev2=17

Comment:
CSV index writer documentation

  ||indexer-elastic ||Indexer for an Elasticsearch server ||
  ||indexer-elastic-rest ||Indexer for Elasticsearch, but using [[https://github.com/searchbox-io/Jest|Jest]] to connect with the REST API provided by Elasticsearch ||
  ||indexer-cloudsearch ||Indexer for Amazon <<GetText(CloudSearch)>> ||
+ ||indexer-csv ||Indexer for writing documents to a CSV file ||
  
  = Structure of index-writers.xml =
  
@@ -44, +45 @@

  ||indexer-elastic ||[[https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.html|org.apache.nutch.indexwriter.elastic.ElasticIndexWriter]] ||
  ||indexer-elastic-rest ||[[https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/indexwriter/elasticrest/ElasticRestIndexWriter.html|org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter]] ||
  ||indexer-cloudsearch ||[[https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/indexwriter/cloudsearch/CloudSearchIndexWriter.html|org.apache.nutch.indexwriter.cloudsearch.CloudSearchIndexWriter]] ||
+ ||indexer-csv ||[[https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/indexwriter/csv/CSVIndexWriter.html|org.apache.nutch.indexwriter.csv.CSVIndexWriter]] ||
  
  Each `<writer>` element contains two child elements: `<mapping>` and `<parameters>`
  
@@ -213, +215 @@

  || batch.dump || '''true''' to send documents to a local file. || false ||
  || batch.maxSize || Maximum number of documents to send as a batch to <<GetText(CloudSearch)>>. || -1 ||
  
+ === CSV indexer properties ===
+ 
+ ||'''Parameter Name''' ||'''Description''' ||'''Default value''' ||
+ || fields || Ordered list of fields (columns) in the CSV file || id,title,content ||
+ || charset || Encoding of CSV file || UTF-8 ||
+ || separator || Separator between fields (columns) || , ||
+ || valuesep || Separator between multiple values of one field || | ||
+ || quotechar || Quote character used to quote fields containing separators or quotes || &quot; ||
+ || escapechar || Escape character used to escape a quote character || &quot; ||
+ || maxfieldlength || Max. length of a single field value in characters || 4096 ||
+ || maxfieldvalues || Max. number of values of one field, useful for, e.g., the anchor texts field || 12 ||
+ || header || Write CSV column headers || true ||
+ || outpath || Output path / directory || csvindexwriter ||
+