You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2019/01/20 22:57:52 UTC
[Nutch Wiki] Update of "IndexWriters" by RoannelFernandez
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "IndexWriters" page has been changed by RoannelFernandez:
https://wiki.apache.org/nutch/IndexWriters?action=diff&rev1=16&rev2=17
Comment:
CSV index writer documentation
||indexer-elastic ||Indexer for an Elasticsearch server ||
||indexer-elastic-rest ||Indexer for Elasticsearch, but using [[https://github.com/searchbox-io/Jest|Jest]] to connect with the REST API provided by Elasticsearch ||
||indexer-cloudsearch ||Indexer for Amazon <<GetText(CloudSearch)>> ||
+ ||indexer-csv ||Indexer for writing documents to a CSV file ||
= Structure of index-writers.xml =
@@ -44, +45 @@
||indexer-elastic ||[[https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.html|org.apache.nutch.indexwriter.elastic.ElasticIndexWriter]] ||
||indexer-elastic-rest ||[[https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/indexwriter/elasticrest/ElasticRestIndexWriter.html|org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter]] ||
||indexer-cloudsearch ||[[https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/indexwriter/cloudsearch/CloudSearchIndexWriter.html|org.apache.nutch.indexwriter.cloudsearch.CloudSearchIndexWriter]] ||
+ ||indexer-csv ||[[https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/indexwriter/csv/CSVIndexWriter.html|org.apache.nutch.indexwriter.csv.CSVIndexWriter]] ||
Each `<writer>` element contains two child elements: `<mapping>` and `<parameters>`
@@ -213, +215 @@
|| batch.dump || '''true''' to send documents to a local file. || false ||
|| batch.maxSize || Maximum number of documents to send as a batch to <<GetText(CloudSearch)>>. || -1 ||
+ === CSV indexer properties ===
+
+ ||'''Parameter Name''' ||'''Description''' ||'''Default value''' ||
+ || fields || Ordered list of fields (columns) in the CSV file || id,title,content ||
+ || charset || Encoding of CSV file || UTF-8 ||
+ || separator || Separator between fields (columns) || , ||
+ || valuesep || Separator between multiple values of one field || | ||
+ || quotechar || Quote character used to quote fields containing separators or quotes || " ||
+ || escapechar || Escape character used to escape a quote character || " ||
+ || maxfieldlength || Max. length of a single field value in characters || 4096 ||
+ || maxfieldvalues || Max. number of values of one field, useful for, e.g., the anchor texts field || 12 ||
+ || header || Write CSV column headers || true ||
+ || outpath || Output path / directory || csvindexwriter ||
+