You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/10/28 00:37:12 UTC
[Solr Wiki] Update of "ExtractingRequestHandler" by PeterWolanin
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "ExtractingRequestHandler" page has been changed by PeterWolanin.
http://wiki.apache.org/solr/ExtractingRequestHandler?action=diff&rev1=49&rev2=50
--------------------------------------------------
* xpath=<XPath expression> - When extracting, only return Tika XHTML content that satisfies the XPath expression. See http://lucene.apache.org/tika/documentation.html for details on the format of Tika XHTML. See also TikaExtractOnlyExampleOutput.
* lowernames=true|false - Map all field names to lowercase with underscores. For example, Content-Type would be mapped to content_type.
+ If extractOnly is true, additional input parameters:
+
+ * extractFormat=xml|text - Default is xml. Controls the serialization format of the extract content. xml format is actually XHTML, like passing the -x command to the tika command line application, while text is like the -t command. See [[https://issues.apache.org/jira/browse/SOLR-1274|SOLR-1274]].
+
== Order of field operations ==
1. fields are generated by Tika or passed in as literals via {{{literal.fieldname=value}}}