You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/10/28 00:37:12 UTC

[Solr Wiki] Update of "ExtractingRequestHandler" by PeterWolanin

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "ExtractingRequestHandler" page has been changed by PeterWolanin.
http://wiki.apache.org/solr/ExtractingRequestHandler?action=diff&rev1=49&rev2=50

--------------------------------------------------

   * xpath=<XPath expression> - When extracting, only return Tika XHTML content that satisfies the XPath expression.  See http://lucene.apache.org/tika/documentation.html for details on the format of Tika XHTML.  See also TikaExtractOnlyExampleOutput.
   * lowernames=true|false - Map all field names to lowercase with underscores.  For example, Content-Type would be mapped to content_type.
  
+ If extractOnly is true, additional input parameters:
+ 
+  * extractFormat=xml|text - Default is xml.  Controls the serialization format of the extract content.  xml format is actually XHTML, like passing the -x command to the tika command line application, while text is like the -t command.  See [[https://issues.apache.org/jira/browse/SOLR-1274|SOLR-1274]].
+ 
  
  == Order of field operations ==
   1. fields are generated by Tika or passed in as literals via {{{literal.fieldname=value}}}