You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/12/07 16:47:55 UTC

[Solr Wiki] Update of "ExtractingRequestHandler" by GrantIngersoll

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "ExtractingRequestHandler" page has been changed by GrantIngersoll.
http://wiki.apache.org/solr/ExtractingRequestHandler?action=diff&rev1=50&rev2=51

--------------------------------------------------

  
  The tika.config entry points to a file containing a Tika configuration.  You would only need this if you have customized your own Tika configuration.  The Tika config contains info about parsers, mime types, etc.
  
- You may also need to adjust the {{{multipartUploadLimitInKB}}} attribute as follows if you are submitting very large documents. The {{{enableRemoteStreaming}}} is not used by the !ExtractingRequestHandler.
+ You may also need to adjust the {{{multipartUploadLimitInKB}}} attribute as follows if you are submitting very large documents. The {{{enableRemoteStreaming}}} can be used by the !ExtractingRequestHandler.
+ In your solrconfig.xml, you must turn it on:
  {{{
    <requestDispatcher handleSelect="true" >
-     <requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="20480" />
+     <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="20480" />
      ....
  }}}
+ 
+ See ContentStreams for more info.  As an example of using remote streaming, you can do:
+ {{{
+  curl "http://localhost:8983/solr/update/extract?stream.file=/path/to/file/StatesLeftToVisit.doc&stream.contentType=application/msword&literal.id=states.doc"
+ }}}
+ 
+ 
  
  Lastly, the date.formats allows you to specify various java.text.SimpleDateFormat date formats for working with transforming extracted input to a Date.  Solr comes configured with the following date formats (see the DateUtil class in Solr)
  {{{