You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/12/07 16:47:55 UTC
[Solr Wiki] Update of "ExtractingRequestHandler" by GrantIngersoll
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "ExtractingRequestHandler" page has been changed by GrantIngersoll.
http://wiki.apache.org/solr/ExtractingRequestHandler?action=diff&rev1=50&rev2=51
--------------------------------------------------
The tika.config entry points to a file containing a Tika configuration. You would only need this if you have customized your own Tika configuration. The Tika config contains info about parsers, mime types, etc.
- You may also need to adjust the {{{multipartUploadLimitInKB}}} attribute as follows if you are submitting very large documents. The {{{enableRemoteStreaming}}} is not used by the !ExtractingRequestHandler.
+ You may also need to adjust the {{{multipartUploadLimitInKB}}} attribute as follows if you are submitting very large documents. The {{{enableRemoteStreaming}}} can be used by the !ExtractingRequestHandler.
+ In your solrconfig.xml, you must turn it on:
{{{
<requestDispatcher handleSelect="true" >
- <requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="20480" />
+ <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="20480" />
....
}}}
+
+ See ContentStreams for more info. As an example of using remote streaming, you can do:
+ {{{
+ curl "http://localhost:8983/solr/update/extract?stream.file=/path/to/file/StatesLeftToVisit.doc&stream.contentType=application/msword&literal.id=states.doc"
+ }}}
+
+
Lastly, the date.formats allows you to specify various java.text.SimpleDateFormat date formats for working with transforming extracted input to a Date. Solr comes configured with the following date formats (see the DateUtil class in Solr)
{{{