You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/08/10 15:54:50 UTC
[Solr Wiki] Update of "ExtractingRequestHandler" by YonikSeeley
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/ExtractingRequestHandler
The comment on the change is:
move the TODO out of the finished top part
------------------------------------------------------------------------------
And then query via http://localhost:8983/solr/select?q=attr_content:tutorial
-
- // TODO: move this somewhere else to a more in-depth discussion of different ways to send the data to Solr (prob with remoteStreaming discussion)
- * curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text --data-binary @tutorial.html -H 'Content-type:text/html'
- <!> NOTE, this literally streams the file, which does not, then, provide info to Solr about the name of the file.
-
= Input Parameters =
* map.<source_field>=<target_field> - Maps (moves) one field name to another. Example: {{{map.content=text}}} will cause the content field normally generated by Tika to be moved to the "text" field.
@@ -186, +181 @@
See TikaExtractOnlyExampleOutput.
+ = Sending documents to Solr =
+
+ // TODO: discribe the different ways to send the documents to solr (POST body, form encoded, remoteStreaming)
+ * curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text --data-binary @tutorial.html -H 'Content-type:text/html'
+ <!> NOTE, this literally streams the file, which does not, then, provide info to Solr about the name of the file.
+
== Additional Resources ==
* [http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#example.source Lucid Imagination article]