You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/08/10 15:54:50 UTC

[Solr Wiki] Update of "ExtractingRequestHandler" by YonikSeeley

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/ExtractingRequestHandler

The comment on the change is:
move the TODO out of the finished top part

------------------------------------------------------------------------------
  
   
  And then query via http://localhost:8983/solr/select?q=attr_content:tutorial
- 
- // TODO: move this somewhere else to a more in-depth discussion of different ways to send the data to Solr (prob with remoteStreaming discussion)
-  * curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text  --data-binary @tutorial.html  -H 'Content-type:text/html'  
-        <!> NOTE, this literally streams the file, which does not, then, provide info to Solr about the name of the file.
- 
  
  = Input Parameters =
   * map.<source_field>=<target_field> - Maps (moves) one field name to another.  Example: {{{map.content=text}}} will cause the content field normally generated by Tika to be moved to the "text" field.
@@ -186, +181 @@

  
  See TikaExtractOnlyExampleOutput.
  
+ = Sending documents to Solr =
+ 
+ // TODO: discribe the different ways to send the documents to solr (POST body, form encoded, remoteStreaming)
+  * curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text  --data-binary @tutorial.html  -H 'Content-type:text/html'  
+        <!> NOTE, this literally streams the file, which does not, then, provide info to Solr about the name of the file.
+ 
  
  == Additional Resources ==
  * [http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#example.source Lucid Imagination article]