You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Michael Kelleher <mj...@gmail.com> on 2011/12/12 14:06:51 UTC

ExtractingRequestHandler and HTML

I am submitting HTML document to Solr using the ERH.  Is it possible to 
store the contents of the document (including all markup) into a field?  
Using fmap.content (I am assuming this comes from Tika) stores the 
extracted text of the document in a field, but not the markup.  I want 
the whole un-altered document.

Is this possible?

thanks

--mike