You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Kelleher <mj...@gmail.com> on 2011/12/12 14:06:51 UTC
ExtractingRequestHandler and HTML
I am submitting HTML document to Solr using the ERH. Is it possible to
store the contents of the document (including all markup) into a field?
Using fmap.content (I am assuming this comes from Tika) stores the
extracted text of the document in a field, but not the markup. I want
the whole un-altered document.
Is this possible?
thanks
--mike