You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Dan Hertz (Insight 49, LLC)" <in...@gmail.com> on 2010/03/03 21:54:22 UTC

Escaping options for tika/solr cell extract-only output

Looking at http://wiki.apache.org/solr/ExtractingRequestHandler:

Extract Only
"the output includes XML generated by Tika (and is hence further escaped 
by Solr's XML)"

...is there an option to NOT have the resulting TIKA output escaped?

so &lt;head&gt; would come back as <head/>

If no, what would need to be done to enable this option? Looked into 
SOLR-1274.patch, but didn't see a parameter for such a thing.

Thanks,

Dan

Re: Escaping options for tika/solr cell extract-only output

Posted by Chris Hostetter <ho...@fucit.org>.
: You can return it with any of the other writers, like JSON or PHP.

The key being that the output from Tika is content -- that content just so 
happens to be a string containing xml -- which is then formated by a 
response writer.

(although given that ExtractingRequestHandler has an extract only mode, 
it could probably be modified pretty easily so that it would play nicely 
with the RawResponseWriter if asked)

-Hoss


Re: Escaping options for tika/solr cell extract-only output

Posted by Lance Norskog <go...@gmail.com>.
You can return it with any of the other writers, like JSON or PHP.

The alternative design decision for the XML output writer would be to
emit using CDATA instead of escaping.

On Wed, Mar 3, 2010 at 12:54 PM, Dan Hertz (Insight 49, LLC)
<in...@gmail.com> wrote:
> Looking at http://wiki.apache.org/solr/ExtractingRequestHandler:
>
> Extract Only
> "the output includes XML generated by Tika (and is hence further escaped by
> Solr's XML)"
>
> ...is there an option to NOT have the resulting TIKA output escaped?
>
> so &lt;head&gt; would come back as <head/>
>
> If no, what would need to be done to enable this option? Looked into
> SOLR-1274.patch, but didn't see a parameter for such a thing.
>
> Thanks,
>
> Dan
>



-- 
Lance Norskog
goksron@gmail.com