You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Florent André <fl...@4sengines.com> on 2010/01/25 15:50:07 UTC
Remove headers from the parser
Hello,
I use the AutoDetectParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler, Metadata metadata).
I use the parse function many times with the same ContentHandler.
My problem is :
- on each parse, tika send to the contentHandler the "xml header
definition" (<?xml version="1.0" encoding="UTF-8"?>)
This is a problem for me, because this sending don't allow me to parse the
contentHandler with a SAX element (cocoon transformer).
For example, after using of tika, my output is :
<root>
<documentparse id="1" <?xml version="1.0" encoding="UTF-8"?>>
<html>
... content from tika
</html>
<documentparse id="2" <?xml version="1.0" encoding="UTF-8"?>>
<html>
... content from tika
</html>
</documentparse>
There is a way to deactivate the xml header sending ?
Thanks in advance,
++
Re: Remove headers from the parser
Posted by Florent André <fl...@4sengines.com>.
Thanks, It's work like a charm
HAND
On Mon, 25 Jan 2010 20:40:59 +0100, Jukka Zitting <ju...@gmail.com>
wrote:
> Hi,
>
> On Mon, Jan 25, 2010 at 3:50 PM, Florent André
> <fl...@4sengines.com> wrote:
>> I use the parse function many times with the same ContentHandler.
>> [...]
>> There is a way to deactivate the xml header sending ?
>
> Check out the EmbeddedContentHandler [1] wrapper that's designed for
> this purpose.
>
> [1]
>
http://lucene.apache.org/tika/0.5/api/org/apache/tika/sax/EmbeddedContentHandler.html
>
> BR,
>
> Jukka Zitting
Re: Remove headers from the parser
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Mon, Jan 25, 2010 at 3:50 PM, Florent André
<fl...@4sengines.com> wrote:
> I use the parse function many times with the same ContentHandler.
> [...]
> There is a way to deactivate the xml header sending ?
Check out the EmbeddedContentHandler [1] wrapper that's designed for
this purpose.
[1] http://lucene.apache.org/tika/0.5/api/org/apache/tika/sax/EmbeddedContentHandler.html
BR,
Jukka Zitting