You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Donald Mennerich <do...@nypl.org> on 2013/06/12 16:10:53 UTC
Streaming structured text
Hello,
Can someone point me in the right direction for streaming the structured
xhtml output from a Tika Parser. The closest I am getting is using a
BodyContentHandler as below.
Parser parser = tika.getParser();
ParseContext context = new ParseContext();
context.set(Locale.class, Locale.ENGLISH);
PrintStream printer = new PrintStream(System.out);
ContentHandler handler = new BodyContentHandler(printer);
Metadata mtdt = new Metadata();
parser.parse(new FileInputStream(f), handler, mtdt, context);
printer.close();
Is there a ContentHandler that can do this easily? I apologize that my
comprehension of the SAX api is minimal at best.
Thanks,
Don
Re: Streaming structured text
Posted by Donald Mennerich <do...@nypl.org>.
Two minutes after I had given up, I get it working: ContentHandler handler
= new XMLWriter(printer);
Sorry, for the post!
Don
On Wed, Jun 12, 2013 at 10:10 AM, Donald Mennerich <donaldmennerich@nypl.org
> wrote:
> Hello,
>
> Can someone point me in the right direction for streaming the structured
> xhtml output from a Tika Parser. The closest I am getting is using a
> BodyContentHandler as below.
>
> Parser parser = tika.getParser();
> ParseContext context = new ParseContext();
> context.set(Locale.class, Locale.ENGLISH);
> PrintStream printer = new PrintStream(System.out);
> ContentHandler handler = new BodyContentHandler(printer);
> Metadata mtdt = new Metadata();
> parser.parse(new FileInputStream(f), handler, mtdt, context);
> printer.close();
>
> Is there a ContentHandler that can do this easily? I apologize that my
> comprehension of the SAX api is minimal at best.
>
> Thanks,
>
> Don
>