You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Donald Mennerich <do...@nypl.org> on 2013/06/12 16:10:53 UTC

Streaming structured text

Hello,

Can someone point me in the right direction for streaming the structured
xhtml output from a Tika Parser. The closest I am getting is using a
BodyContentHandler as below.

        Parser parser = tika.getParser();
        ParseContext context = new ParseContext();
        context.set(Locale.class, Locale.ENGLISH);
        PrintStream printer = new PrintStream(System.out);
        ContentHandler handler = new BodyContentHandler(printer);
        Metadata mtdt = new Metadata();
        parser.parse(new FileInputStream(f), handler, mtdt, context);
        printer.close();

Is there a ContentHandler that can do this easily? I apologize that my
comprehension of the SAX api is minimal at best.

Thanks,

Don

Re: Streaming structured text

Posted by Donald Mennerich <do...@nypl.org>.
Two minutes after I had given up, I get it working: ContentHandler handler
= new XMLWriter(printer);

Sorry, for the post!

Don


On Wed, Jun 12, 2013 at 10:10 AM, Donald Mennerich <donaldmennerich@nypl.org
> wrote:

> Hello,
>
> Can someone point me in the right direction for streaming the structured
> xhtml output from a Tika Parser. The closest I am getting is using a
> BodyContentHandler as below.
>
>         Parser parser = tika.getParser();
>         ParseContext context = new ParseContext();
>         context.set(Locale.class, Locale.ENGLISH);
>         PrintStream printer = new PrintStream(System.out);
>         ContentHandler handler = new BodyContentHandler(printer);
>         Metadata mtdt = new Metadata();
>         parser.parse(new FileInputStream(f), handler, mtdt, context);
>         printer.close();
>
> Is there a ContentHandler that can do this easily? I apologize that my
> comprehension of the SAX api is minimal at best.
>
> Thanks,
>
> Don
>