You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2012/03/07 22:30:58 UTC

[jira] [Updated] (TIKA-870) Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

     [ https://issues.apache.org/jira/browse/TIKA-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated TIKA-870:
------------------------------------

    Attachment: TIKA-870.patch

Patch, with the sample code plus a test case.

The test case failed at first!  Ie, the returned string was over the specified limit... I dug and discovered WriteOutContentHandler wasn't overriding/counting ignorableWhitespace, so I added that override and now the test passes.

I think it's ready...
                
> Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-870
>                 URL: https://issues.apache.org/jira/browse/TIKA-870
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Shay Banon
>            Assignee: Michael McCandless
>         Attachments: TIKA-870.patch
>
>
> It would be great to be able to call parseToString with an additional parameter of the maxStringLength, instead of having to set it on the Tika instance. This allows to set it per parse call. Sample code:
> {code}
> public String parseToString(InputStream stream, Metadata metadata, int maxStringLength)
>         throws IOException, TikaException {
>     WriteOutContentHandler handler =
>         new WriteOutContentHandler(maxStringLength);
>     try {
>         ParseContext context = new ParseContext();
>         context.set(Parser.class, parser);
>         parser.parse(
>                 stream, new BodyContentHandler(handler), metadata, context);
>     } catch (SAXException e) {
>         if (!handler.isWriteLimitReached(e)) {
>             // This should never happen with BodyContentHandler...
>             throw new TikaException("Unexpected SAX processing failure", e);
>         }
>     } finally {
>         stream.close();
>     }
>     return handler.toString();
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira