You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Shay Banon (Created) (JIRA)" <ji...@apache.org> on 2012/03/07 20:44:57 UTC

[jira] [Created] (TIKA-870) Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call
-------------------------------------------------------------------------------------------------------------

                 Key: TIKA-870
                 URL: https://issues.apache.org/jira/browse/TIKA-870
             Project: Tika
          Issue Type: Improvement
            Reporter: Shay Banon


It would be great to be able to call parseToString with an additional parameter of the maxStringLength, instead of having to set it on the Tika instance. This allows to set it per parse call. Sample code:

{code}
public String parseToString(InputStream stream, Metadata metadata, int maxStringLength)
        throws IOException, TikaException {
    WriteOutContentHandler handler =
        new WriteOutContentHandler(maxStringLength);
    try {
        ParseContext context = new ParseContext();
        context.set(Parser.class, parser);
        parser.parse(
                stream, new BodyContentHandler(handler), metadata, context);
    } catch (SAXException e) {
        if (!handler.isWriteLimitReached(e)) {
            // This should never happen with BodyContentHandler...
            throw new TikaException("Unexpected SAX processing failure", e);
        }
    } finally {
        stream.close();
    }
    return handler.toString();
}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (TIKA-870) Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

Posted by "Michael McCandless (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned TIKA-870:
---------------------------------------

    Assignee: Michael McCandless
    
> Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-870
>                 URL: https://issues.apache.org/jira/browse/TIKA-870
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Shay Banon
>            Assignee: Michael McCandless
>
> It would be great to be able to call parseToString with an additional parameter of the maxStringLength, instead of having to set it on the Tika instance. This allows to set it per parse call. Sample code:
> {code}
> public String parseToString(InputStream stream, Metadata metadata, int maxStringLength)
>         throws IOException, TikaException {
>     WriteOutContentHandler handler =
>         new WriteOutContentHandler(maxStringLength);
>     try {
>         ParseContext context = new ParseContext();
>         context.set(Parser.class, parser);
>         parser.parse(
>                 stream, new BodyContentHandler(handler), metadata, context);
>     } catch (SAXException e) {
>         if (!handler.isWriteLimitReached(e)) {
>             // This should never happen with BodyContentHandler...
>             throw new TikaException("Unexpected SAX processing failure", e);
>         }
>     } finally {
>         stream.close();
>     }
>     return handler.toString();
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-870) Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224643#comment-13224643 ] 

Michael McCandless commented on TIKA-870:
-----------------------------------------

I think this makes sense.
                
> Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-870
>                 URL: https://issues.apache.org/jira/browse/TIKA-870
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Shay Banon
>            Assignee: Michael McCandless
>
> It would be great to be able to call parseToString with an additional parameter of the maxStringLength, instead of having to set it on the Tika instance. This allows to set it per parse call. Sample code:
> {code}
> public String parseToString(InputStream stream, Metadata metadata, int maxStringLength)
>         throws IOException, TikaException {
>     WriteOutContentHandler handler =
>         new WriteOutContentHandler(maxStringLength);
>     try {
>         ParseContext context = new ParseContext();
>         context.set(Parser.class, parser);
>         parser.parse(
>                 stream, new BodyContentHandler(handler), metadata, context);
>     } catch (SAXException e) {
>         if (!handler.isWriteLimitReached(e)) {
>             // This should never happen with BodyContentHandler...
>             throw new TikaException("Unexpected SAX processing failure", e);
>         }
>     } finally {
>         stream.close();
>     }
>     return handler.toString();
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-870) Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

Posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved TIKA-870.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2

Thanks Shay!
                
> Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-870
>                 URL: https://issues.apache.org/jira/browse/TIKA-870
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Shay Banon
>            Assignee: Michael McCandless
>             Fix For: 1.2
>
>         Attachments: TIKA-870.patch
>
>
> It would be great to be able to call parseToString with an additional parameter of the maxStringLength, instead of having to set it on the Tika instance. This allows to set it per parse call. Sample code:
> {code}
> public String parseToString(InputStream stream, Metadata metadata, int maxStringLength)
>         throws IOException, TikaException {
>     WriteOutContentHandler handler =
>         new WriteOutContentHandler(maxStringLength);
>     try {
>         ParseContext context = new ParseContext();
>         context.set(Parser.class, parser);
>         parser.parse(
>                 stream, new BodyContentHandler(handler), metadata, context);
>     } catch (SAXException e) {
>         if (!handler.isWriteLimitReached(e)) {
>             // This should never happen with BodyContentHandler...
>             throw new TikaException("Unexpected SAX processing failure", e);
>         }
>     } finally {
>         stream.close();
>     }
>     return handler.toString();
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-870) Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

Posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated TIKA-870:
------------------------------------

    Attachment: TIKA-870.patch

Patch, with the sample code plus a test case.

The test case failed at first!  Ie, the returned string was over the specified limit... I dug and discovered WriteOutContentHandler wasn't overriding/counting ignorableWhitespace, so I added that override and now the test passes.

I think it's ready...
                
> Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-870
>                 URL: https://issues.apache.org/jira/browse/TIKA-870
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Shay Banon
>            Assignee: Michael McCandless
>         Attachments: TIKA-870.patch
>
>
> It would be great to be able to call parseToString with an additional parameter of the maxStringLength, instead of having to set it on the Tika instance. This allows to set it per parse call. Sample code:
> {code}
> public String parseToString(InputStream stream, Metadata metadata, int maxStringLength)
>         throws IOException, TikaException {
>     WriteOutContentHandler handler =
>         new WriteOutContentHandler(maxStringLength);
>     try {
>         ParseContext context = new ParseContext();
>         context.set(Parser.class, parser);
>         parser.parse(
>                 stream, new BodyContentHandler(handler), metadata, context);
>     } catch (SAXException e) {
>         if (!handler.isWriteLimitReached(e)) {
>             // This should never happen with BodyContentHandler...
>             throw new TikaException("Unexpected SAX processing failure", e);
>         }
>     } finally {
>         stream.close();
>     }
>     return handler.toString();
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira