You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Torsten Krah (Created) (JIRA)" <ji...@apache.org> on 2011/10/21 16:06:32 UTC

[jira] [Created] (TIKA-760) NPE XHTMLContentHandler in characters Method

NPE XHTMLContentHandler in characters Method
--------------------------------------------

                 Key: TIKA-760
                 URL: https://issues.apache.org/jira/browse/TIKA-760
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.10
         Environment: JDK 1.6, Linux
            Reporter: Torsten Krah


The method:

    public void characters(String characters) throws SAXException {
        characters(characters.toCharArray(), 0, characters.length());
    }

does not check for null values.
On many code references a check is done "before" calling this methd. However on other sides, e.g. HSLFExtractor some values are not checked:

xhtml.characters( comment.getAuthor() );

which may be null.

The simplest fix would be to check for null on the handler and if it is null handle it as NOOP or insert the new UTF-8 "replacement char" to let the user decide.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-760) NPE XHTMLContentHandler in characters Method

Posted by "Pablo Queixalos (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133880#comment-13133880 ] 

Pablo Queixalos commented on TIKA-760:
--------------------------------------

Concerning the HSLFExtractor, this is already fixed in trunk. getAuthor() is checked before calling xhtml.characters( comment.getAuthor() );
                
> NPE XHTMLContentHandler in characters Method
> --------------------------------------------
>
>                 Key: TIKA-760
>                 URL: https://issues.apache.org/jira/browse/TIKA-760
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10
>         Environment: JDK 1.6, Linux
>            Reporter: Torsten Krah
>
> The method:
>     public void characters(String characters) throws SAXException {
>         characters(characters.toCharArray(), 0, characters.length());
>     }
> does not check for null values.
> On many code references a check is done "before" calling this methd. However on other sides, e.g. HSLFExtractor some values are not checked:
> xhtml.characters( comment.getAuthor() );
> which may be null.
> The simplest fix would be to check for null on the handler and if it is null handle it as NOOP or insert the new UTF-8 "replacement char" to let the user decide.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-760) NPE XHTMLContentHandler in characters Method

Posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch resolved TIKA-760.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
    
> NPE XHTMLContentHandler in characters Method
> --------------------------------------------
>
>                 Key: TIKA-760
>                 URL: https://issues.apache.org/jira/browse/TIKA-760
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10
>         Environment: JDK 1.6, Linux
>            Reporter: Torsten Krah
>             Fix For: 1.1
>
>
> The method:
>     public void characters(String characters) throws SAXException {
>         characters(characters.toCharArray(), 0, characters.length());
>     }
> does not check for null values.
> On many code references a check is done "before" calling this methd. However on other sides, e.g. HSLFExtractor some values are not checked:
> xhtml.characters( comment.getAuthor() );
> which may be null.
> The simplest fix would be to check for null on the handler and if it is null handle it as NOOP or insert the new UTF-8 "replacement char" to let the user decide.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-760) NPE XHTMLContentHandler in characters Method

Posted by "Torsten Krah (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133898#comment-13133898 ] 

Torsten Krah commented on TIKA-760:
-----------------------------------

Yeah, but there are more calls to this method which are not checked too, not only the author call. So imho the best option here is to be defensive and check and make a NoOp and log a warning or error, at least it should not fail here.

                
> NPE XHTMLContentHandler in characters Method
> --------------------------------------------
>
>                 Key: TIKA-760
>                 URL: https://issues.apache.org/jira/browse/TIKA-760
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10
>         Environment: JDK 1.6, Linux
>            Reporter: Torsten Krah
>
> The method:
>     public void characters(String characters) throws SAXException {
>         characters(characters.toCharArray(), 0, characters.length());
>     }
> does not check for null values.
> On many code references a check is done "before" calling this methd. However on other sides, e.g. HSLFExtractor some values are not checked:
> xhtml.characters( comment.getAuthor() );
> which may be null.
> The simplest fix would be to check for null on the handler and if it is null handle it as NOOP or insert the new UTF-8 "replacement char" to let the user decide.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-760) NPE XHTMLContentHandler in characters Method

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192189#comment-13192189 ] 

Nick Burch commented on TIKA-760:
---------------------------------

NPE check added in r1235284.
                
> NPE XHTMLContentHandler in characters Method
> --------------------------------------------
>
>                 Key: TIKA-760
>                 URL: https://issues.apache.org/jira/browse/TIKA-760
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10
>         Environment: JDK 1.6, Linux
>            Reporter: Torsten Krah
>
> The method:
>     public void characters(String characters) throws SAXException {
>         characters(characters.toCharArray(), 0, characters.length());
>     }
> does not check for null values.
> On many code references a check is done "before" calling this methd. However on other sides, e.g. HSLFExtractor some values are not checked:
> xhtml.characters( comment.getAuthor() );
> which may be null.
> The simplest fix would be to check for null on the handler and if it is null handle it as NOOP or insert the new UTF-8 "replacement char" to let the user decide.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira