You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Daniel Bonniot de Ruisselet (Created) (JIRA)" <ji...@apache.org> on 2011/12/20 10:01:33 UTC

[jira] [Created] (TIKA-820) Locator is unset for HTML parser

Locator is unset for HTML parser
--------------------------------

                 Key: TIKA-820
                 URL: https://issues.apache.org/jira/browse/TIKA-820
             Project: Tika
          Issue Type: Bug
          Components: general, parser
            Reporter: Daniel Bonniot de Ruisselet
         Attachments: text-locator.patch

The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.

Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

Posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-820:
-----------------------------------

    Fix Version/s:     (was: 1.1)
                   1.2

- push out to 1.2
                
> Locator is unset for HTML parser
> --------------------------------
>
>                 Key: TIKA-820
>                 URL: https://issues.apache.org/jira/browse/TIKA-820
>             Project: Tika
>          Issue Type: Bug
>          Components: general, parser
>    Affects Versions: 1.0
>            Reporter: Daniel Bonniot de Ruisselet
>              Labels: patch
>             Fix For: 1.2
>
>         Attachments: text-locator.patch
>
>
> The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.
> Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-820) Locator is unset for HTML parser

Posted by "Daniel Bonniot de Ruisselet (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173046#comment-13173046 ] 

Daniel Bonniot de Ruisselet commented on TIKA-820:
--------------------------------------------------

Note that the exact value of the line/column locations seems not perfect, but that's a separate issue.
                
> Locator is unset for HTML parser
> --------------------------------
>
>                 Key: TIKA-820
>                 URL: https://issues.apache.org/jira/browse/TIKA-820
>             Project: Tika
>          Issue Type: Bug
>          Components: general, parser
>            Reporter: Daniel Bonniot de Ruisselet
>         Attachments: text-locator.patch
>
>
> The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.
> Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

Posted by "Daniel Bonniot de Ruisselet (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Bonniot de Ruisselet updated TIKA-820:
---------------------------------------------

        Fix Version/s: 1.1
    Affects Version/s: 1.0
    
> Locator is unset for HTML parser
> --------------------------------
>
>                 Key: TIKA-820
>                 URL: https://issues.apache.org/jira/browse/TIKA-820
>             Project: Tika
>          Issue Type: Bug
>          Components: general, parser
>    Affects Versions: 1.0
>            Reporter: Daniel Bonniot de Ruisselet
>              Labels: patch
>             Fix For: 1.1
>
>         Attachments: text-locator.patch
>
>
> The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.
> Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-820) Locator is unset for HTML parser

Posted by "Daniel Bonniot de Ruisselet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460573#comment-13460573 ] 

Daniel Bonniot de Ruisselet commented on TIKA-820:
--------------------------------------------------

Hi Ken - Thanks for looking at the patch. I have no idea if this is the only missing delegating call, it just seemed wrong to me not to do it in TextContentHandler.
                
> Locator is unset for HTML parser
> --------------------------------
>
>                 Key: TIKA-820
>                 URL: https://issues.apache.org/jira/browse/TIKA-820
>             Project: Tika
>          Issue Type: Bug
>          Components: general, parser
>    Affects Versions: 1.0
>            Reporter: Daniel Bonniot de Ruisselet
>            Assignee: Ken Krugler
>              Labels: patch
>             Fix For: 1.3
>
>         Attachments: text-locator.patch
>
>
> The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.
> Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-820:
-----------------------------------

    Fix Version/s:     (was: 1.2)
                   1.3

- push to 1.3
                
> Locator is unset for HTML parser
> --------------------------------
>
>                 Key: TIKA-820
>                 URL: https://issues.apache.org/jira/browse/TIKA-820
>             Project: Tika
>          Issue Type: Bug
>          Components: general, parser
>    Affects Versions: 1.0
>            Reporter: Daniel Bonniot de Ruisselet
>              Labels: patch
>             Fix For: 1.3
>
>         Attachments: text-locator.patch
>
>
> The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.
> Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

Posted by "Daniel Bonniot de Ruisselet (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Bonniot de Ruisselet updated TIKA-820:
---------------------------------------------

    Attachment: text-locator.patch

Fix+test patch.
                
> Locator is unset for HTML parser
> --------------------------------
>
>                 Key: TIKA-820
>                 URL: https://issues.apache.org/jira/browse/TIKA-820
>             Project: Tika
>          Issue Type: Bug
>          Components: general, parser
>            Reporter: Daniel Bonniot de Ruisselet
>         Attachments: text-locator.patch
>
>
> The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.
> Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-820) Locator is unset for HTML parser

Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432246#comment-13432246 ] 

Ken Krugler commented on TIKA-820:
----------------------------------

Hi Daniel - I took a quick look at your patch, and had a question. It looks like the change was for TextContentHandler to call setDocumentLocator on its delegate; is this the only case in Tika where a ContentHandler wasn't delegating the method call properly? Thanks!
                
> Locator is unset for HTML parser
> --------------------------------
>
>                 Key: TIKA-820
>                 URL: https://issues.apache.org/jira/browse/TIKA-820
>             Project: Tika
>          Issue Type: Bug
>          Components: general, parser
>    Affects Versions: 1.0
>            Reporter: Daniel Bonniot de Ruisselet
>              Labels: patch
>             Fix For: 1.3
>
>         Attachments: text-locator.patch
>
>
> The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.
> Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-820:
-----------------------------------


- push to 1.3
                
> Locator is unset for HTML parser
> --------------------------------
>
>                 Key: TIKA-820
>                 URL: https://issues.apache.org/jira/browse/TIKA-820
>             Project: Tika
>          Issue Type: Bug
>          Components: general, parser
>    Affects Versions: 1.0
>            Reporter: Daniel Bonniot de Ruisselet
>              Labels: patch
>             Fix For: 1.3
>
>         Attachments: text-locator.patch
>
>
> The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.
> Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (TIKA-820) Locator is unset for HTML parser

Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ken Krugler reassigned TIKA-820:
--------------------------------

    Assignee: Ken Krugler
    
> Locator is unset for HTML parser
> --------------------------------
>
>                 Key: TIKA-820
>                 URL: https://issues.apache.org/jira/browse/TIKA-820
>             Project: Tika
>          Issue Type: Bug
>          Components: general, parser
>    Affects Versions: 1.0
>            Reporter: Daniel Bonniot de Ruisselet
>            Assignee: Ken Krugler
>              Labels: patch
>             Fix For: 1.3
>
>         Attachments: text-locator.patch
>
>
> The HtmlParser does not call setDocumentLocator(Locator locator) on the user's content handler.
> Patch and unit test attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira