You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2008/12/12 20:19:44 UTC

[jira] Created: (TIKA-182) Allow clients to listen to the raw SAX events if available

Allow clients to listen to the raw SAX events if available
----------------------------------------------------------

                 Key: TIKA-182
                 URL: https://issues.apache.org/jira/browse/TIKA-182
             Project: Tika
          Issue Type: New Feature
          Components: parser
            Reporter: Jukka Zitting
            Priority: Minor


As discussed on the mailing list (http://markmail.org/message/gojiffbhlcuifnzd) it would be nice to allow clients to listen to the raw SAX events of an underlying XML-based (or -like) document.

There's a proposed patch for the HTML parser in http://markmail.org/message/l72v6ybf4jjrcp7p

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-182) Allow clients to listen to the raw SAX events if available

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-182.
--------------------------------

    Resolution: Won't Fix

Resolving as Won't Fix based on the above reasoning.

> Allow clients to listen to the raw SAX events if available
> ----------------------------------------------------------
>
>                 Key: TIKA-182
>                 URL: https://issues.apache.org/jira/browse/TIKA-182
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> As discussed on the mailing list (http://markmail.org/message/gojiffbhlcuifnzd) it would be nice to allow clients to listen to the raw SAX events of an underlying XML-based (or -like) document.
> There's a proposed patch for the HTML parser in http://markmail.org/message/l72v6ybf4jjrcp7p

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-182) Allow clients to listen to the raw SAX events if available

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656484#action_12656484 ] 

Jukka Zitting commented on TIKA-182:
------------------------------------

After thinking about this a bit more, I find myself reluctant to apply this patch. Adding such a low-level extension point essentially prevents us from changing to some other parser library that doesn't generate those low-level SAX events. For example I wouldn't count out the possibility that at some point we'd want to replace NekoHTML with a higher level HTML parser that better expresses how the HTML content gets expressed to the user.

> Allow clients to listen to the raw SAX events if available
> ----------------------------------------------------------
>
>                 Key: TIKA-182
>                 URL: https://issues.apache.org/jira/browse/TIKA-182
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> As discussed on the mailing list (http://markmail.org/message/gojiffbhlcuifnzd) it would be nice to allow clients to listen to the raw SAX events of an underlying XML-based (or -like) document.
> There's a proposed patch for the HTML parser in http://markmail.org/message/l72v6ybf4jjrcp7p

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.