You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/04/11 19:06:20 UTC

[jira] [Commented] (ANY23-137) RDFa parser implementation proposal

    [ https://issues.apache.org/jira/browse/ANY23-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966791#comment-13966791 ] 

Lewis John McGibbney commented on ANY23-137:
--------------------------------------------

Regarding the 1st question above. It all looks good. The changes in {noformat}Any23Test.testExtractionParameters{noformet} look only to be aesthetic reformatting as oppose to functional.

I do not think that there is any _standard_ for catching SAXException. In the past (ANY23-115) for example when we discovered that empty spans break extraction of some documents, we decided to simply replace empty spans with a String "null". This way entire page parse and extraction is not lost/failed. I would be supportive of such measure if they occur when we encounter SAXException as well.     

> RDFa parser implementation proposal
> -----------------------------------
>
>                 Key: ANY23-137
>                 URL: https://issues.apache.org/jira/browse/ANY23-137
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.8.0
>            Reporter: Lev Khomich
>            Assignee: Peter Ansell
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: oQYfomKX.part, rdfa-extractor-proposal.patch
>
>
> As a follow up to discussion [1].
> I've implemented another RDFa extractor for Any23 (0.7.1).
> Proposed code depends on semargl project [2].
> Pull request located at [3].
> [1] http://mail-archives.apache.org/mod_mbox/any23-dev/201212.mbox/browser
> [2] http://semarglproject.org
> [3] https://github.com/apache/any23/pull/2



--
This message was sent by Atlassian JIRA
(v6.2#6252)