You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/11 18:34:00 UTC

[jira] [Commented] (JENA-1462) RDF/XML parsing fails on newer/provisional/private URI schemes in base URI

    [ https://issues.apache.org/jira/browse/JENA-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322713#comment-16322713 ] 

ASF GitHub Bot commented on JENA-1462:
--------------------------------------

GitHub user stain opened a pull request:

    https://github.com/apache/jena/pull/341

    JENA-1462: Tests RDF/XML parsing newer URI schemes

    Tests for [JENA-1462](https://issues.apache.org/jira/browse/JENA-1462)
    
    RIOT parsing RDF/XML with a base URI different from http/https/file, such as `ssh://example.com/nested/`, fails.
    
    Note as JENA-1462 is not fixed by this PR, this only adds the unit tests and test files.
    
    This test also highlights a bug in parsing URIs like `file://example.com/etc/passwd` as described in [JENA-1463](https://issues.apache.org/jira/browse/JENA-1463)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/stain/jena JENA-1462

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/341.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #341
    
----
commit 6ecd48af6967ca48f985850393ac3b16df31a314
Author: Stian Soiland-Reyes <st...@...>
Date:   2018-01-11T18:12:33Z

    JENA-1462: Tests RDF/XML parsing newer URI schemes
    
    RIOT parsing RDF/XML with a base URI different from http/https/file,
    such as ssh://, fails.
    
    Note as JENA-1462 is not fixed, this only adds the unit tests.

----


> RDF/XML parsing fails on newer/provisional/private URI schemes in base URI
> --------------------------------------------------------------------------
>
>                 Key: JENA-1462
>                 URL: https://issues.apache.org/jira/browse/JENA-1462
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ, RDF/XML
>    Affects Versions: Jena 3.3.0, Jena 3.4.0, Jena 3.5.0, Jena 3.6.0
>         Environment: Apache Maven 3.3.9
> Maven home: /usr/share/maven
> Java version: 1.8.0_151, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> Default locale: en_GB, platform encoding: UTF-8
> OS name: "linux", version: "4.10.0-42-generic", arch: "amd64", family: "unix"
> Distributor ID:	Ubuntu
> Description:	Ubuntu 16.04.3 LTS
> Release:	16.04
> Codename:	xenial
>            Reporter: Stian Soiland-Reyes
>
> RIOT parsing RDF/XML with a base URI different from http/https/file, such as ssh://, fails.
> See https://github.com/stain/jena-test-unregistered-iana for some tests I came up with.
> Tests fail both for xml:base or if the base URI is provided to RDFDataMgr, but not if the URI is full inside the RDF/XML.
> {code}
> org.apache.jena.riot.RiotException: [line: 5, col: 40] {E214} Resolving against bad URI <ssh://example.com/nested/>: <foo.txt>
> 	at org.apache.jena.riot.TestParseURISchemeBases.sshBaseRDF(TestParseURISchemeBases.java:336)
> {code}
> This error message comes from ERR_RESOLVING_AGAINST_MALFORMED_BASE - for some reason the warning becomes an error as the IRI Factory used for creating the Base IRI within the RDF/XML parser is a bit too strict.
> However I could not find anything in the specs:
> * https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/
> * https://www.w3.org/TR/2009/REC-xmlbase-20090128/
> * https://www.ietf.org/rfc/rfc3986
> that says "foreign" URI schemes should not be permitted. Anyway Jena's IANA list is probably out of date, as my tests shown.
> This was initially detected in TAVERNA-1027 which tries to parse an RDF/XML with the [app:// URI scheme|https://www.w3.org/TR/app-uri/] , which is *not* registered with IANA https://www.iana.org/assignments/uri-schemes according to https://tools.ietf.org/html/bcp35
> However, testing Jena with other permanent and provisional schemes from the registry, such as example://, ssh:// or a conformant private scheme with a domain-based name org.apache.jena.test:// also give the same error.  
> IMHO they should all be understood in the same way as when parsing the Turtle examples, which don't fail.
> I could trace this back to Jena 3.3.0, so I suspect this was introduced with JENA-1306. With versions before that all my tests *) work.
> I'll raise a pull request with the junit tests, but have not been able to find a good way to fix it.
> _*) There's a separate issue that hostnames in file://example.com/etc/passwd style URIs also seem to be misparsed in RDF/XML into file:///example.com/etc/passwd , which I'll report separately, that goes back till 3.0.1._



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)