You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2015/07/22 16:27:13 UTC

[jira] [Commented] (JENA-996) riot should recognize invalid URIs in large jsonld files

    [ https://issues.apache.org/jira/browse/JENA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637005#comment-14637005 ] 

Andy Seaborne commented on JENA-996:
------------------------------------

if the file passed (with WARn and no ERROR)"riot --validate" and then showing up on loading, it's more likely an issue with TDB.

Its not clear the two items above, ERROR and WARN are related. riot streams so scale is not a factor.  I suspect that the TDB node table is involved.

This also explains getting "ERROR [] Broken token (newline): ..." because Jena uses jsonld-java for JSON-LD parsing and so errors don't necessarily come out in RIOT format.  In fact, it does not like it is a URI error but a literal with "Public consulting services for smaller comme".  Is that in the data file? Could you extract that part and make it self contained JSON-LD and put it here?

I can't explain where the line number come from. Did you pipe the logging output at any stage to stdout? (a wild guess - but is the "ERROR-WARN" is one message it's because the WARN got into the data stream.)

Is there  line: 31572453 in the file?

> riot should recognize invalid URIs in large jsonld files
> --------------------------------------------------------
>
>                 Key: JENA-996
>                 URL: https://issues.apache.org/jira/browse/JENA-996
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: RIOT
>    Affects Versions: Jena 2.13.0
>            Reporter: Joachim Neubert
>
> With riot --validate, in large jsonld files URIs including whitespace are not flagged, the files seems to be valid.
> However, when loading this with Fusekis tdbloader, loading aborts with
> ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting services for smaller comme03:06:01 WARN  riot                 :: Bad IRI: <http://dx.doi.org/DOI 10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. These match no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
> Unfortunately, it seems that the error can not be reproduced on small files. The isolated sequence:
> {code}
> {
>   "@context":
>   {
>     "eb": "http://zbw.eu/beta/resource/title/",
>     "doi": "http://dx.doi.org/",
>     "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
>   },
>   "@graph":
> [
> {
>    "@id" : "eb:10003656538",
>    "identifier_doi" : [
>       "doi:DOI 10.2767/59617"
>    ]
> }
> ]
> }
> {code}
> creates a message:
> 14:13:52 WARN  riot                 :: Bad IRI: <http://dx.doi.org/DOI 10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. These match no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
> I can make the large file (econis_0037.json, 59Mb) available for download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)