You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2023/04/16 09:12:00 UTC
[jira] [Commented] (JENA-2351) Newline (U+000A) in IRIs not escaped during NT/TTL/NQ/TRIG serialization
[ https://issues.apache.org/jira/browse/JENA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712761#comment-17712761 ]
Andy Seaborne commented on JENA-2351:
-------------------------------------
Newlines in IRIs are not legal. How are you encountering them?
The grammar rule (N-Triples, Turtle etc) is
{noformat}
IRIREF ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
{noformat}
and in addition there is a requirement to conform according to the grammar of RFC3986/7.
The {{IRIREF}} rule does not allow any raw control characters. The readers are lax but an app should at least get a warning about UCHAR used to get a bad character in. (The lax reading is part legacy and part "be generous in what you read" and there is a fair amount of bad data around.)
It is not correct to change the URI to have {{%0A}}. Percent encoding is not an escape mechanism - it really does put 3 characters %-0-A into the URI making it a different URI.
NT/NQ/TTL and TriG all use a {{NodeFormatter}} using {{QuotedURI}} for a URI that isn't made into a prefix name.
The RDF/XML writer is different. It blows up here trying for a base URI conversion on output.
> Newline (U+000A) in IRIs not escaped during NT/TTL/NQ/TRIG serialization
> -------------------------------------------------------------------------
>
> Key: JENA-2351
> URL: https://issues.apache.org/jira/browse/JENA-2351
> Project: Apache Jena
> Issue Type: Bug
> Components: RIOT
> Affects Versions: Jena 4.7.0
> Reporter: Jan Martin Keil
> Priority: Major
>
> [Newline characters (U+000A) in IRIs|https://github.com/dbpedia/extraction-framework/issues/748] are not escaped during the serialization of a model or datasets into a format of the turtle family. This results in invalid files, which Jena is not able to read anymore. Please not the following tests:
> {code:java}
> import org.apache.jena.query.Dataset;
> import org.apache.jena.query.DatasetFactory;
> import org.apache.jena.rdf.model.*;
> import org.apache.jena.riot.Lang;
> import org.apache.jena.riot.RDFDataMgr;
> import org.junit.jupiter.api.Test;
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.IOException;
> public class Example {
> @Test
> public void rdfXml() throws IOException {
> Property someProperty = ResourceFactory.createProperty("http://example.org/property");
> Model model = ModelFactory.createDefaultModel();
> model.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
> System.out.println("\nRDF/XML:\n");
> model.write(System.out,"RDF/XML");
> // test write and read
> File file = File.createTempFile("example",".rdf");
> model.write(new FileOutputStream(file),"RDF/XML");
> ModelFactory.createDefaultModel().read(new FileInputStream(file),"","RDF/XML");
> }
> @Test
> public void ttl() throws IOException {
> Property someProperty = ResourceFactory.createProperty("http://example.org/property");
> Model model = ModelFactory.createDefaultModel();
> model.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
> System.out.println("\nTTL:\n");
> model.write(System.out,"TTL");
> // test write and read
> File file = File.createTempFile("example",".ttl");
> model.write(new FileOutputStream(file),"TTL");
> ModelFactory.createDefaultModel().read(new FileInputStream(file),"","TTL");
> }
> @Test
> public void nTriples() throws IOException {
> Property someProperty = ResourceFactory.createProperty("http://example.org/property");
> Model model = ModelFactory.createDefaultModel();
> model.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
> System.out.println("\nN-TRIPLE:\n");
> model.write(System.out,"N-TRIPLE");
> // test write and read
> File file = File.createTempFile("example",".nt");
> model.write(new FileOutputStream(file),"N-TRIPLE");
> ModelFactory.createDefaultModel().read(new FileInputStream(file),"","N-TRIPLE");
> }
> @Test
> public void nq() throws IOException {
> Property someProperty = ResourceFactory.createProperty("http://example.org/property");
> Model model1 = ModelFactory.createDefaultModel();
> model1.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
> Model model2 = ModelFactory.createDefaultModel();
> model2.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
> Dataset dataset = DatasetFactory.createGeneral();
> dataset.setDefaultModel(model1);
> dataset.addNamedModel("http://example.org/namedGraph",model2);
> System.out.println("\nNQ:\n");
> RDFDataMgr.write(System.out, dataset, Lang.NQ) ;
> // test write and read
> File file = File.createTempFile("example", ".nq");
> RDFDataMgr.write(new FileOutputStream(file), dataset, Lang.NQ) ;
> RDFDataMgr.read(DatasetFactory.createGeneral(), new FileInputStream(file), Lang.NQ) ;
> }
> @Test
> public void trig() throws IOException {
> Property someProperty = ResourceFactory.createProperty("http://example.org/property");
> Model model1 = ModelFactory.createDefaultModel();
> model1.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
> Model model2 = ModelFactory.createDefaultModel();
> model2.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
> Dataset dataset = DatasetFactory.createGeneral();
> dataset.setDefaultModel(model1);
> dataset.addNamedModel("http://example.org/namedGraph",model2);
> System.out.println("\nTRIG:\n");
> RDFDataMgr.write(System.out, dataset, Lang.TRIG) ;
> // test write and read
> File file = File.createTempFile("example", ".trig");
> RDFDataMgr.write(new FileOutputStream(file), dataset, Lang.TRIG) ;
> RDFDataMgr.read(DatasetFactory.createGeneral(), new FileInputStream(file), Lang.TRIG) ;
> }
> }
> {code}
> Outputs (stack traces truncated):
> {code:java}
> N-TRIPLE:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> Apr. 15, 2023 10:01:45 PM org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> ...
> {code}
> {code:java}
> RDF/XML:
> <rdf:RDF
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> xmlns:j.0="http://example.org/">
> <rdf:Description rdf:about="http://example.org/aaa/
bbb">
> <j.0:property>a string</j.0:property>
> </rdf:Description>
> </rdf:RDF>
> {code}
> {code:java}
> NQ:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" <http://example.org/namedGraph> .
> Apr. 15, 2023 10:01:45 PM org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> ...
> {code}
> {code:java}
> TTL:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> Apr. 15, 2023 10:01:45 PM org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> ...
> {code}
> {code:java}
> TRIG:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> <http://example.org/namedGraph> {
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> }
> Apr. 15, 2023 10:01:45 PM org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> ...
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: jira-unsubscribe@jena.apache.org
For additional commands, e-mail: jira-help@jena.apache.org