You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2023/04/16 09:12:00 UTC

[jira] [Commented] (JENA-2351) Newline (U+000A) in IRIs not escaped during NT/TTL/NQ/TRIG serialization

    [ https://issues.apache.org/jira/browse/JENA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712761#comment-17712761 ] 

Andy Seaborne commented on JENA-2351:
-------------------------------------

Newlines in IRIs are not legal. How are you encountering them?

The grammar rule (N-Triples, Turtle etc) is 

{noformat}
	IRIREF	::=	'<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
{noformat}

and in addition there is a requirement to conform according to the grammar of RFC3986/7.

The {{IRIREF}} rule does not allow any raw control characters. The readers are lax but an app should at least get a warning about UCHAR used to get a bad character in. (The lax reading is part legacy and part "be generous in what you read" and there is a fair amount of bad data around.)

It is not correct to change the URI to have {{%0A}}. Percent encoding is not an escape mechanism - it really does put 3 characters %-0-A into the URI making it a different URI.

NT/NQ/TTL and TriG all use a {{NodeFormatter}} using {{QuotedURI}} for a URI that isn't made into a prefix name.

The RDF/XML writer is different. It blows up here trying for a base URI conversion on output.


> Newline (U+000A) in IRIs not escaped during NT/TTL/NQ/TRIG serialization 
> -------------------------------------------------------------------------
>
>                 Key: JENA-2351
>                 URL: https://issues.apache.org/jira/browse/JENA-2351
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: RIOT
>    Affects Versions: Jena 4.7.0
>            Reporter: Jan Martin Keil
>            Priority: Major
>
> [Newline characters (U+000A) in IRIs|https://github.com/dbpedia/extraction-framework/issues/748] are not escaped during the serialization of a model or datasets into a format of the turtle family. This results in invalid files, which Jena is not able to read anymore. Please not the following tests:
> {code:java}
> import org.apache.jena.query.Dataset;
> import org.apache.jena.query.DatasetFactory;
> import org.apache.jena.rdf.model.*;
> import org.apache.jena.riot.Lang;
> import org.apache.jena.riot.RDFDataMgr;
> import org.junit.jupiter.api.Test;
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.IOException;
> public class Example {
>     @Test
>     public void rdfXml() throws IOException {
>         Property someProperty = ResourceFactory.createProperty("http://example.org/property");
>         Model model = ModelFactory.createDefaultModel();
>         model.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
>         System.out.println("\nRDF/XML:\n");
>         model.write(System.out,"RDF/XML");
>         // test write and read
>         File file = File.createTempFile("example",".rdf");
>         model.write(new FileOutputStream(file),"RDF/XML");
>         ModelFactory.createDefaultModel().read(new FileInputStream(file),"","RDF/XML");
>     }
>     @Test
>     public void ttl() throws IOException {
>         Property someProperty = ResourceFactory.createProperty("http://example.org/property");
>         Model model = ModelFactory.createDefaultModel();
>         model.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
>         System.out.println("\nTTL:\n");
>         model.write(System.out,"TTL");
>         // test write and read
>         File file = File.createTempFile("example",".ttl");
>         model.write(new FileOutputStream(file),"TTL");
>         ModelFactory.createDefaultModel().read(new FileInputStream(file),"","TTL");
>     }
>     @Test
>     public void nTriples() throws IOException {
>         Property someProperty = ResourceFactory.createProperty("http://example.org/property");
>         Model model = ModelFactory.createDefaultModel();
>         model.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
>         System.out.println("\nN-TRIPLE:\n");
>         model.write(System.out,"N-TRIPLE");
>         // test write and read
>         File file = File.createTempFile("example",".nt");
>         model.write(new FileOutputStream(file),"N-TRIPLE");
>         ModelFactory.createDefaultModel().read(new FileInputStream(file),"","N-TRIPLE");
>     }
>     @Test
>     public void nq() throws IOException {
>         Property someProperty = ResourceFactory.createProperty("http://example.org/property");
>         Model model1 = ModelFactory.createDefaultModel();
>         model1.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
>         Model model2 = ModelFactory.createDefaultModel();
>         model2.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
>         Dataset dataset = DatasetFactory.createGeneral();
>         dataset.setDefaultModel(model1);
>         dataset.addNamedModel("http://example.org/namedGraph",model2);
>         System.out.println("\nNQ:\n");
>         RDFDataMgr.write(System.out, dataset, Lang.NQ) ;
>         // test write and read
>         File file = File.createTempFile("example", ".nq");
>         RDFDataMgr.write(new FileOutputStream(file), dataset, Lang.NQ) ;
>         RDFDataMgr.read(DatasetFactory.createGeneral(), new FileInputStream(file), Lang.NQ) ;
>     }
>     @Test
>     public void trig() throws IOException {
>         Property someProperty = ResourceFactory.createProperty("http://example.org/property");
>         Model model1 = ModelFactory.createDefaultModel();
>         model1.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
>         Model model2 = ModelFactory.createDefaultModel();
>         model2.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a string");
>         Dataset dataset = DatasetFactory.createGeneral();
>         dataset.setDefaultModel(model1);
>         dataset.addNamedModel("http://example.org/namedGraph",model2);
>         System.out.println("\nTRIG:\n");
>         RDFDataMgr.write(System.out, dataset, Lang.TRIG) ;
>         // test write and read
>         File file = File.createTempFile("example", ".trig");
>         RDFDataMgr.write(new FileOutputStream(file), dataset, Lang.TRIG) ;
>         RDFDataMgr.read(DatasetFactory.createGeneral(), new FileInputStream(file), Lang.TRIG) ;
>     }
> }
> {code}
> Outputs (stack traces truncated):
> {code:java}
> N-TRIPLE:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> Apr. 15, 2023 10:01:45 PM org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> 	at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> 	...
> {code}
> {code:java}
> RDF/XML:
> <rdf:RDF
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>     xmlns:j.0="http://example.org/">
>   <rdf:Description rdf:about="http://example.org/aaa/&#xA;bbb">
>     <j.0:property>a string</j.0:property>
>   </rdf:Description>
> </rdf:RDF>
> {code}
> {code:java}
> NQ:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" <http://example.org/namedGraph> .
> Apr. 15, 2023 10:01:45 PM org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> 	at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> 	...
> {code}
> {code:java}
> TTL:
> <http://example.org/aaa/
> bbb>    <http://example.org/property>  "a string" .
> Apr. 15, 2023 10:01:45 PM org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> 	at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> 	...
> {code}
> {code:java}
> TRIG:
> <http://example.org/aaa/
> bbb>    <http://example.org/property>  "a string" .
> <http://example.org/namedGraph> {
>     <http://example.org/aaa/
>     bbb>    <http://example.org/property>  "a string" .
> }
> Apr. 15, 2023 10:01:45 PM org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline): http://example.org/aaa/
> 	at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> 	...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: jira-unsubscribe@jena.apache.org
For additional commands, e-mail: jira-help@jena.apache.org