You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Paolo Castagna <ca...@googlemail.com> on 2012/03/21 10:38:54 UTC

\n in literals and N-Triples|N-Quads|Turtle files...

Hi,
I am sorry if this is a silly question, but I need some clarity (or another coffee already).

The following are Java strings, therefore \n is the new line character...

Java strings                 Turtle literals   N-Triples literals
-----------------------------------------------------------------
\"\"\"Hello \n World\"\"\"   legal             illegal
\"Hello \n World\"           illegal           illegal
\"\"\"Hello \\n World\"\"\"  legal             legal
\"Hello \\n World\"          legal             legal
\"Hello \u0010 World\"       legal             legal
-----------------------------------------------------------------

If someone tries to parse a Turtle | N-Triples file with a literal with the characters '\''n' in it, we have a RiotException:

org.openjena.riot.RiotException: [line: 1, col: 68] Broken token (newline): Hello
	at org.openjena.riot.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:125)
	at org.openjena.riot.lang.LangEngine.raiseException(LangEngine.java:169)
	at org.openjena.riot.lang.LangEngine.nextToken(LangEngine.java:116)
	at org.openjena.riot.lang.LangTurtleBase.predicateObjectItem(LangTurtleBase.java:307)
	at org.openjena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:289)
	at org.openjena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:280)
	at org.openjena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:219)
	at org.openjena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:46)
	at org.openjena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:144)
	at org.openjena.riot.lang.LangBase.parse(LangBase.java:43)
	at org.openjena.riot.RiotLoader.datasetFromString(RiotLoader.java:79)
	at dev.Run2.main(Run2.java:47)

Example here:
https://raw.github.com/castagna/jena-examples/master/src/main/java/dev/Run2.java

I think this is the right behavior, since the new line character is not legal in a string literal in N-Triples | N-Quads files.
It must be escaped '\n' (in a Java string as "\\n" or \u0010).

Right?

Paolo

Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Paolo Castagna <ca...@googlemail.com>.
Sam,
why don't you open a JIRA issue and we can move the discussion there?
I assume the majority of people are not interested in the Unicode pains
we are experiencing. ;-)

Thanks,
Paolo

Sam Tunnicliffe wrote:
> It seems that NodeLib, or rather NodecSSE it uses under the covers,
> has problems round tripping certain strings. How the data gets into
> the larger system is still an issue but somewhat orthogonal here.
> 
> String s = "Hello \uDAE0 World";
> Node literal = Node.createLiteral(s);
> ByteBuffer bb = NodeLib.encode(literal);
> NodeLib.decode(bb);
> 
> blows up during the decode - looking at the stacktraces, this seems to
> be the what causes the problems in committing the transaction. Should
> we expect NodeLib's encode() & decode() to be symmetrical?
> 
> Cheers,
> Sam
> 
> 
> On 21 March 2012 12:37, Paolo Castagna <ca...@googlemail.com> wrote:
>> Hi Andy
>>
>> Andy Seaborne wrote:
>>> On 21/03/12 12:01, Paolo Castagna wrote:
>>>> Paolo Castagna wrote:
>>>>> Sam contributed this test case:
>>>>> https://github.com/castagna/jena-examples/blob/master/src/main/resources/data/single-bad-triple.nt
>>>>>
>>>>> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/Run3.java
>>>>>
>>>>> Looking at this, right now.
>>>> Getting closer... do we have a bug here?
>>>>
>>>> Test case is here:
>>>> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/TestTDBUnicode.java
>>>>
>>>>
>>>> This is the data (i.e. a single triple with the Unicode character \uDAE0
>>>> in a literal value):<s>  <p>  "Hello \uDAE0 World" .
>>> You need
>>>
>>>     private static final String str_literal = "Hello \\uDAE0 World";
>>>
>>> else it is a Java \u
>> Ack, silly mistake.
>>
>> I still don't understand why with "Hello \\uDAE0 World" test_03() fails.
>>
>>>
>>> Using
>>>
>>> Dataset dataset = TDBFactory.createDataset ( ) ;
>>>
>>> is way easier for testing.
>> Added test_04() which fails with the same exception:
>> org.openjena.riot.RiotParseException: [line: 1, col: 2 ] Broken token: Hello
>>
>> I was using TDBFactory.createDataset ( location ) because I noticed that if
>> FileOps.clearDirectory( path ) is commented out and the test is executed
>> twice there is another exception:
>>
>> 12:35:00 WARN  NodeTableTrans            :: Txn[1]/W journalStartOffset not zero: 109/0x6D
>> ************* UNEXPECTED [1]
>>
>>
>> Different ids for file:///opt/workspaces/jena/jena-examples/s: allocated: expected [000000000000006D], got [0000000000000000]
>> label = nodes
>> txn = Transaction: 1 : Mode=WRITE : State=PREPARING : /opt/workspaces/jena/jena-examples/target/tdb/
>> offset = 109
>> journalStartOffset = 109
>> journal = nodes.dat-jrnl
>>
>> com.hp.hpl.jena.tdb.TDBException: Different ids for file:///opt/workspaces/jena/jena-examples/s: allocated: expected [000000000000006D], got [0000000000000000]
>>        at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.inconsistent(NodeTableTrans.java:212)
>>        at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.append(NodeTableTrans.java:200)
>>        at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.writeNodeJournal(NodeTableTrans.java:306)
>>        at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.commitPrepare(NodeTableTrans.java:266)
>>        at com.hp.hpl.jena.tdb.transaction.Transaction.prepare(Transaction.java:131)
>>        at com.hp.hpl.jena.tdb.transaction.Transaction.commit(Transaction.java:112)
>>        at com.hp.hpl.jena.tdb.transaction.DatasetGraphTxn.commit(DatasetGraphTxn.java:40)
>>        at com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction._commit(DatasetGraphTransaction.java:106)
>>        at com.hp.hpl.jena.tdb.migrate.DatasetGraphTrackActive.commit(DatasetGraphTrackActive.java:60)
>>        at com.hp.hpl.jena.sparql.core.DatasetImpl.commit(DatasetImpl.java:143)
>>        at dev.TestTDBUnicode.test_03(TestTDBUnicode.java:66)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>>        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>>        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>>        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>>        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>>        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69)
>>        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48)
>>        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>>        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>>        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>>        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>>        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>>        at org.junit.runners.ParentRunner.run(ParentRunner.java:292)
>>        at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
>>        at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>>        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>>        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>>        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>>        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
>>
>> But, I'd like to understand the first problem first.
>>
>> Paolo
>>
>>>     Andy

Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Andy Seaborne <an...@apache.org>.
On 21/03/12 14:42, Andy Seaborne wrote:
> On 21/03/12 14:27, Andy Seaborne wrote:
>> On 21/03/12 14:20, Sam Tunnicliffe wrote:
>>> It seems that NodeLib, or rather NodecSSE it uses under the covers,
>>> has problems round tripping certain strings. How the data gets into
>>> the larger system is still an issue but somewhat orthogonal here.
>>>
>>> String s = "Hello \uDAE0 World";

String s = "a\uDAE0                ";

>>> Node literal = Node.createLiteral(s);
>>> ByteBuffer bb = NodeLib.encode(literal);

System.out.println(bb) ;

shows a short byte buffer.  Suggests an encoding issue.

	Andy

>>> NodeLib.decode(bb);
>>>
>>> blows up during the decode - looking at the stacktraces, this seems to
>>> be the what causes the problems in committing the transaction. Should
>>> we expect NodeLib's encode()& decode() to be symmetrical?
>>
>> Yes.
>>
>> What range of unicode codepoints does it fail on?
>>
>> (D is high bit set in 16 bits).
>
> It is part of a surrogate pair but not a complete pair.
>
> Surrogate pairs come in (high, low) pairs. This is a high surrogate but
> there is no low surrogate. If I add a low surrogate, it seems to work
> for me.
>
> String s = "\uDAE0\uDC00";
>
> Unicode: chapter 3, section 3.8.
>
> I think the exception occurs because the UTF-8 decoder (from the Java
> library) aborts and says "end of file"
>
> Andy
>
>
>>
>> Andy
>>
>>>
>>> Cheers,
>>> Sam
>


Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Andy Seaborne <an...@apache.org>.
On 21/03/12 14:27, Andy Seaborne wrote:
> On 21/03/12 14:20, Sam Tunnicliffe wrote:
>> It seems that NodeLib, or rather NodecSSE it uses under the covers,
>> has problems round tripping certain strings. How the data gets into
>> the larger system is still an issue but somewhat orthogonal here.
>>
>> String s = "Hello \uDAE0 World";
>> Node literal = Node.createLiteral(s);
>> ByteBuffer bb = NodeLib.encode(literal);
>> NodeLib.decode(bb);
>>
>> blows up during the decode - looking at the stacktraces, this seems to
>> be the what causes the problems in committing the transaction. Should
>> we expect NodeLib's encode()& decode() to be symmetrical?
>
> Yes.
>
> What range of unicode codepoints does it fail on?
>
> (D is high bit set in 16 bits).

It is part of a surrogate pair but not a complete pair.

Surrogate pairs come in (high, low) pairs.  This is a high surrogate but 
there is no low surrogate.  If I add a low surrogate, it seems to work 
for me.

String s = "\uDAE0\uDC00";

Unicode: chapter 3, section 3.8.

I think the exception occurs because the UTF-8 decoder (from the Java 
library) aborts and says "end of file"

	Andy


>
> Andy
>
>>
>> Cheers,
>> Sam


Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Andy Seaborne <an...@apache.org>.
On 21/03/12 14:20, Sam Tunnicliffe wrote:
> It seems that NodeLib, or rather NodecSSE it uses under the covers,
> has problems round tripping certain strings. How the data gets into
> the larger system is still an issue but somewhat orthogonal here.
>
> String s = "Hello \uDAE0 World";
> Node literal = Node.createLiteral(s);
> ByteBuffer bb = NodeLib.encode(literal);
> NodeLib.decode(bb);
>
> blows up during the decode - looking at the stacktraces, this seems to
> be the what causes the problems in committing the transaction. Should
> we expect NodeLib's encode()&  decode() to be symmetrical?

Yes.

What range of unicode codepoints does it fail on?

(D is high bit set in 16 bits).

	Andy

>
> Cheers,
> Sam

Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Sam Tunnicliffe <li...@beobal.com>.
It seems that NodeLib, or rather NodecSSE it uses under the covers,
has problems round tripping certain strings. How the data gets into
the larger system is still an issue but somewhat orthogonal here.

String s = "Hello \uDAE0 World";
Node literal = Node.createLiteral(s);
ByteBuffer bb = NodeLib.encode(literal);
NodeLib.decode(bb);

blows up during the decode - looking at the stacktraces, this seems to
be the what causes the problems in committing the transaction. Should
we expect NodeLib's encode() & decode() to be symmetrical?

Cheers,
Sam


On 21 March 2012 12:37, Paolo Castagna <ca...@googlemail.com> wrote:
>
> Hi Andy
>
> Andy Seaborne wrote:
> > On 21/03/12 12:01, Paolo Castagna wrote:
> >> Paolo Castagna wrote:
> >>> Sam contributed this test case:
> >>> https://github.com/castagna/jena-examples/blob/master/src/main/resources/data/single-bad-triple.nt
> >>>
> >>> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/Run3.java
> >>>
> >>> Looking at this, right now.
> >>
> >> Getting closer... do we have a bug here?
> >>
> >> Test case is here:
> >> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/TestTDBUnicode.java
> >>
> >>
> >> This is the data (i.e. a single triple with the Unicode character \uDAE0
> >> in a literal value):<s>  <p>  "Hello \uDAE0 World" .
> >
> > You need
> >
> >     private static final String str_literal = "Hello \\uDAE0 World";
> >
> > else it is a Java \u
>
> Ack, silly mistake.
>
> I still don't understand why with "Hello \\uDAE0 World" test_03() fails.
>
> >
> >
> > Using
> >
> > Dataset dataset = TDBFactory.createDataset ( ) ;
> >
> > is way easier for testing.
>
> Added test_04() which fails with the same exception:
> org.openjena.riot.RiotParseException: [line: 1, col: 2 ] Broken token: Hello
>
> I was using TDBFactory.createDataset ( location ) because I noticed that if
> FileOps.clearDirectory( path ) is commented out and the test is executed
> twice there is another exception:
>
> 12:35:00 WARN  NodeTableTrans            :: Txn[1]/W journalStartOffset not zero: 109/0x6D
> ************* UNEXPECTED [1]
>
>
> Different ids for file:///opt/workspaces/jena/jena-examples/s: allocated: expected [000000000000006D], got [0000000000000000]
> >>>>>>>>>>
> label = nodes
> txn = Transaction: 1 : Mode=WRITE : State=PREPARING : /opt/workspaces/jena/jena-examples/target/tdb/
> offset = 109
> journalStartOffset = 109
> journal = nodes.dat-jrnl
>
> com.hp.hpl.jena.tdb.TDBException: Different ids for file:///opt/workspaces/jena/jena-examples/s: allocated: expected [000000000000006D], got [0000000000000000]
>        at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.inconsistent(NodeTableTrans.java:212)
>        at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.append(NodeTableTrans.java:200)
>        at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.writeNodeJournal(NodeTableTrans.java:306)
>        at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.commitPrepare(NodeTableTrans.java:266)
>        at com.hp.hpl.jena.tdb.transaction.Transaction.prepare(Transaction.java:131)
>        at com.hp.hpl.jena.tdb.transaction.Transaction.commit(Transaction.java:112)
>        at com.hp.hpl.jena.tdb.transaction.DatasetGraphTxn.commit(DatasetGraphTxn.java:40)
>        at com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction._commit(DatasetGraphTransaction.java:106)
>        at com.hp.hpl.jena.tdb.migrate.DatasetGraphTrackActive.commit(DatasetGraphTrackActive.java:60)
>        at com.hp.hpl.jena.sparql.core.DatasetImpl.commit(DatasetImpl.java:143)
>        at dev.TestTDBUnicode.test_03(TestTDBUnicode.java:66)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69)
>        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48)
>        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>        at org.junit.runners.ParentRunner.run(ParentRunner.java:292)
>        at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
>        at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
>
> But, I'd like to understand the first problem first.
>
> Paolo
>
> >
> >     Andy

Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Andy

Andy Seaborne wrote:
> On 21/03/12 12:01, Paolo Castagna wrote:
>> Paolo Castagna wrote:
>>> Sam contributed this test case:
>>> https://github.com/castagna/jena-examples/blob/master/src/main/resources/data/single-bad-triple.nt
>>>
>>> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/Run3.java
>>>
>>> Looking at this, right now.
>>
>> Getting closer... do we have a bug here?
>>
>> Test case is here:
>> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/TestTDBUnicode.java
>>
>>
>> This is the data (i.e. a single triple with the Unicode character \uDAE0
>> in a literal value):<s>  <p>  "Hello \uDAE0 World" .
> 
> You need
> 
>     private static final String str_literal = "Hello \\uDAE0 World";
> 
> else it is a Java \u

Ack, silly mistake.

I still don't understand why with "Hello \\uDAE0 World" test_03() fails.

> 
> 
> Using
> 
> Dataset dataset = TDBFactory.createDataset ( ) ;
> 
> is way easier for testing.

Added test_04() which fails with the same exception:
org.openjena.riot.RiotParseException: [line: 1, col: 2 ] Broken token: Hello

I was using TDBFactory.createDataset ( location ) because I noticed that if
FileOps.clearDirectory( path ) is commented out and the test is executed
twice there is another exception:

12:35:00 WARN  NodeTableTrans            :: Txn[1]/W journalStartOffset not zero: 109/0x6D
************* UNEXPECTED [1]


Different ids for file:///opt/workspaces/jena/jena-examples/s: allocated: expected [000000000000006D], got [0000000000000000]
>>>>>>>>>>
label = nodes
txn = Transaction: 1 : Mode=WRITE : State=PREPARING : /opt/workspaces/jena/jena-examples/target/tdb/
offset = 109
journalStartOffset = 109
journal = nodes.dat-jrnl

com.hp.hpl.jena.tdb.TDBException: Different ids for file:///opt/workspaces/jena/jena-examples/s: allocated: expected [000000000000006D], got [0000000000000000]
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.inconsistent(NodeTableTrans.java:212)
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.append(NodeTableTrans.java:200)
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.writeNodeJournal(NodeTableTrans.java:306)
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.commitPrepare(NodeTableTrans.java:266)
	at com.hp.hpl.jena.tdb.transaction.Transaction.prepare(Transaction.java:131)
	at com.hp.hpl.jena.tdb.transaction.Transaction.commit(Transaction.java:112)
	at com.hp.hpl.jena.tdb.transaction.DatasetGraphTxn.commit(DatasetGraphTxn.java:40)
	at com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction._commit(DatasetGraphTransaction.java:106)
	at com.hp.hpl.jena.tdb.migrate.DatasetGraphTrackActive.commit(DatasetGraphTrackActive.java:60)
	at com.hp.hpl.jena.sparql.core.DatasetImpl.commit(DatasetImpl.java:143)
	at dev.TestTDBUnicode.test_03(TestTDBUnicode.java:66)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:292)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

But, I'd like to understand the first problem first.

Paolo

> 
>     Andy

Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Andy Seaborne <an...@apache.org>.
On 21/03/12 12:01, Paolo Castagna wrote:
> Paolo Castagna wrote:
>> Sam contributed this test case:
>> https://github.com/castagna/jena-examples/blob/master/src/main/resources/data/single-bad-triple.nt
>> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/Run3.java
>> Looking at this, right now.
>
> Getting closer... do we have a bug here?
>
> Test case is here:
> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/TestTDBUnicode.java
>
> This is the data (i.e. a single triple with the Unicode character \uDAE0
> in a literal value):<s>  <p>  "Hello \uDAE0 World" .

You need

     private static final String str_literal = "Hello \\uDAE0 World";

else it is a Java \u


Using

Dataset dataset = TDBFactory.createDataset ( ) ;

is way easier for testing.

	Andy

Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Paolo Castagna <ca...@googlemail.com>.
Paolo Castagna wrote:
> Sam contributed this test case:
> https://github.com/castagna/jena-examples/blob/master/src/main/resources/data/single-bad-triple.nt
> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/Run3.java
> Looking at this, right now.

Getting closer... do we have a bug here?

Test case is here:
https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/TestTDBUnicode.java

This is the data (i.e. a single triple with the Unicode character \uDAE0
in a literal value): <s> <p> "Hello \uDAE0 World" .

Is \uDAE0 legal in a literal value? I would think so.

RiotLoader.datasetFromString(...) parses that triple with no problems:
test_01 and test_02 do not throw any exceptions.

However, if we try to add the same triple to a TDB store within a
transaction (i.e. test_03) we have an exception when we commit the
transaction:

org.openjena.riot.RiotParseException: [line: 1, col: 2 ] Broken token: Hello
	at org.openjena.riot.tokens.TokenizerText.exception(TokenizerText.java:1209)
	at org.openjena.riot.tokens.TokenizerText.readString(TokenizerText.java:620)
	at org.openjena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:248)
	at org.openjena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:112)
	at com.hp.hpl.jena.tdb.nodetable.NodecSSE.decode(NodecSSE.java:105)
	at com.hp.hpl.jena.tdb.lib.NodeLib.decode(NodeLib.java:93)
	at com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(NodeTableNative.java:234)
	at com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(NodeTableNative.java:228)
	at org.openjena.atlas.iterator.Iter$4.next(Iter.java:301)
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.append(NodeTableTrans.java:188)
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.writeNodeJournal(NodeTableTrans.java:306)
	at com.hp.hpl.jena.tdb.transaction.NodeTableTrans.commitPrepare(NodeTableTrans.java:266)
	at com.hp.hpl.jena.tdb.transaction.Transaction.prepare(Transaction.java:131)
	at com.hp.hpl.jena.tdb.transaction.Transaction.commit(Transaction.java:112)
	at com.hp.hpl.jena.tdb.transaction.DatasetGraphTxn.commit(DatasetGraphTxn.java:40)
	at com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction._commit(DatasetGraphTransaction.java:106)
	at com.hp.hpl.jena.tdb.migrate.DatasetGraphTrackActive.commit(DatasetGraphTrackActive.java:60)
	at com.hp.hpl.jena.sparql.core.DatasetImpl.commit(DatasetImpl.java:143)
	at dev.TestTDBUnicode.test_03(TestTDBUnicode.java:66)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:292)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

If we think this is a bug, I am going to raise a JIRA issue and
continue to investigate to find out where the problem is.

Thanks,
Paolo



Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Andy Seaborne <an...@apache.org>.
On 21/03/12 10:45, Paolo Castagna wrote:
>
> Sam contributed this test case:
> https://github.com/castagna/jena-examples/blob/master/src/main/resources/data/single-bad-triple.nt

Not a bad triple to RIOT.

(The old parsers don't handle \U)

	Andy

> https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/Run3.java
> Looking at this, right now.
>
> Cheers,
> Paolo
>
>>
>>>
>>> Paolo
>>


Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Andy

Andy Seaborne wrote:
> On 21/03/12 09:38, Paolo Castagna wrote:
>> Hi,
>> I am sorry if this is a silly question, but I need some clarity (or
>> another coffee already).
> 
> Have you had that coffee yet?

:-)

>>
>> The following are Java strings, therefore \n is the new line character...
>>
>> Java strings                 Turtle literals   N-Triples literals
>> -----------------------------------------------------------------
>> \"\"\"Hello \n World\"\"\"   legal             illegal
> 
> Yes - a triple quoted string can contain a raw newline.
> 
>> \"Hello \n World\"           illegal           illegal
>> \"\"\"Hello \\n World\"\"\"  legal             legal
>> \"Hello \\n World\"          legal             legal
>> \"Hello \u0010 World\"       legal             legal
>> -----------------------------------------------------------------
> 
> Yes - it's layering.
> 
> Don't forget about using ' for " (for this exact reason).

Yep. Thanks.

>> If someone tries to parse a Turtle | N-Triples file with a literal
>> with the characters '\''n' in it, we have a RiotException:
>>
>> org.openjena.riot.RiotException: [line: 1, col: 68] Broken token
>> (newline): Hello
>>     at
>> org.openjena.riot.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:125)
>>
>>     at
>> org.openjena.riot.lang.LangEngine.raiseException(LangEngine.java:169)
>>     at org.openjena.riot.lang.LangEngine.nextToken(LangEngine.java:116)
>>     at
>> org.openjena.riot.lang.LangTurtleBase.predicateObjectItem(LangTurtleBase.java:307)
>>
>>     at
>> org.openjena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:289)
>>
>>     at
>> org.openjena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:280)
>>     at
>> org.openjena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:219)
>>
>>     at
>> org.openjena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:46)
>>     at
>> org.openjena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:144)
>>     at org.openjena.riot.lang.LangBase.parse(LangBase.java:43)
>>     at org.openjena.riot.RiotLoader.datasetFromString(RiotLoader.java:79)
>>     at dev.Run2.main(Run2.java:47)
>>
>> Example here:
>> https://raw.github.com/castagna/jena-examples/master/src/main/java/dev/Run2.java
>>
>>
>> I think this is the right behavior, since the new line character is
>> not legal in a string literal in N-Triples | N-Quads files.
>> It must be escaped '\n' (in a Java string as "\\n" or \u0010).
>>
>> Right?
> 
> Looks right to me.

Ok, thanks for the sanity check.

Investigation continues...

We have some data which is coming in as N-Triples and/or Turtle and there must be
something weird with it. Data goes between different "systems" and, as usual, people
use all sort of tools to generate the data.
Something must be wrong with the data, but it is passing our checks and causing
problems further on (when we assume we have legal Turtle or N-Triples in our hands).
So, I am trying to understand if there is a problem somewhere... or, simply, the data
is illegal.

Sam contributed this test case:
https://github.com/castagna/jena-examples/blob/master/src/main/resources/data/single-bad-triple.nt
https://github.com/castagna/jena-examples/blob/master/src/main/java/dev/Run3.java
Looking at this, right now.

Cheers,
Paolo

> 
>>
>> Paolo
> 

Re: \n in literals and N-Triples|N-Quads|Turtle files...

Posted by Andy Seaborne <an...@apache.org>.
On 21/03/12 09:38, Paolo Castagna wrote:
> Hi,
> I am sorry if this is a silly question, but I need some clarity (or another coffee already).

Have you had that coffee yet?

>
> The following are Java strings, therefore \n is the new line character...
>
> Java strings                 Turtle literals   N-Triples literals
> -----------------------------------------------------------------
> \"\"\"Hello \n World\"\"\"   legal             illegal

Yes - a triple quoted string can contain a raw newline.

> \"Hello \n World\"           illegal           illegal
> \"\"\"Hello \\n World\"\"\"  legal             legal
> \"Hello \\n World\"          legal             legal
> \"Hello \u0010 World\"       legal             legal
> -----------------------------------------------------------------

Yes - it's layering.

Don't forget about using ' for " (for this exact reason).

> If someone tries to parse a Turtle | N-Triples file with a literal with the characters '\''n' in it, we have a RiotException:
>
> org.openjena.riot.RiotException: [line: 1, col: 68] Broken token (newline): Hello
> 	at org.openjena.riot.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:125)
> 	at org.openjena.riot.lang.LangEngine.raiseException(LangEngine.java:169)
> 	at org.openjena.riot.lang.LangEngine.nextToken(LangEngine.java:116)
> 	at org.openjena.riot.lang.LangTurtleBase.predicateObjectItem(LangTurtleBase.java:307)
> 	at org.openjena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:289)
> 	at org.openjena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:280)
> 	at org.openjena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:219)
> 	at org.openjena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:46)
> 	at org.openjena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:144)
> 	at org.openjena.riot.lang.LangBase.parse(LangBase.java:43)
> 	at org.openjena.riot.RiotLoader.datasetFromString(RiotLoader.java:79)
> 	at dev.Run2.main(Run2.java:47)
>
> Example here:
> https://raw.github.com/castagna/jena-examples/master/src/main/java/dev/Run2.java
>
> I think this is the right behavior, since the new line character is not legal in a string literal in N-Triples | N-Quads files.
> It must be escaped '\n' (in a Java string as "\\n" or \u0010).
>
> Right?

Looks right to me.

>
> Paolo