You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Shaw, Ryan" <ry...@unc.edu> on 2021/09/09 16:08:41 UTC

Using Xerces2 2.12.1 with Jena

I would like to use Xerces2 2.12.1 (with support for XML Schema 1.1) as the Jena XML parser, specifically to gain support for XSD 1.1 datatypes. 

I see at [1] that “any XML parser can be used with Jena … through the usual mechanism for adding to the application.” 

But I don’t know what that usual mechanism is. The Xerces docs say something about the Java Endorsed Standards Override Mechanism, but elsewhere I see that this has been deprecated.

What’s the recommended way to do this for Jena?

Thanks,
Ryan

[1] https://issues.apache.org/jira/browse/JENA-341?focusedCommentId=16463591&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16463591

Re: Validation of XSD 1.1 datatypes (was Re: Using Xerces2 2.12.1 with Jena)

Posted by Andy Seaborne <an...@apache.org>.

On 10/09/2021 19:23, Shaw, Ryan wrote:
> 
>>> On 09/09/2021 23:32, Shaw, Ryan wrote:
>>>
>>> riot gives me the warning “Lexical form '0000' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, 0000 is a permitted value for gYear, representing 1 BCE.
> 
>> On Sep 10, 2021, at 6:25 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> Command line riot?
> 
> I am using riot --validate as a final QA check on some RDF generated by other (non-Jena) code.
> 
>> It is just a warning the triple and it's object literal is still output from the parser. From the command line "--nocheck" turns off the checking.
> 
> The output is fine, but since I’m using riot specifically for validation I don’t want to turn off checking.
> 
Workaround:

"grep -v" of stderr will remove it

>> Now logged as
>> https://issues.apache.org/jira/browse/JENA-2158
> 
> Thanks, I will watch this issue.
> 
>> There is a constant to turn on XSD 1.1 schema mode for checking. It affects year 0000, including the value of negative years, and some duration detection.
> 
> Where is this constant? Does this mean I could write my own CLI tool to do validation with this flag set? (Or submit a PR for setting this constant via a riot command line option)?

The constant is in org.apache.jena.ext.xerces.impl.Constants

(also need to change 
org.apache.jena.ext.xerces.jaxp.datatype/XMLGregorianCalendarImpl.java)

There's a PR#1069 in-progress.

It does not mean all arithmetic involving 0000 and indeed BCE dates will 
work. Xerces does not support XSD 1.1 "0000" year in its arithmetic 
support nor does the JDK in my testing.

(And to everyone that points to java.time.* : useful for parsing to 
TemporalAccessors but it has a different concept of duration)


> I wonder if this flag should be on by default, since RDF 1.1 Concepts and Abstract Syntax says [1]:
> 
>> IRIs of the form http://www.w3.org/2001/XMLSchema#xxx, where xxx is the name of a datatype, denote the built-in datatypes defined in XML Schema 1.1 Part 2: Datatypes.

That text is saying that XSD IRIs can't be redefined. The section above 
says  "Any datatype definition that conforms to this abstraction MAY be 
used in RDF" -- so not a requirement.

     Andy

> 
> Thanks,
> Ryan
> 
> [1] https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes
> 
> 

Validation of XSD 1.1 datatypes (was Re: Using Xerces2 2.12.1 with Jena)

Posted by "Shaw, Ryan" <ry...@unc.edu>.
>> On 09/09/2021 23:32, Shaw, Ryan wrote:
>> 
>> riot gives me the warning “Lexical form '0000' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, 0000 is a permitted value for gYear, representing 1 BCE.

> On Sep 10, 2021, at 6:25 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> Command line riot?

I am using riot --validate as a final QA check on some RDF generated by other (non-Jena) code.

> It is just a warning the triple and it's object literal is still output from the parser. From the command line "--nocheck" turns off the checking.

The output is fine, but since I’m using riot specifically for validation I don’t want to turn off checking.

> Now logged as
> https://issues.apache.org/jira/browse/JENA-2158

Thanks, I will watch this issue.

> There is a constant to turn on XSD 1.1 schema mode for checking. It affects year 0000, including the value of negative years, and some duration detection.

Where is this constant? Does this mean I could write my own CLI tool to do validation with this flag set? (Or submit a PR for setting this constant via a riot command line option)?

I wonder if this flag should be on by default, since RDF 1.1 Concepts and Abstract Syntax says [1]:

> IRIs of the form http://www.w3.org/2001/XMLSchema#xxx, where xxx is the name of a datatype, denote the built-in datatypes defined in XML Schema 1.1 Part 2: Datatypes. 

Thanks,
Ryan

[1] https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes



Re: Using Xerces2 2.12.1 with Jena

Posted by Andy Seaborne <an...@apache.org>.
Hi Ryan,



On 09/09/2021 23:32, Shaw, Ryan wrote:
> 
>> On Sep 9, 2021, at 4:00 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>> What is your usage scenario?
> 
> riot gives me the warning “Lexical form '0000' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, 0000 is a permitted value for gYear, representing 1 BCE.

Command line riot?

> 
>> Jena supports the XSD datatypes relevant to RDF independently of XML.
> 
> I had thought that the reason for this warning was that Jena was relying on the default Java implementation of XSD datatypes (which is 1.0 not 1.1). But I guess I was wrong, in which case using the Xerces2 parser would not resolve this?

I'm afraid it won't.

This is warning happens for any input not just RDF/XML. The checking is 
happening after parsing.

In Java code: RDFParser.checking(false).

It is just a warning the triple and it's object literal is still output 
from the parser. From the command line "--nocheck" turns off the checking.

Now logged as
https://issues.apache.org/jira/browse/JENA-2158

Jena has it's own XSD parsing code which is copied from the internal 
implementation in Xerces 2.11.0, not going through public APIs. It was 
repackaged into the Jena code base so that any XML parser can be used, 
normally the JDK one.

There is a constant to turn on XSD 1.1 schema mode for checking. It 
affects year 0000, including the value of negative years, and some 
duration detection. It does not seem to affect other parts of the Xerces 
subsystem though.

Between the JDK and Xerces code there are slight differences (the JDK 
does not, or at least did not, handle "T24:00:00" which is a legal time 
by XSD.) So some checking to do.

     Andy

> 
> Thanks,
> Ryan
> 

Re: Using Xerces2 2.12.1 with Jena

Posted by "Shaw, Ryan" <ry...@unc.edu>.
> On Sep 9, 2021, at 4:00 PM, Andy Seaborne <an...@apache.org> wrote:
> 
> What is your usage scenario?

riot gives me the warning “Lexical form '0000' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, 0000 is a permitted value for gYear, representing 1 BCE.

> Jena supports the XSD datatypes relevant to RDF independently of XML.

I had thought that the reason for this warning was that Jena was relying on the default Java implementation of XSD datatypes (which is 1.0 not 1.1). But I guess I was wrong, in which case using the Xerces2 parser would not resolve this?

Thanks,
Ryan


Re: Using Xerces2 2.12.1 with Jena

Posted by Andy Seaborne <an...@apache.org>.
Hi Ryan,

On 09/09/2021 17:08, Shaw, Ryan wrote:
> I would like to use Xerces2 2.12.1 (with support for XML Schema 1.1) as the Jena XML parser, specifically to gain support for XSD 1.1 datatypes.

XML 1.1 != XSD 1.1

Jena supports the XSD datatypes relevant to RDF independently of XML.

What is your usage scenario?

> I see at [1] that “any XML parser can be used with Jena … through the usual mechanism for adding to the application.”
> 
> But I don’t know what that usual mechanism is.

For Java:

https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/parsers/DocumentBuilderFactory.html#newInstance()


> The Xerces docs say something about the Java Endorsed Standards Override Mechanism, but elsewhere I see that this has been deprecated.
> 
> What’s the recommended way to do this for Jena?
> 
> Thanks,
> Ryan
> 
> [1] https://issues.apache.org/jira/browse/JENA-341?focusedCommentId=16463591&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16463591
>