You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Shaw, Ryan" <ry...@unc.edu> on 2021/09/09 16:08:41 UTC
Using Xerces2 2.12.1 with Jena
I would like to use Xerces2 2.12.1 (with support for XML Schema 1.1) as the Jena XML parser, specifically to gain support for XSD 1.1 datatypes.
I see at [1] that “any XML parser can be used with Jena … through the usual mechanism for adding to the application.”
But I don’t know what that usual mechanism is. The Xerces docs say something about the Java Endorsed Standards Override Mechanism, but elsewhere I see that this has been deprecated.
What’s the recommended way to do this for Jena?
Thanks,
Ryan
[1] https://issues.apache.org/jira/browse/JENA-341?focusedCommentId=16463591&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16463591
Re: Validation of XSD 1.1 datatypes (was Re: Using Xerces2 2.12.1
with Jena)
Posted by Andy Seaborne <an...@apache.org>.
On 10/09/2021 19:23, Shaw, Ryan wrote:
>
>>> On 09/09/2021 23:32, Shaw, Ryan wrote:
>>>
>>> riot gives me the warning “Lexical form '0000' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, 0000 is a permitted value for gYear, representing 1 BCE.
>
>> On Sep 10, 2021, at 6:25 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> Command line riot?
>
> I am using riot --validate as a final QA check on some RDF generated by other (non-Jena) code.
>
>> It is just a warning the triple and it's object literal is still output from the parser. From the command line "--nocheck" turns off the checking.
>
> The output is fine, but since I’m using riot specifically for validation I don’t want to turn off checking.
>
Workaround:
"grep -v" of stderr will remove it
>> Now logged as
>> https://issues.apache.org/jira/browse/JENA-2158
>
> Thanks, I will watch this issue.
>
>> There is a constant to turn on XSD 1.1 schema mode for checking. It affects year 0000, including the value of negative years, and some duration detection.
>
> Where is this constant? Does this mean I could write my own CLI tool to do validation with this flag set? (Or submit a PR for setting this constant via a riot command line option)?
The constant is in org.apache.jena.ext.xerces.impl.Constants
(also need to change
org.apache.jena.ext.xerces.jaxp.datatype/XMLGregorianCalendarImpl.java)
There's a PR#1069 in-progress.
It does not mean all arithmetic involving 0000 and indeed BCE dates will
work. Xerces does not support XSD 1.1 "0000" year in its arithmetic
support nor does the JDK in my testing.
(And to everyone that points to java.time.* : useful for parsing to
TemporalAccessors but it has a different concept of duration)
> I wonder if this flag should be on by default, since RDF 1.1 Concepts and Abstract Syntax says [1]:
>
>> IRIs of the form http://www.w3.org/2001/XMLSchema#xxx, where xxx is the name of a datatype, denote the built-in datatypes defined in XML Schema 1.1 Part 2: Datatypes.
That text is saying that XSD IRIs can't be redefined. The section above
says "Any datatype definition that conforms to this abstraction MAY be
used in RDF" -- so not a requirement.
Andy
>
> Thanks,
> Ryan
>
> [1] https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes
>
>
Validation of XSD 1.1 datatypes (was Re: Using Xerces2 2.12.1 with
Jena)
Posted by "Shaw, Ryan" <ry...@unc.edu>.
>> On 09/09/2021 23:32, Shaw, Ryan wrote:
>>
>> riot gives me the warning “Lexical form '0000' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, 0000 is a permitted value for gYear, representing 1 BCE.
> On Sep 10, 2021, at 6:25 AM, Andy Seaborne <an...@apache.org> wrote:
>
> Command line riot?
I am using riot --validate as a final QA check on some RDF generated by other (non-Jena) code.
> It is just a warning the triple and it's object literal is still output from the parser. From the command line "--nocheck" turns off the checking.
The output is fine, but since I’m using riot specifically for validation I don’t want to turn off checking.
> Now logged as
> https://issues.apache.org/jira/browse/JENA-2158
Thanks, I will watch this issue.
> There is a constant to turn on XSD 1.1 schema mode for checking. It affects year 0000, including the value of negative years, and some duration detection.
Where is this constant? Does this mean I could write my own CLI tool to do validation with this flag set? (Or submit a PR for setting this constant via a riot command line option)?
I wonder if this flag should be on by default, since RDF 1.1 Concepts and Abstract Syntax says [1]:
> IRIs of the form http://www.w3.org/2001/XMLSchema#xxx, where xxx is the name of a datatype, denote the built-in datatypes defined in XML Schema 1.1 Part 2: Datatypes.
Thanks,
Ryan
[1] https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes
Re: Using Xerces2 2.12.1 with Jena
Posted by Andy Seaborne <an...@apache.org>.
Hi Ryan,
On 09/09/2021 23:32, Shaw, Ryan wrote:
>
>> On Sep 9, 2021, at 4:00 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>> What is your usage scenario?
>
> riot gives me the warning “Lexical form '0000' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, 0000 is a permitted value for gYear, representing 1 BCE.
Command line riot?
>
>> Jena supports the XSD datatypes relevant to RDF independently of XML.
>
> I had thought that the reason for this warning was that Jena was relying on the default Java implementation of XSD datatypes (which is 1.0 not 1.1). But I guess I was wrong, in which case using the Xerces2 parser would not resolve this?
I'm afraid it won't.
This is warning happens for any input not just RDF/XML. The checking is
happening after parsing.
In Java code: RDFParser.checking(false).
It is just a warning the triple and it's object literal is still output
from the parser. From the command line "--nocheck" turns off the checking.
Now logged as
https://issues.apache.org/jira/browse/JENA-2158
Jena has it's own XSD parsing code which is copied from the internal
implementation in Xerces 2.11.0, not going through public APIs. It was
repackaged into the Jena code base so that any XML parser can be used,
normally the JDK one.
There is a constant to turn on XSD 1.1 schema mode for checking. It
affects year 0000, including the value of negative years, and some
duration detection. It does not seem to affect other parts of the Xerces
subsystem though.
Between the JDK and Xerces code there are slight differences (the JDK
does not, or at least did not, handle "T24:00:00" which is a legal time
by XSD.) So some checking to do.
Andy
>
> Thanks,
> Ryan
>
Re: Using Xerces2 2.12.1 with Jena
Posted by "Shaw, Ryan" <ry...@unc.edu>.
> On Sep 9, 2021, at 4:00 PM, Andy Seaborne <an...@apache.org> wrote:
>
> What is your usage scenario?
riot gives me the warning “Lexical form '0000' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, 0000 is a permitted value for gYear, representing 1 BCE.
> Jena supports the XSD datatypes relevant to RDF independently of XML.
I had thought that the reason for this warning was that Jena was relying on the default Java implementation of XSD datatypes (which is 1.0 not 1.1). But I guess I was wrong, in which case using the Xerces2 parser would not resolve this?
Thanks,
Ryan
Re: Using Xerces2 2.12.1 with Jena
Posted by Andy Seaborne <an...@apache.org>.
Hi Ryan,
On 09/09/2021 17:08, Shaw, Ryan wrote:
> I would like to use Xerces2 2.12.1 (with support for XML Schema 1.1) as the Jena XML parser, specifically to gain support for XSD 1.1 datatypes.
XML 1.1 != XSD 1.1
Jena supports the XSD datatypes relevant to RDF independently of XML.
What is your usage scenario?
> I see at [1] that “any XML parser can be used with Jena … through the usual mechanism for adding to the application.”
>
> But I don’t know what that usual mechanism is.
For Java:
https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/parsers/DocumentBuilderFactory.html#newInstance()
> The Xerces docs say something about the Java Endorsed Standards Override Mechanism, but elsewhere I see that this has been deprecated.
>
> What’s the recommended way to do this for Jena?
>
> Thanks,
> Ryan
>
> [1] https://issues.apache.org/jira/browse/JENA-341?focusedCommentId=16463591&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16463591
>