You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2018/05/01 11:05:31 UTC

Re: Un-depending on Apache Xerces

FYI:

Xerces 2.12.0 is out (as of April 21) though it has not made it to Maven 
central.

One thing of interest (to me) is whether it has a bugfixed version of 
Duration. JENA-1402

I still think we should un-depend on Xerces.

     Andy

On 28/04/18 20:38, Andy Seaborne wrote:
> JENA-1537
> 
> While the JDK does have a Xerces derived parser (it split off long 
> before 2.11.0 and separately evolved), it is behind Java9 module 
> "java.xml".
> 
> Jena uses Xerces 2.11.0 in two ways - for the datatypes (oaj.datatypes) 
> and XML parsing (oaj.rdfxml.xmlinput - also known as ARP).  Both make 
> internal use of Xerces.
> 
> The datatypes uses Xerces provide XSD datatypes including validation.
> 
> RDFXMLParser uses Xerces SAXParser and in a minor way some other stuff 
> that isn't in java.xml.sax.
> 
> I've had a prototype-hack go at removing Xerces from Jena:
> https://github.com/afs/jena-xerces
> 
> Datatypes:
> 
> * One feature omitted: XSDDatatype.loadUserDefined.
> 
> These functions parse XSD scheme datatype definitions. The 
> implementation calls into the internal XML parsing which would not be 
> legal in Java9 modules if using the JDK built-in parser. It seems to 
> need a fairly complete XML parser engine.
> 
> We should consider dropping this feature.
> 
> XML Parsing:
> 
> * Looses the check on whether InputStreamReader or FileReader have the 
> right encoding for the XML document. It hooks into an interface call 
> that does not seem to be available in a standard SAX parser. (Shouldn't 
> be using Readers anyway!)
> 
>      Andy

Re: Un-depending on Apache Xerces

Posted by Andy Seaborne <an...@apache.org>.
I hadn't realised how intrusive the use of Xerces was - it's all XML 
parsing in the app, not just RDFXML in Jena because Xerces wires itself 
into the app and also replaces the XML datatypes factory (it is a 
subclass of the standard one to add different Duration and 
XMLGregorianCalendar).

The Duration bug is not fixed in Xerces 2.12.0. The new release is 
mostly around XML Schema 1.1.

I've now updated the Jena top level NOTICE taking the relevant text from 
Xerces's NOTICE.  The jena-core NOTICE is updated as well; we 
could/should roll up all the sub NOTICE/LICENSE and just have the top 
level one for source and the ones for binaries (download, Fuseki's).

It is now ready for integration.

     Andy

On 02/05/18 07:01, Claude Warren wrote:
> I think undepending on Xerces is a good idea as well.  With lots of other
> faster parsers to choose from it seems like we should not be forcing apps
> to include Xerces as well.
> 
> Claude
> 
> On Tue, May 1, 2018 at 12:05 PM, Andy Seaborne <an...@apache.org> wrote:
> 
>> FYI:
>>
>> Xerces 2.12.0 is out (as of April 21) though it has not made it to Maven
>> central.
>>
>> One thing of interest (to me) is whether it has a bugfixed version of
>> Duration. JENA-1402
>>
>> I still think we should un-depend on Xerces.
>>
>>      Andy
>>
>>
>> On 28/04/18 20:38, Andy Seaborne wrote:
>>
>>> JENA-1537
>>>
>>> While the JDK does have a Xerces derived parser (it split off long before
>>> 2.11.0 and separately evolved), it is behind Java9 module "java.xml".
>>>
>>> Jena uses Xerces 2.11.0 in two ways - for the datatypes (oaj.datatypes)
>>> and XML parsing (oaj.rdfxml.xmlinput - also known as ARP).  Both make
>>> internal use of Xerces.
>>>
>>> The datatypes uses Xerces provide XSD datatypes including validation.
>>>
>>> RDFXMLParser uses Xerces SAXParser and in a minor way some other stuff
>>> that isn't in java.xml.sax.
>>>
>>> I've had a prototype-hack go at removing Xerces from Jena:
>>> https://github.com/afs/jena-xerces
>>>
>>> Datatypes:
>>>
>>> * One feature omitted: XSDDatatype.loadUserDefined.
>>>
>>> These functions parse XSD scheme datatype definitions. The implementation
>>> calls into the internal XML parsing which would not be legal in Java9
>>> modules if using the JDK built-in parser. It seems to need a fairly
>>> complete XML parser engine.
>>>
>>> We should consider dropping this feature.
>>>
>>> XML Parsing:
>>>
>>> * Looses the check on whether InputStreamReader or FileReader have the
>>> right encoding for the XML document. It hooks into an interface call that
>>> does not seem to be available in a standard SAX parser. (Shouldn't be using
>>> Readers anyway!)
>>>
>>>       Andy
>>>
>>
> 
> 

Re: Un-depending on Apache Xerces

Posted by Claude Warren <cl...@xenei.com>.
I think undepending on Xerces is a good idea as well.  With lots of other
faster parsers to choose from it seems like we should not be forcing apps
to include Xerces as well.

Claude

On Tue, May 1, 2018 at 12:05 PM, Andy Seaborne <an...@apache.org> wrote:

> FYI:
>
> Xerces 2.12.0 is out (as of April 21) though it has not made it to Maven
> central.
>
> One thing of interest (to me) is whether it has a bugfixed version of
> Duration. JENA-1402
>
> I still think we should un-depend on Xerces.
>
>     Andy
>
>
> On 28/04/18 20:38, Andy Seaborne wrote:
>
>> JENA-1537
>>
>> While the JDK does have a Xerces derived parser (it split off long before
>> 2.11.0 and separately evolved), it is behind Java9 module "java.xml".
>>
>> Jena uses Xerces 2.11.0 in two ways - for the datatypes (oaj.datatypes)
>> and XML parsing (oaj.rdfxml.xmlinput - also known as ARP).  Both make
>> internal use of Xerces.
>>
>> The datatypes uses Xerces provide XSD datatypes including validation.
>>
>> RDFXMLParser uses Xerces SAXParser and in a minor way some other stuff
>> that isn't in java.xml.sax.
>>
>> I've had a prototype-hack go at removing Xerces from Jena:
>> https://github.com/afs/jena-xerces
>>
>> Datatypes:
>>
>> * One feature omitted: XSDDatatype.loadUserDefined.
>>
>> These functions parse XSD scheme datatype definitions. The implementation
>> calls into the internal XML parsing which would not be legal in Java9
>> modules if using the JDK built-in parser. It seems to need a fairly
>> complete XML parser engine.
>>
>> We should consider dropping this feature.
>>
>> XML Parsing:
>>
>> * Looses the check on whether InputStreamReader or FileReader have the
>> right encoding for the XML document. It hooks into an interface call that
>> does not seem to be available in a standard SAX parser. (Shouldn't be using
>> Readers anyway!)
>>
>>      Andy
>>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren