You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by John Snelson <jo...@oracle.com> on 2008/04/11 17:57:26 UTC

Xerces-C API changes for XQilla

Hi everyone,

Boris Kolpackov and I were just having a discussion over on the XQilla 
mailing list. He suggested that now was the time to make a few changes 
in the Xerces-C API to benefit XQilla - so I've written a proposal for 
what I'd like to change.

I'll give as much help as I can in making these changes, but at the 
moment I'm going through the process of getting permission to sign the 
contributor agreement.

Please let me know your opinions on making these changes.

John

---------------------------------------------------------

1) Problem:

XPath 2.0 is just different to XPath 1.0. We've therefore got our own 
version of DOMXPathResult (XPath2Result) which makes more sense in this 
context:

http://xqilla.sourceforge.net/docs/dom3-api/classXPath2Result.html

Solution:

It's probably simple enough to either extend DOMXPathResult to include 
the extra functionality in XPath2Result, or to include it as a new class 
called DOMXPath2Result.

---------------------------------------------------------

2) Problem:

It's necessary to get access to DOMDocumentImpl, which isn't in the 
public API, in order to implement the DOM3 XPath API. Needing access to 
the Xerces-C source code to compile XQilla is a big problem for our 
maintainers. We need DOMDocumentImpl for a number of reasons:

a) In order to implement DOMXPathNamespace, we need to allocate using 
the DOMDocumentImpl's memory manager and string pool.
b) We need to access DOMDocumentImpl::changes() to invalidate the xpath 
result iterator if the DOMDocument is changed 
(DOMXPathResult::getInvalidIteratorState()).
c) XQuery allows a document node to contain multiple document elements. 
We derive from DOMDocumentImpl to allow that to happen.
d) In order to implement the DOMXPathEvaluator interface on the 
DOMDocument object we need to derive from DOMDocumentImpl.

Solution:

Put DOMDocumentImpl in the public API.

---------------------------------------------------------

3) Problem:

We need access to DOMWriterImpl in order to override it to know how to 
write namespace nodes. DOMWriterImpl isn't in the public API.

Solution:

Put DOMWriterImpl in the public API, or implement namespace node 
handling in it.

---------------------------------------------------------

4) Problem:

XQilla can construct typed DOMDocuments, so we need a way to set the 
type information on these nodes. Currently we use DOMTypeInfoImpl, 
DOMAttrImpl and DOMElementNSImpl to do this, which aren't in the public API.

Solution:

Implement a method of setting the type information on an element or 
attribute, or put DOMTypeInfoImpl, DOMAttrImpl and DOMElementNSImpl in 
the public API.

---------------------------------------------------------

5) Problem:

RegularExpression is not thread safe or consistent with it's use of 
MemoryManager. It's also not quite flexible enough to implement XSLT 
2.0's analyze-string, and it has bugs in the replace() methods.

http://www.w3.org/TR/xslt20/#analyze-string

Solution:

I have a patch that fixes all of this in Xerces-C 2.8, and I can update 
it to apply to 3.0. I'm in the process of getting permission to sign the 
contributor agreement.

---------------------------------------------------------

6) Problem:

The socket and WinSock HTTP InputStream implementations have fixed 
buffers which can result in buffer overflow. They needlessly duplicate a 
whole load of code that could be shared. In addition, a lot of 
algorithms need access to the HTTP "Content-Type" header, to decide how 
to parse a file, or what encoding it is in - for instance see XSLT 2.0's 
unparsed-text() function:

http://www.w3.org/TR/xslt20/#unparsed-text

Solution:

I have a patch that implements this functionality for 
UnixHTTPURLInputStream and BinHTTPURLInputStream (WinSock) in Xerces-C 
2.8. I added BinInputStream::getContentType() to get access to the 
"Content-Type" header. I can update this code for Xerces-C 3.0.

---------------------------------------------------------

7) Problem:

GrammarResolver has a bug where it fails to initialize it's XSModel if 
the XMLGrammarPool it is created with is locked.

Solution:

We hack this at the moment, but it would be great if this could be fixed.

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by Gareth Reakes <ga...@we7.com>.
I have no problem with these changes John.

Gareth

On 11 Apr 2008, at 16:57, John Snelson wrote:

> Hi everyone,
>
> Boris Kolpackov and I were just having a discussion over on the  
> XQilla mailing list. He suggested that now was the time to make a  
> few changes in the Xerces-C API to benefit XQilla - so I've written  
> a proposal for what I'd like to change.
>
> I'll give as much help as I can in making these changes, but at the  
> moment I'm going through the process of getting permission to sign  
> the contributor agreement.
>
> Please let me know your opinions on making these changes.
>
> John
>
> ---------------------------------------------------------
>
> 1) Problem:
>
> XPath 2.0 is just different to XPath 1.0. We've therefore got our  
> own version of DOMXPathResult (XPath2Result) which makes more sense  
> in this context:
>
> http://xqilla.sourceforge.net/docs/dom3-api/classXPath2Result.html
>
> Solution:
>
> It's probably simple enough to either extend DOMXPathResult to  
> include the extra functionality in XPath2Result, or to include it as  
> a new class called DOMXPath2Result.
>
> ---------------------------------------------------------
>
> 2) Problem:
>
> It's necessary to get access to DOMDocumentImpl, which isn't in the  
> public API, in order to implement the DOM3 XPath API. Needing access  
> to the Xerces-C source code to compile XQilla is a big problem for  
> our maintainers. We need DOMDocumentImpl for a number of reasons:
>
> a) In order to implement DOMXPathNamespace, we need to allocate  
> using the DOMDocumentImpl's memory manager and string pool.
> b) We need to access DOMDocumentImpl::changes() to invalidate the  
> xpath result iterator if the DOMDocument is changed  
> (DOMXPathResult::getInvalidIteratorState()).
> c) XQuery allows a document node to contain multiple document  
> elements. We derive from DOMDocumentImpl to allow that to happen.
> d) In order to implement the DOMXPathEvaluator interface on the  
> DOMDocument object we need to derive from DOMDocumentImpl.
>
> Solution:
>
> Put DOMDocumentImpl in the public API.
>
> ---------------------------------------------------------
>
> 3) Problem:
>
> We need access to DOMWriterImpl in order to override it to know how  
> to write namespace nodes. DOMWriterImpl isn't in the public API.
>
> Solution:
>
> Put DOMWriterImpl in the public API, or implement namespace node  
> handling in it.
>
> ---------------------------------------------------------
>
> 4) Problem:
>
> XQilla can construct typed DOMDocuments, so we need a way to set the  
> type information on these nodes. Currently we use DOMTypeInfoImpl,  
> DOMAttrImpl and DOMElementNSImpl to do this, which aren't in the  
> public API.
>
> Solution:
>
> Implement a method of setting the type information on an element or  
> attribute, or put DOMTypeInfoImpl, DOMAttrImpl and DOMElementNSImpl  
> in the public API.
>
> ---------------------------------------------------------
>
> 5) Problem:
>
> RegularExpression is not thread safe or consistent with it's use of  
> MemoryManager. It's also not quite flexible enough to implement XSLT  
> 2.0's analyze-string, and it has bugs in the replace() methods.
>
> http://www.w3.org/TR/xslt20/#analyze-string
>
> Solution:
>
> I have a patch that fixes all of this in Xerces-C 2.8, and I can  
> update it to apply to 3.0. I'm in the process of getting permission  
> to sign the contributor agreement.
>
> ---------------------------------------------------------
>
> 6) Problem:
>
> The socket and WinSock HTTP InputStream implementations have fixed  
> buffers which can result in buffer overflow. They needlessly  
> duplicate a whole load of code that could be shared. In addition, a  
> lot of algorithms need access to the HTTP "Content-Type" header, to  
> decide how to parse a file, or what encoding it is in - for instance  
> see XSLT 2.0's unparsed-text() function:
>
> http://www.w3.org/TR/xslt20/#unparsed-text
>
> Solution:
>
> I have a patch that implements this functionality for  
> UnixHTTPURLInputStream and BinHTTPURLInputStream (WinSock) in Xerces- 
> C 2.8. I added BinInputStream::getContentType() to get access to the  
> "Content-Type" header. I can update this code for Xerces-C 3.0.
>
> ---------------------------------------------------------
>
> 7) Problem:
>
> GrammarResolver has a bug where it fails to initialize it's XSModel  
> if the XMLGrammarPool it is created with is locked.
>
> Solution:
>
> We hack this at the moment, but it would be great if this could be  
> fixed.
>
> -- 
> John Snelson, Oracle Corporation            http://snelson.org.uk/john
> Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
> XQilla:                                  http://xqilla.sourceforge.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: c-dev-help@xerces.apache.org
>
>

--
Gareth Reakes, CTO                                            WE7
+44-20-7117-0809                    http://www.we7.com





---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi John,

John Snelson <jo...@oracle.com> writes:

> Have you got a list of the Xerces-C API changes in 3.0 anywhere?

No, unfortunately, there does not seem to be such a list. I can
try to come up with one for the DOM API if that's helpful.

Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
Boris Kolpackov wrote:
>> I think that some headers related to DOMDocumentImpl are also needed. An
>> exhaustive list of the ones that XQilla includes are these:
>>
>> [...]
>>
>> I imagine that you also need to include the headers that these files
>> include themselves.
> 
> All headers in dom/impl/ are now installed. I also checked and none
> of them include anything from internal/ so I thing there won't be any
> problems. Though it is always better to check that we are not missing
> anything on the real use-case. Are you planning to start working on
> the 3.0.0 port of XQilla before the release?

I was just thinking about that - I think that would be the best way to 
go. I took a look at what would be needed a while back and I don't think 
it's much. Have you got a list of the Xerces-C API changes in 3.0 anywhere?

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by Gareth Reakes <ga...@we7.com>.

John Snelson wrote:
> Boris Kolpackov wrote:
>> John Snelson <jo...@oracle.com> writes:
>>
>>> I'm not so sure about that. The existing DOMXPathResult is an
>>> implementation of a W3C published DOM interface, and I think there's
>>> value in keeping it that way. There's all sorts of improvements I'd like
>>> to make to W3C DOM otherwise ;-).
>>
>> Here are some of the reasons why I believe we should merge the two
>> interfaces:
>>
>> 1. The XPath 2 interface supports requirements of XPath 1.
>>
>> 2. It is my understanding (from talking to various people and reading
>>    the W3C mailing lists) that the XPath 1 DOM interface is commonly
>>    believed to be a half-backed work that is full of holes and omissions
>>    mainly because it was done in a hurry to get it into the DOM spec.
>>    To me personally the fact that the official interface of NSResolver
>>    does not allow the user to specify custom namespace-prefix mappings
>>    makes it clear that the authors of the spec had no real-world
>>    experience in this area.
>>
>> 3. There won't be any revisions to the DOM spec so there is no hope
>>    of the official support for XPath 2.
>>
>> So I think, as far as the XPath part of the DOM spec is concerned, we
>> should try to make the interface sensible rather than trying to conform
>> to the spec. I chatted to Alberto and he also thinks that we should
>> rather generalize the interface. Any thoughts?
> 
> That makes sense, and the fact that the W3C Web API group took control 
> of the spec makes me think that the W3C might change it anyway. I guess 
> I'm agnostic about what to do - does anyone else have any opinions?
> 

We normally err on the side of supporting interfaces for the standard 
reasons. In this case I don't really mind either way.

Gareth


-- 
Gareth Reakes, CTO                                 WE7
+44-20-7117-0809                    http://www.we7.com

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
Boris Kolpackov wrote:
> John Snelson <jo...@oracle.com> writes:
> 
>> I'm not so sure about that. The existing DOMXPathResult is an
>> implementation of a W3C published DOM interface, and I think there's
>> value in keeping it that way. There's all sorts of improvements I'd like
>> to make to W3C DOM otherwise ;-).
> 
> Here are some of the reasons why I believe we should merge the two
> interfaces:
> 
> 1. The XPath 2 interface supports requirements of XPath 1.
> 
> 2. It is my understanding (from talking to various people and reading
>    the W3C mailing lists) that the XPath 1 DOM interface is commonly
>    believed to be a half-backed work that is full of holes and omissions
>    mainly because it was done in a hurry to get it into the DOM spec.
>    To me personally the fact that the official interface of NSResolver
>    does not allow the user to specify custom namespace-prefix mappings
>    makes it clear that the authors of the spec had no real-world
>    experience in this area.
> 
> 3. There won't be any revisions to the DOM spec so there is no hope
>    of the official support for XPath 2.
> 
> So I think, as far as the XPath part of the DOM spec is concerned, we
> should try to make the interface sensible rather than trying to conform
> to the spec. I chatted to Alberto and he also thinks that we should
> rather generalize the interface. Any thoughts?

That makes sense, and the fact that the W3C Web API group took control 
of the spec makes me think that the W3C might change it anyway. I guess 
I'm agnostic about what to do - does anyone else have any opinions?

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Boris Kolpackov <bo...@codesynthesis.com> wrote on 05/06/2008 03:49:23 PM:

> 3. There won't be any revisions to the DOM spec so there is no hope
>    of the official support for XPath 2.

Maybe. The Web API working group [1] took ownership of the DOM Level 3
XPath spec, so there's still some hope it will be revised and perhaps at
some point become a W3C Recommendation.

[1] http://www.w3.org/2006/webapi/

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Re: Xerces-C API changes for XQilla

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi John,

John Snelson <jo...@oracle.com> writes:

> I'm not so sure about that. The existing DOMXPathResult is an
> implementation of a W3C published DOM interface, and I think there's
> value in keeping it that way. There's all sorts of improvements I'd like
> to make to W3C DOM otherwise ;-).

Here are some of the reasons why I believe we should merge the two
interfaces:

1. The XPath 2 interface supports requirements of XPath 1.

2. It is my understanding (from talking to various people and reading
   the W3C mailing lists) that the XPath 1 DOM interface is commonly
   believed to be a half-backed work that is full of holes and omissions
   mainly because it was done in a hurry to get it into the DOM spec.
   To me personally the fact that the official interface of NSResolver
   does not allow the user to specify custom namespace-prefix mappings
   makes it clear that the authors of the spec had no real-world
   experience in this area.

3. There won't be any revisions to the DOM spec so there is no hope
   of the official support for XPath 2.

So I think, as far as the XPath part of the DOM spec is concerned, we
should try to make the interface sensible rather than trying to conform
to the spec. I chatted to Alberto and he also thinks that we should
rather generalize the interface. Any thoughts?

Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
Boris Kolpackov wrote:
> John Snelson <jo...@oracle.com> writes:
> 
>> I've also changed my mind (again) about the DOMXPathResult object. The
>> problem is this - DOMXPathResult::iterateNext() returns DOMNode, because
>> in XPath 1 you can only get single values of a list of nodes. XQilla's
>> XPath2Result::iterateNext() returns a bool, and simply moved the
>> iterator to look at the next item in the sequence, which in XPath 2.0
>> can be a heterogeneous sequence of nodes and values. The same is true
>> for snapshotItem(). I've come to the conclusion that it would be best
>> for XQilla to stick with the XPath2Result API, and it would be good if
>> Xerces-C incorporated it into it's API.
> 
> Since the XPath 2 approach is more generic and can supports XPath
> 1 requirements, I think we should just change the DOMXPathResult
> interface to return bool from iterateNext() and snapshotItem().

I'm not so sure about that. The existing DOMXPathResult is an 
implementation of a W3C published DOM interface, and I think there's 
value in keeping it that way. There's all sorts of improvements I'd like 
to make to W3C DOM otherwise ;-).

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi John,

John Snelson <jo...@oracle.com> writes:

> I've also changed my mind (again) about the DOMXPathResult object. The
> problem is this - DOMXPathResult::iterateNext() returns DOMNode, because
> in XPath 1 you can only get single values of a list of nodes. XQilla's
> XPath2Result::iterateNext() returns a bool, and simply moved the
> iterator to look at the next item in the sequence, which in XPath 2.0
> can be a heterogeneous sequence of nodes and values. The same is true
> for snapshotItem(). I've come to the conclusion that it would be best
> for XQilla to stick with the XPath2Result API, and it would be good if
> Xerces-C incorporated it into it's API.

Since the XPath 2 approach is more generic and can supports XPath
1 requirements, I think we should just change the DOMXPathResult
interface to return bool from iterateNext() and snapshotItem().

Alberto, do you have any objections to this change?

Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
John Snelson wrote:
> Boris Kolpackov wrote:
>>> Having given it some thought, I think that merging the two would be the
>>> best idea - it would be a breaking change for XQilla users, but it would
>>> be a move to a more standard API.
>>
>> Agree. If you can come up with a patch, I will review and commit it.
> 
> I've attached a patch against the SVN trunk for the XPath API changes 
> we've discussed.

I think I sent that out a bit quick. This new patch implements the new 
API in the existing implementation objects, so that Xerces-C will 
compile with the patch applied.

I've also changed my mind (again) about the DOMXPathResult object. The 
problem is this - DOMXPathResult::iterateNext() returns DOMNode, because 
in XPath 1 you can only get single values of a list of nodes. XQilla's 
XPath2Result::iterateNext() returns a bool, and simply moved the 
iterator to look at the next item in the sequence, which in XPath 2.0 
can be a heterogeneous sequence of nodes and values. The same is true 
for snapshotItem(). I've come to the conclusion that it would be best 
for XQilla to stick with the XPath2Result API, and it would be good if 
Xerces-C incorporated it into it's API.

Any questions or opinions?

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
Boris Kolpackov wrote:
>> Having given it some thought, I think that merging the two would be the
>> best idea - it would be a breaking change for XQilla users, but it would
>> be a move to a more standard API.
> 
> Agree. If you can come up with a patch, I will review and commit it.

I've attached a patch against the SVN trunk for the XPath API changes 
we've discussed.

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

Re: Xerces-C API changes for XQilla

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi John,

John Snelson <jo...@oracle.com> writes:

> Having given it some thought, I think that merging the two would be the
> best idea - it would be a breaking change for XQilla users, but it would
> be a move to a more standard API.

Agree. If you can come up with a patch, I will review and commit it.


> I think that some headers related to DOMDocumentImpl are also needed. An
> exhaustive list of the ones that XQilla includes are these:
>
> [...]
>
> I imagine that you also need to include the headers that these files
> include themselves.

All headers in dom/impl/ are now installed. I also checked and none
of them include anything from internal/ so I thing there won't be any
problems. Though it is always better to check that we are not missing
anything on the real use-case. Are you planning to start working on
the 3.0.0 port of XQilla before the release?


> I've got permission now - hurrah! In the next couple of days when I find
> some time I'll port the patches over to Xerces-C 3.0.

That's great news!


> I looked into the MacOS net accessors, and it seemed to be impossible to
> support this API with them. However, recent email on the list suggests
> that these implementations have other problems (a problem with fork?).
>
> Since you won't get any content type information back for file URLs
> either, I'd suggest that a null return should be the standard response
> when the content type is unavailable.

I agree. If there is no easy way to support this with non-mainstream
net accessors then we should simply return NULL.

Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
I've made enquiries through Oracle about whether they have signed a CCLA 
- but it may be quicker to ask someone in Apache the question, if Apache 
has that information available.

As for an ICLA, sure no problem - I'll try and mail that off today.

The question still remains - do you guys just trust that I've done what 
I said I'll do, or does someone need to check up on me? :-)

John

Gareth Reakes wrote:
> Have you got Oracle to sign a CCLA? For good measure you could sign an 
> ICLA as well.
> 
> Gareth
> 
> John Snelson wrote:
>> John Snelson wrote:
>>> I've got permission now - hurrah! In the next couple of days when I 
>>> find some time I'll port the patches over to Xerces-C 3.0.
>>
>> So as I said, Oracle have OKed me to contribute code to Xerces-C. To 
>> pre-empt any delays, what steps should I take to make sure that you 
>> guys are happy? Is there anything more I have to sign? Who can I 
>> contact about this?
>>
>> John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by Gareth Reakes <ga...@embracemobile.com>.

Neil Graham wrote:
> Hi all.  I'm not totally up-to-date these days with Apache's contribution 
> requirements, but I had been under the impression that the ICLA was 
> required in all cases.  The CCLA is a good (probably required) too.
> 
> Cheers,
> Neil
> Neil Graham
> Manager, XML Transformation and Query Development
> IBM Toronto Lab
> Phone:  905-413-3519, T/L 413-3519
> E-mail:  neilg@ca.ibm.com
> 
> 
> 
> 
> 
> Gareth Reakes <ga...@we7.com> 
> 05/02/08 08:39 AM
> Please respond to
> c-dev@xerces.apache.org
> 
> 
> To
> c-dev@xerces.apache.org
> cc
> 
> Subject
> Re: Xerces-C API changes for XQilla
> 
> 
> 
> 
> 
> 
> 
> 
> Have you got Oracle to sign a CCLA? For good measure you could sign an 
> ICLA as well.
> 
> Gareth
> 
> John Snelson wrote:
>> John Snelson wrote:
>>> I've got permission now - hurrah! In the next couple of days when I 
>>> find some time I'll port the patches over to Xerces-C 3.0.
>> So as I said, Oracle have OKed me to contribute code to Xerces-C. To 
>> pre-empt any delays, what steps should I take to make sure that you guys 
> 
>> are happy? Is there anything more I have to sign? Who can I contact 
>> about this?
>>
>> John
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by Neil Graham <ne...@ca.ibm.com>.
Hi all.  I'm not totally up-to-date these days with Apache's contribution 
requirements, but I had been under the impression that the ICLA was 
required in all cases.  The CCLA is a good (probably required) too.

Cheers,
Neil
Neil Graham
Manager, XML Transformation and Query Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 413-3519
E-mail:  neilg@ca.ibm.com





Gareth Reakes <ga...@we7.com> 
05/02/08 08:39 AM
Please respond to
c-dev@xerces.apache.org


To
c-dev@xerces.apache.org
cc

Subject
Re: Xerces-C API changes for XQilla








Have you got Oracle to sign a CCLA? For good measure you could sign an 
ICLA as well.

Gareth

John Snelson wrote:
> John Snelson wrote:
>> I've got permission now - hurrah! In the next couple of days when I 
>> find some time I'll port the patches over to Xerces-C 3.0.
> 
> So as I said, Oracle have OKed me to contribute code to Xerces-C. To 
> pre-empt any delays, what steps should I take to make sure that you guys 

> are happy? Is there anything more I have to sign? Who can I contact 
> about this?
> 
> John
> 

-- 
Gareth Reakes, CTO                                 WE7
+44-20-7117-0809                    http://www.we7.com

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by Gareth Reakes <ga...@we7.com>.

Have you got Oracle to sign a CCLA? For good measure you could sign an 
ICLA as well.

Gareth

John Snelson wrote:
> John Snelson wrote:
>> I've got permission now - hurrah! In the next couple of days when I 
>> find some time I'll port the patches over to Xerces-C 3.0.
> 
> So as I said, Oracle have OKed me to contribute code to Xerces-C. To 
> pre-empt any delays, what steps should I take to make sure that you guys 
> are happy? Is there anything more I have to sign? Who can I contact 
> about this?
> 
> John
> 

-- 
Gareth Reakes, CTO                                 WE7
+44-20-7117-0809                    http://www.we7.com

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
John Snelson wrote:
> I've got permission now - hurrah! In the next couple of days when I find 
> some time I'll port the patches over to Xerces-C 3.0.

So as I said, Oracle have OKed me to contribute code to Xerces-C. To 
pre-empt any delays, what steps should I take to make sure that you guys 
are happy? Is there anything more I have to sign? Who can I contact 
about this?

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
Hi Boris,

Boris Kolpackov wrote:
> John Snelson <jo...@oracle.com> writes:
> 
>> 1) Problem:
>>
>> XPath 2.0 is just different to XPath 1.0. We've therefore got our own
>> version of DOMXPathResult (XPath2Result) which makes more sense in this
>> context:
>>
>> http://xqilla.sourceforge.net/docs/dom3-api/classXPath2Result.html
>>
>> Solution:
>>
>> It's probably simple enough to either extend DOMXPathResult to include
>> the extra functionality in XPath2Result, or to include it as a new class
>> called DOMXPath2Result.
> 
> I did a quick check and it appears that the DOMXPathResult is very
> similar to DOMXPath2Result. I would therefore suggest that we try
> to add the missing functionality to DOMXPathResult as non-standard
> extensions (though we should try to use names that will likely be
> used in the next version of DOM3 when it is updated to include
> support for XPath 2, for example getIntegerValue instead of asInt).
> What is your feeling on this approach? Also did you base your
> DOMXPath2Result on any draft spec (e.g., where do the asDouble,
> asInt, etc., names come from)?

XPath2Result wasn't based on any draft spec IIRC. Gareth probably 
remembers more about it's design, since I wasn't heavily involved in 
Pathan at the time.

Having given it some thought, I think that merging the two would be the 
best idea - it would be a breaking change for XQilla users, but it would 
be a move to a more standard API.

>> 2) Problem:
>>
>> It's necessary to get access to DOMDocumentImpl, which isn't in the
>> public API, in order to implement the DOM3 XPath API. Needing access to
>> the Xerces-C source code to compile XQilla is a big problem for our
>> maintainers. We need DOMDocumentImpl for a number of reasons:
>>
>> [...]
>>
>> Solution:
>>
>> Put DOMDocumentImpl in the public API.
> 
> The DOMDocumentImpl.hpp is now installed with the rest of the headers.
> I've also changed all private data members and functions to be protected
> in all DOM*Impl classes. Is there anything else we need to do?

I think that some headers related to DOMDocumentImpl are also needed. An 
exhaustive list of the ones that XQilla includes are these:

xercesc/dom/impl/DOMAttrImpl.hpp
xercesc/dom/impl/DOMCasts.hpp
xercesc/dom/impl/DOMDocumentImpl.hpp
xercesc/dom/impl/DOMDocumentTypeImpl.hpp
xercesc/dom/impl/DOMElementNSImpl.hpp
xercesc/dom/impl/DOMNodeImpl.hpp
xercesc/dom/impl/DOMRangeImpl.hpp
xercesc/dom/impl/DOMTypeInfoImpl.hpp
xercesc/dom/impl/DOMWriterImpl.hpp

I imagine that you also need to include the headers that these files 
include themselves.

>> 5) Problem:
>>
>> RegularExpression is not thread safe or consistent with it's use of
>> MemoryManager. It's also not quite flexible enough to implement XSLT
>> 2.0's analyze-string, and it has bugs in the replace() methods.
>>
>> http://www.w3.org/TR/xslt20/#analyze-string
>>
>> Solution:
>>
>> I have a patch that fixes all of this in Xerces-C 2.8, and I can update
>> it to apply to 3.0. I'm in the process of getting permission to sign the
>> contributor agreement.
> 
> Sounds good.

I've got permission now - hurrah! In the next couple of days when I find 
some time I'll port the patches over to Xerces-C 3.0.

>> 6) Problem:
>>
>> The socket and WinSock HTTP InputStream implementations have fixed
>> buffers which can result in buffer overflow. They needlessly duplicate a
>> whole load of code that could be shared. In addition, a lot of
>> algorithms need access to the HTTP "Content-Type" header, to decide how
>> to parse a file, or what encoding it is in - for instance see XSLT 2.0's
>> unparsed-text() function:
>>
>> http://www.w3.org/TR/xslt20/#unparsed-text
>>
>> Solution:
>>
>> I have a patch that implements this functionality for
>> UnixHTTPURLInputStream and BinHTTPURLInputStream (WinSock) in Xerces-C
>> 2.8. I added BinInputStream::getContentType() to get access to the
>> "Content-Type" header. I can update this code for Xerces-C 3.0.
> 
> Sounds good. There are also Curl, MacOS, and libWWW net accessors.
> Hopefully it will be easy to implement getContentType() for them.

For Curl the option to use seems to be CURLOPT_HEADERFUNCTION, although 
I haven't investigated more than that.

http://curl.rtin.bz/libcurl/c/curl_easy_setopt.html

For libWWW I haven't looked further by I imagine it should be possible. 
I looked into the MacOS net accessors, and it seemed to be impossible to 
support this API with them. However, recent email on the list suggests 
that these implementations have other problems (a problem with fork?).

Since you won't get any content type information back for file URLs 
either, I'd suggest that a null return should be the standard response 
when the content type is unavailable.

>> 7) Problem:
>>
>> GrammarResolver has a bug where it fails to initialize it's XSModel if
>> the XMLGrammarPool it is created with is locked.
>>
>> Solution:
>>
>> We hack this at the moment, but it would be great if this could be fixed.
> 
> Would you be willing to work on a patch? Also I hit a bug in this area
> once that may be related. This code works:
> 
>     auto_ptr<GrammarResolver> gr (new GrammarResolver (0));
> 
>     // load some schemas into gr
> 
>     XMLGrammarPool* gp = gr->getGrammarPool ();
>     gp->lockPool ();
>     XSModel* xsm = gp->getXSModel ();
> 
> While if I remove lockPool(), the returned XSModel is invalid. Or may
> be this is how it is supposed to work.

I don't know if your problem is related. I'll try to work out a fix for 
the problem I'm seeing, at least.

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
Boris Kolpackov wrote:
> John Snelson <jo...@oracle.com> writes:
>> 5) Problem:
>>
>> RegularExpression is not thread safe or consistent with it's use of
>> MemoryManager. It's also not quite flexible enough to implement XSLT
>> 2.0's analyze-string, and it has bugs in the replace() methods.
>>
>> http://www.w3.org/TR/xslt20/#analyze-string
>>
>> Solution:
>>
>> I have a patch that fixes all of this in Xerces-C 2.8, and I can update
>> it to apply to 3.0. I'm in the process of getting permission to sign the
>> contributor agreement.
> 
> Sounds good.

I've updated and attached my RegularExpression patch for Xerces-C 3.0.

Is it best to post these patches to the mailing list, or would it be 
better to open jira items for them.

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

Re: Xerces-C API changes for XQilla

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi John,

John Snelson <jo...@oracle.com> writes:

> I've updated and attached my RegularExpression patch for Xerces-C 3.0.
>
> Is it best to post these patches to the mailing list, or would it be
> better to open jira items for them.

For any significant amount of code it is better to create a bug and
attach the patch to it. While attaching the patch, you are presented
with the option to assign the rights to the patch to ASF which you
need to select.

When creating the bug, add 3.0.0 to the Fix Version list. This will
add it to the list of issues that needs to be resolved before 3.0.0
can be released.

Thanks,
Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
Boris Kolpackov wrote:
> John Snelson <jo...@oracle.com> writes:
>> 5) Problem:
>>
>> RegularExpression is not thread safe or consistent with it's use of
>> MemoryManager. It's also not quite flexible enough to implement XSLT
>> 2.0's analyze-string, and it has bugs in the replace() methods.
>>
>> http://www.w3.org/TR/xslt20/#analyze-string
>>
>> Solution:
>>
>> I have a patch that fixes all of this in Xerces-C 2.8, and I can update
>> it to apply to 3.0. I'm in the process of getting permission to sign the
>> contributor agreement.
> 
> Sounds good.

I've updated and attached my RegularExpression patch for Xerces-C 3.0.

Is it best to post these patches to the mailing list, or would it be 
better to open jira items for them.

John

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

Re: Xerces-C API changes for XQilla

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi John,

John Snelson <jo...@oracle.com> writes:

> 1) Problem:
>
> XPath 2.0 is just different to XPath 1.0. We've therefore got our own
> version of DOMXPathResult (XPath2Result) which makes more sense in this
> context:
>
> http://xqilla.sourceforge.net/docs/dom3-api/classXPath2Result.html
>
> Solution:
>
> It's probably simple enough to either extend DOMXPathResult to include
> the extra functionality in XPath2Result, or to include it as a new class
> called DOMXPath2Result.

I did a quick check and it appears that the DOMXPathResult is very
similar to DOMXPath2Result. I would therefore suggest that we try
to add the missing functionality to DOMXPathResult as non-standard
extensions (though we should try to use names that will likely be
used in the next version of DOM3 when it is updated to include
support for XPath 2, for example getIntegerValue instead of asInt).
What is your feeling on this approach? Also did you base your
DOMXPath2Result on any draft spec (e.g., where do the asDouble,
asInt, etc., names come from)?


> 2) Problem:
>
> It's necessary to get access to DOMDocumentImpl, which isn't in the
> public API, in order to implement the DOM3 XPath API. Needing access to
> the Xerces-C source code to compile XQilla is a big problem for our
> maintainers. We need DOMDocumentImpl for a number of reasons:
>
> [...]
>
> Solution:
>
> Put DOMDocumentImpl in the public API.

The DOMDocumentImpl.hpp is now installed with the rest of the headers.
I've also changed all private data members and functions to be protected
in all DOM*Impl classes. Is there anything else we need to do?


> 3) Problem:
>
> We need access to DOMWriterImpl in order to override it to know how to
> write namespace nodes. DOMWriterImpl isn't in the public API.
>
> Solution:
>
> Put DOMWriterImpl in the public API, or implement namespace node
> handling in it.

The same situation as with DOMDocumentImpl.hpp. I tend to prefer to
leave the namespace node implementation in XQilla for now since it
is XPath-specific and is not used by the limited built-in XPath
support in Xerces-C++.


> 4) Problem:
>
> XQilla can construct typed DOMDocuments, so we need a way to set the
> type information on these nodes. Currently we use DOMTypeInfoImpl,
> DOMAttrImpl and DOMElementNSImpl to do this, which aren't in the public API.
>
> Solution:
>
> Implement a method of setting the type information on an element or
> attribute, or put DOMTypeInfoImpl, DOMAttrImpl and DOMElementNSImpl in
> the public API.

I don't think an end user would ever need this functionality since the
type information is tied to the grammar being used and can only be set
by a parser that can control both. The *Impl headers are available for
XQilla to use.


> 5) Problem:
>
> RegularExpression is not thread safe or consistent with it's use of
> MemoryManager. It's also not quite flexible enough to implement XSLT
> 2.0's analyze-string, and it has bugs in the replace() methods.
>
> http://www.w3.org/TR/xslt20/#analyze-string
>
> Solution:
>
> I have a patch that fixes all of this in Xerces-C 2.8, and I can update
> it to apply to 3.0. I'm in the process of getting permission to sign the
> contributor agreement.

Sounds good.


> 6) Problem:
>
> The socket and WinSock HTTP InputStream implementations have fixed
> buffers which can result in buffer overflow. They needlessly duplicate a
> whole load of code that could be shared. In addition, a lot of
> algorithms need access to the HTTP "Content-Type" header, to decide how
> to parse a file, or what encoding it is in - for instance see XSLT 2.0's
> unparsed-text() function:
>
> http://www.w3.org/TR/xslt20/#unparsed-text
>
> Solution:
>
> I have a patch that implements this functionality for
> UnixHTTPURLInputStream and BinHTTPURLInputStream (WinSock) in Xerces-C
> 2.8. I added BinInputStream::getContentType() to get access to the
> "Content-Type" header. I can update this code for Xerces-C 3.0.

Sounds good. There are also Curl, MacOS, and libWWW net accessors.
Hopefully it will be easy to implement getContentType() for them.


> 7) Problem:
>
> GrammarResolver has a bug where it fails to initialize it's XSModel if
> the XMLGrammarPool it is created with is locked.
>
> Solution:
>
> We hack this at the moment, but it would be great if this could be fixed.

Would you be willing to work on a patch? Also I hit a bug in this area
once that may be related. This code works:

    auto_ptr<GrammarResolver> gr (new GrammarResolver (0));

    // load some schemas into gr

    XMLGrammarPool* gp = gr->getGrammarPool ();
    gp->lockPool ();
    XSModel* xsm = gp->getXSModel ();

While if I remove lockPool(), the returned XSModel is invalid. Or may
be this is how it is supposed to work.

Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by John Snelson <jo...@oracle.com>.
Hi Boris,

That sounds perfectly reasonable. I think that XQilla already does all 
of those things, but it would be good to get it into Xerces-C too.

John

Boris Kolpackov wrote:
> Hi John,
> 
> Another area where we can improvise the interface alignment
> is DOMXPathNSResolver. I just had a quick chat with Alberto
> and here are the changes that we both agree would make sense
> (I hope the motivation is self-evident, if not -- let me know):
> 
> 1. Add the ability to provide custom namespace-prefix mappings
>    in the DOMXPathNSResolver interface. Something along these
>    lines:
> 
>    virtual void
>    setNamespacePrefix (const XMLCh* prefix, const XMLCh* uri);
> 
>    The new mapping will override any existing one. If NULL or
>    empty string is passed as uri then the existing mapping, if
>    any, is removed.
> 
> 2. Change the createNSResolver() function in DOMXPathEvaluator
>    to return read-write DOMXPathNSResolver (const right now).
> 
> 3. Allow NULL to be passed to createNSResolver() in which case
>    empty DOMXPathNSResolver will be created (can then be populated
>    with setNamespacePrefix).
> 
> 
> Let me know what you think.
> 
> Boris
> 


-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:            http://oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Xerces-C API changes for XQilla

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi John,

Another area where we can improvise the interface alignment
is DOMXPathNSResolver. I just had a quick chat with Alberto
and here are the changes that we both agree would make sense
(I hope the motivation is self-evident, if not -- let me know):

1. Add the ability to provide custom namespace-prefix mappings
   in the DOMXPathNSResolver interface. Something along these
   lines:

   virtual void
   setNamespacePrefix (const XMLCh* prefix, const XMLCh* uri);

   The new mapping will override any existing one. If NULL or
   empty string is passed as uri then the existing mapping, if
   any, is removed.

2. Change the createNSResolver() function in DOMXPathEvaluator
   to return read-write DOMXPathNSResolver (const right now).

3. Allow NULL to be passed to createNSResolver() in which case
   empty DOMXPathNSResolver will be created (can then be populated
   with setNamespacePrefix).


Let me know what you think.

Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org