You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Neil Graham <ne...@ca.ibm.com> on 2003/03/20 17:04:01 UTC

[PROPOSAL: XNI CHANGE]: determining the encoding of an external subset via XNI

Hi all,

For some reason, this issue still doesn't seem to be generating much
feedback; with the exception of Neeraj, everyone's been silent.  So I think
it's time to move towards a concrete proposal.

After thinking about the 4 options I originally posted, I've concluded that
option 1 seems best.  It's completely consistent with what we've done
everywhere else in XNI viz. encodings, and makes the startExternal subset
call symmetric with the startDocument call.  So to review, I'm proposing
that we change

      public void startExternalSubset(XMLResourceIdentifier identifier,
Augmentations augs)

on the XMLDTDHandler interface so that it looks like

      public void startExternalSubset(XMLResourceIdentifier identifier,
String encoding, Augmentations augs)

My reasons for disliking the other options:

> 2.  We could add a new callback to the XMLDTDHandler interface, something
like:
>      public void externalSubsetEncoding(String encoding)

This  is slightly more backward compatible perhaps, but would stick out
like a sore thumb when the broad API is considered.

> 3.  We could use the Augmentations parameter of the startExternalSubset
callback.

This is possible, but would also be inconsistent with the way encodings are
handled everywhere else; if a user actually wanted to acquire this
information directly from XNI, I'd submit that forcing use of Augmentations
here would be counterintuitive.  If people are absolutely opposed to
modifying XNI signatures at this stage, then I could live with this
however.

> 4.  We could amend the XMLLocator interface by adding a method like
>      public String getEncoding()

I think this is what Neeraj proposed.  This would mean that, in most cases,
encoding information could be got in two places--from the callbacks and
from the XMLLocator.  We'd also have to make sure to update the locator
implementation at every entity change, which could impact performance
slightly.  Finally, I'd submit that it's trivial to implement the SAX
locator2 functionality without this change.  If we're going to do something
really ugly in XNI like duplicate means for accessing a particular piece of
information, I really hope we'd require some solid use-case that simply
couldn't be met with the existing framework.

It's true that we've declared XNI to be "golden", but we always said it
could still change if we found a sufficiently significant problem.  I think
this problem is sufficiently significant.

Anyway, there are my thoughts; I really hope this engenders some discussion
since if we're to make a change like this at all, we'd better make it
sooner than later.

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com


----- Forwarded by Neil Graham/Toronto/IBM on 03/20/2003 10:47 AM -----
|---------+---------------------------->
|         |           Neil             |
|         |           Graham/Toronto/IB|
|         |           M@IBMCA          |
|         |                            |
|         |           03/10/2003 06:05 |
|         |           PM               |
|         |           Please respond to|
|         |           xerces-j-user    |
|         |                            |
|---------+---------------------------->
  >---------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                                             |
  |       To:       xerces-j-dev@xml.apache.org                                                                                                 |
  |       cc:       xerces-j-user@xml.apache.org                                                                                                |
  |       Subject:  determining the encoding of an external subset via XNI                                                                      |
  |                                                                                                                                             |
  |                                                                                                                                             |
  >---------------------------------------------------------------------------------------------------------------------------------------------|



Hi all,

In an attempt to generate some more discussion surrounding the issue I
raised in the message below, here are some ways by which we might move
forward.  For those who didn't see the previous thread, the Cole's Notes
version of the problem is that, as XNI is currently designed, there doesn't
seem to be any way of determining what the parser autodetected the encoding
of the DTD external subset to be--or any way of determining anything about
that encoding at all if the external subset doesn't happen to contain a
text decl.

Here are all the options that I've thought of:

1.  We could modify the XMLDTDHandler#externalSubset callback so that,
instead of looking like

      public void startExternalSubset(XMLResourceIdentifier identifier,
Augmentations augs)

it looks like

      public void startExternalSubset(XMLResourceIdentifier identifier,
String encoding, Augmentations augs)

This would make that callback much more symmetric to the startDocument
callback of the XMLDocumentHandler interface; unfortunately it has the
tremendous drawback of not being terribly backwards compatible.

2.  We could add a new callback to the XMLDTDHandler interface, something
like:

      public void externalSubsetEncoding(String encoding)

which we would advertise as occurring after the startExternalSubset
callback and before the textDecl call. While this would be far more
backward compatible, there's no precedent for anything like it in XNI;
also, the callback would only be useful for external subsets, since in all
other contexts we already have methods for conveying encoding information.

3.  We could use the Augmentations parameter of the startExternalSubset
callback.  This would preserve backward compatibility, but certainly
couldn't be accused of being beautiful; also , it would mark the first time
we've used Augmentations in Xerces for something at the level of a scanner.
So far, we've only employed that functionality in the context of schema
validation.

4.  We could amend the XMLLocator interface by adding a method like

      public String getEncoding()

on the lines of the SAX Locator2 interface.  This again would only be
really useful in this single context, since XNI goes out of its way
everywhere else to explicitly make provision for the passage of encoding
information; i.e., it doesn't seem to accord well with the overall design
of the API.

I'll readily admit that none of these solutions is particularly attractive.
Thoughts, preferences, or more appealing solutions are thus even more than
usually welcome!

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com


----- Forwarded by Neil Graham/Toronto/IBM on 03/10/2003 06:03 PM -----
|---------+---------------------------->
|         |           Neil Graham      |
|         |                            |
|         |           03/04/2003 11:13 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >
---------------------------------------------------------------------------------------------------------------------------------------------|

  |
|
  |       To:       xerces-j-dev@xml.apache.org
|
  |       cc:
|
  |       From:     Neil Graham/Toronto/IBM@IBMCA
|
  |       Subject:  another encoding issue
|
  |
|
  |
|
  >
---------------------------------------------------------------------------------------------------------------------------------------------|




Hi all,

How does one determine the autodetected encoding of a DTD external subset?

Right now, our DTD scanner takes this information from the entity manager
in a (non-XNI) startEntity(name, resourceIdentifier, encoding) call but
drops the encoding information on the floor for entities whose names are
[dtd].

It sure would have been handy if the
XMLDTDHandler#startExternalSubset(XMLResourceIdentifier, Augmentations) had
also included an encoding parameter...

Thoughts?

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [PROPOSAL: XNI CHANGE]: determining the encoding of an external subset via XNI

Posted by Joseph Kesselman <ke...@us.ibm.com>.
If there is any chance we might wind up deciding that breaking XNI 
compatability is worth the cost, better to do it now than later. (I'd 
rather not recode now, but I'd *REALLY* rather have the best possible 
solution in the long run.)

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more. 
"may'ron DaroQbe'chugh vaj bIrIQbej"  ("Put down the squeezebox and nobody 
gets hurt.")


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [PROPOSAL: XNI CHANGE]: determining the encoding of an external subset via XNI

Posted by Neeraj Bajaj <ne...@sun.com>.
Andy Clark wrote:

> Neil Graham wrote:
>
>
>
>> After thinking about the 4 options I originally posted, I've 
>> concluded that
>> option 1 seems best.  It's completely consistent with what we've done
>> everywhere else in XNI viz. encodings, and makes the startExternal 
>> subset
>> call symmetric with the startDocument call.  So to review, I'm proposing
>> that we change
>
>
> I don't like option 1 because it's a breaking API change.
> Anything that changes an existing method's prototype or
> removes an existing method is destructive and causes a
> lot of distress for application writers using XNI.

Hi Neil,

    I understand your point and you really explained well about the pros 
& cons of each approach. As you mentioned too, the downside of this 
approach is that it breaks compatibility and IMO its a big thing which 
should be avoided.
 I agree with Andy that breaking compatibility is pain for all the 
developers who has developed their applications using XNI. So, I am also 
not in favor of this change.

>
>>> 4.  We could amend the XMLLocator interface by adding a method like
>>>     public String getEncoding()
>>
>
> I don't mind this addition because I figure that we'll
> probably end up adding something like this anyway. And
> this change does not break existing applications unless
> those applications implement their own XMLLocator class.
> And even then, adding this method in their code still
> allows it to be used by people using older versions of
> Xerces.
>
>> from the XMLLocator.  We'd also have to make sure to update the locator
>> implementation at every entity change, which could impact performance
>> slightly.  Finally, I'd submit that it's trivial to implement the SAX
>
>
> I don't think so.
>
> The XMLEntityManager.ScannedEntity class already keeps
> track of the encoding of the entity. So there is no
> "updating" required as entities are popped off of the
> entity manager's scanned-entity stack.

Right. Encoding information was always available and it just needs to be 
passed.


thanks,
Neeraj



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


RE: [PROPOSAL: XNI CHANGE]: determining the encoding of an external subset via XNI

Posted by Joseph Kesselman <ke...@us.ibm.com>.
>Option 4 solves the pb without breaking
>backward compatibility and only requires an extra call

Hmmm. I need to take another look at this, but I think you're right...


______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more. 
"may'ron DaroQbe'chugh vaj bIrIQbej"  ("Put down the squeezebox and nobody 
gets hurt.")


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


RE: [PROPOSAL: XNI CHANGE]: determining the encoding of an external subset via XNI

Posted by Arnaud Le Hors <le...@us.ibm.com>.
I have to agree with Andy on this. Option 4 solves the pb without breaking
backward compatibility and only requires an extra call for the few (as shown
by the lack of interest in this) who care.
--
Arnaud  Le Hors - IBM, XML Standards Strategy Group / W3C AC Rep.


> -----Original Message-----
> From: Andy Clark [mailto:andyc@apache.org]
> Sent: Thursday, March 20, 2003 4:53 PM
> To: xerces-j-dev@xml.apache.org
> Subject: Re: [PROPOSAL: XNI CHANGE]: determining the encoding of an
> external subset via XNI
>
>
> Neil Graham wrote:
> > For some reason, this issue still doesn't seem to be generating much
> > feedback; with the exception of Neeraj, everyone's been silent.
>  So I think
> > it's time to move towards a concrete proposal.
>
> I've been quiet on this issue for a few reasons: 1) I've
> been busy with other things; and 2) the SAX2 extensions
> that are the cause of this discussion are only in the
> beta stage at the moment. (Or were there other reasons
> for this that I'm forgetting?)
>
> > After thinking about the 4 options I originally posted, I've
> concluded that
> > option 1 seems best.  It's completely consistent with what we've done
> > everywhere else in XNI viz. encodings, and makes the
> startExternal subset
> > call symmetric with the startDocument call.  So to review, I'm proposing
> > that we change
>
> I don't like option 1 because it's a breaking API change.
> Anything that changes an existing method's prototype or
> removes an existing method is destructive and causes a
> lot of distress for application writers using XNI.
>
> >>2.  We could add a new callback to the XMLDTDHandler interface,
> something
> > like:
> >
> >>     public void externalSubsetEncoding(String encoding)
>
> Ugh. I *really* don't like this.
>
> >>3.  We could use the Augmentations parameter of the startExternalSubset
> > callback.
>
> Works but not desirable.
>
> >>4.  We could amend the XMLLocator interface by adding a method like
> >>     public String getEncoding()
>
> I don't mind this addition because I figure that we'll
> probably end up adding something like this anyway. And
> this change does not break existing applications unless
> those applications implement their own XMLLocator class.
> And even then, adding this method in their code still
> allows it to be used by people using older versions of
> Xerces.
>
> > from the XMLLocator.  We'd also have to make sure to update the locator
> > implementation at every entity change, which could impact performance
> > slightly.  Finally, I'd submit that it's trivial to implement the SAX
>
> I don't think so.
>
> The XMLEntityManager.ScannedEntity class already keeps
> track of the encoding of the entity. So there is no
> "updating" required as entities are popped off of the
> entity manager's scanned-entity stack.
>
> > locator2 functionality without this change.  If we're going to
> do something
> > really ugly in XNI like duplicate means for accessing a
> particular piece of
> > information, I really hope we'd require some solid use-case that simply
> > couldn't be met with the existing framework.
>
> Before we make changes to XNI, I'd like to see more
> cases than just the SAX extensions (1.1 beta1) to
> justify it.
>
> > It's true that we've declared XNI to be "golden", but we always said it
> > could still change if we found a sufficiently significant
> problem.  I think
> > this problem is sufficiently significant.
>
> I'm still unconvinced that it's significant enough
> to break the API.
>
> --
> Andy Clark * andyc@apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [PROPOSAL: XNI CHANGE]: determining the encoding of an external subset via XNI

Posted by Andy Clark <an...@apache.org>.
Neil Graham wrote:
> For some reason, this issue still doesn't seem to be generating much
> feedback; with the exception of Neeraj, everyone's been silent.  So I think
> it's time to move towards a concrete proposal.

I've been quiet on this issue for a few reasons: 1) I've
been busy with other things; and 2) the SAX2 extensions
that are the cause of this discussion are only in the
beta stage at the moment. (Or were there other reasons
for this that I'm forgetting?)

> After thinking about the 4 options I originally posted, I've concluded that
> option 1 seems best.  It's completely consistent with what we've done
> everywhere else in XNI viz. encodings, and makes the startExternal subset
> call symmetric with the startDocument call.  So to review, I'm proposing
> that we change

I don't like option 1 because it's a breaking API change.
Anything that changes an existing method's prototype or
removes an existing method is destructive and causes a
lot of distress for application writers using XNI.

>>2.  We could add a new callback to the XMLDTDHandler interface, something
> like:
> 
>>     public void externalSubsetEncoding(String encoding)

Ugh. I *really* don't like this.

>>3.  We could use the Augmentations parameter of the startExternalSubset
> callback.

Works but not desirable.

>>4.  We could amend the XMLLocator interface by adding a method like
>>     public String getEncoding()

I don't mind this addition because I figure that we'll
probably end up adding something like this anyway. And
this change does not break existing applications unless
those applications implement their own XMLLocator class.
And even then, adding this method in their code still
allows it to be used by people using older versions of
Xerces.

> from the XMLLocator.  We'd also have to make sure to update the locator
> implementation at every entity change, which could impact performance
> slightly.  Finally, I'd submit that it's trivial to implement the SAX

I don't think so.

The XMLEntityManager.ScannedEntity class already keeps
track of the encoding of the entity. So there is no
"updating" required as entities are popped off of the
entity manager's scanned-entity stack.

> locator2 functionality without this change.  If we're going to do something
> really ugly in XNI like duplicate means for accessing a particular piece of
> information, I really hope we'd require some solid use-case that simply
> couldn't be met with the existing framework.

Before we make changes to XNI, I'd like to see more
cases than just the SAX extensions (1.1 beta1) to
justify it.

> It's true that we've declared XNI to be "golden", but we always said it
> could still change if we found a sufficiently significant problem.  I think
> this problem is sufficiently significant.

I'm still unconvinced that it's significant enough
to break the API.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org