You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Petr Kuzel <Pe...@netbeans.com> on 2000/12/19 16:43:02 UTC

Xerces2 features - compile time configuration

Hello Xerces2 team,

  I see that there are requirements from many sides on new Xerces2 features.
(We also posted a requirement regarding a new feature.) Most of them are really
simple to implement but as Arnaud Le Hors stated there are two kinds of XML
parsers:
  1) consuming applications - require data cooking related features
  2) transforming applications - require round trip related features

  Another truth is that the latter parser just produces more information that
the first can ignore. Simply a parser engine can be same in both cases. Arnauld
also stated that Xerces2 engine is designed to be of the first type for 
performance reason.
  But there are these requirements about the features related to the second type.
I do not think that ignoring them or suggest to create a new branch is a right
solution.

  In consequence I would propose that a compile time configuration would be introduced.
It would be possible to compile fast or heavy weight parser. The configuration 
should be for example done by a class:

  public class XercesConfig {
     public static final boolean FULL_NOTIFICATIONS = true;
     public static final boolean MANY_NOTIFICATIONS = true || FULL_NOTIFICATIONS;
  }

  or a fine feature bit map. Feature flags can be substituted by ant <replace ../>. 
Then following conditional blocks appear in source:

  ...

  if (XercesConfig.FULL_NOTIFICATIONS) {
       fDocumentHandler.notifyCharacterEntity(...);
  }  //!!! no else at all for improved readability

  fDocumentHandler.characters(...);

  ...

  Optimizing compiler should inline the block or omit it at all if the
static final condition variable is set to false.

  I know it may decrease code readability but the configuration advantage 
should compensate it fully.

  Is is an acceptable way how to add new features while preserving high performance?
If a new feature would introduce an interface change then the new feature would
be provided by a new property to avoid modification of core interfaces.

  Does it make enough sense to be considered?
  Would be started a thread collecting new (ortogonal) feature requests?

  Have a nice day
  Ccc

-- 
<address>
<a href="mailto:pkuzel@netbeans.com">Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/">Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/">Jini</a> modules</address>

Re: Xerces2 features - compile time configuration

Posted by Libor Kramolis <li...@netbeans.com>.

Petr Kuzel wrote:

> Hello Xerces2 team,
> 
>   I see that there are requirements from many sides on new Xerces2 features.
> (We also posted a requirement regarding a new feature.) Most of them are really
> simple to implement but as Arnaud Le Hors stated there are two kinds of XML
> parsers:
>   1) consuming applications - require data cooking related features
>   2) transforming applications - require round trip related features
> 
>   Another truth is that the latter parser just produces more information that
> the first can ignore. Simply a parser engine can be same in both cases. Arnauld
> also stated that Xerces2 engine is designed to be of the first type for 
> performance reason.
>   But there are these requirements about the features related to the second type.
> I do not think that ignoring them or suggest to create a new branch is a right
> solution.
> 
>   In consequence I would propose that a compile time configuration would be introduced.
> It would be possible to compile fast or heavy weight parser. The configuration 
> should be for example done by a class:
> 
>   public class XercesConfig {
>      public static final boolean FULL_NOTIFICATIONS = true;
>      public static final boolean MANY_NOTIFICATIONS = true || FULL_NOTIFICATIONS;
>   }
> 
>   or a fine feature bit map. Feature flags can be substituted by ant <replace ../>. 
> Then following conditional blocks appear in source:
> 
>   ...
> 
>   if (XercesConfig.FULL_NOTIFICATIONS) {
>        fDocumentHandler.notifyCharacterEntity(...);
>   }  //!!! no else at all for improved readability
> 
>   fDocumentHandler.characters(...);
> 
>   ...
> 
>   Optimizing compiler should inline the block or omit it at all if the
> static final condition variable is set to false.
> 
>   I know it may decrease code readability but the configuration advantage 
> should compensate it fully.
> 
>   Is is an acceptable way how to add new features while preserving high performance?
> If a new feature would introduce an interface change then the new feature would
> be provided by a new property to avoid modification of core interfaces.
> 
>   Does it make enough sense to be considered?
>   Would be started a thread collecting new (ortogonal) feature requests?
> 
>   Have a nice day
>   Ccc
> 

Hello.

I hope, we are not only one how want to get from parser all possible 
informations about parsed document. This must be useful for xml editor 
writers. We want parse document and when we will store this document 
without changes to the another file, these files must be identical 
(there should be indentation changes only). All used entity references 
in source must be stored to the result. Don't you think?

I would like to embolden Xerces 2 developers. It looks very good and I 
hope Xerces 2 will be able to used for xml editor usega. I believe that 
only Xerces 2 can do that because is perfectly designed.   :-)

So. I would like to vote for possibility to ge information about all 
parts of parsed document.

Thanks,
Libor

Re: Xerces2 features - compile time configuration

Posted by Petr Kuzel <Pe...@netbeans.com>.

Petr Kuzel wrote:
> 
> Mark Diekhans wrote:
> >
> > I have been using Xerces in what would best classified as a transforming
> > application and am over all quite happy with it.  The only restriction that
> > has actually prevented supporting some functionality is the inability to
> > recover entity references in attribute values.  By only promissing to produce
> > functionally equivalent XML, many problems go away.  Attribute order, quote
> > character, etc, might change, but the data expressed in the document is the
> > same.
> >
> > Guess this is a hand raise for being able to get attribute values with
> > entity references expanded.
> 
>   Yes. More generally to be informed about all (entity, parameter and character)
> refs starts and ends regardless of document context.

  I have to correct myself. After further analysis I realized that it is not
really a good idea to pollute SAX with such features. It is better to switch
to XNI which reports them through XMLAttributes interface.

  At XNI level is only missing information about character references in this 
respect (references).

  Sorry for confusion, new context is XNI at Xerces2.

  Ccc

-- 
<address>
<a href="mailto:pkuzel@netbeans.com">Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/">Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/">Jini</a> modules</address>

Re: Xerces2 features - compile time configuration

Posted by Libor Kramolis <li...@netbeans.com>.

Petr Kuzel wrote:

> Mark Diekhans wrote:
> 
>> I have been using Xerces in what would best classified as a transforming
>> application and am over all quite happy with it.  The only restriction that
>> has actually prevented supporting some functionality is the inability to
>> recover entity references in attribute values.  By only promissing to produce
>> functionally equivalent XML, many problems go away.  Attribute order, quote
>> character, etc, might change, but the data expressed in the document is the
>> same.
>> 
>> Guess this is a hand raise for being able to get attribute values with
>> entity references expanded.
> 
> 
>   Yes. More generally to be informed about all (entity, parameter and character) 
> refs starts and ends regardless of document context.
> 
>   Ccc
> 

I think that this is not only one "problem". I would like to write here, 
which problems I see now with using Xerces with usage in xml editor.

Problem is that like editor writer I would like to store parsed document 
with minimum of changes.

Ideal state: source and result would be same. [ it is only dream now ]
Small problem state: source and result differ only in indentation. [ it 
breaks users favorite indentation, but no structure is lost ]
Another states are not accetable to xml editor. [ data will not be lost 
but result structure is very different from source ]

Problem parts:

* ignored conditional section: inside data are not fired.

* attribute value delimiters: I don't know if was used ' or "

* attributes from dtd: i don't know if attribute is not fixed or is used 
default value

Thanks,
Libor

Re: Xerces2 features - compile time configuration

Posted by Petr Kuzel <Pe...@netbeans.com>.

Mark Diekhans wrote:
> 
> I have been using Xerces in what would best classified as a transforming
> application and am over all quite happy with it.  The only restriction that
> has actually prevented supporting some functionality is the inability to
> recover entity references in attribute values.  By only promissing to produce
> functionally equivalent XML, many problems go away.  Attribute order, quote
> character, etc, might change, but the data expressed in the document is the
> same.
> 
> Guess this is a hand raise for being able to get attribute values with
> entity references expanded.

  Yes. More generally to be informed about all (entity, parameter and character) 
refs starts and ends regardless of document context.

  Ccc

-- 
<address>
<a href="mailto:pkuzel@netbeans.com">Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/">Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/">Jini</a> modules</address>

Re: Xerces2 features - compile time configuration

Posted by Mark Diekhans <ma...@lutris.com>.

I have been using Xerces in what would best classified as a transforming
application and am over all quite happy with it.  The only restriction that
has actually prevented supporting some functionality is the inability to
recover entity references in attribute values.  By only promissing to produce
functionally equivalent XML, many problems go away.  Attribute order, quote
character, etc, might change, but the data expressed in the document is the
same. 

Guess this is a hand raise for being able to get attribute values with
entity references expanded.

Mark

Arnaud Le Hors <le...@us.ibm.com> writes:
> Petr Kuzel wrote:
> > 
> > Hello Xerces2 team,
> > 
> >   I see that there are requirements from many sides on new Xerces2 features.
> > (We also posted a requirement regarding a new feature.) Most of them are really
> > simple to implement but as Arnaud Le Hors stated there are two kinds of XML
> > parsers:
> >   1) consuming applications - require data cooking related features
> >   2) transforming applications - require round trip related features
> > 
> >   Another truth is that the latter parser just produces more information that
> > the first can ignore. Simply a parser engine can be same in both cases. Arnauld
> > also stated that Xerces2 engine is designed to be of the first type for
> > performance reason.
> >   But there are these requirements about the features related to the second type.
> > I do not think that ignoring them or suggest to create a new branch is a right
> > solution.
> 
> First, I'd like it to be clear that I don't pretend to be deciding for
> everybody, so I'd like to hear what others think/want. Second, before we
> go to the compile option approach, which is a pain on several fronts
> (interoperability + maintenance), we should assess how many features we
> are taking about and what it would really cost to have them dynamic.

Re: Xerces2 features - compile time configuration

Posted by Petr Kuzel <Pe...@netbeans.com>.

Arnaud Le Hors wrote:
> First, I'd like it to be clear that I don't pretend to be deciding for
> everybody, so I'd like to hear what others think/want. Second, before we
> go to the compile option approach, which is a pain on several fronts
> (interoperability + maintenance), we should assess how many features we
> are taking about and what it would really cost to have them dynamic.

  Pardon me. I intended to use your name just as a link to another thread
related to new features versus performance issue.

  Regarding the features: I would divide them into SAX, DOM and highest 
category - XNI features. As we are developing a tool we are not interested
in SAX data extraction API nor DOM data manipulation API but XNI seems
to be right API for tools applications.
  So we do not want to add new features and properties (handlers for such
features) at SAX or DOM level. We would prefer to have them at XNI level
rather than at SAX one. I understand SAX as pure data extraction API and
enriching it by new features and properties just tries in many cases
to solve problem of non data extraction applications.
  Besides is an idea that XNI level features are for tools applications
better than SAX features correct? I think so. Many of SAX extending 
features are not need in XNI level because these are simply a part of it.

  Ccc

-- 
<address>
<a href="mailto:pkuzel@netbeans.com">Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/">Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/">Jini</a> modules</address>

Re: Xerces2 features - compile time configuration

Posted by Arnaud Le Hors <le...@us.ibm.com>.

Petr Kuzel wrote:
> 
> Hello Xerces2 team,
> 
>   I see that there are requirements from many sides on new Xerces2 features.
> (We also posted a requirement regarding a new feature.) Most of them are really
> simple to implement but as Arnaud Le Hors stated there are two kinds of XML
> parsers:
>   1) consuming applications - require data cooking related features
>   2) transforming applications - require round trip related features
> 
>   Another truth is that the latter parser just produces more information that
> the first can ignore. Simply a parser engine can be same in both cases. Arnauld
> also stated that Xerces2 engine is designed to be of the first type for
> performance reason.
>   But there are these requirements about the features related to the second type.
> I do not think that ignoring them or suggest to create a new branch is a right
> solution.

First, I'd like it to be clear that I don't pretend to be deciding for
everybody, so I'd like to hear what others think/want. Second, before we
go to the compile option approach, which is a pain on several fronts
(interoperability + maintenance), we should assess how many features we
are taking about and what it would really cost to have them dynamic.
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group