You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by Aaron Digulla <di...@hepe.com> on 2008/07/29 22:21:19 UTC

POM rewriting with DecentXML

Hi guys,

It's official: I'm through with the W3C and the stupid XML parsers which
came in it's wake. To allow to write XML filters and editors which don't
ruin the layout, I've started my own XML parser project "DecentXML".

The main goals are to provide a library to manipulate exiting (small)
XML files with the least amount of disruption plus good error handling
(like telling line and column numbers) and an easy to use API.

And no, it's not W3C compliant. And it never will be. That's the whole
point :)

I've got some code ready for you to try:
http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz

It can read XML 1.0 including namespaces, manipulate it (not completely
but all common operations are supported) and write it back.

As time permits, I'll get the test coverage above 90% tomorrow plus I'll
replace toXML(StringBuilder) with toXML(Writer).

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://darkviews.blogspot.com/          http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Jason van Zyl schrieb:

>> It's official: I'm through with the W3C and the stupid XML parsers which
>> came in it's wake. To allow to write XML filters and editors which don't
>> ruin the layout, I've started my own XML parser project "DecentXML".
> There are a couple that are decent. XMLBeans does a good job of not
> completely mutilating the XML, and the code that's been created for
> m2eclipse using EMF is pretty good as well. Netbeans is still using some
> JDOM based code that is not bad either.

None of them can preserve whitespace in elements (between attributes,
for example or in the end tag).

For XMLBeans, you need to define POJOs for all any any elements in the
XML that you want to process. Not suitable for a generic XML
search'n'replace tool.

EMF has a classpath that is longer than my attention span which not
documented anywhere. I'm past guesswork in my job.

JDOM has the best XML API so far but it's still based on Java 1.4, so
the code is pretty expressive ... lots of "ceremony".

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://darkviews.blogspot.com/          http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Jason van Zyl <ja...@maven.org>.
There are a couple that are decent. XMLBeans does a good job of not  
completely mutilating the XML, and the code that's been created for  
m2eclipse using EMF is pretty good as well. Netbeans is still using  
some JDOM based code that is not bad either.

The JDOM code is so-so and is used in a few places, but the XMLBeans  
is good, the EMF code better but coupled to EMF/WTP right now.

On 29-Jul-08, at 1:21 PM, Aaron Digulla wrote:

> Hi guys,
>
> It's official: I'm through with the W3C and the stupid XML parsers  
> which
> came in it's wake. To allow to write XML filters and editors which  
> don't
> ruin the layout, I've started my own XML parser project "DecentXML".
>
> The main goals are to provide a library to manipulate exiting (small)
> XML files with the least amount of disruption plus good error handling
> (like telling line and column numbers) and an easy to use API.
>
> And no, it's not W3C compliant. And it never will be. That's the whole
> point :)
>
> I've got some code ready for you to try:
> http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz
>
> It can read XML 1.0 including namespaces, manipulate it (not  
> completely
> but all common operations are supported) and write it back.
>
> As time permits, I'll get the test coverage above 90% tomorrow plus  
> I'll
> replace toXML(StringBuilder) with toXML(Writer).
>
> Regards,
>
> -- 
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://darkviews.blogspot.com/          http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
----------------------------------------------------------

happiness is like a butterfly: the more you chase it, the more it will
elude you, but if you turn your attention to other things, it will come
and sit softly on your shoulder ...

  -- Thoreau


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Hi,

I've finished all the main features and released version 1.0 of
DecentXML on Google Code: http://code.google.com/p/decentxml/

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://darkviews.blogspot.com/          http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Quoting John Casey <jd...@commonjava.org>:

>>> I've got some code ready for you to try:
>>> http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz
>> wrong list? :)
> still, sounds cool. Are you parking this at SF or Codehaus or anywhere
> like that? :-)

Currently, I'm hosting that on my own server; this was a mad dash (I  
wrote the whole thing in about ten hours), so I hadn't had the time to  
plan how to release the project.

Regards,

PS: Guys, please learn to quote :)

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by John Casey <jd...@commonjava.org>.
still, sounds cool. Are you parking this at SF or Codehaus or anywhere 
like that? :-)

-j

Brett Porter wrote:
> wrong list? :)
> 
> On 30/07/2008, at 6:21 AM, Aaron Digulla wrote:
> 
>> Hi guys,
>>
>> It's official: I'm through with the W3C and the stupid XML parsers which
>> came in it's wake. To allow to write XML filters and editors which don't
>> ruin the layout, I've started my own XML parser project "DecentXML".
>>
>> The main goals are to provide a library to manipulate exiting (small)
>> XML files with the least amount of disruption plus good error handling
>> (like telling line and column numbers) and an easy to use API.
>>
>> And no, it's not W3C compliant. And it never will be. That's the whole
>> point :)
>>
>> I've got some code ready for you to try:
>> http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz
>>
>> It can read XML 1.0 including namespaces, manipulate it (not completely
>> but all common operations are supported) and write it back.
>>
>> As time permits, I'll get the test coverage above 90% tomorrow plus I'll
>> replace toXML(StringBuilder) with toXML(Writer).
>>
>> Regards,
>>
>> -- 
>> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
>> "It's not the universe that's limited, it's our imagination.
>> Follow me and I'll show you something beyond the limits."
>> http://darkviews.blogspot.com/          http://www.pdark.de/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>>
> 
> -- 
> Brett Porter
> brett@apache.org
> http://blogs.exist.com/bporter/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

-- 
John Casey
Developer, PMC Member - Apache Maven (http://maven.apache.org)
Blog: http://www.ejlife.net/blogs/buildchimp/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Quoting Michael McCallum <gh...@apache.org>:

> On Tue, 05 Aug 2008 19:28:47 Aaron Digulla wrote:
>> I mean, there was *no* XML parser which can do 100%  
>> round-tripping before DecentXML. It's just a non-issue for the XML guys.
>
> xom using xerces 2.6.7 was supposed to be able to do a complete round trip,
> have you disproved that?

Yes. SAX parsers can't do 100% perfect round-tripping, they always  
lose some information, for example, whitespace in elements (i.e. when  
you put every attribute in it's own line and things like that).

Also, attribute values come with entities already expanded, etc.

There are more cases but these are the ones which are most simple to  
explain :)

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Michael McCallum <gh...@apache.org>.
On Tue, 05 Aug 2008 19:28:47 Aaron Digulla wrote:
> I mean, there was *no* XML parser which can do 100%  
> round-tripping before DecentXML. It's just a non-issue for the XML guys.

xom using xerces 2.6.7 was supposed to be able to do a complete round trip, 
have you disproved that?

-- 
Michael McCallum
Enterprise Engineer
mailto:gholam@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Quoting Stephen Connolly <st...@gmail.com>:

>> You can fix StAX, we know the authors. Even if you added an extension
>> property that turned on better whitespace handling that would be fine. I'm
>> not keen on pulling in another XML parser to be honest.
> +1000...

*sigh*

Okay. Look at the last example in the tutorial  
(http://code.google.com/p/decentxml/wiki/Tutorial). If StAX can pass  
this test case, I'm willing to have a look.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Stephen Connolly <st...@gmail.com>.
On Tue, Aug 5, 2008 at 2:44 PM, Jason van Zyl <ja...@maven.org> wrote:

>
> On 5-Aug-08, at 12:28 AM, Aaron Digulla wrote:
>
>  Quoting Jason van Zyl <ja...@maven.org>:
>>
>>  But I think looking at StAX and possibly trying to patch that to be
>>> smarter about formatting, if necessary, might be a better route for us.
>>>
>>
>> StAX can't preserve whitespace between attributes, between "<" and the
>> element name, whitespace after the last attribute and the ">", between "</"
>> and the end element name. Same goes for all pull parsers.
>>
>>
> Why not fix StAX?
>
>  Not sure about CDATA but I guess StAX can't preserve that, either. Lastly,
>> StAX is about *reading* XML. DecentXML is about *writing* XML *preserving*
>> the original format 100%, no compromises.
>>
>>
> Yes, but your preservation tactics and impetus for doing this is predicated
> on having *read* something first.
>
>  As for patching it: StAX is a standard API (JSR-173). How big are my
>> chances that the standard API is going to be extended to allow the features
>> I need? I mean, there was *no* XML parser which can do 100% round-tripping
>> before DecentXML. It's just a non-issue for the XML guys.
>>
>>
> We could get you into StaX in five minutes if you wanted to patch it.
>
>  Just looking at an XML gives you a visual clue: these guys couldn't care
>> less how it *looks* as long as their tools can read it.
>>
>>
> Dan, he uses StAX (but knows Tatu who wrote it) and said if it isn't
> possible now it would be easy to fix.
>
>  As I said: My parser is probably not so useful as a general purpose
>> replacement for POM *reading* in general. It ought to be used in the Maven
>> artifact plugin and any other code which *writes* POM files.
>>
>>
> If we've read in the model using the tools that we currently use which
> knows about everything about the whitespace, and then manipulate the model
> in memory how exactly would we integrate your writer?
>
> You can fix StAX, we know the authors. Even if you added an extension
> property that turned on better whitespace handling that would be fine. I'm
> not keen on pulling in another XML parser to be honest.
>

+1000...


>
>
>  Regards,
>>
>> --
>> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
>> "It's not the universe that's limited, it's our imagination.
>> Follow me and I'll show you something beyond the limits."
>> http://www.pdark.de/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>>
>>
> Thanks,
>
> Jason
>
> ----------------------------------------------------------
> Jason van Zyl
> Founder,  Apache Maven
> jason at sonatype dot com
> ----------------------------------------------------------
>
> the course of true love never did run smooth ...
>
>  -- Shakespeare
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Stuart McCulloch schrieb:

>> Why not fix StAX?
>> Because StAX is not meant to do this. I need to keep the original XML
>> source somewhere to be able to recreate anything you might have done. That
>> includes entities (and how you entered them originally) and all kind of
>> weird stuff that every XML parser out there throws away.
> this isn't necessarily true - you could take the XML source and map
> it into a model that's used by the application - later on you'd take the
> modified model and compare (diff) it against the original model and
> use this to rewrite sections of the XML source while keeping the rest
> undisturbed... (using indexing to improve performance/tracking)

Right. And I could have China paint the moon in red so that I can write
"Coca Cola" on it. ;)

>>  As I said: My parser is probably not so useful as a general purpose
>>>>  replacement for POM *reading* in general. It ought to be used in  the Maven
>>>> artifact plugin and any other code which *writes* POM  files.
>>>>
>>> If we've read in the model using the tools that we currently use which
>>> knows about everything about the whitespace, and then manipulate the
>>> model in memory how exactly would we integrate your writer?
>>>
>> Same issue as above. My suggestion is to keep the model reader as it is. If
>> you write a plugin which wants to manipulate any kind of XML, you add a
>> dependency to DecentXML, read the XML, manipulate it and write it out.
> 
> which kind of sucks if you want to pass the model around collecting
> changes from different components - then everyone would have to
> use the DecentXML document, otherwise you'd lose the formatting.

That's one way. Another way would be to generate XML after each step and
pass that on.

Secondly, in the patch I've sent you, I directly modify the StringBuffer
after I've located the bit that should change. So you could hand each
component this buffer.

It would mean some overhead for parsing the document every time but the
documents are small and the parser is very fast.

Thirdly, all components should preserve the formatting and DecentXML is
the only way to do that, so each component has to use it anyway.

>> My solution returns a complete XML document to begin with, so the setup is
>> just a single line of code and then you can start working on the document.
> 
> your solution is interesting, but I think you'd get more support if you
> stopped dissing everything else - there's been a lot of innovation in
> this area already and I expect there's still more to come.
> 
> at least have a serious look at StAX and see if it could be improved

Make StAX pass the test I've sent to the list and I'll have a look. From
what I know, it can't pass this test because it throws that information
away before anyone can see it.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://darkviews.blogspot.com/          http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Stuart McCulloch <mc...@gmail.com>.
2008/8/5 Aaron Digulla <di...@hepe.com>

> Quoting Jason van Zyl <ja...@maven.org>:
>
> Why not fix StAX?
>>
>
> Because StAX is not meant to do this. I need to keep the original XML
> source somewhere to be able to recreate anything you might have done. That
> includes entities (and how you entered them originally) and all kind of
> weird stuff that every XML parser out there throws away.
>

this isn't necessarily true - you could take the XML source and map
it into a model that's used by the application - later on you'd take the
modified model and compare (diff) it against the original model and
use this to rewrite sections of the XML source while keeping the rest
undisturbed... (using indexing to improve performance/tracking)

you could even track edits internally while the model is manipulated,
then you don't need the whole document kept in memory all the time
- only when writing out the actual changes.

In my code, I tokenize the XML source and then keep references to these
> tokens. Can StAX do that? Do I have full access to the unicode input stream?
> Can I patch the tokenizer?
>
> Later, in your POM reader, you turn the XML events into a Java object
> model. At this stage, all the information I've gathered is thrown away. So
> even if I could extend StAX to keep the necessary bits, you would still have
> to rewrite your POM readers to save the XML tokens somewhere and then,
> later, when we want to recreate the POM, you would have to collect that
> information from the various bits and pieces.
>

all that information is still in the original XML source, so you
just need to be able to translate model changes into minimal
edits to the XML source (not trivial, but not impossible - I've
done this for other file formats in the past)


> And even if that would all work ... how would you preserve the original
> order of XML elements from the Java version of the POM? I mean, it's nice
> and all that I can iterate over the dependencies but is the original order
> preserved?
>

using the "diff" approach any unchanged elements automatically
keep their original order (actually that should also be the case at
the moment, if the Java model uses the right collection classes)

the tricky part is usually deciding where to slot in new elements...


>
>  As I said: My parser is probably not so useful as a general purpose
>>>  replacement for POM *reading* in general. It ought to be used in  the Maven
>>> artifact plugin and any other code which *writes* POM  files.
>>>
>>
>> If we've read in the model using the tools that we currently use which
>> knows about everything about the whitespace, and then manipulate the
>> model in memory how exactly would we integrate your writer?
>>
>
> Same issue as above. My suggestion is to keep the model reader as it is. If
> you write a plugin which wants to manipulate any kind of XML, you add a
> dependency to DecentXML, read the XML, manipulate it and write it out.
>

which kind of sucks if you want to pass the model around collecting
changes from different components - then everyone would have to
use the DecentXML document, otherwise you'd lose the formatting.

There is no way to read the XML with tool A and then write it out with tool
> B.
>

I think there is, you just have to be able to map model changes into
minimal XML changes - of course the more context you can stash in
the actual model the easier this is (DecentXML stashes everything)


> I'm not keen on pulling in another XML parser to be honest.
>>
>
> I know that. I don't have a better solution because there probably isn't. I
> don't start forks just because of the fun of it. This is essential an
> unsolved problem in the XML space, it's been unsolved since XML was invented
> and it won't ever be solved because it's a corner case. I just happen to be
> in that corner very often, so I finally gave in and started on a solution.
>

I wouldn't say it's unsolved or unsolvable - there are many ways to
achieve different levels of round-tripping, and just because a parser
doesn't achieve 100% doesn't mean that it's useless - there's always
some trade-off (space, performance, etc.)


> My solution returns a complete XML document to begin with, so the setup is
> just a single line of code and then you can start working on the document.
>

your solution is interesting, but I think you'd get more support if you
stopped dissing everything else - there's been a lot of innovation in
this area already and I expect there's still more to come.

at least have a serious look at StAX and see if it could be improved

Regards,
>
> --
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

-- 
Cheers, Stuart

Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Quoting Jason van Zyl <ja...@maven.org>:

>> StAX can't preserve whitespace between attributes, between "<" and   
>> the element name, whitespace after the last attribute and the ">",   
>> between "</" and the end element name. Same goes for all pull   
>> parsers.
>
> Why not fix StAX?

Because StAX is not meant to do this. I need to keep the original XML  
source somewhere to be able to recreate anything you might have done.  
That includes entities (and how you entered them originally) and all  
kind of weird stuff that every XML parser out there throws away.

In my code, I tokenize the XML source and then keep references to  
these tokens. Can StAX do that? Do I have full access to the unicode  
input stream? Can I patch the tokenizer?

Later, in your POM reader, you turn the XML events into a Java object  
model. At this stage, all the information I've gathered is thrown  
away. So even if I could extend StAX to keep the necessary bits, you  
would still have to rewrite your POM readers to save the XML tokens  
somewhere and then, later, when we want to recreate the POM, you would  
have to collect that information from the various bits and pieces.

And even if that would all work ... how would you preserve the  
original order of XML elements from the Java version of the POM? I  
mean, it's nice and all that I can iterate over the dependencies but  
is the original order preserved?

Sorry, Jason, your arguments only tell me that you haven't thought  
this through.

>> As I said: My parser is probably not so useful as a general purpose  
>>  replacement for POM *reading* in general. It ought to be used in   
>> the Maven artifact plugin and any other code which *writes* POM   
>> files.
>
> If we've read in the model using the tools that we currently use which
> knows about everything about the whitespace, and then manipulate the
> model in memory how exactly would we integrate your writer?

Same issue as above. My suggestion is to keep the model reader as it  
is. If you write a plugin which wants to manipulate any kind of XML,  
you add a dependency to DecentXML, read the XML, manipulate it and  
write it out.

There is no way to read the XML with tool A and then write it out with tool B.

> You can fix StAX, we know the authors. Even if you added an extension
> property that turned on better whitespace handling that would be fine.

StAX is just another XML parser. It might be better for round-tripping  
than SAX and all the other crap but so far, you've failed to convince  
me that you even understand what the issue is, so I can't trust your  
trust in StAX :)

That said, how do you manipulate the result of what StAX gives you? I  
mean, StAX is a streaming API. Which means I would have to build a  
model from the XML events returned by StAX. Only then, I could  
manipulate that XML document.

> I'm not keen on pulling in another XML parser to be honest.

I know that. I don't have a better solution because there probably  
isn't. I don't start forks just because of the fun of it. This is  
essential an unsolved problem in the XML space, it's been unsolved  
since XML was invented and it won't ever be solved because it's a  
corner case. I just happen to be in that corner very often, so I  
finally gave in and started on a solution.

My solution returns a complete XML document to begin with, so the  
setup is just a single line of code and then you can start working on  
the document.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Jason van Zyl <ja...@maven.org>.
On 5-Aug-08, at 12:28 AM, Aaron Digulla wrote:

> Quoting Jason van Zyl <ja...@maven.org>:
>
>> But I think looking at StAX and possibly trying to patch that to be
>> smarter about formatting, if necessary, might be a better route for  
>> us.
>
> StAX can't preserve whitespace between attributes, between "<" and  
> the element name, whitespace after the last attribute and the ">",  
> between "</" and the end element name. Same goes for all pull parsers.
>

Why not fix StAX?

> Not sure about CDATA but I guess StAX can't preserve that, either.  
> Lastly, StAX is about *reading* XML. DecentXML is about *writing*  
> XML *preserving* the original format 100%, no compromises.
>

Yes, but your preservation tactics and impetus for doing this is  
predicated on having *read* something first.

> As for patching it: StAX is a standard API (JSR-173). How big are my  
> chances that the standard API is going to be extended to allow the  
> features I need? I mean, there was *no* XML parser which can do 100%  
> round-tripping before DecentXML. It's just a non-issue for the XML  
> guys.
>

We could get you into StaX in five minutes if you wanted to patch it.

> Just looking at an XML gives you a visual clue: these guys couldn't  
> care less how it *looks* as long as their tools can read it.
>

Dan, he uses StAX (but knows Tatu who wrote it) and said if it isn't  
possible now it would be easy to fix.

> As I said: My parser is probably not so useful as a general purpose  
> replacement for POM *reading* in general. It ought to be used in the  
> Maven artifact plugin and any other code which *writes* POM files.
>

If we've read in the model using the tools that we currently use which  
knows about everything about the whitespace, and then manipulate the  
model in memory how exactly would we integrate your writer?

You can fix StAX, we know the authors. Even if you added an extension  
property that turned on better whitespace handling that would be fine.  
I'm not keen on pulling in another XML parser to be honest.

> Regards,
>
> -- 
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
----------------------------------------------------------

the course of true love never did run smooth ...

  -- Shakespeare


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Stuart McCulloch schrieb:

>> For example, the first version of the m2eclipse POM editor would remove all
>> comments after "Add dependency". That's not cool. Those comments contained
>> references to web pages and explanations why some stuff was set up "oddly".
> 
> yeah, I hate it when an editor zaps all the intermediate comments and
> whitespace - that's why in the past I've used a modified version of the
> plexus XML parser that preserves comments and maintains consistent
> indentation levels:  (although it definitely doesn't preserve 100%)
> 
> http://www.ops4j.org/projects/pax/construct/maven-pax-plugin/xref/org/ops4j/pax/construct/util/RoundTripXml.html
> 
> bit hacky, but was good enough for my purposes at the time :)

That was the first time I thought about writing my own XML parser but
only when I need to replace the parent version in 200 POM files, I
actually started to do it.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://darkviews.blogspot.com/          http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Stuart McCulloch <mc...@gmail.com>.
2008/8/5 Aaron Digulla <di...@hepe.com>

> Quoting Stuart McCulloch <mc...@gmail.com>:
>
>  StAX can't preserve whitespace between attributes, between "<" and the
>>> element name, whitespace after the last attribute and the ">", between
>>> "</"
>>> and the end element name. Same goes for all pull parsers.
>>>
>>
>> personally speaking, I don't actually mind if </ foo > is changed to
>> </foo>
>> for me preserving comments and the general layout, such as indentation
>> is much more important than attribute spacing.
>>
>
> Same here, but how about newlines in the project element to keep the
> namespace declarations in view?
>

honestly, that's not usually a problem for me - my IDE wraps the line


>  as Milos mentioned: how do you decide where to slot new elements, like
>> dependencies if there weren't any dependencies in the original pom - are
>> they always appended? do they inherit the surrounding indentation?
>>
>
> DecentXML doesn't try to be smarter than you. It just gives you all the
> tools to get what you want. It doesn't throw information away which was in
> the original file, so you can examine it and make an educated guess what
> would probably look right.
>

ah, ok - so it's up to me as a user of DecentXML to detect and apply
the correct indentation for the inserted element (based on level, etc.)

Even if it's not 100% correct, it will at least preserve all comments and
> processing instructions and other special XML data because the guy who wrote
> that XML probably had a reason to write it the way it is.
>
> For example, the first version of the m2eclipse POM editor would remove all
> comments after "Add dependency". That's not cool. Those comments contained
> references to web pages and explanations why some stuff was set up "oddly".
>

yeah, I hate it when an editor zaps all the intermediate comments and
whitespace - that's why in the past I've used a modified version of the
plexus XML parser that preserves comments and maintains consistent
indentation levels:  (although it definitely doesn't preserve 100%)


http://www.ops4j.org/projects/pax/construct/maven-pax-plugin/xref/org/ops4j/pax/construct/util/RoundTripXml.html

bit hacky, but was good enough for my purposes at the time :)


>  Just looking at an XML gives you a visual clue: these guys couldn't care
>>> less how it *looks* as long as their tools can read it.
>>>
>>
>> usually I'm more concerned about getting it working than the *look*, in
>> fact often what I need is an XML formatting tool that I can apply to our
>> poms to make them consistent (ie. like formatting code in an IDE)
>>
>
> In that case, DecentXML is for you. With it, you can pick all the
> whitespace in the POM file (and *only* that) and replace it with something
> that is more correct. It will leave comments, etc. alone.
>
> Or you can also reorder the elements without messing with the indentation.
>
> I've been thinking about such a cleanup tool myself. All the POM creators
> create POMs which look different, which have a different indentation,
> different order of elements, etc. With DecentXML, all that could be fixed
> without messing with parts of the XML which we might want to preserve.
>

yep, I think such a configurable cleanup/formatting tool would be very
useful.

Regards,
>
> --
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

-- 
Cheers, Stuart

Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Quoting Stuart McCulloch <mc...@gmail.com>:

>> StAX can't preserve whitespace between attributes, between "<" and the
>> element name, whitespace after the last attribute and the ">", between "</"
>> and the end element name. Same goes for all pull parsers.
>
> personally speaking, I don't actually mind if </ foo > is changed to </foo>
> for me preserving comments and the general layout, such as indentation
> is much more important than attribute spacing.

Same here, but how about newlines in the project element to keep the  
namespace declarations in view?

> as Milos mentioned: how do you decide where to slot new elements, like
> dependencies if there weren't any dependencies in the original pom - are
> they always appended? do they inherit the surrounding indentation?

DecentXML doesn't try to be smarter than you. It just gives you all  
the tools to get what you want. It doesn't throw information away  
which was in the original file, so you can examine it and make an  
educated guess what would probably look right.

Even if it's not 100% correct, it will at least preserve all comments  
and processing instructions and other special XML data because the guy  
who wrote that XML probably had a reason to write it the way it is.

For example, the first version of the m2eclipse POM editor would  
remove all comments after "Add dependency". That's not cool. Those  
comments contained references to web pages and explanations why some  
stuff was set up "oddly".

>> Just looking at an XML gives you a visual clue: these guys couldn't care
>> less how it *looks* as long as their tools can read it.
>
> usually I'm more concerned about getting it working than the *look*, in
> fact often what I need is an XML formatting tool that I can apply to our
> poms to make them consistent (ie. like formatting code in an IDE)

In that case, DecentXML is for you. With it, you can pick all the  
whitespace in the POM file (and *only* that) and replace it with  
something that is more correct. It will leave comments, etc. alone.

Or you can also reorder the elements without messing with the indentation.

I've been thinking about such a cleanup tool myself. All the POM  
creators create POMs which look different, which have a different  
indentation, different order of elements, etc. With DecentXML, all  
that could be fixed without messing with parts of the XML which we  
might want to preserve.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Stuart McCulloch <mc...@gmail.com>.
2008/8/5 Aaron Digulla <di...@hepe.com>

> Quoting Jason van Zyl <ja...@maven.org>:
>
>  But I think looking at StAX and possibly trying to patch that to be
>> smarter about formatting, if necessary, might be a better route for us.
>>
>
> StAX can't preserve whitespace between attributes, between "<" and the
> element name, whitespace after the last attribute and the ">", between "</"
> and the end element name. Same goes for all pull parsers.
>

personally speaking, I don't actually mind if </ foo > is changed to </foo>
for me preserving comments and the general layout, such as indentation
is much more important than attribute spacing.

as Milos mentioned: how do you decide where to slot new elements, like
dependencies if there weren't any dependencies in the original pom - are
they always appended? do they inherit the surrounding indentation?

Not sure about CDATA but I guess StAX can't preserve that, either. Lastly,
> StAX is about *reading* XML. DecentXML is about *writing* XML *preserving*
> the original format 100%, no compromises.
>
> As for patching it: StAX is a standard API (JSR-173). How big are my
> chances that the standard API is going to be extended to allow the features
> I need? I mean, there was *no* XML parser which can do 100% round-tripping
> before DecentXML. It's just a non-issue for the XML guys.
>
> Just looking at an XML gives you a visual clue: these guys couldn't care
> less how it *looks* as long as their tools can read it.
>

usually I'm more concerned about getting it working than the *look*, in
fact often what I need is an XML formatting tool that I can apply to our
poms to make them consistent (ie. like formatting code in an IDE)

As I said: My parser is probably not so useful as a general purpose
> replacement for POM *reading* in general. It ought to be used in the Maven
> artifact plugin and any other code which *writes* POM files.
>
> Regards,
>
> --
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>


-- 
Cheers, Stuart

Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Quoting Milos Kleint <mk...@gmail.com>:

> how do you deal with newly added content? just overwriting a value for
> existing elements is relatively easy.
> I mean if I add a new dependency to the pom file, how do I make sure
> it's properly indented? That's been the major issue for me now with
> the jdom modello writer (which I wrote). I don't really care if it
> screws up attribute spacing, but newly added content needs to fit in.

The most simple solution is to search for an existing dependency and  
then copy any whitespace before it plus the dependency itself and then  
overwrite the text content of groupId, etc, of the copy.

If that doesn't work, search for an element in the same level as  
"dependency", "dependencies", etc., check the whitespace before that  
and duplicate it as necessary.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Stuart McCulloch schrieb:

>> Sorry. I've written code with about any XML parser out there and none of
>> them would even get close to what DecentXML can do. In DecentXML, there are
>> no private fields or methods. Everything is meant to be extended or reused.
>> It's meant to be useful instead of limiting what you can do.
> 
> ouch, there's a reason to use private fields, etc. - if you open everything
> up then you don't know what people are using and can't be sure that you
> don't break them when you change or refactor your implementation

That's a non-issue for OSS software. If a user used something from 1.1
that's gone in 1.2, he has three options:

1. He can chose *not* to upgrade. Nothing forces him to.

2. He can backport the changes he needs from 1.2

3. He can modify his code to work with 1.2.

Of course, with closed software, you have no choice but #3.

That said, I of course try to make no breaking changes and the
information that I offer is basic. It can't be broken down further or
aggregated in any useful way with other information, so even if I have
to change some internal details, this information will be still around.

>> That said, I haven't thought about building Maven on DecentXML and get rid
>> of StAX. Pull parsers are a step forward to what we had before (SAX&DOM) but
>> they still suck (any parser sucks, pull parsers just suck a bit less).
> 
> *any* parser sucks? does that include DecentXML? ;)

It sucks least. For me. :)

>> As of right now, there are two reasons not to use XML:
>>
>> 1. You documents are huge (10MB and more)
> 
> kind of orthogonal to choosing a parser - besides I can write large
> documents in any format

Well, all parsers that build context for you (so you don't have to track
it yourself), need the memory to keep the whole document in memory.

That said, I haven't tried DecentXML with a big document yet. I've tried
to make it memory conservative (I'm an old C64 guy; I hate to waste
bytes), so it won't allocate memory for a lot of things until it has to
(for example, if your elements don't have attributes, each element will
only need 4-8 bytes for the null pointer)

> 2. You must be compliant to some API.
> 
> I actually like APIs, because it gives me confidence I can swap in other
> implementations later on

You still believe that? ;) Okay ... have you ever tried to replace
Xerces with Crimson or the other way around? Both are SAX parsers and
both are SAX compliant and if you do, you will have the most strange
errors plus some of your code probably won't work because you needed
that one feature which wasn't in the standard ...

Further more, there usually is one implementation which fits you need so
you will never ever do that and any minute spent of making it compatible
is most likely wasted.

As for DecentXML, I've considered to make it Duck-typing compatible to
JDOM (so you can replace the JAR and recompile and you shouldn't have
many errors). Didn't have the time nor the pressure, yet.

I also considered to make the API compatible to java.util.List to make
it more natural to work with child nodes, etc. But that would mean a
major breaking change in the API and I'm not sure it's worth the effort.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://darkviews.blogspot.com/          http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Stuart McCulloch <mc...@gmail.com>.
2008/8/5 Aaron Digulla <di...@hepe.com>

> Sorry. I've written code with about any XML parser out there and none of
> them would even get close to what DecentXML can do. In DecentXML, there are
> no private fields or methods. Everything is meant to be extended or reused.
> It's meant to be useful instead of limiting what you can do.


ouch, there's a reason to use private fields, etc. - if you open everything
up then you don't know what people are using and can't be sure that you
don't break them when you change or refactor your implementation

That said, I haven't thought about building Maven on DecentXML and get rid
> of StAX. Pull parsers are a step forward to what we had before (SAX&DOM) but
> they still suck (any parser sucks, pull parsers just suck a bit less).
>

*any* parser sucks? does that include DecentXML? ;)


> As of right now, there are two reasons not to use XML:
>
> 1. You documents are huge (10MB and more)
>

kind of orthogonal to choosing a parser - besides I can write large
documents in any format

2. You must be compliant to some API.
>

I actually like APIs, because it gives me confidence I can swap in other
implementations later on


> Regards,
>
> --
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>
-- 
Cheers, Stuart

Re: POM rewriting with DecentXML

Posted by Stephen Connolly <st...@gmail.com>.
Aaron,

speaking for myself, it's more the aggressive tone of your replies  
that's getting in the way.

I agree that SAX and DOM are stinking piles of crap and completely  
unsuited to round-tripping preserving human formatting.

However I still think StAX has the potential to be made to do what's  
needed.

BTW I will put my time where my mouth is.



Sent from my iPod

On 5 Aug 2008, at 20:31, Aaron Digulla <di...@hepe.com> wrote:

> Jochen Wiedmann schrieb:
>
>>>> particular mode, one could supply and accept additional events.  
>>>> In the
>>>> case of white space around attributes, you could offer an  
>>>> extension of
>>>> the Attribute interface that informs about whitespace to the left  
>>>> and
>>>> to the right.
>>> That's great. Can I also add new getters to the Attribute  
>>> interface? ;)
>>
>> Aaron, I know you since the early Amiga days quite well and have a
>> very high opinion of you. So believe me: Such nonsense is way beyond
>> your abilities. Or do I really need to point out how you can extemd  
>> an
>> API, exposing additional powers, without loosing upwards
>> compatibility, by extending interfaces or adding methods to
>> implementations?
>
> My point is that you can't extend an API which is part of a standard.
> Either the standard already contains means to save the information I
> need to add or there is no way to keep it.
>
> What drives me so mad is that no one here on the list takes the five
> minutes to even try to understand what I'm talking about.
>
>>> Okay. Look at the last example in the tutorial (http://code.google.com/p/decentxml/wiki/Tutorial 
>>> ).
>>> If StAX can pass this test case, I'm willing to have a look.
>>
>> I know how difficult it can be to keep XML syntax from a lot of
>> experience. Typical areas of trouble are internal DTD (which we
>> hopefully no longer need to bother with, thanks heaven) and (as you
>> have pointed out) white space within opening and closing tags. But
>> that example is almost trivial. I think you are underestimating the
>> power of SAX, StAX, and friends quite a lot.
>
> Despite the example being so trivial, I haven't seen a single XML  
> parser
> who can do this:
>
> - Create a document
> - Add a single root element
> - Add two attributes
> - Insert a newline between two attributes
> - Write the document to a file
>
> You can't do it. The XML standard allows it as input but there is no  
> API
> to do that for the output. The space between attributes is a limbo.  
> Same
> goes for comments before the root element in DOM. SAX can't do it, DOM
> certainly can't do it and StAX can't do it either.
>
> I challenge you to prove me wrong. Send me a piece of code which I can
> compile against any existing XML parser to prove me wrong (and it must
> be a parser. System.out or FileWriter isn't a parser; of course you  
> can
> do that manually).
>
> Regards,
>
> -- 
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://darkviews.blogspot.com/          http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Jochen Wiedmann schrieb:

>>> particular mode, one could supply and accept additional events. In the
>>> case of white space around attributes, you could offer an extension of
>>> the Attribute interface that informs about whitespace to the left and
>>> to the right.
>> That's great. Can I also add new getters to the Attribute interface? ;)
> 
> Aaron, I know you since the early Amiga days quite well and have a
> very high opinion of you. So believe me: Such nonsense is way beyond
> your abilities. Or do I really need to point out how you can extemd an
> API, exposing additional powers, without loosing upwards
> compatibility, by extending interfaces or adding methods to
> implementations?

My point is that you can't extend an API which is part of a standard.
Either the standard already contains means to save the information I
need to add or there is no way to keep it.

What drives me so mad is that no one here on the list takes the five
minutes to even try to understand what I'm talking about.

>> Okay. Look at the last example in the tutorial (http://code.google.com/p/decentxml/wiki/Tutorial).
>> If StAX can pass this test case, I'm willing to have a look.
> 
> I know how difficult it can be to keep XML syntax from a lot of
> experience. Typical areas of trouble are internal DTD (which we
> hopefully no longer need to bother with, thanks heaven) and (as you
> have pointed out) white space within opening and closing tags. But
> that example is almost trivial. I think you are underestimating the
> power of SAX, StAX, and friends quite a lot.

Despite the example being so trivial, I haven't seen a single XML parser
who can do this:

- Create a document
- Add a single root element
- Add two attributes
- Insert a newline between two attributes
- Write the document to a file

You can't do it. The XML standard allows it as input but there is no API
to do that for the output. The space between attributes is a limbo. Same
goes for comments before the root element in DOM. SAX can't do it, DOM
certainly can't do it and StAX can't do it either.

I challenge you to prove me wrong. Send me a piece of code which I can
compile against any existing XML parser to prove me wrong (and it must
be a parser. System.out or FileWriter isn't a parser; of course you can
do that manually).

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://darkviews.blogspot.com/          http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Jochen Wiedmann <jo...@gmail.com>.
On Tue, Aug 5, 2008 at 2:40 PM, Aaron Digulla <di...@hepe.com> wrote:

>> No API forbids extending.
>
> In Java, APIs are written by paranoid control freaks ;) Just try to add a
> line number to org.jdom.Element... It *ought* to be possible ... maybe with
> a little bit of reflection, setAccessible(true) and a smart classloader ...

>> particular mode, one could supply and accept additional events. In the
>> case of white space around attributes, you could offer an extension of
>> the Attribute interface that informs about whitespace to the left and
>> to the right.
>
> That's great. Can I also add new getters to the Attribute interface? ;)

Aaron, I know you since the early Amiga days quite well and have a
very high opinion of you. So believe me: Such nonsense is way beyond
your abilities. Or do I really need to point out how you can extemd an
API, exposing additional powers, without loosing upwards
compatibility, by extending interfaces or adding methods to
implementations?


> Okay. Look at the last example in the tutorial (http://code.google.com/p/decentxml/wiki/Tutorial).
> If StAX can pass this test case, I'm willing to have a look.

I know how difficult it can be to keep XML syntax from a lot of
experience. Typical areas of trouble are internal DTD (which we
hopefully no longer need to bother with, thanks heaven) and (as you
have pointed out) white space within opening and closing tags. But
that example is almost trivial. I think you are underestimating the
power of SAX, StAX, and friends quite a lot.

Jochen

-- 
Look, that's why there's rules, understand? So that you think before
you break 'em.

 -- (Terry Pratchett, Thief of Time)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Quoting Jochen Wiedmann <jo...@gmail.com>:

>> As for patching it: StAX is a standard API (JSR-173). How big are my chances
>> that the standard API is going to be extended to allow the features I need?
>
> No API forbids extending.

In Java, APIs are written by paranoid control freaks ;) Just try to  
add a line number to org.jdom.Element... It *ought* to be possible ...  
maybe with a little bit of reflection, setAccessible(true) and a smart  
classloader ...

> In the case of a StAX parser, one could by
> default supply and accept the standard event types. However, in a
> particular mode, one could supply and accept additional events. In the
> case of white space around attributes, you could offer an extension of
> the Attribute interface that informs about whitespace to the left and
> to the right.

That's great. Can I also add new getters to the Attribute interface? ;)

Sorry. I've written code with about any XML parser out there and none  
of them would even get close to what DecentXML can do. In DecentXML,  
there are no private fields or methods. Everything is meant to be  
extended or reused. It's meant to be useful instead of limiting what  
you can do.

>> As I said: My parser is probably not so useful as a general purpose
>> replacement for POM *reading* in general. It ought to be used in the Maven
>> artifact plugin and any other code which *writes* POM files.
>
> Unfortunately, these can hardly be separated. At least from a software
> architects point of view, I'd strongly argue against two completely
> different approaches for reading POM files.

You gave the answer to that yourself:

> Look, that's why there's rules, understand? So that you think before
> you break 'em.

As long as the current standard XML tools can't do what we need, we  
need a second tool.

That said, I haven't thought about building Maven on DecentXML and get  
rid of StAX. Pull parsers are a step forward to what we had before  
(SAX&DOM) but they still suck (any parser sucks, pull parsers just  
suck a bit less).

I haven't tried to write a XML->OO mapper with DecentXML, yet, but  
it'll probably be more simple to do than with anything else in the  
market. One of the reasons is that you can extend and override methods  
in XMLParser, so you could call the parser and get a Document back  
whose root element is a Project object. Or have Project extend Document.

That would allow you to merge the XML and the OO API into one. Not  
sure if that was smart but at least, you have the choice.

Or you could use the XML tokenizer directly to slurp in the POM.  
Another choice which might make sense in 1% of all the situations.  
Most people will never do that, of course, but the 1% sure will be  
happy to be able to without having to invent the wheel again.

AFAIK, DecentXML is also the only XML parser out there which allows  
you to access the class which turns a byte stream into unicode (see  
XMLInputStreamSource).

As of right now, there are two reasons not to use XML:

1. You documents are huge (10MB and more)

2. You must be compliant to some API.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Jochen Wiedmann <jo...@gmail.com>.
On Tue, Aug 5, 2008 at 9:28 AM, Aaron Digulla <di...@hepe.com> wrote:

> StAX can't preserve whitespace between attributes, between "<" and the element name, whitespace
> after the last attribute and the ">", between "</" and the end element name. Same goes for all pull parsers.

I must admit that I can life comfortably if these are changed in my POM ...

> Not sure about CDATA but I guess StAX can't preserve that, either.

Of course it does. There is an event type CDATA, as opposed to SPACE
or CHARACTERS. This event type is supposed for both reading and
writing.


> As for patching it: StAX is a standard API (JSR-173). How big are my chances
> that the standard API is going to be extended to allow the features I need?

No API forbids extending. In the case of a StAX parser, one could by
default supply and accept the standard event types. However, in a
particular mode, one could supply and accept additional events. In the
case of white space around attributes, you could offer an extension of
the Attribute interface that informs about whitespace to the left and
to the right.

But that would still miss the most important problem: Supplying this
information is important. But in practice, the POM file is read and
transformed into a model. And these completely syntactical
informations will hardly enter the model. But if they don't, then they
will also be missing when writing the POM file. At which point they
are lost.


> Just looking at an XML gives you a visual clue: these guys couldn't care
> less how it *looks* as long as their tools can read it.

That's exactly the purpose of XML, isn't it. :-)


> As I said: My parser is probably not so useful as a general purpose
> replacement for POM *reading* in general. It ought to be used in the Maven
> artifact plugin and any other code which *writes* POM files.

Unfortunately, these can hardly be separated. At least from a software
architects point of view, I'd strongly argue against two completely
different approaches for reading POM files.


Jochen


-- 
Look, that's why there's rules, understand? So that you think before
you break 'em.

 -- (Terry Pratchett, Thief of Time)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Milos Kleint <mk...@gmail.com>.
how do you deal with newly added content? just overwriting a value for
existing elements is relatively easy.
I mean if I add a new dependency to the pom file, how do I make sure
it's properly indented? That's been the major issue for me now with
the jdom modello writer (which I wrote). I don't really care if it
screws up attribute spacing, but newly added content needs to fit in.

Milos

On Tue, Aug 5, 2008 at 9:28 AM, Aaron Digulla <di...@hepe.com> wrote:
> Quoting Jason van Zyl <ja...@maven.org>:
>
>> But I think looking at StAX and possibly trying to patch that to be
>> smarter about formatting, if necessary, might be a better route for us.
>
> StAX can't preserve whitespace between attributes, between "<" and the
> element name, whitespace after the last attribute and the ">", between "</"
> and the end element name. Same goes for all pull parsers.
>
> Not sure about CDATA but I guess StAX can't preserve that, either. Lastly,
> StAX is about *reading* XML. DecentXML is about *writing* XML *preserving*
> the original format 100%, no compromises.
>
> As for patching it: StAX is a standard API (JSR-173). How big are my chances
> that the standard API is going to be extended to allow the features I need?
> I mean, there was *no* XML parser which can do 100% round-tripping before
> DecentXML. It's just a non-issue for the XML guys.
>
> Just looking at an XML gives you a visual clue: these guys couldn't care
> less how it *looks* as long as their tools can read it.
>
> As I said: My parser is probably not so useful as a general purpose
> replacement for POM *reading* in general. It ought to be used in the Maven
> artifact plugin and any other code which *writes* POM files.
>
> Regards,
>
> --
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Quoting Jason van Zyl <ja...@maven.org>:

> But I think looking at StAX and possibly trying to patch that to be
> smarter about formatting, if necessary, might be a better route for us.

StAX can't preserve whitespace between attributes, between "<" and the  
element name, whitespace after the last attribute and the ">", between  
"</" and the end element name. Same goes for all pull parsers.

Not sure about CDATA but I guess StAX can't preserve that, either.  
Lastly, StAX is about *reading* XML. DecentXML is about *writing* XML  
*preserving* the original format 100%, no compromises.

As for patching it: StAX is a standard API (JSR-173). How big are my  
chances that the standard API is going to be extended to allow the  
features I need? I mean, there was *no* XML parser which can do 100%  
round-tripping before DecentXML. It's just a non-issue for the XML guys.

Just looking at an XML gives you a visual clue: these guys couldn't  
care less how it *looks* as long as their tools can read it.

As I said: My parser is probably not so useful as a general purpose  
replacement for POM *reading* in general. It ought to be used in the  
Maven artifact plugin and any other code which *writes* POM files.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Jason van Zyl <ja...@maven.org>.
Aaron,

I talked to Dan who knows how the StAX framework does it's parsing and  
he says that it was fairly good whitespace control.

We are using StaX in the work that Shane has done and maybe you could  
evaluate if this is just another framework that doesn't preserve  
formatting and maybe you could integrate your ideas into StaX if they  
don't as we have a investment already in StAX directly in the work  
Shane has done and the work Brett has done on the modello side.

The other thing is that do you think you could make your parser comply  
to the pull parser API? Then we could try to plug it into the work  
we've already done?

But I think looking at StAX and possibly trying to patch that to be  
smarter about formatting, if necessary, might be a better route for us.

On 31-Jul-08, at 12:19 PM, Aaron Digulla wrote:

> Michael McCallum schrieb:
>> there is already <http://www.xom.nu/> that is worth considering...  
>> its goal is
>> correctness and roundtripability ;-)... its mature and stable... i  
>> have used
>> it and been very happy
>
> xom doesn't preserve whitespace in elements (no XML parser besides
> DecentXML can do that) and it doesn't preserve CDATA and entities.
>
> Also, you can create elements with special whitespace between the
> element attributes, for example.
>
> Regards,
>
> -- 
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://darkviews.blogspot.com/          http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
----------------------------------------------------------

Our achievements speak for themselves. What we have to keep track
of are our failures, discouragements and doubts. We tend to forget
the past difficulties, the many false starts, and the painful
groping. We see our past achievements as the end result of a
clean forward thrust, and our present difficulties as
signs of decline and decay.

  -- Eric Hoffer, Reflections on the Human Condition


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Michael McCallum schrieb:
> there is already <http://www.xom.nu/> that is worth considering... its goal is 
> correctness and roundtripability ;-)... its mature and stable... i have used 
> it and been very happy

xom doesn't preserve whitespace in elements (no XML parser besides
DecentXML can do that) and it doesn't preserve CDATA and entities.

Also, you can create elements with special whitespace between the
element attributes, for example.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://darkviews.blogspot.com/          http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Michael McCallum <gh...@apache.org>.
there is already <http://www.xom.nu/> that is worth considering... its goal is 
correctness and roundtripability ;-)... its mature and stable... i have used 
it and been very happy

On Wed, 30 Jul 2008 19:34:53 Stephen Connolly wrote:
> On Wed, Jul 30, 2008 at 8:33 AM, Stephen Connolly <
>
> stephen.alan.connolly@gmail.com> wrote:
> > On Wed, Jul 30, 2008 at 8:00 AM, Brett Porter <br...@apache.org> wrote:
> >> On 30/07/2008, at 4:52 PM, Aaron Digulla wrote:
> >>
> >>  Quoting Brett Porter <br...@apache.org>:
> >>>  I've got some code ready for you to try:
> >>>>> http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz
> >>>>
> >>>> wrong list? :)
> >>>
> >>> I've seen three Maven plugins which use their own code to manipulate
> >>> POM and other XML files in projects, so I guess this is the right list
> >>> to announce such a project.
> >>
> >> Sorry, it was missing the context so I thought you'd intended to send it
> >> to someone else.
> >
> > Part of it is that he wants this to be available for MOJO-1178... as the
> > current pom rewriting is through semi-smart string replacement (in order
> > to ensure that the formatting does not change)
>
> Which reminds me... can somebody have a look at MOJO-1178??? I'd like it if
> it could be hosted on mojo... if that is not something that people want
> I'll go look elsewhere but I'd appreciate knowing sooner rather than later
>
> -Stephen
>
> >> Cheers,
> >> Brett
> >>
> >> --
> >> Brett Porter
> >> brett@apache.org
> >> http://blogs.exist.com/bporter/
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >> For additional commands, e-mail: dev-help@maven.apache.org



-- 
Michael McCallum
Enterprise Engineer
mailto:gholam@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Stephen Connolly <st...@gmail.com>.
On Wed, Jul 30, 2008 at 8:33 AM, Stephen Connolly <
stephen.alan.connolly@gmail.com> wrote:

>
>
> On Wed, Jul 30, 2008 at 8:00 AM, Brett Porter <br...@apache.org> wrote:
>
>>
>> On 30/07/2008, at 4:52 PM, Aaron Digulla wrote:
>>
>>  Quoting Brett Porter <br...@apache.org>:
>>>
>>>  I've got some code ready for you to try:
>>>>> http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz
>>>>>
>>>> wrong list? :)
>>>>
>>>
>>> I've seen three Maven plugins which use their own code to manipulate POM
>>> and other XML files in projects, so I guess this is the right list to
>>> announce such a project.
>>>
>>
>> Sorry, it was missing the context so I thought you'd intended to send it
>> to someone else.
>>
>
> Part of it is that he wants this to be available for MOJO-1178... as the
> current pom rewriting is through semi-smart string replacement (in order to
> ensure that the formatting does not change)
>

Which reminds me... can somebody have a look at MOJO-1178??? I'd like it if
it could be hosted on mojo... if that is not something that people want I'll
go look elsewhere but I'd appreciate knowing sooner rather than later

-Stephen


>
>
>>
>> Cheers,
>> Brett
>>
>> --
>> Brett Porter
>> brett@apache.org
>> http://blogs.exist.com/bporter/
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>>
>>
>

Re: POM rewriting with DecentXML

Posted by Stephen Connolly <st...@gmail.com>.
On Wed, Jul 30, 2008 at 8:00 AM, Brett Porter <br...@apache.org> wrote:

>
> On 30/07/2008, at 4:52 PM, Aaron Digulla wrote:
>
>  Quoting Brett Porter <br...@apache.org>:
>>
>>  I've got some code ready for you to try:
>>>> http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz
>>>>
>>> wrong list? :)
>>>
>>
>> I've seen three Maven plugins which use their own code to manipulate POM
>> and other XML files in projects, so I guess this is the right list to
>> announce such a project.
>>
>
> Sorry, it was missing the context so I thought you'd intended to send it to
> someone else.
>

Part of it is that he wants this to be available for MOJO-1178... as the
current pom rewriting is through semi-smart string replacement (in order to
ensure that the formatting does not change)


>
> Cheers,
> Brett
>
> --
> Brett Porter
> brett@apache.org
> http://blogs.exist.com/bporter/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

Re: POM rewriting with DecentXML

Posted by Brett Porter <br...@apache.org>.
On 30/07/2008, at 4:52 PM, Aaron Digulla wrote:

> Quoting Brett Porter <br...@apache.org>:
>
>>> I've got some code ready for you to try:
>>> http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz
>> wrong list? :)
>
> I've seen three Maven plugins which use their own code to manipulate  
> POM and other XML files in projects, so I guess this is the right  
> list to announce such a project.

Sorry, it was missing the context so I thought you'd intended to send  
it to someone else.

Cheers,
Brett

--
Brett Porter
brett@apache.org
http://blogs.exist.com/bporter/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Aaron Digulla <di...@hepe.com>.
Quoting Brett Porter <br...@apache.org>:

>> I've got some code ready for you to try:
>> http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz
> wrong list? :)

I've seen three Maven plugins which use their own code to manipulate  
POM and other XML files in projects, so I guess this is the right list  
to announce such a project.

Regards,

-- 
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: POM rewriting with DecentXML

Posted by Brett Porter <br...@apache.org>.
wrong list? :)

On 30/07/2008, at 6:21 AM, Aaron Digulla wrote:

> Hi guys,
>
> It's official: I'm through with the W3C and the stupid XML parsers  
> which
> came in it's wake. To allow to write XML filters and editors which  
> don't
> ruin the layout, I've started my own XML parser project "DecentXML".
>
> The main goals are to provide a library to manipulate exiting (small)
> XML files with the least amount of disruption plus good error handling
> (like telling line and column numbers) and an easy to use API.
>
> And no, it's not W3C compliant. And it never will be. That's the whole
> point :)
>
> I've got some code ready for you to try:
> http://www.pdark.de/decentxml-1.0-SNAPSHOT-src.tar.gz
>
> It can read XML 1.0 including namespaces, manipulate it (not  
> completely
> but all common operations are supported) and write it back.
>
> As time permits, I'll get the test coverage above 90% tomorrow plus  
> I'll
> replace toXML(StringBuilder) with toXML(Writer).
>
> Regards,
>
> -- 
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://darkviews.blogspot.com/          http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

--
Brett Porter
brett@apache.org
http://blogs.exist.com/bporter/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org