You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@maven.apache.org by Jason van Zyl <ja...@maven.org> on 2007/06/21 23:39:11 UTC

Encoding issues for 2.0.8

It seems like there are many problems with encoding that could be  
easily solved with a couple tweaks to modello, specifically the  
reader and writing so I've scheduled these for 2.0.8. There some  
patches for these and hopefully Herve will work his magic with his  
suggested fix. I like the idea of borrowing the idea from the Rome IO  
utils to find the right encoding by default. That could easily be  
integrated into modello. Herve if you need access to Modello we can  
set you up.

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder and PMC Chair, Apache Maven
jason at sonatype dot com
----------------------------------------------------------




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Jason van Zyl <ja...@maven.org>.

On 21 Jun 07, at 10:26 PM 21 Jun 07, Hervé BOUTEMY wrote:

> Le jeudi 21 juin 2007, Jason van Zyl a écrit :
>> It seems like there are many problems with encoding that could be
>> easily solved with a couple tweaks to modello, specifically the
>> reader and writing so I've scheduled these for 2.0.8. There some
>> patches for these and hopefully Herve will work his magic with his
>> suggested fix. I like the idea of borrowing the idea from the Rome IO
>> utils to find the right encoding by default. That could easily be
>> integrated into modello. Herve if you need access to Modello we can
>> set you up.
> I'm interested at working on that. Do I need Modello access, or other
> components? I don't really know, these Modello things are the parts  
> I didn't
> really dive into for the moment.

We generate everything from Modello so you would need access. Not a  
problem. We can do this offline.

> The magic of the idea is that the encoding handling is not done by  
> the parser,
> but by the reader. Then, the code that has to change is the code  
> creating the
> Reader from a File: it must be changed from "new FileReader(file)"  
> to "new
> XmlReader(file)".
>
> We need to:
> 1. choose where we put the XmlReader so that any code can use it when
> necessary. Or have a dependency on Rome: but all Rome for only 1  
> class (even
> if this class is really great)...

No, just extract it from Rome, we don't want a dependency on Rome.

> 2. change every code that creates a Reader for XML parsing
>

We really only need to fix Modello and many of the problems will go  
away. Start with the Xpp3 generators but we will need to take care of  
this properly in the Stax and JDOM plugins (modello plugins) as well.

> WDYT?
>>
>> Thanks,
>>
>> Jason
>>
>> ----------------------------------------------------------
>> Jason van Zyl
>> Founder and PMC Chair, Apache Maven
>> jason at sonatype dot com
>> ----------------------------------------------------------
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder and PMC Chair, Apache Maven
jason at sonatype dot com
----------------------------------------------------------




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Hervé BOUTEMY <he...@free.fr>.

Le samedi 23 juin 2007, Kenney Westerhof a écrit :
> Hervé BOUTEMY wrote:
> > Le samedi 23 juin 2007, Brett Porter a écrit :
> >> We shouldn't permit relative entities in the POM - it would cause
> >> grief when deployed to the repository.
> >>
> >> - Brett
> >
> > ok, then the target API is read(InputStream)
> > I won't add read(URL).
>
> I think read(URL) is actually the only good API to add.
I agree that read(URL) is the most powerful, that permits encoding resolution 
and relative entities. I don't think this is the only API that should remain, 
since every stream does not have an URL.
But the question is: should relative entities be permitted?
I personally don't have real opinion, I'm interested in encoding, for french 
accents in my first name, for example :)

> I'm not sure how rome does it - pushbackstream or whatever, but an URL has
> 2 advantages: - you can re-open the url to start reading again
> - you have the location present to do relative resolution if necessary
> (outside of modello).
FYI, Rome's XmlStream uses internally a pushback to detect encoding.

Hervé
>
> We could also just use the xml api's for this, since they're designed for
> it: javax.xml.transform.Source and implementations (StreamSource comes to
> mind).
>
> -- Kenney
>
> > regards,
> >
> > Hervé
> >
> >> On 23/06/2007, at 5:29 AM, Hervé BOUTEMY wrote:
> >>> Le vendredi 22 juin 2007, Arnaud HERITIER a écrit :
> >>>> Be careful, because when you read an xml file with a reader (or an
> >>>> inputstream) instead of a path (or an url) you can't use relative
> >>>> entities
> >>>> in xml (because the parser can't know where the main doc is).
> >>>
> >>> yes, read(InputStream) is better than read(Reader) for encoding,
> >>> but does not
> >>> help for relative entities.
> >>> Do you want that I add read(URL) at the same time, and document it
> >>> as the
> >>> preferred way of getting the model read? At the first time, for
> >>> compatibility
> >>> reasons, it would call read(InputStream) then read(Reader), but in
> >>> the long
> >>> term, it could be coded to permit relative entities.
> >>> WDYT?
> >>>
> >>> Hervé
> >>>
> >>>> This is not a problem because we discourage the usage of xml
> >>>> entities in
> >>>> our xml documents, but we have to continue to say it !!!
> >>>>
> >>>> Arnaud
> >>>>
> >>>> On 22/06/07, Hervé BOUTEMY <he...@free.fr> wrote:
> >>>>> Le vendredi 22 juin 2007, Kenney Westerhof a écrit :
> >>>>>> Hi,
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>>> indeed, it's a case of doing new XXXInputstream( something,
> >>>>>> "encoding"
> >>>>>
> >>>>> ),
> >>>>>
> >>>>>> or a reader. Some work has been done on this, IIRC.
> >>>>>>
> >>>>>> The problem is that you need to prescan the xml declaration, so you
> >>>>>
> >>>>> start
> >>>>>
> >>>>>> parsing until you get the first xml language element that is not a
> >>>>>
> >>>>> comment,
> >>>>>
> >>>>>> (an xml element, in which case encoding is utf8, or
> >>>>>>  a doctype declaration, encoding is utf8, or
> >>>>>>  a processing instruction, and if it's the xml processing
> >>>>>> instruction
> >>>>>
> >>>>> parse
> >>>>>
> >>>>>> the encoding attribute and use that, otherwise it's utf8).
> >>>>>>
> >>>>>> This isn't too hard to do, except you need to restart reading
> >>>>>> the xml
> >>>>>
> >>>>> file
> >>>>>
> >>>>>> from start, if the encoding is not utf-8. The real problem is in
> >>>>>> the
> >>>>>
> >>>>> API's;
> >>>>>
> >>>>>> you cannot take a reader and restart that, since you cannot
> >>>>>> change the
> >>>>>> encoding on an instantiated reader, and you certainly don't want to
> >>>>>> wrap it. You'd need access to a raw inputstream that doesn't apply
> >>>>>> encoding transformations to the bytes, and wrap that in a Pushback
> >>>>>> something and then rewrap it if you found the encoding.
> >>>>>
> >>>>> exactly, this is the job done by XmlReader in Rome:
> >>>>>
> >>>>> https://rome.dev.java.net/apidocs/0_5/com/sun/syndication/io/
> >>>>> XmlReader.ht
> >>>>> ml
> >>>>>
> >>>>> I have the class, well written and tested by Rome developers. My
> >>>>> first
> >>>>> question is then: where to put it, to be able to use it in a lot of
> >>>>> places where there are Readers instanciated for XML streams?
> >>>>> plexus-utils? or make a dependency on Rome? or another place?
> >>>>>
> >>>>>> I'm a bit fuzzy on all the java.io api's, so we'll have to find the
> >>>>>
> >>>>> proper
> >>>>>
> >>>>>> class to use in the API so we can do this; a File would work.
> >>>>>>
> >>>>>> Anyway, I once tried to fix this issue but the api had to be
> >>>>>> changed
> >>>>>> and there were just too many changes across plexus and maven at the
> >>>>>> time to push this through.
> >>>>>
> >>>>> With this class available, the change to Maven model can be backward
> >>>>> compatible:
> >>>>> - the old read(Reader) API remains for compatibility, but is
> >>>>> deprecated
> >>>>> - a new read(InputStream) API is added, which calls read(new
> >>>>> XmlReader(in))
> >>>>> The whole Maven code can then slowly migrate from deprecated
> >>>>> Reader API
> >>>>> to the
> >>>>> new InputStream one, or use XmlReader if it is too hard to switch to
> >>>>> InputStream.
> >>>>> The only change is that there is a new dependency to this XmlReader
> >>>>> class: I
> >>>>> don't know if it is a real problem or not.
> >>>>>
> >>>>> I searched a little bit, this new API addition could be done
> >>>>> individually
> >>>>> in
> >>>>> each .mdo file. But of course integrating it the code generation
> >>>>> mechanism of
> >>>>> Modello would be a lot better: Jason, if your proposal to have
> >>>>> access to
> >>>>> Modello is still valid, I'm interested.
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> Hervé
> >>>>>
> >>>>>> -- Kenney
> >>>>>>
> >>>>>> Hervé BOUTEMY wrote:
> >>>>>>> Le jeudi 21 juin 2007, Jason van Zyl a écrit :
> >>>>>>>> It seems like there are many problems with encoding that could be
> >>>>>>>> easily solved with a couple tweaks to modello, specifically the
> >>>>>>>> reader and writing so I've scheduled these for 2.0.8. There some
> >>>>>>>> patches for these and hopefully Herve will work his magic with
> >>>>>>>> his
> >>>>>>>> suggested fix. I like the idea of borrowing the idea from the
> >>>>>>>> Rome
> >>>>>>>> IO utils to find the right encoding by default. That could
> >>>>>>>> easily be
> >>>>>>>> integrated into modello. Herve if you need access to Modello
> >>>>>>>> we can
> >>>>>>>> set you up.
> >>>>>>>
> >>>>>>> I'm interested at working on that. Do I need Modello access, or
> >>>>>>> other
> >>>>>>> components? I don't really know, these Modello things are the
> >>>>>>> parts I
> >>>>>>> didn't really dive into for the moment.
> >>>>>>> The magic of the idea is that the encoding handling is not done by
> >>>>>>> the parser, but by the reader. Then, the code that has to
> >>>>>>> change is
> >>>>>>> the
> >>>>>
> >>>>> code
> >>>>>
> >>>>>>> creating the Reader from a File: it must be changed from "new
> >>>>>>> FileReader(file)" to "new XmlReader(file)".
> >>>>>>>
> >>>>>>> We need to:
> >>>>>>> 1. choose where we put the XmlReader so that any code can use
> >>>>>>> it when
> >>>>>>> necessary. Or have a dependency on Rome: but all Rome for only 1
> >>>>>>> class (even if this class is really great)...
> >>>>>>> 2. change every code that creates a Reader for XML parsing
> >>>>>>>
> >>>>>>> WDYT?
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Jason
> >>>>>>>>
> >>>>>>>> ----------------------------------------------------------
> >>>>>>>> Jason van Zyl
> >>>>>>>> Founder and PMC Chair, Apache Maven
> >>>>>>>> jason at sonatype dot com
> >>>>>>>> ----------------------------------------------------------
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -----------------------------------------------------------------
> >>>>>>>> ---
> >>>>>>>> - To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >>>>>>>> For additional commands, e-mail: dev-help@maven.apache.org
> >>>>>>>
> >>>>>>> ------------------------------------------------------------------
> >>>>>>> ---
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >>>>>>> For additional commands, e-mail: dev-help@maven.apache.org
> >>>>>>
> >>>>>> -------------------------------------------------------------------
> >>>>>> --
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >>>>>> For additional commands, e-mail: dev-help@maven.apache.org
> >>>>>
> >>>>> --------------------------------------------------------------------
> >>>>> -
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >>>>> For additional commands, e-mail: dev-help@maven.apache.org
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >>> For additional commands, e-mail: dev-help@maven.apache.org
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >> For additional commands, e-mail: dev-help@maven.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Kenney Westerhof <ke...@apache.org>.


Hervé BOUTEMY wrote:
> Le samedi 23 juin 2007, Brett Porter a écrit :
>> We shouldn't permit relative entities in the POM - it would cause
>> grief when deployed to the repository.
>>
>> - Brett
> ok, then the target API is read(InputStream)
> I won't add read(URL).

I think read(URL) is actually the only good API to add.
I'm not sure how rome does it - pushbackstream or whatever, but an URL has 2 advantages:
- you can re-open the url to start reading again
- you have the location present to do relative resolution if necessary (outside of modello).

We could also just use the xml api's for this, since they're designed for it: javax.xml.transform.Source
and implementations (StreamSource comes to mind).

-- Kenney

> 
> regards,
> 
> Hervé
> 
>> On 23/06/2007, at 5:29 AM, Hervé BOUTEMY wrote:
>>> Le vendredi 22 juin 2007, Arnaud HERITIER a écrit :
>>>> Be careful, because when you read an xml file with a reader (or an
>>>> inputstream) instead of a path (or an url) you can't use relative
>>>> entities
>>>> in xml (because the parser can't know where the main doc is).
>>> yes, read(InputStream) is better than read(Reader) for encoding,
>>> but does not
>>> help for relative entities.
>>> Do you want that I add read(URL) at the same time, and document it
>>> as the
>>> preferred way of getting the model read? At the first time, for
>>> compatibility
>>> reasons, it would call read(InputStream) then read(Reader), but in
>>> the long
>>> term, it could be coded to permit relative entities.
>>> WDYT?
>>>
>>> Hervé
>>>
>>>> This is not a problem because we discourage the usage of xml
>>>> entities in
>>>> our xml documents, but we have to continue to say it !!!
>>>>
>>>> Arnaud
>>>>
>>>> On 22/06/07, Hervé BOUTEMY <he...@free.fr> wrote:
>>>>> Le vendredi 22 juin 2007, Kenney Westerhof a écrit :
>>>>>> Hi,
>>>>> Hi,
>>>>>
>>>>>> indeed, it's a case of doing new XXXInputstream( something,
>>>>>> "encoding"
>>>>> ),
>>>>>
>>>>>> or a reader. Some work has been done on this, IIRC.
>>>>>>
>>>>>> The problem is that you need to prescan the xml declaration, so you
>>>>> start
>>>>>
>>>>>> parsing until you get the first xml language element that is not a
>>>>> comment,
>>>>>
>>>>>> (an xml element, in which case encoding is utf8, or
>>>>>>  a doctype declaration, encoding is utf8, or
>>>>>>  a processing instruction, and if it's the xml processing
>>>>>> instruction
>>>>> parse
>>>>>
>>>>>> the encoding attribute and use that, otherwise it's utf8).
>>>>>>
>>>>>> This isn't too hard to do, except you need to restart reading
>>>>>> the xml
>>>>> file
>>>>>
>>>>>> from start, if the encoding is not utf-8. The real problem is in
>>>>>> the
>>>>> API's;
>>>>>
>>>>>> you cannot take a reader and restart that, since you cannot
>>>>>> change the
>>>>>> encoding on an instantiated reader, and you certainly don't want to
>>>>>> wrap it. You'd need access to a raw inputstream that doesn't apply
>>>>>> encoding transformations to the bytes, and wrap that in a Pushback
>>>>>> something and then rewrap it if you found the encoding.
>>>>> exactly, this is the job done by XmlReader in Rome:
>>>>>
>>>>> https://rome.dev.java.net/apidocs/0_5/com/sun/syndication/io/
>>>>> XmlReader.ht
>>>>> ml
>>>>>
>>>>> I have the class, well written and tested by Rome developers. My
>>>>> first
>>>>> question is then: where to put it, to be able to use it in a lot of
>>>>> places where there are Readers instanciated for XML streams?
>>>>> plexus-utils? or make a dependency on Rome? or another place?
>>>>>
>>>>>> I'm a bit fuzzy on all the java.io api's, so we'll have to find the
>>>>> proper
>>>>>
>>>>>> class to use in the API so we can do this; a File would work.
>>>>>>
>>>>>> Anyway, I once tried to fix this issue but the api had to be
>>>>>> changed
>>>>>> and there were just too many changes across plexus and maven at the
>>>>>> time to push this through.
>>>>> With this class available, the change to Maven model can be backward
>>>>> compatible:
>>>>> - the old read(Reader) API remains for compatibility, but is
>>>>> deprecated
>>>>> - a new read(InputStream) API is added, which calls read(new
>>>>> XmlReader(in))
>>>>> The whole Maven code can then slowly migrate from deprecated
>>>>> Reader API
>>>>> to the
>>>>> new InputStream one, or use XmlReader if it is too hard to switch to
>>>>> InputStream.
>>>>> The only change is that there is a new dependency to this XmlReader
>>>>> class: I
>>>>> don't know if it is a real problem or not.
>>>>>
>>>>> I searched a little bit, this new API addition could be done
>>>>> individually
>>>>> in
>>>>> each .mdo file. But of course integrating it the code generation
>>>>> mechanism of
>>>>> Modello would be a lot better: Jason, if your proposal to have
>>>>> access to
>>>>> Modello is still valid, I'm interested.
>>>>>
>>>>> Regards
>>>>>
>>>>> Hervé
>>>>>
>>>>>> -- Kenney
>>>>>>
>>>>>> Hervé BOUTEMY wrote:
>>>>>>> Le jeudi 21 juin 2007, Jason van Zyl a écrit :
>>>>>>>> It seems like there are many problems with encoding that could be
>>>>>>>> easily solved with a couple tweaks to modello, specifically the
>>>>>>>> reader and writing so I've scheduled these for 2.0.8. There some
>>>>>>>> patches for these and hopefully Herve will work his magic with
>>>>>>>> his
>>>>>>>> suggested fix. I like the idea of borrowing the idea from the
>>>>>>>> Rome
>>>>>>>> IO utils to find the right encoding by default. That could
>>>>>>>> easily be
>>>>>>>> integrated into modello. Herve if you need access to Modello
>>>>>>>> we can
>>>>>>>> set you up.
>>>>>>> I'm interested at working on that. Do I need Modello access, or
>>>>>>> other
>>>>>>> components? I don't really know, these Modello things are the
>>>>>>> parts I
>>>>>>> didn't really dive into for the moment.
>>>>>>> The magic of the idea is that the encoding handling is not done by
>>>>>>> the parser, but by the reader. Then, the code that has to
>>>>>>> change is
>>>>>>> the
>>>>> code
>>>>>
>>>>>>> creating the Reader from a File: it must be changed from "new
>>>>>>> FileReader(file)" to "new XmlReader(file)".
>>>>>>>
>>>>>>> We need to:
>>>>>>> 1. choose where we put the XmlReader so that any code can use
>>>>>>> it when
>>>>>>> necessary. Or have a dependency on Rome: but all Rome for only 1
>>>>>>> class (even if this class is really great)...
>>>>>>> 2. change every code that creates a Reader for XML parsing
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Jason
>>>>>>>>
>>>>>>>> ----------------------------------------------------------
>>>>>>>> Jason van Zyl
>>>>>>>> Founder and PMC Chair, Apache Maven
>>>>>>>> jason at sonatype dot com
>>>>>>>> ----------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------------------------------------------------------
>>>>>>>> ---
>>>>>>>> - To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>>>>> ------------------------------------------------------------------
>>>>>>> ---
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>>>> -------------------------------------------------------------------
>>>>>> --
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>>> --------------------------------------------------------------------
>>>>> -
>>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>> For additional commands, e-mail: dev-help@maven.apache.org
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Hervé BOUTEMY <he...@free.fr>.

Le samedi 23 juin 2007, Brett Porter a écrit :
> We shouldn't permit relative entities in the POM - it would cause
> grief when deployed to the repository.
>
> - Brett
ok, then the target API is read(InputStream)
I won't add read(URL).

regards,

Hervé

>
> On 23/06/2007, at 5:29 AM, Hervé BOUTEMY wrote:
> > Le vendredi 22 juin 2007, Arnaud HERITIER a écrit :
> >> Be careful, because when you read an xml file with a reader (or an
> >> inputstream) instead of a path (or an url) you can't use relative
> >> entities
> >> in xml (because the parser can't know where the main doc is).
> >
> > yes, read(InputStream) is better than read(Reader) for encoding,
> > but does not
> > help for relative entities.
> > Do you want that I add read(URL) at the same time, and document it
> > as the
> > preferred way of getting the model read? At the first time, for
> > compatibility
> > reasons, it would call read(InputStream) then read(Reader), but in
> > the long
> > term, it could be coded to permit relative entities.
> > WDYT?
> >
> > Hervé
> >
> >> This is not a problem because we discourage the usage of xml
> >> entities in
> >> our xml documents, but we have to continue to say it !!!
> >>
> >> Arnaud
> >>
> >> On 22/06/07, Hervé BOUTEMY <he...@free.fr> wrote:
> >>> Le vendredi 22 juin 2007, Kenney Westerhof a écrit :
> >>>> Hi,
> >>>
> >>> Hi,
> >>>
> >>>> indeed, it's a case of doing new XXXInputstream( something,
> >>>> "encoding"
> >>>
> >>> ),
> >>>
> >>>> or a reader. Some work has been done on this, IIRC.
> >>>>
> >>>> The problem is that you need to prescan the xml declaration, so you
> >>>
> >>> start
> >>>
> >>>> parsing until you get the first xml language element that is not a
> >>>
> >>> comment,
> >>>
> >>>> (an xml element, in which case encoding is utf8, or
> >>>>  a doctype declaration, encoding is utf8, or
> >>>>  a processing instruction, and if it's the xml processing
> >>>> instruction
> >>>
> >>> parse
> >>>
> >>>> the encoding attribute and use that, otherwise it's utf8).
> >>>>
> >>>> This isn't too hard to do, except you need to restart reading
> >>>> the xml
> >>>
> >>> file
> >>>
> >>>> from start, if the encoding is not utf-8. The real problem is in
> >>>> the
> >>>
> >>> API's;
> >>>
> >>>> you cannot take a reader and restart that, since you cannot
> >>>> change the
> >>>> encoding on an instantiated reader, and you certainly don't want to
> >>>> wrap it. You'd need access to a raw inputstream that doesn't apply
> >>>> encoding transformations to the bytes, and wrap that in a Pushback
> >>>> something and then rewrap it if you found the encoding.
> >>>
> >>> exactly, this is the job done by XmlReader in Rome:
> >>>
> >>> https://rome.dev.java.net/apidocs/0_5/com/sun/syndication/io/
> >>> XmlReader.ht
> >>> ml
> >>>
> >>> I have the class, well written and tested by Rome developers. My
> >>> first
> >>> question is then: where to put it, to be able to use it in a lot of
> >>> places where there are Readers instanciated for XML streams?
> >>> plexus-utils? or make a dependency on Rome? or another place?
> >>>
> >>>> I'm a bit fuzzy on all the java.io api's, so we'll have to find the
> >>>
> >>> proper
> >>>
> >>>> class to use in the API so we can do this; a File would work.
> >>>>
> >>>> Anyway, I once tried to fix this issue but the api had to be
> >>>> changed
> >>>> and there were just too many changes across plexus and maven at the
> >>>> time to push this through.
> >>>
> >>> With this class available, the change to Maven model can be backward
> >>> compatible:
> >>> - the old read(Reader) API remains for compatibility, but is
> >>> deprecated
> >>> - a new read(InputStream) API is added, which calls read(new
> >>> XmlReader(in))
> >>> The whole Maven code can then slowly migrate from deprecated
> >>> Reader API
> >>> to the
> >>> new InputStream one, or use XmlReader if it is too hard to switch to
> >>> InputStream.
> >>> The only change is that there is a new dependency to this XmlReader
> >>> class: I
> >>> don't know if it is a real problem or not.
> >>>
> >>> I searched a little bit, this new API addition could be done
> >>> individually
> >>> in
> >>> each .mdo file. But of course integrating it the code generation
> >>> mechanism of
> >>> Modello would be a lot better: Jason, if your proposal to have
> >>> access to
> >>> Modello is still valid, I'm interested.
> >>>
> >>> Regards
> >>>
> >>> Hervé
> >>>
> >>>> -- Kenney
> >>>>
> >>>> Hervé BOUTEMY wrote:
> >>>>> Le jeudi 21 juin 2007, Jason van Zyl a écrit :
> >>>>>> It seems like there are many problems with encoding that could be
> >>>>>> easily solved with a couple tweaks to modello, specifically the
> >>>>>> reader and writing so I've scheduled these for 2.0.8. There some
> >>>>>> patches for these and hopefully Herve will work his magic with
> >>>>>> his
> >>>>>> suggested fix. I like the idea of borrowing the idea from the
> >>>>>> Rome
> >>>>>> IO utils to find the right encoding by default. That could
> >>>>>> easily be
> >>>>>> integrated into modello. Herve if you need access to Modello
> >>>>>> we can
> >>>>>> set you up.
> >>>>>
> >>>>> I'm interested at working on that. Do I need Modello access, or
> >>>>> other
> >>>>> components? I don't really know, these Modello things are the
> >>>>> parts I
> >>>>> didn't really dive into for the moment.
> >>>>> The magic of the idea is that the encoding handling is not done by
> >>>>> the parser, but by the reader. Then, the code that has to
> >>>>> change is
> >>>>> the
> >>>
> >>> code
> >>>
> >>>>> creating the Reader from a File: it must be changed from "new
> >>>>> FileReader(file)" to "new XmlReader(file)".
> >>>>>
> >>>>> We need to:
> >>>>> 1. choose where we put the XmlReader so that any code can use
> >>>>> it when
> >>>>> necessary. Or have a dependency on Rome: but all Rome for only 1
> >>>>> class (even if this class is really great)...
> >>>>> 2. change every code that creates a Reader for XML parsing
> >>>>>
> >>>>> WDYT?
> >>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Jason
> >>>>>>
> >>>>>> ----------------------------------------------------------
> >>>>>> Jason van Zyl
> >>>>>> Founder and PMC Chair, Apache Maven
> >>>>>> jason at sonatype dot com
> >>>>>> ----------------------------------------------------------
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -----------------------------------------------------------------
> >>>>>> ---
> >>>>>> - To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >>>>>> For additional commands, e-mail: dev-help@maven.apache.org
> >>>>>
> >>>>> ------------------------------------------------------------------
> >>>>> ---
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >>>>> For additional commands, e-mail: dev-help@maven.apache.org
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >>>> For additional commands, e-mail: dev-help@maven.apache.org
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >>> For additional commands, e-mail: dev-help@maven.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Brett Porter <br...@apache.org>.

We shouldn't permit relative entities in the POM - it would cause  
grief when deployed to the repository.

- Brett

On 23/06/2007, at 5:29 AM, Hervé BOUTEMY wrote:

> Le vendredi 22 juin 2007, Arnaud HERITIER a écrit :
>> Be careful, because when you read an xml file with a reader (or an
>> inputstream) instead of a path (or an url) you can't use relative  
>> entities
>> in xml (because the parser can't know where the main doc is).
> yes, read(InputStream) is better than read(Reader) for encoding,  
> but does not
> help for relative entities.
> Do you want that I add read(URL) at the same time, and document it  
> as the
> preferred way of getting the model read? At the first time, for  
> compatibility
> reasons, it would call read(InputStream) then read(Reader), but in  
> the long
> term, it could be coded to permit relative entities.
> WDYT?
>
> Hervé
>
>> This is not a problem because we discourage the usage of xml  
>> entities in
>> our xml documents, but we have to continue to say it !!!
>>
>> Arnaud
>>
>> On 22/06/07, Hervé BOUTEMY <he...@free.fr> wrote:
>>> Le vendredi 22 juin 2007, Kenney Westerhof a écrit :
>>>> Hi,
>>>
>>> Hi,
>>>
>>>> indeed, it's a case of doing new XXXInputstream( something,  
>>>> "encoding"
>>>
>>> ),
>>>
>>>> or a reader. Some work has been done on this, IIRC.
>>>>
>>>> The problem is that you need to prescan the xml declaration, so you
>>>
>>> start
>>>
>>>> parsing until you get the first xml language element that is not a
>>>
>>> comment,
>>>
>>>> (an xml element, in which case encoding is utf8, or
>>>>  a doctype declaration, encoding is utf8, or
>>>>  a processing instruction, and if it's the xml processing  
>>>> instruction
>>>
>>> parse
>>>
>>>> the encoding attribute and use that, otherwise it's utf8).
>>>>
>>>> This isn't too hard to do, except you need to restart reading  
>>>> the xml
>>>
>>> file
>>>
>>>> from start, if the encoding is not utf-8. The real problem is in  
>>>> the
>>>
>>> API's;
>>>
>>>> you cannot take a reader and restart that, since you cannot  
>>>> change the
>>>> encoding on an instantiated reader, and you certainly don't want to
>>>> wrap it. You'd need access to a raw inputstream that doesn't apply
>>>> encoding transformations to the bytes, and wrap that in a Pushback
>>>> something and then rewrap it if you found the encoding.
>>>
>>> exactly, this is the job done by XmlReader in Rome:
>>>
>>> https://rome.dev.java.net/apidocs/0_5/com/sun/syndication/io/ 
>>> XmlReader.ht
>>> ml
>>>
>>> I have the class, well written and tested by Rome developers. My  
>>> first
>>> question is then: where to put it, to be able to use it in a lot of
>>> places where there are Readers instanciated for XML streams?
>>> plexus-utils? or make a dependency on Rome? or another place?
>>>
>>>> I'm a bit fuzzy on all the java.io api's, so we'll have to find the
>>>
>>> proper
>>>
>>>> class to use in the API so we can do this; a File would work.
>>>>
>>>> Anyway, I once tried to fix this issue but the api had to be  
>>>> changed
>>>> and there were just too many changes across plexus and maven at the
>>>> time to push this through.
>>>
>>> With this class available, the change to Maven model can be backward
>>> compatible:
>>> - the old read(Reader) API remains for compatibility, but is  
>>> deprecated
>>> - a new read(InputStream) API is added, which calls read(new
>>> XmlReader(in))
>>> The whole Maven code can then slowly migrate from deprecated  
>>> Reader API
>>> to the
>>> new InputStream one, or use XmlReader if it is too hard to switch to
>>> InputStream.
>>> The only change is that there is a new dependency to this XmlReader
>>> class: I
>>> don't know if it is a real problem or not.
>>>
>>> I searched a little bit, this new API addition could be done  
>>> individually
>>> in
>>> each .mdo file. But of course integrating it the code generation
>>> mechanism of
>>> Modello would be a lot better: Jason, if your proposal to have  
>>> access to
>>> Modello is still valid, I'm interested.
>>>
>>> Regards
>>>
>>> Hervé
>>>
>>>> -- Kenney
>>>>
>>>> Hervé BOUTEMY wrote:
>>>>> Le jeudi 21 juin 2007, Jason van Zyl a écrit :
>>>>>> It seems like there are many problems with encoding that could be
>>>>>> easily solved with a couple tweaks to modello, specifically the
>>>>>> reader and writing so I've scheduled these for 2.0.8. There some
>>>>>> patches for these and hopefully Herve will work his magic with  
>>>>>> his
>>>>>> suggested fix. I like the idea of borrowing the idea from the  
>>>>>> Rome
>>>>>> IO utils to find the right encoding by default. That could  
>>>>>> easily be
>>>>>> integrated into modello. Herve if you need access to Modello  
>>>>>> we can
>>>>>> set you up.
>>>>>
>>>>> I'm interested at working on that. Do I need Modello access, or  
>>>>> other
>>>>> components? I don't really know, these Modello things are the  
>>>>> parts I
>>>>> didn't really dive into for the moment.
>>>>> The magic of the idea is that the encoding handling is not done by
>>>>> the parser, but by the reader. Then, the code that has to  
>>>>> change is
>>>>> the
>>>
>>> code
>>>
>>>>> creating the Reader from a File: it must be changed from "new
>>>>> FileReader(file)" to "new XmlReader(file)".
>>>>>
>>>>> We need to:
>>>>> 1. choose where we put the XmlReader so that any code can use  
>>>>> it when
>>>>> necessary. Or have a dependency on Rome: but all Rome for only 1
>>>>> class (even if this class is really great)...
>>>>> 2. change every code that creates a Reader for XML parsing
>>>>>
>>>>> WDYT?
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jason
>>>>>>
>>>>>> ----------------------------------------------------------
>>>>>> Jason van Zyl
>>>>>> Founder and PMC Chair, Apache Maven
>>>>>> jason at sonatype dot com
>>>>>> ----------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ---
>>>>>> - To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>> For additional commands, e-mail: dev-help@maven.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Hervé BOUTEMY <he...@free.fr>.

Le vendredi 22 juin 2007, Arnaud HERITIER a écrit :
> Be careful, because when you read an xml file with a reader (or an
> inputstream) instead of a path (or an url) you can't use relative entities
> in xml (because the parser can't know where the main doc is).
yes, read(InputStream) is better than read(Reader) for encoding, but does not 
help for relative entities.
Do you want that I add read(URL) at the same time, and document it as the 
preferred way of getting the model read? At the first time, for compatibility 
reasons, it would call read(InputStream) then read(Reader), but in the long 
term, it could be coded to permit relative entities.
WDYT?

Hervé

> This is not a problem because we discourage the usage of xml entities in
> our xml documents, but we have to continue to say it !!!
>
> Arnaud
>
> On 22/06/07, Hervé BOUTEMY <he...@free.fr> wrote:
> > Le vendredi 22 juin 2007, Kenney Westerhof a écrit :
> > > Hi,
> >
> > Hi,
> >
> > > indeed, it's a case of doing new XXXInputstream( something, "encoding"
> >
> > ),
> >
> > > or a reader. Some work has been done on this, IIRC.
> > >
> > > The problem is that you need to prescan the xml declaration, so you
> >
> > start
> >
> > > parsing until you get the first xml language element that is not a
> >
> > comment,
> >
> > > (an xml element, in which case encoding is utf8, or
> > >  a doctype declaration, encoding is utf8, or
> > >  a processing instruction, and if it's the xml processing instruction
> >
> > parse
> >
> > > the encoding attribute and use that, otherwise it's utf8).
> > >
> > > This isn't too hard to do, except you need to restart reading the xml
> >
> > file
> >
> > > from start, if the encoding is not utf-8. The real problem is in the
> >
> > API's;
> >
> > > you cannot take a reader and restart that, since you cannot change the
> > > encoding on an instantiated reader, and you certainly don't want to
> > > wrap it. You'd need access to a raw inputstream that doesn't apply
> > > encoding transformations to the bytes, and wrap that in a Pushback
> > > something and then rewrap it if you found the encoding.
> >
> > exactly, this is the job done by XmlReader in Rome:
> >
> > https://rome.dev.java.net/apidocs/0_5/com/sun/syndication/io/XmlReader.ht
> >ml
> >
> > I have the class, well written and tested by Rome developers. My first
> > question is then: where to put it, to be able to use it in a lot of
> > places where there are Readers instanciated for XML streams?
> > plexus-utils? or make a dependency on Rome? or another place?
> >
> > > I'm a bit fuzzy on all the java.io api's, so we'll have to find the
> >
> > proper
> >
> > > class to use in the API so we can do this; a File would work.
> > >
> > > Anyway, I once tried to fix this issue but the api had to be changed
> > > and there were just too many changes across plexus and maven at the
> > > time to push this through.
> >
> > With this class available, the change to Maven model can be backward
> > compatible:
> > - the old read(Reader) API remains for compatibility, but is deprecated
> > - a new read(InputStream) API is added, which calls read(new
> > XmlReader(in))
> > The whole Maven code can then slowly migrate from deprecated Reader API
> > to the
> > new InputStream one, or use XmlReader if it is too hard to switch to
> > InputStream.
> > The only change is that there is a new dependency to this XmlReader
> > class: I
> > don't know if it is a real problem or not.
> >
> > I searched a little bit, this new API addition could be done individually
> > in
> > each .mdo file. But of course integrating it the code generation
> > mechanism of
> > Modello would be a lot better: Jason, if your proposal to have access to
> > Modello is still valid, I'm interested.
> >
> > Regards
> >
> > Hervé
> >
> > > -- Kenney
> > >
> > > Hervé BOUTEMY wrote:
> > > > Le jeudi 21 juin 2007, Jason van Zyl a écrit :
> > > >> It seems like there are many problems with encoding that could be
> > > >> easily solved with a couple tweaks to modello, specifically the
> > > >> reader and writing so I've scheduled these for 2.0.8. There some
> > > >> patches for these and hopefully Herve will work his magic with his
> > > >> suggested fix. I like the idea of borrowing the idea from the Rome
> > > >> IO utils to find the right encoding by default. That could easily be
> > > >> integrated into modello. Herve if you need access to Modello we can
> > > >> set you up.
> > > >
> > > > I'm interested at working on that. Do I need Modello access, or other
> > > > components? I don't really know, these Modello things are the parts I
> > > > didn't really dive into for the moment.
> > > > The magic of the idea is that the encoding handling is not done by
> > > > the parser, but by the reader. Then, the code that has to change is
> > > > the
> >
> > code
> >
> > > > creating the Reader from a File: it must be changed from "new
> > > > FileReader(file)" to "new XmlReader(file)".
> > > >
> > > > We need to:
> > > > 1. choose where we put the XmlReader so that any code can use it when
> > > > necessary. Or have a dependency on Rome: but all Rome for only 1
> > > > class (even if this class is really great)...
> > > > 2. change every code that creates a Reader for XML parsing
> > > >
> > > > WDYT?
> > > >
> > > >> Thanks,
> > > >>
> > > >> Jason
> > > >>
> > > >> ----------------------------------------------------------
> > > >> Jason van Zyl
> > > >> Founder and PMC Chair, Apache Maven
> > > >> jason at sonatype dot com
> > > >> ----------------------------------------------------------
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --------------------------------------------------------------------
> > > >>- To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > > >> For additional commands, e-mail: dev-help@maven.apache.org
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > > > For additional commands, e-mail: dev-help@maven.apache.org
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > > For additional commands, e-mail: dev-help@maven.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Arnaud HERITIER <ah...@gmail.com>.

Be careful, because when you read an xml file with a reader (or an
inputstream) instead of a path (or an url) you can't use relative entities
in xml (because the parser can't know where the main doc is).
This is not a problem because we discourage the usage of xml entities in our
xml documents, but we have to continue to say it !!!

Arnaud

On 22/06/07, Hervé BOUTEMY <he...@free.fr> wrote:
>
> Le vendredi 22 juin 2007, Kenney Westerhof a écrit :
> > Hi,
> Hi,
>
> >
> > indeed, it's a case of doing new XXXInputstream( something, "encoding"
> ),
> > or a reader. Some work has been done on this, IIRC.
> >
> > The problem is that you need to prescan the xml declaration, so you
> start
> > parsing until you get the first xml language element that is not a
> comment,
> > (an xml element, in which case encoding is utf8, or
> >  a doctype declaration, encoding is utf8, or
> >  a processing instruction, and if it's the xml processing instruction
> parse
> > the encoding attribute and use that, otherwise it's utf8).
> >
> > This isn't too hard to do, except you need to restart reading the xml
> file
> > from start, if the encoding is not utf-8. The real problem is in the
> API's;
> > you cannot take a reader and restart that, since you cannot change the
> > encoding on an instantiated reader, and you certainly don't want to wrap
> > it. You'd need access to a raw inputstream that doesn't apply encoding
> > transformations to the bytes, and wrap that in a Pushback something and
> > then rewrap it if you found the encoding.
> exactly, this is the job done by XmlReader in Rome:
>
> https://rome.dev.java.net/apidocs/0_5/com/sun/syndication/io/XmlReader.html
>
> I have the class, well written and tested by Rome developers. My first
> question is then: where to put it, to be able to use it in a lot of places
> where there are Readers instanciated for XML streams?
> plexus-utils? or make a dependency on Rome? or another place?
>
> >
> > I'm a bit fuzzy on all the java.io api's, so we'll have to find the
> proper
> > class to use in the API so we can do this; a File would work.
> >
> > Anyway, I once tried to fix this issue but the api had to be changed and
> > there were just too many changes across plexus and maven at the time to
> > push this through.
> With this class available, the change to Maven model can be backward
> compatible:
> - the old read(Reader) API remains for compatibility, but is deprecated
> - a new read(InputStream) API is added, which calls read(new
> XmlReader(in))
> The whole Maven code can then slowly migrate from deprecated Reader API to
> the
> new InputStream one, or use XmlReader if it is too hard to switch to
> InputStream.
> The only change is that there is a new dependency to this XmlReader class:
> I
> don't know if it is a real problem or not.
>
> I searched a little bit, this new API addition could be done individually
> in
> each .mdo file. But of course integrating it the code generation mechanism
> of
> Modello would be a lot better: Jason, if your proposal to have access to
> Modello is still valid, I'm interested.
>
> Regards
>
> Hervé
>
> >
> > -- Kenney
> >
> > Hervé BOUTEMY wrote:
> > > Le jeudi 21 juin 2007, Jason van Zyl a écrit :
> > >> It seems like there are many problems with encoding that could be
> > >> easily solved with a couple tweaks to modello, specifically the
> > >> reader and writing so I've scheduled these for 2.0.8. There some
> > >> patches for these and hopefully Herve will work his magic with his
> > >> suggested fix. I like the idea of borrowing the idea from the Rome IO
> > >> utils to find the right encoding by default. That could easily be
> > >> integrated into modello. Herve if you need access to Modello we can
> > >> set you up.
> > >
> > > I'm interested at working on that. Do I need Modello access, or other
> > > components? I don't really know, these Modello things are the parts I
> > > didn't really dive into for the moment.
> > > The magic of the idea is that the encoding handling is not done by the
> > > parser, but by the reader. Then, the code that has to change is the
> code
> > > creating the Reader from a File: it must be changed from "new
> > > FileReader(file)" to "new XmlReader(file)".
> > >
> > > We need to:
> > > 1. choose where we put the XmlReader so that any code can use it when
> > > necessary. Or have a dependency on Rome: but all Rome for only 1 class
> > > (even if this class is really great)...
> > > 2. change every code that creates a Reader for XML parsing
> > >
> > > WDYT?
> > >
> > >> Thanks,
> > >>
> > >> Jason
> > >>
> > >> ----------------------------------------------------------
> > >> Jason van Zyl
> > >> Founder and PMC Chair, Apache Maven
> > >> jason at sonatype dot com
> > >> ----------------------------------------------------------
> > >>
> > >>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > >> For additional commands, e-mail: dev-help@maven.apache.org
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > > For additional commands, e-mail: dev-help@maven.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>


-- 
..........................................................
Arnaud HERITIER
..........................................................
OCTO Technology - aheritier@octo.com
www.octo.com | blog.octo.com
..........................................................
ASF - aheritier@apache.org
www.apache.org | maven.apache.org
...........................................................

Re: Encoding issues for 2.0.8

Posted by Hervé BOUTEMY <he...@free.fr>.

Le vendredi 22 juin 2007, Kenney Westerhof a écrit :
> Hi,
Hi,

>
> indeed, it's a case of doing new XXXInputstream( something, "encoding" ),
> or a reader. Some work has been done on this, IIRC.
>
> The problem is that you need to prescan the xml declaration, so you start
> parsing until you get the first xml language element that is not a comment,
> (an xml element, in which case encoding is utf8, or
>  a doctype declaration, encoding is utf8, or
>  a processing instruction, and if it's the xml processing instruction parse
> the encoding attribute and use that, otherwise it's utf8).
>
> This isn't too hard to do, except you need to restart reading the xml file
> from start, if the encoding is not utf-8. The real problem is in the API's;
> you cannot take a reader and restart that, since you cannot change the
> encoding on an instantiated reader, and you certainly don't want to wrap
> it. You'd need access to a raw inputstream that doesn't apply encoding
> transformations to the bytes, and wrap that in a Pushback something and
> then rewrap it if you found the encoding.
exactly, this is the job done by XmlReader in Rome:
https://rome.dev.java.net/apidocs/0_5/com/sun/syndication/io/XmlReader.html

I have the class, well written and tested by Rome developers. My first 
question is then: where to put it, to be able to use it in a lot of places 
where there are Readers instanciated for XML streams?
plexus-utils? or make a dependency on Rome? or another place?

>
> I'm a bit fuzzy on all the java.io api's, so we'll have to find the proper
> class to use in the API so we can do this; a File would work.
>
> Anyway, I once tried to fix this issue but the api had to be changed and
> there were just too many changes across plexus and maven at the time to
> push this through.
With this class available, the change to Maven model can be backward 
compatible:
- the old read(Reader) API remains for compatibility, but is deprecated
- a new read(InputStream) API is added, which calls read(new XmlReader(in))
The whole Maven code can then slowly migrate from deprecated Reader API to the 
new InputStream one, or use XmlReader if it is too hard to switch to 
InputStream.
The only change is that there is a new dependency to this XmlReader class: I 
don't know if it is a real problem or not.

I searched a little bit, this new API addition could be done individually in 
each .mdo file. But of course integrating it the code generation mechanism of 
Modello would be a lot better: Jason, if your proposal to have access to 
Modello is still valid, I'm interested.

Regards

Hervé

>
> -- Kenney
>
> Hervé BOUTEMY wrote:
> > Le jeudi 21 juin 2007, Jason van Zyl a écrit :
> >> It seems like there are many problems with encoding that could be
> >> easily solved with a couple tweaks to modello, specifically the
> >> reader and writing so I've scheduled these for 2.0.8. There some
> >> patches for these and hopefully Herve will work his magic with his
> >> suggested fix. I like the idea of borrowing the idea from the Rome IO
> >> utils to find the right encoding by default. That could easily be
> >> integrated into modello. Herve if you need access to Modello we can
> >> set you up.
> >
> > I'm interested at working on that. Do I need Modello access, or other
> > components? I don't really know, these Modello things are the parts I
> > didn't really dive into for the moment.
> > The magic of the idea is that the encoding handling is not done by the
> > parser, but by the reader. Then, the code that has to change is the code
> > creating the Reader from a File: it must be changed from "new
> > FileReader(file)" to "new XmlReader(file)".
> >
> > We need to:
> > 1. choose where we put the XmlReader so that any code can use it when
> > necessary. Or have a dependency on Rome: but all Rome for only 1 class
> > (even if this class is really great)...
> > 2. change every code that creates a Reader for XML parsing
> >
> > WDYT?
> >
> >> Thanks,
> >>
> >> Jason
> >>
> >> ----------------------------------------------------------
> >> Jason van Zyl
> >> Founder and PMC Chair, Apache Maven
> >> jason at sonatype dot com
> >> ----------------------------------------------------------
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >> For additional commands, e-mail: dev-help@maven.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Kenney Westerhof <ke...@apache.org>.

Hi,

indeed, it's a case of doing new XXXInputstream( something, "encoding" ),
or a reader. Some work has been done on this, IIRC.

The problem is that you need to prescan the xml declaration, so you start
parsing until you get the first xml language element that is not a comment,
(an xml element, in which case encoding is utf8, or
 a doctype declaration, encoding is utf8, or
 a processing instruction, and if it's the xml processing instruction parse the encoding
 attribute and use that, otherwise it's utf8).

This isn't too hard to do, except you need to restart reading the xml file from
start, if the encoding is not utf-8. The real problem is in the API's; you cannot
take a reader and restart that, since you cannot change the encoding on an instantiated
reader, and you certainly don't want to wrap it. You'd need access to a raw
inputstream that doesn't apply encoding transformations to the bytes, and wrap that
in a Pushback something and then rewrap it if you found the encoding.

I'm a bit fuzzy on all the java.io api's, so we'll have to find the proper
class to use in the API so we can do this; a File would work. 

Anyway, I once tried to fix this issue but the api had to be changed and there were
just too many changes across plexus and maven at the time to push this through.

-- Kenney

Hervé BOUTEMY wrote:
> Le jeudi 21 juin 2007, Jason van Zyl a écrit :
>> It seems like there are many problems with encoding that could be
>> easily solved with a couple tweaks to modello, specifically the
>> reader and writing so I've scheduled these for 2.0.8. There some
>> patches for these and hopefully Herve will work his magic with his
>> suggested fix. I like the idea of borrowing the idea from the Rome IO
>> utils to find the right encoding by default. That could easily be
>> integrated into modello. Herve if you need access to Modello we can
>> set you up.
> I'm interested at working on that. Do I need Modello access, or other 
> components? I don't really know, these Modello things are the parts I didn't 
> really dive into for the moment.
> The magic of the idea is that the encoding handling is not done by the parser, 
> but by the reader. Then, the code that has to change is the code creating the 
> Reader from a File: it must be changed from "new FileReader(file)" to "new 
> XmlReader(file)".
> 
> We need to:
> 1. choose where we put the XmlReader so that any code can use it when 
> necessary. Or have a dependency on Rome: but all Rome for only 1 class (even 
> if this class is really great)...
> 2. change every code that creates a Reader for XML parsing
> 
> WDYT?
>> Thanks,
>>
>> Jason
>>
>> ----------------------------------------------------------
>> Jason van Zyl
>> Founder and PMC Chair, Apache Maven
>> jason at sonatype dot com
>> ----------------------------------------------------------
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Hervé BOUTEMY <he...@free.fr>.

Le jeudi 21 juin 2007, Jason van Zyl a écrit :
> It seems like there are many problems with encoding that could be
> easily solved with a couple tweaks to modello, specifically the
> reader and writing so I've scheduled these for 2.0.8. There some
> patches for these and hopefully Herve will work his magic with his
> suggested fix. I like the idea of borrowing the idea from the Rome IO
> utils to find the right encoding by default. That could easily be
> integrated into modello. Herve if you need access to Modello we can
> set you up.
I'm interested at working on that. Do I need Modello access, or other 
components? I don't really know, these Modello things are the parts I didn't 
really dive into for the moment.
The magic of the idea is that the encoding handling is not done by the parser, 
but by the reader. Then, the code that has to change is the code creating the 
Reader from a File: it must be changed from "new FileReader(file)" to "new 
XmlReader(file)".

We need to:
1. choose where we put the XmlReader so that any code can use it when 
necessary. Or have a dependency on Rome: but all Rome for only 1 class (even 
if this class is really great)...
2. change every code that creates a Reader for XML parsing

WDYT?
>
> Thanks,
>
> Jason
>
> ----------------------------------------------------------
> Jason van Zyl
> Founder and PMC Chair, Apache Maven
> jason at sonatype dot com
> ----------------------------------------------------------
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Arnaud HERITIER <ah...@gmail.com>.

In maven 1.1 we are already using the stax reader/writer to support XML
entities, and it's working fine...

Arnaud

On 22/06/07, Brett Porter <br...@apache.org> wrote:
>
>
> On 22/06/2007, at 11:38 AM, Jason van Zyl wrote:
>
> > Does it buy us anything? You've tried it so I don't know but the
> > source to the xpp3 stuff isn't that big.
>
> As I understand it, the xpp3 guy works on the stax RI (not sure if
> he's completely moved on to doing that - I went to check the xpp3
> site for new releases but it's down).
>
> Main benefits are that it's a standard, there's a couple of
> implementations to choose from, and that it's already in the JDK from
> v6 onwards. It's not at all bloated.
>
> Probably a separate discussion - I doubt it's really required for
> this stuff, just thought it was worth a look.
>
> - Brett
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>


-- 
..........................................................
Arnaud HERITIER
..........................................................
OCTO Technology - aheritier@octo.com
www.octo.com | blog.octo.com
..........................................................
ASF - aheritier@apache.org
www.apache.org | maven.apache.org
...........................................................

Re: Encoding issues for 2.0.8

Posted by Brett Porter <br...@apache.org>.

On 22/06/2007, at 11:38 AM, Jason van Zyl wrote:

> Does it buy us anything? You've tried it so I don't know but the  
> source to the xpp3 stuff isn't that big.

As I understand it, the xpp3 guy works on the stax RI (not sure if  
he's completely moved on to doing that - I went to check the xpp3  
site for new releases but it's down).

Main benefits are that it's a standard, there's a couple of  
implementations to choose from, and that it's already in the JDK from  
v6 onwards. It's not at all bloated.

Probably a separate discussion - I doubt it's really required for  
this stuff, just thought it was worth a look.

- Brett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Jason van Zyl <ja...@maven.org>.

On 21 Jun 07, at 6:23 PM 21 Jun 07, Brett Porter wrote:

> Sounds good. The only caution I'd exercise is that I think one of  
> them introduces POM elements - I'm not sure if we want them into  
> 2.0.8.
>
> If we do add them, that's fine though I'm not entirely sure what  
> the current situation is with Maven reading POMs from the  
> repository with elements it doesn't understand (I think it safely  
> ignores them, just need to test).
>

I don't think we can do that, but I think what Herve suggested is a  
good approach. I think most problems are fixed just obeying the  
encoding as we should.

> One other thing to consider is whether it is easier to use the StAX  
> modello plugin instead of xpp3. Probably not until 2.1 since we'd  
> want to disentangle xpp3 from a number of other places, but just  
> throwing it out there.
>

Does it buy us anything? You've tried it so I don't know but the  
source to the xpp3 stuff isn't that big.

> Side note, I went through the issues in 2.0.8 and voted too, but I  
> didn't have any 'must be in 2.0.8' issues that I felt compelled raise.
>
> On 22/06/2007, at 7:39 AM, Jason van Zyl wrote:
>
>> It seems like there are many problems with encoding that could be  
>> easily solved with a couple tweaks to modello, specifically the  
>> reader and writing so I've scheduled these for 2.0.8. There some  
>> patches for these and hopefully Herve will work his magic with his  
>> suggested fix. I like the idea of borrowing the idea from the Rome  
>> IO utils to find the right encoding by default. That could easily  
>> be integrated into modello. Herve if you need access to Modello we  
>> can set you up.
>>
>> Thanks,
>>
>> Jason
>>
>> ----------------------------------------------------------
>> Jason van Zyl
>> Founder and PMC Chair, Apache Maven
>> jason at sonatype dot com
>> ----------------------------------------------------------
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder and PMC Chair, Apache Maven
jason at sonatype dot com
----------------------------------------------------------




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: Encoding issues for 2.0.8

Posted by Brett Porter <br...@apache.org>.

Sounds good. The only caution I'd exercise is that I think one of  
them introduces POM elements - I'm not sure if we want them into 2.0.8.

If we do add them, that's fine though I'm not entirely sure what the  
current situation is with Maven reading POMs from the repository with  
elements it doesn't understand (I think it safely ignores them, just  
need to test).

One other thing to consider is whether it is easier to use the StAX  
modello plugin instead of xpp3. Probably not until 2.1 since we'd  
want to disentangle xpp3 from a number of other places, but just  
throwing it out there.

Side note, I went through the issues in 2.0.8 and voted too, but I  
didn't have any 'must be in 2.0.8' issues that I felt compelled raise.

On 22/06/2007, at 7:39 AM, Jason van Zyl wrote:

> It seems like there are many problems with encoding that could be  
> easily solved with a couple tweaks to modello, specifically the  
> reader and writing so I've scheduled these for 2.0.8. There some  
> patches for these and hopefully Herve will work his magic with his  
> suggested fix. I like the idea of borrowing the idea from the Rome  
> IO utils to find the right encoding by default. That could easily  
> be integrated into modello. Herve if you need access to Modello we  
> can set you up.
>
> Thanks,
>
> Jason
>
> ----------------------------------------------------------
> Jason van Zyl
> Founder and PMC Chair, Apache Maven
> jason at sonatype dot com
> ----------------------------------------------------------
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org