You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ant.apache.org by Alex Egg <eg...@gmail.com> on 2007/08/01 18:54:36 UTC

Re: concat task on group of utf 8 files w/ BOM

I don't understand how to use the filterchains, so I just wrote ant ant task
to remove the BOMs. Can you show me an example?

On 7/31/07, Jan.Materne@rzf.fin-nrw.de <Ja...@rzf.fin-nrw.de> wrote:
>
> You could try a <filterchain> to get rid off the BOMs when <concat>enating
> the files.
>
> Jan
>
> >-----Ursprüngliche Nachricht-----
> >Von: Alex Egg [mailto:eggie5@gmail.com]
> >Gesendet: Montag, 30. Juli 2007 22:33
> >An: Ant Users List
> >Betreff: concat task on group of utf 8 files w/ BOM
> >
> >Hi,
> >I'm using the concat task with a fileset that points to a
> >bunch of files
> >with unicode byte order marks. After the task is complete I
> >wind up with
> >file that's full for BOMs, which down the road, crashes a
> >program that reads
> >this file.
> >
> >Any good solution to deal with this? The BOMS?
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
> For additional commands, e-mail: user-help@ant.apache.org
>
>

Re: concat task on group of utf 8 files w/ BOM

Posted by Dominique Devienne <dd...@gmail.com>.
On 8/1/07, Peter Reilly <pe...@gmail.com> wrote:
> On 8/1/07, Dominique Devienne <dd...@gmail.com> wrote:
> > On 8/1/07, Peter Reilly <pe...@gmail.com> wrote:
> > > I do not think that filter chains will help here as they
> > > operate on Readers and not on input streams.
> >
> > Actually, that may be why it "should" work. Java knows about optional
> > BOMs and does the right thing, as long as you tell it that the
> > encoding is UTF-16.
>
> The encoding is not UTF-16, it is UTF-8. Having a BOM in UTF-8
> makes no sense, as a byte cannot be byte ordered. However, it
> is allowed by the UTF standard as an optional feature.
> see: http://issues.apache.org/bugzilla/show_bug.cgi?id=28049 and
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

Hmmm, OK. I missed the part where Alex wrote that he was using UTF-8
with BOM, instead of UTF-16. I assumed the latter, since indeed I
didn't know one could put a BOM with UTF-8 data! (useless indeed).
--DD

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Re: concat task on group of utf 8 files w/ BOM

Posted by Peter Reilly <pe...@gmail.com>.
On 8/1/07, Dominique Devienne <dd...@gmail.com> wrote:
> On 8/1/07, Peter Reilly <pe...@gmail.com> wrote:
> > I do not think that filter chains will help here as they
> > operate on Readers and not on input streams.
>
> Actually, that may be why it "should" work. Java knows about optional
> BOMs and does the right thing, as long as you tell it that the
> encoding is UTF-16.

The encoding is not UTF-16, it is UTF-8. Having a BOM in UTF-8
makes no sense, as a byte cannot be byte ordered. However, it
is allowed by the UTF standard as an optional feature.
see: http://issues.apache.org/bugzilla/show_bug.cgi?id=28049 and
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

Some windows programs (notepad.exe for example) use this character sequence
to indicate that the text file is utf-8 encoded as against the windows
encoded.

XML readers in java know about the UTF-8 BOMs, but the std java streamreader
does not. In old versions of java they woudl throw an exception, new versions
convert it to a ? (i think).

Peter

>
> Alex, try playing with 'encoding' and 'outputencoding' attributes of
> <concat> to see if that gets rid of the BOMs. I suspect the BOMs will
> be "eaten up" for the char decoder and no longer appear in the
> streams. --DD
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
> For additional commands, e-mail: user-help@ant.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Re: concat task on group of utf 8 files w/ BOM

Posted by Dominique Devienne <dd...@gmail.com>.
On 8/1/07, Peter Reilly <pe...@gmail.com> wrote:
> I do not think that filter chains will help here as they
> operate on Readers and not on input streams.

Actually, that may be why it "should" work. Java knows about optional
BOMs and does the right thing, as long as you tell it that the
encoding is UTF-16.

Alex, try playing with 'encoding' and 'outputencoding' attributes of
<concat> to see if that gets rid of the BOMs. I suspect the BOMs will
be "eaten up" for the char decoder and no longer appear in the
streams. --DD

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Re: concat task on group of utf 8 files w/ BOM

Posted by Peter Reilly <pe...@gmail.com>.
I do not think that filter chains will help here as they
operate on Readers and not on input streams.

Peter


On 8/1/07, Alex Egg <eg...@gmail.com> wrote:
> I don't understand how to use the filterchains, so I just wrote ant ant task
> to remove the BOMs. Can you show me an example?
>
> On 7/31/07, Jan.Materne@rzf.fin-nrw.de <Ja...@rzf.fin-nrw.de> wrote:
> >
> > You could try a <filterchain> to get rid off the BOMs when <concat>enating
> > the files.
> >
> > Jan
> >
> > >-----Ursprüngliche Nachricht-----
> > >Von: Alex Egg [mailto:eggie5@gmail.com]
> > >Gesendet: Montag, 30. Juli 2007 22:33
> > >An: Ant Users List
> > >Betreff: concat task on group of utf 8 files w/ BOM
> > >
> > >Hi,
> > >I'm using the concat task with a fileset that points to a
> > >bunch of files
> > >with unicode byte order marks. After the task is complete I
> > >wind up with
> > >file that's full for BOMs, which down the road, crashes a
> > >program that reads
> > >this file.
> > >
> > >Any good solution to deal with this? The BOMS?
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
> > For additional commands, e-mail: user-help@ant.apache.org
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org