You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@groovy.apache.org by Keegan Witt <ke...@gmail.com> on 2015/06/08 17:53:03 UTC

UTF16 BOM in new PrintWriter() vs withPrintWriter()

I've always taken a perverse pleasure in character encoding problems.  I
was intrigued by this SO question
<http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char>
on
UTF 16 BOMs in Java vs Groovy.

It appears using withPrintWriter(charset) produces a BOM whereas new
PrintWriter(file, charset) does not.  As demonstrated here:

File file = new File("tmp.txt")try {
    String text = " "
    String charset = "UTF-16LE"

    file.withPrintWriter(charset) { it << text }
    println "withPrintWriter"
    file.getBytes().each { System.out.format("%02x ", it) }

    PrintWriter w = new PrintWriter(file, charset)
    w.print(text)
    w.close()
    println "\n\nnew PrintWriter"
    file.getBytes().each { System.out.format("%02x ", it) }} finally {
    file.delete()}

Outputs

withPrintWriter
ff fe 20 00

new PrintWriter
20 00


Is this difference in behavior intentional?  It seems kinda odd to me.

-Keegan

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
For that point, perhaps it's a limitation of Java itself not recognizing
that alias?

2015-06-08 23:41 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> Another point of interest is that the current code doesn't respect
> aliases.  For example, the charset string "UTF_16LE" will not write the
> BOM, despite being an alias for "UTF-16LE"
>
> -Keegan
> On Jun 8, 2015 5:20 PM, "Keegan Witt" <ke...@gmail.com> wrote:
>
>> The code as-is today writes the BOM regardless of platform.  I just
>> tested in Linux with the same results.  I think there are 2 parts to the
>> question of "what's the correct behavior?"
>>
>> 1.  Should the BOM be written at all, particularly when the platform is
>> Windows?
>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>> difference is to be smarter) from the behavior of *new PrintWriter*?
>>
>> *Discussion*
>> 1.  Strictly speaking, yes.  Because RFC 2781
>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume big
>> endian if there is no BOM.  However, in practice, many applications
>> disregard the RFC and assume little-endian because that's what Windows
>> does
>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>> Because of this, the behavior could be changed so that when writing
>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>> best practice to always write a BOM when working with UTF-16, and Java
>> should have done this in their implementation of their PrintWriter.
>>
>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
>> smarter, more correct behavior, but the typical user would assume this is
>> just a shorthand convenience for newing up a PrintWriter (I certainly
>> did).  So the question is, is it better to just document this difference in
>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>> latter, what breakages would that cause within Groovy itself?  Making that
>> change could break folks in production, because they could rely on that BOM
>> being there, in cases for example where the file is created on Windows, but
>> then processed on Linux or when working with a third party library that is
>> more picky about the presence of a BOM.
>>
>> -Keegan
>>
>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <gl...@gmail.com>
>> wrote:
>>
>>> Now... is it what should be done or not is the good question to ask :-)
>>> Does Windows manages to open UTF-16 files without BOMs?
>>>
>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>>
>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <gl...@gmail.com>
>>>> wrote:
>>>>
>>>>> That's a good question.
>>>>> I guess this is happening on Windows? (I haven't tried here, since I'm
>>>>> on OS X)
>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>
>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>
>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>> problems.  I was intrigued by this SO question
>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>
>>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>>
>>>>>> File file = new File("tmp.txt")try {
>>>>>>     String text = " "
>>>>>>     String charset = "UTF-16LE"
>>>>>>
>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>     println "withPrintWriter"
>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>
>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>     w.print(text)
>>>>>>     w.close()
>>>>>>     println "\n\nnew PrintWriter"
>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>     file.delete()}
>>>>>>
>>>>>> Outputs
>>>>>>
>>>>>> withPrintWriter
>>>>>> ff fe 20 00
>>>>>>
>>>>>> new PrintWriter
>>>>>> 20 00
>>>>>>
>>>>>>
>>>>>> Is this difference in behavior intentional?  It seems kinda odd to me.
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
Another point of interest is that the current code doesn't respect
aliases.  For example, the charset string "UTF_16LE" will not write the
BOM, despite being an alias for "UTF-16LE"

-Keegan
On Jun 8, 2015 5:20 PM, "Keegan Witt" <ke...@gmail.com> wrote:

> The code as-is today writes the BOM regardless of platform.  I just tested
> in Linux with the same results.  I think there are 2 parts to the question
> of "what's the correct behavior?"
>
> 1.  Should the BOM be written at all, particularly when the platform is
> Windows?
> 2.  Should the behavior of *withPrintWriter* differ (even if the
> difference is to be smarter) from the behavior of *new PrintWriter*?
>
> *Discussion*
> 1.  Strictly speaking, yes.  Because RFC 2781
> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume big
> endian if there is no BOM.  However, in practice, many applications
> disregard the RFC and assume little-endian because that's what Windows
> does
> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
> Because of this, the behavior could be changed so that when writing
> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
> best practice to always write a BOM when working with UTF-16, and Java
> should have done this in their implementation of their PrintWriter.
>
> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
> smarter, more correct behavior, but the typical user would assume this is
> just a shorthand convenience for newing up a PrintWriter (I certainly
> did).  So the question is, is it better to just document this difference in
> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
> latter, what breakages would that cause within Groovy itself?  Making that
> change could break folks in production, because they could rely on that BOM
> being there, in cases for example where the file is created on Windows, but
> then processed on Linux or when working with a third party library that is
> more picky about the presence of a BOM.
>
> -Keegan
>
> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>> Now... is it what should be done or not is the good question to ask :-)
>> Does Windows manages to open UTF-16 files without BOMs?
>>
>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>
>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <gl...@gmail.com>
>>> wrote:
>>>
>>>> That's a good question.
>>>> I guess this is happening on Windows? (I haven't tried here, since I'm
>>>> on OS X)
>>>> I think BOMs were mandatory in text files on Windows.
>>>>
>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>
>>>>> I've always taken a perverse pleasure in character encoding problems.
>>>>> I was intrigued by this SO question
>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>
>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>
>>>>> File file = new File("tmp.txt")try {
>>>>>     String text = " "
>>>>>     String charset = "UTF-16LE"
>>>>>
>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>     println "withPrintWriter"
>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>
>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>     w.print(text)
>>>>>     w.close()
>>>>>     println "\n\nnew PrintWriter"
>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>     file.delete()}
>>>>>
>>>>> Outputs
>>>>>
>>>>> withPrintWriter
>>>>> ff fe 20 00
>>>>>
>>>>> new PrintWriter
>>>>> 20 00
>>>>>
>>>>>
>>>>> Is this difference in behavior intentional?  It seems kinda odd to me.
>>>>>
>>>>> -Keegan
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
I opened a separate issue (GROOVY-7661
<https://issues.apache.org/jira/browse/GROOVY-7661>) for stripping BOM.  PR
176 <https://github.com/apache/incubator-groovy/pull/176> opened for this
issue.

-Keegan

On Thu, Oct 22, 2015 at 12:04 AM, Keegan Witt <ke...@gmail.com> wrote:

> Trying to wrap up 7465 this week.  Got one more question: What do you
> think of making *getText* and *getBytes* take an arg of whether to strip
> the BOM (if found)?
>
> -Keegan
>
> On Tue, Jul 7, 2015 at 2:44 PM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>> Agreed on consistency too.
>>
>> 2015-07-07 20:38 GMT+02:00 Pascal Schumacher <pa...@gmx.net>:
>>
>>> I agree, the behavior should be consistent.
>>>
>>> Am 06.07.2015 um 00:31 schrieb Keegan Witt:
>>>
>>> I'm starting work on this.  Just to be clear (since we didn't really
>>> discuss this): Do we want to make only newPrintWriter() not default to
>>> writing a BOM?  Or also write() and append() methods not default to writing
>>> a BOM?  I was thinking we would change all 3 so their behavior is
>>> consistent.  What do you think?
>>>
>>> On Thu, Jun 11, 2015 at 9:19 AM, Keegan Witt <ke...@gmail.com>
>>> wrote:
>>>
>>>> I created GROOVY-7465
>>>> <https://issues.apache.org/jira/browse/GROOVY-7465> to track this.
>>>>
>>>> -Keegan
>>>>
>>>> On Tue, Jun 9, 2015 at 4:04 PM, Keegan Witt < <ke...@gmail.com>
>>>> keeganwitt@gmail.com> wrote:
>>>>
>>>>> I'd be OK with that.  I think having false by default is the *Right
>>>>> Thing™*, but true has a certain allure since it'd reduce the risk of
>>>>> breaking existing code (hard to guess how likely breakage is).  Tough
>>>>> choice.  Even if we defaulted to true, it's an improvement over current
>>>>> state since it gives users the flexibility, and calling it out as a
>>>>> parameter might elicit more thought and attention than just a JavaDoc
>>>>> comment.
>>>>>
>>>>> On Tue, Jun 9, 2015 at 3:50 PM, Guillaume Laforge <gl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> So let's say, perhaps, we don't generate a BOM, unless asked
>>>>>> specifically... but not with new methods, but with new parameters to such
>>>>>> methods. In addition to specifying a charset, we could also pass a boolean
>>>>>> saying we want a BOM to be generated (false by default, needs to be
>>>>>> specified as true if BOM wanted) ?
>>>>>>
>>>>>> 2015-06-09 21:47 GMT+02:00 Keegan Witt < <ke...@gmail.com>
>>>>>> keeganwitt@gmail.com>:
>>>>>>
>>>>>>> I get that -- and I wish JDK did the same.  But what bothers me most
>>>>>>> about the current state is that sometimes it's transparent, sometimes it's
>>>>>>> not -- depending on how it was invoked.  And while we could fix the new
>>>>>>> instance usage too with metaClass, that could lead to weird inconsistencies
>>>>>>> when Groovy is invoked from Java.
>>>>>>>
>>>>>>> I really think most users would not expect these two usages to
>>>>>>> behave differently.  I think most would expect the difference to be
>>>>>>> stylistic only.  So as much as it pains me to say this, I think it's better
>>>>>>> not to violate the principle of least surprise, and remain consistent
>>>>>>> across all styles of invocation with Java's poor life choices.
>>>>>>>
>>>>>>> But maybe the friendlier APIs can be moved into new methods, such as
>>>>>>> newBomAwareWriter() / WithBomAwareWriter{}  What do you think?  If
>>>>>>> we did that, I guess it'd be consistent to do the same for the readers as
>>>>>>> well.
>>>>>>>
>>>>>>> -Keegan
>>>>>>>
>>>>>>> On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <
>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> 2015-06-09 18:57 GMT+02:00 Keegan Witt < <ke...@gmail.com>
>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>
>>>>>>>>> I created PR 37
>>>>>>>>> <https://github.com/apache/incubator-groovy/pull/37> to correct
>>>>>>>>> the JavaDoc I mentioned (as well as to document the existing behavior for
>>>>>>>>> the non-NIO methods).
>>>>>>>>>
>>>>>>>>> Java doesn't eat the BOM, but this is a problem Java folks are
>>>>>>>>> used to dealing with, and why things like Apache Common-IO's
>>>>>>>>> BOMInputStream
>>>>>>>>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>>>>>>>>> exist.
>>>>>>>>>
>>>>>>>>
>>>>>>>> That's also why I made Groovy eat the BOM too, so that it's
>>>>>>>> transparent to our users :-)
>>>>>>>> But that was a long time ago since I worked on those parts of the
>>>>>>>> codebase, and it's been refactored quite a bit (by Jim for example).
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Keegan
>>>>>>>>>
>>>>>>>>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <
>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> So now, how to decide what's best? :-)
>>>>>>>>>>
>>>>>>>>>> Is a Java reader happy with the BOM? and eats it transparently?
>>>>>>>>>> (I think in the past that wasn't the case but I may be wrong)
>>>>>>>>>>
>>>>>>>>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt < <ke...@gmail.com>
>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter
>>>>>>>>>>> claims (in the JavaDoc) it will write the BOM if needed, but it doesn't
>>>>>>>>>>> because it uses Java's implementation rather than with Groovy's
>>>>>>>>>>> writeUTF16BomIfRequired.  None of the methods in
>>>>>>>>>>> NioGroovyMethods use writeUTF16BomIfRequired.
>>>>>>>>>>>
>>>>>>>>>>> Whichever we decide, we should be consistent.
>>>>>>>>>>>
>>>>>>>>>>> -Keegan
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>>>>>>>>> <pa...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm wondering if NioGroovyMethods that implement the write
>>>>>>>>>>>> methods for Path should do the same.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Paolo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <
>>>>>>>>>>>> <ke...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also
>>>>>>>>>>>>> was thinking the Javadoc would be changed from
>>>>>>>>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>>>>>>>>> to
>>>>>>>>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <
>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Created GROOVY-7461
>>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR
>>>>>>>>>>>>>>> 36 <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cool!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How would you feel about a PR to copy the Javadoc comment
>>>>>>>>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the
>>>>>>>>>>>>>>> other methods that use writeUTF16BomIfRequired (at least
>>>>>>>>>>>>>>> until we decide we're going to change the current behavior)?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Right, worth it!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Good point!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That's only available in Java 7.  Isn't Groovy still
>>>>>>>>>>>>>>>>> targeting 1.6 for the non-indy version?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <
>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well spotted!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You could also compare with the StandardCharset, instead
>>>>>>>>>>>>>>>>>> of going through the name comparison:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> <http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html>
>>>>>>>>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);    } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);    }
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);    } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);    }
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.
>>>>>>>>>>>>>>>>>>> We'll probably want to fix that regardless of what we decide on the
>>>>>>>>>>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in
>>>>>>>>>>>>>>>>>>>> Groovy), the BOM is automatically discarded when you use one of our reader
>>>>>>>>>>>>>>>>>>>> methods (withReader, etc), so it's transparent whether the BOM is here or
>>>>>>>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I tend to think that having the BOM always is a good
>>>>>>>>>>>>>>>>>>>> thing (I even thought that was mandatory), but Groovy should guess the
>>>>>>>>>>>>>>>>>>>> endianness regardless anyway.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Happy to hear what others think too about all this
>>>>>>>>>>>>>>>>>>>> though.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The code as-is today writes the BOM regardless of
>>>>>>>>>>>>>>>>>>>>> platform.  I just tested in Linux with the same results.  I think there are
>>>>>>>>>>>>>>>>>>>>> 2 parts to the question of "what's the correct behavior?"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly
>>>>>>>>>>>>>>>>>>>>> when the platform is Windows?
>>>>>>>>>>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ
>>>>>>>>>>>>>>>>>>>>> (even if the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in
>>>>>>>>>>>>>>>>>>>>> section 4.3 to assume big endian if there is no BOM.  However, in practice,
>>>>>>>>>>>>>>>>>>>>> many applications disregard the RFC and assume little-endian because that's
>>>>>>>>>>>>>>>>>>>>> what Windows does
>>>>>>>>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter*
>>>>>>>>>>>>>>>>>>>>> is doing the smarter, more correct behavior, but the typical user would
>>>>>>>>>>>>>>>>>>>>> assume this is just a shorthand convenience for newing up a PrintWriter (I
>>>>>>>>>>>>>>>>>>>>> certainly did).  So the question is, is it better to just document this
>>>>>>>>>>>>>>>>>>>>> difference in the GroovyDoc?  Or to change the behavior to be closer to
>>>>>>>>>>>>>>>>>>>>> Java?  And if the latter, what breakages would that cause within Groovy
>>>>>>>>>>>>>>>>>>>>> itself?  Making that change could break folks in production, because they
>>>>>>>>>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where the file is
>>>>>>>>>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working with a
>>>>>>>>>>>>>>>>>>>>> third party library that is more picky about the presence of a BOM.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Now... is it what should be done or not is the good
>>>>>>>>>>>>>>>>>>>>>> question to ask :-)
>>>>>>>>>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without
>>>>>>>>>>>>>>>>>>>>>> BOMs?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test
>>>>>>>>>>>>>>>>>>>>>>> mentioned in Windows.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't
>>>>>>>>>>>>>>>>>>>>>>>> tried here, since I'm on OS X)
>>>>>>>>>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on
>>>>>>>>>>>>>>>>>>>>>>>> Windows.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character
>>>>>>>>>>>>>>>>>>>>>>>>> encoding problems.  I was intrigued by this SO
>>>>>>>>>>>>>>>>>>>>>>>>> question
>>>>>>>>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset)
>>>>>>>>>>>>>>>>>>>>>>>>> produces a BOM whereas new PrintWriter(file,
>>>>>>>>>>>>>>>>>>>>>>>>> charset) does not.  As demonstrated here:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It
>>>>>>>>>>>>>>>>>>>>>>>>> seems kinda odd to me.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
>>>>>>>>>>>>>>>>>>>>>>>> <http://restlet.com>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
>>>>>>>>>>>>>>>>>>>>>> <http://restlet.com>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
>>>>>>>>>>>>>>>>>>>> <http://restlet.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Laforge
>>>>>>>>>> Groovy Project Manager
>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>
>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>http://glaforge.appspot.com/
>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Laforge
>>>>>>>> Groovy Project Manager
>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>
>>>>>>>> Blog:  <http://glaforge.appspot.com/>http://glaforge.appspot.com/
>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>> Blog:  <http://glaforge.appspot.com/>http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
Trying to wrap up 7465 this week.  Got one more question: What do you think
of making *getText* and *getBytes* take an arg of whether to strip the BOM
(if found)?

-Keegan

On Tue, Jul 7, 2015 at 2:44 PM, Guillaume Laforge <gl...@gmail.com>
wrote:

> Agreed on consistency too.
>
> 2015-07-07 20:38 GMT+02:00 Pascal Schumacher <pa...@gmx.net>:
>
>> I agree, the behavior should be consistent.
>>
>> Am 06.07.2015 um 00:31 schrieb Keegan Witt:
>>
>> I'm starting work on this.  Just to be clear (since we didn't really
>> discuss this): Do we want to make only newPrintWriter() not default to
>> writing a BOM?  Or also write() and append() methods not default to writing
>> a BOM?  I was thinking we would change all 3 so their behavior is
>> consistent.  What do you think?
>>
>> On Thu, Jun 11, 2015 at 9:19 AM, Keegan Witt <ke...@gmail.com>
>> wrote:
>>
>>> I created GROOVY-7465
>>> <https://issues.apache.org/jira/browse/GROOVY-7465> to track this.
>>>
>>> -Keegan
>>>
>>> On Tue, Jun 9, 2015 at 4:04 PM, Keegan Witt < <ke...@gmail.com>
>>> keeganwitt@gmail.com> wrote:
>>>
>>>> I'd be OK with that.  I think having false by default is the *Right
>>>> Thing™*, but true has a certain allure since it'd reduce the risk of
>>>> breaking existing code (hard to guess how likely breakage is).  Tough
>>>> choice.  Even if we defaulted to true, it's an improvement over current
>>>> state since it gives users the flexibility, and calling it out as a
>>>> parameter might elicit more thought and attention than just a JavaDoc
>>>> comment.
>>>>
>>>> On Tue, Jun 9, 2015 at 3:50 PM, Guillaume Laforge <gl...@gmail.com>
>>>> wrote:
>>>>
>>>>> So let's say, perhaps, we don't generate a BOM, unless asked
>>>>> specifically... but not with new methods, but with new parameters to such
>>>>> methods. In addition to specifying a charset, we could also pass a boolean
>>>>> saying we want a BOM to be generated (false by default, needs to be
>>>>> specified as true if BOM wanted) ?
>>>>>
>>>>> 2015-06-09 21:47 GMT+02:00 Keegan Witt < <ke...@gmail.com>
>>>>> keeganwitt@gmail.com>:
>>>>>
>>>>>> I get that -- and I wish JDK did the same.  But what bothers me most
>>>>>> about the current state is that sometimes it's transparent, sometimes it's
>>>>>> not -- depending on how it was invoked.  And while we could fix the new
>>>>>> instance usage too with metaClass, that could lead to weird inconsistencies
>>>>>> when Groovy is invoked from Java.
>>>>>>
>>>>>> I really think most users would not expect these two usages to behave
>>>>>> differently.  I think most would expect the difference to be stylistic
>>>>>> only.  So as much as it pains me to say this, I think it's better not to
>>>>>> violate the principle of least surprise, and remain consistent across all
>>>>>> styles of invocation with Java's poor life choices.
>>>>>>
>>>>>> But maybe the friendlier APIs can be moved into new methods, such as
>>>>>> newBomAwareWriter() / WithBomAwareWriter{}  What do you think?  If
>>>>>> we did that, I guess it'd be consistent to do the same for the readers as
>>>>>> well.
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <
>>>>>> <gl...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> 2015-06-09 18:57 GMT+02:00 Keegan Witt < <ke...@gmail.com>
>>>>>>> keeganwitt@gmail.com>:
>>>>>>>
>>>>>>>> I created PR 37
>>>>>>>> <https://github.com/apache/incubator-groovy/pull/37> to correct
>>>>>>>> the JavaDoc I mentioned (as well as to document the existing behavior for
>>>>>>>> the non-NIO methods).
>>>>>>>>
>>>>>>>> Java doesn't eat the BOM, but this is a problem Java folks are used
>>>>>>>> to dealing with, and why things like Apache Common-IO's
>>>>>>>> BOMInputStream
>>>>>>>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>>>>>>>> exist.
>>>>>>>>
>>>>>>>
>>>>>>> That's also why I made Groovy eat the BOM too, so that it's
>>>>>>> transparent to our users :-)
>>>>>>> But that was a long time ago since I worked on those parts of the
>>>>>>> codebase, and it's been refactored quite a bit (by Jim for example).
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <
>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> So now, how to decide what's best? :-)
>>>>>>>>>
>>>>>>>>> Is a Java reader happy with the BOM? and eats it transparently? (I
>>>>>>>>> think in the past that wasn't the case but I may be wrong)
>>>>>>>>>
>>>>>>>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt < <ke...@gmail.com>
>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>
>>>>>>>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter
>>>>>>>>>> claims (in the JavaDoc) it will write the BOM if needed, but it doesn't
>>>>>>>>>> because it uses Java's implementation rather than with Groovy's
>>>>>>>>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods
>>>>>>>>>>  use writeUTF16BomIfRequired.
>>>>>>>>>>
>>>>>>>>>> Whichever we decide, we should be consistent.
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>>>>>>>> <pa...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm wondering if NioGroovyMethods that implement the write
>>>>>>>>>>> methods for Path should do the same.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Paolo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <
>>>>>>>>>>> <ke...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also
>>>>>>>>>>>> was thinking the Javadoc would be changed from
>>>>>>>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>>>>>>>> to
>>>>>>>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>>>>>>>
>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <
>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Created GROOVY-7461
>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>>>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cool!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> How would you feel about a PR to copy the Javadoc comment
>>>>>>>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the other
>>>>>>>>>>>>>> methods that use writeUTF16BomIfRequired (at least until we
>>>>>>>>>>>>>> decide we're going to change the current behavior)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Right, worth it!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Good point!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That's only available in Java 7.  Isn't Groovy still
>>>>>>>>>>>>>>>> targeting 1.6 for the non-indy version?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <
>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well spotted!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You could also compare with the StandardCharset, instead
>>>>>>>>>>>>>>>>> of going through the name comparison:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html>
>>>>>>>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);    } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);    }
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);    } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);    }
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.
>>>>>>>>>>>>>>>>>> We'll probably want to fix that regardless of what we decide on the
>>>>>>>>>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in
>>>>>>>>>>>>>>>>>>> Groovy), the BOM is automatically discarded when you use one of our reader
>>>>>>>>>>>>>>>>>>> methods (withReader, etc), so it's transparent whether the BOM is here or
>>>>>>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I tend to think that having the BOM always is a good
>>>>>>>>>>>>>>>>>>> thing (I even thought that was mandatory), but Groovy should guess the
>>>>>>>>>>>>>>>>>>> endianness regardless anyway.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Happy to hear what others think too about all this
>>>>>>>>>>>>>>>>>>> though.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The code as-is today writes the BOM regardless of
>>>>>>>>>>>>>>>>>>>> platform.  I just tested in Linux with the same results.  I think there are
>>>>>>>>>>>>>>>>>>>> 2 parts to the question of "what's the correct behavior?"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when
>>>>>>>>>>>>>>>>>>>> the platform is Windows?
>>>>>>>>>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ
>>>>>>>>>>>>>>>>>>>> (even if the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section
>>>>>>>>>>>>>>>>>>>> 4.3 to assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter*
>>>>>>>>>>>>>>>>>>>> is doing the smarter, more correct behavior, but the typical user would
>>>>>>>>>>>>>>>>>>>> assume this is just a shorthand convenience for newing up a PrintWriter (I
>>>>>>>>>>>>>>>>>>>> certainly did).  So the question is, is it better to just document this
>>>>>>>>>>>>>>>>>>>> difference in the GroovyDoc?  Or to change the behavior to be closer to
>>>>>>>>>>>>>>>>>>>> Java?  And if the latter, what breakages would that cause within Groovy
>>>>>>>>>>>>>>>>>>>> itself?  Making that change could break folks in production, because they
>>>>>>>>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where the file is
>>>>>>>>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working with a
>>>>>>>>>>>>>>>>>>>> third party library that is more picky about the presence of a BOM.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Now... is it what should be done or not is the good
>>>>>>>>>>>>>>>>>>>>> question to ask :-)
>>>>>>>>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test
>>>>>>>>>>>>>>>>>>>>>> mentioned in Windows.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't
>>>>>>>>>>>>>>>>>>>>>>> tried here, since I'm on OS X)
>>>>>>>>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character
>>>>>>>>>>>>>>>>>>>>>>>> encoding problems.  I was intrigued by this SO
>>>>>>>>>>>>>>>>>>>>>>>> question
>>>>>>>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces
>>>>>>>>>>>>>>>>>>>>>>>> a BOM whereas new PrintWriter(file, charset) does
>>>>>>>>>>>>>>>>>>>>>>>> not.  As demonstrated here:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It
>>>>>>>>>>>>>>>>>>>>>>>> seems kinda odd to me.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
>>>>>>>>>>>>>>>>>>>>>>> <http://restlet.com>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
>>>>>>>>>>>>>>>>>>>>> <http://restlet.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog:  <http://glaforge.appspot.com/>http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog:  <http://glaforge.appspot.com/>http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog:  <http://glaforge.appspot.com/>http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
Agreed on consistency too.

2015-07-07 20:38 GMT+02:00 Pascal Schumacher <pa...@gmx.net>:

>  I agree, the behavior should be consistent.
>
> Am 06.07.2015 um 00:31 schrieb Keegan Witt:
>
> I'm starting work on this.  Just to be clear (since we didn't really
> discuss this): Do we want to make only newPrintWriter() not default to
> writing a BOM?  Or also write() and append() methods not default to writing
> a BOM?  I was thinking we would change all 3 so their behavior is
> consistent.  What do you think?
>
> On Thu, Jun 11, 2015 at 9:19 AM, Keegan Witt <ke...@gmail.com> wrote:
>
>> I created GROOVY-7465 <https://issues.apache.org/jira/browse/GROOVY-7465> to
>> track this.
>>
>>  -Keegan
>>
>> On Tue, Jun 9, 2015 at 4:04 PM, Keegan Witt < <ke...@gmail.com>
>> keeganwitt@gmail.com> wrote:
>>
>>> I'd be OK with that.  I think having false by default is the *Right
>>> Thing™*, but true has a certain allure since it'd reduce the risk of
>>> breaking existing code (hard to guess how likely breakage is).  Tough
>>> choice.  Even if we defaulted to true, it's an improvement over current
>>> state since it gives users the flexibility, and calling it out as a
>>> parameter might elicit more thought and attention than just a JavaDoc
>>> comment.
>>>
>>> On Tue, Jun 9, 2015 at 3:50 PM, Guillaume Laforge <gl...@gmail.com>
>>> wrote:
>>>
>>>> So let's say, perhaps, we don't generate a BOM, unless asked
>>>> specifically... but not with new methods, but with new parameters to such
>>>> methods. In addition to specifying a charset, we could also pass a boolean
>>>> saying we want a BOM to be generated (false by default, needs to be
>>>> specified as true if BOM wanted) ?
>>>>
>>>> 2015-06-09 21:47 GMT+02:00 Keegan Witt < <ke...@gmail.com>
>>>> keeganwitt@gmail.com>:
>>>>
>>>>> I get that -- and I wish JDK did the same.  But what bothers me most
>>>>> about the current state is that sometimes it's transparent, sometimes it's
>>>>> not -- depending on how it was invoked.  And while we could fix the new
>>>>> instance usage too with metaClass, that could lead to weird inconsistencies
>>>>> when Groovy is invoked from Java.
>>>>>
>>>>>  I really think most users would not expect these two usages to
>>>>> behave differently.  I think most would expect the difference to be
>>>>> stylistic only.  So as much as it pains me to say this, I think it's better
>>>>> not to violate the principle of least surprise, and remain consistent
>>>>> across all styles of invocation with Java's poor life choices.
>>>>>
>>>>>  But maybe the friendlier APIs can be moved into new methods, such as
>>>>> newBomAwareWriter() / WithBomAwareWriter{}  What do you think?  If we
>>>>> did that, I guess it'd be consistent to do the same for the readers as well.
>>>>>
>>>>>  -Keegan
>>>>>
>>>>> On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <
>>>>> <gl...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>  2015-06-09 18:57 GMT+02:00 Keegan Witt < <ke...@gmail.com>
>>>>>> keeganwitt@gmail.com>:
>>>>>>
>>>>>>> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37>
>>>>>>> to correct the JavaDoc I mentioned (as well as to document the existing
>>>>>>> behavior for the non-NIO methods).
>>>>>>>
>>>>>>>  Java doesn't eat the BOM, but this is a problem Java folks are
>>>>>>> used to dealing with, and why things like Apache Common-IO's
>>>>>>> BOMInputStream
>>>>>>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>>>>>>> exist.
>>>>>>>
>>>>>>
>>>>>>  That's also why I made Groovy eat the BOM too, so that it's
>>>>>> transparent to our users :-)
>>>>>> But that was a long time ago since I worked on those parts of the
>>>>>> codebase, and it's been refactored quite a bit (by Jim for example).
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>  -Keegan
>>>>>>>
>>>>>>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <
>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>
>>>>>>>> So now, how to decide what's best? :-)
>>>>>>>>
>>>>>>>>  Is a Java reader happy with the BOM? and eats it transparently?
>>>>>>>> (I think in the past that wasn't the case but I may be wrong)
>>>>>>>>
>>>>>>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt < <ke...@gmail.com>
>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>
>>>>>>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter
>>>>>>>>> claims (in the JavaDoc) it will write the BOM if needed, but it doesn't
>>>>>>>>> because it uses Java's implementation rather than with Groovy's
>>>>>>>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods
>>>>>>>>>  use writeUTF16BomIfRequired.
>>>>>>>>>
>>>>>>>>>  Whichever we decide, we should be consistent.
>>>>>>>>>
>>>>>>>>>  -Keegan
>>>>>>>>>
>>>>>>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>>>>>>> <pa...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I'm wondering if NioGroovyMethods that implement the write
>>>>>>>>>> methods for Path should do the same.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  Cheers,
>>>>>>>>>> Paolo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <
>>>>>>>>>> <ke...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also
>>>>>>>>>>> was thinking the Javadoc would be changed from
>>>>>>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>>>>>>>  to
>>>>>>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>>>>>>
>>>>>>>>>>>  -Keegan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <
>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  2015-06-09 15:04 GMT+02:00 Keegan Witt <
>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Created GROOVY-7461
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  Cool!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>  How would you feel about a PR to copy the Javadoc comment
>>>>>>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the other
>>>>>>>>>>>>> methods that use writeUTF16BomIfRequired (at least until we
>>>>>>>>>>>>> decide we're going to change the current behavior)?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  Right, worth it!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  -Keegan
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Good point!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's only available in Java 7.  Isn't Groovy still
>>>>>>>>>>>>>>> targeting 1.6 for the non-indy version?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>   On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <
>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Well spotted!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You could also compare with the StandardCharset, instead of
>>>>>>>>>>>>>>>> going through the name comparison:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html>
>>>>>>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  No, it's a Groovy bug.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);    } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);    }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  should be
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);    } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);    }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.
>>>>>>>>>>>>>>>>> We'll probably want to fix that regardless of what we decide on the
>>>>>>>>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in
>>>>>>>>>>>>>>>>>> Groovy), the BOM is automatically discarded when you use one of our reader
>>>>>>>>>>>>>>>>>> methods (withReader, etc), so it's transparent whether the BOM is here or
>>>>>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  I tend to think that having the BOM always is a good
>>>>>>>>>>>>>>>>>> thing (I even thought that was mandatory), but Groovy should guess the
>>>>>>>>>>>>>>>>>> endianness regardless anyway.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Happy to hear what others think too about all this
>>>>>>>>>>>>>>>>>> though.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Guillaume
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The code as-is today writes the BOM regardless of
>>>>>>>>>>>>>>>>>>> platform.  I just tested in Linux with the same results.  I think there are
>>>>>>>>>>>>>>>>>>> 2 parts to the question of "what's the correct behavior?"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  1.  Should the BOM be written at all, particularly
>>>>>>>>>>>>>>>>>>> when the platform is Windows?
>>>>>>>>>>>>>>>>>>>  2.  Should the behavior of *withPrintWriter* differ
>>>>>>>>>>>>>>>>>>> (even if the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  *Discussion*
>>>>>>>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section
>>>>>>>>>>>>>>>>>>> 4.3 to assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  2.  This is a tough one.  Arguably, *withPrintWriter*
>>>>>>>>>>>>>>>>>>> is doing the smarter, more correct behavior, but the typical user would
>>>>>>>>>>>>>>>>>>> assume this is just a shorthand convenience for newing up a PrintWriter (I
>>>>>>>>>>>>>>>>>>> certainly did).  So the question is, is it better to just document this
>>>>>>>>>>>>>>>>>>> difference in the GroovyDoc?  Or to change the behavior to be closer to
>>>>>>>>>>>>>>>>>>> Java?  And if the latter, what breakages would that cause within Groovy
>>>>>>>>>>>>>>>>>>> itself?  Making that change could break folks in production, because they
>>>>>>>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where the file is
>>>>>>>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working with a
>>>>>>>>>>>>>>>>>>> third party library that is more picky about the presence of a BOM.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  -Keegan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Now... is it what should be done or not is the good
>>>>>>>>>>>>>>>>>>>> question to ask :-)
>>>>>>>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test
>>>>>>>>>>>>>>>>>>>>> mentioned in Windows.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>>>>> <gl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't
>>>>>>>>>>>>>>>>>>>>>> tried here, since I'm on OS X)
>>>>>>>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>>>>> <ke...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character
>>>>>>>>>>>>>>>>>>>>>>> encoding problems.  I was intrigued by this SO
>>>>>>>>>>>>>>>>>>>>>>> question
>>>>>>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>  It appears using withPrintWriter(charset) produces
>>>>>>>>>>>>>>>>>>>>>>> a BOM whereas new PrintWriter(file, charset) does
>>>>>>>>>>>>>>>>>>>>>>> not.  As demonstrated here:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>  File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>  Is this difference in behavior intentional?  It
>>>>>>>>>>>>>>>>>>>>>>> seems kinda odd to me.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>  -Keegan
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   --
>>>>>>>>>>>>>>>>>>>>>>    Guillaume Laforge
>>>>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
>>>>>>>>>>>>>>>>>>>>>> <http://restlet.com>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>>>>>>>    Guillaume Laforge
>>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
>>>>>>>>>>>>>>>>>>>> <http://restlet.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>>>>>    Guillaume Laforge
>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>>>    Guillaume Laforge
>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>    Guillaume Laforge
>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  --
>>>>>>>>>>>>    Guillaume Laforge
>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>
>>>>>>>>>>>>  Blog:  <http://glaforge.appspot.com/>
>>>>>>>>>>>> http://glaforge.appspot.com/
>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>>    Guillaume Laforge
>>>>>>>> Groovy Project Manager
>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>
>>>>>>>>  Blog:  <http://glaforge.appspot.com/>http://glaforge.appspot.com/
>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>>    Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>>  Blog:  <http://glaforge.appspot.com/>http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>>    Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>>  Blog:  <http://glaforge.appspot.com/>http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>
>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Pascal Schumacher <pa...@gmx.net>.
I agree, the behavior should be consistent.

Am 06.07.2015 um 00:31 schrieb Keegan Witt:
> I'm starting work on this.  Just to be clear (since we didn't really 
> discuss this): Do we want to make only newPrintWriter() not default to 
> writing a BOM?  Or also write() and append() methods not default to 
> writing a BOM?  I was thinking we would change all 3 so their behavior 
> is consistent. What do you think?
>
> On Thu, Jun 11, 2015 at 9:19 AM, Keegan Witt <keeganwitt@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I created GROOVY-7465
>     <https://issues.apache.org/jira/browse/GROOVY-7465> to track this.
>
>     -Keegan
>
>     On Tue, Jun 9, 2015 at 4:04 PM, Keegan Witt <keeganwitt@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         I'd be OK with that.  I think having false by default is the
>         /Right Thing™/, but true has a certain allure since it'd
>         reduce the risk of breaking existing code (hard to guess how
>         likely breakage is).  Tough choice. Even if we defaulted to
>         true, it's an improvement over current state since it gives
>         users the flexibility, and calling it out as a parameter might
>         elicit more thought and attention than just a JavaDoc comment.
>
>         On Tue, Jun 9, 2015 at 3:50 PM, Guillaume Laforge
>         <glaforge@gmail.com <ma...@gmail.com>> wrote:
>
>             So let's say, perhaps, we don't generate a BOM, unless
>             asked specifically... but not with new methods, but with
>             new parameters to such methods. In addition to specifying
>             a charset, we could also pass a boolean saying we want a
>             BOM to be generated (false by default, needs to be
>             specified as true if BOM wanted) ?
>
>             2015-06-09 21:47 GMT+02:00 Keegan Witt
>             <keeganwitt@gmail.com <ma...@gmail.com>>:
>
>                 I get that -- and I wish JDK did the same.  But what
>                 bothers me most about the current state is that
>                 sometimes it's transparent, sometimes it's not --
>                 depending on how it was invoked.  And while we could
>                 fix the new instance usage too with metaClass, that
>                 could lead to weird inconsistencies when Groovy is
>                 invoked from Java.
>
>                 I really think most users would not expect these two
>                 usages to behave differently.  I think most would
>                 expect the difference to be stylistic only.  So as
>                 much as it pains me to say this, I think it's better
>                 not to violate the principle of least surprise, and
>                 remain consistent across all styles of invocation with
>                 Java's poor life choices.
>
>                 But maybe the friendlier APIs can be moved into new
>                 methods, such as newBomAwareWriter() /
>                 WithBomAwareWriter{} What do you think?  If we did
>                 that, I guess it'd be consistent to do the same for
>                 the readers as well.
>
>                 -Keegan
>
>                 On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge
>                 <glaforge@gmail.com <ma...@gmail.com>> wrote:
>
>
>                     2015-06-09 18:57 GMT+02:00 Keegan Witt
>                     <keeganwitt@gmail.com <ma...@gmail.com>>:
>
>                         I created PR 37
>                         <https://github.com/apache/incubator-groovy/pull/37>
>                         to correct the JavaDoc I mentioned (as well as
>                         to document the existing behavior for the
>                         non-NIO methods).
>
>                         Java doesn't eat the BOM, but this is a
>                         problem Java folks are used to dealing with,
>                         and why things like Apache Common-IO's
>                         BOMInputStream
>                         <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>                         exist.
>
>
>                     That's also why I made Groovy eat the BOM too, so
>                     that it's transparent to our users :-)
>                     But that was a long time ago since I worked on
>                     those parts of the codebase, and it's been
>                     refactored quite a bit (by Jim for example).
>
>
>                         -Keegan
>
>                         On Tue, Jun 9, 2015 at 11:33 AM, Guillaume
>                         Laforge <glaforge@gmail.com
>                         <ma...@gmail.com>> wrote:
>
>                             So now, how to decide what's best? :-)
>
>                             Is a Java reader happy with the BOM? and
>                             eats it transparently? (I think in the
>                             past that wasn't the case but I may be wrong)
>
>                             2015-06-09 17:21 GMT+02:00 Keegan Witt
>                             <keeganwitt@gmail.com
>                             <ma...@gmail.com>>:
>
>                                 That's an excellent point, Paolo.
>                                 NioGroovyMethods.newWriter claims (in
>                                 the JavaDoc) it will write the BOM if
>                                 needed, but it doesn't because it uses
>                                 Java's implementation rather
>                                 than with Groovy's
>                                 writeUTF16BomIfRequired. None of the
>                                 methods in NioGroovyMethods use
>                                 writeUTF16BomIfRequired.
>
>                                 Whichever we decide, we should be
>                                 consistent.
>
>                                 -Keegan
>
>                                 On Tue, Jun 9, 2015 at 11:08 AM, Paolo
>                                 Di Tommaso <paolo.ditommaso@gmail.com
>                                 <ma...@gmail.com>> wrote:
>
>                                     I'm wondering if NioGroovyMethods
>                                     that implement the write methods
>                                     for Path should do the same.
>
>
>                                     Cheers,
>                                     Paolo
>
>                                     On Tue, Jun 9, 2015 at 4:02 PM,
>                                     Keegan Witt <keeganwitt@gmail.com
>                                     <ma...@gmail.com>> wrote:
>
>                                         Cool. I'll wait for PR 36 to
>                                         be merged first, because I
>                                         also was thinking the Javadoc
>                                         would be changed from
>                                         is "UTF-16BE" or "UTF-16LE"
>                                         to
>                                         is "UTF-16BE" or "UTF-16LE"
>                                         (or an equivalent alias)
>
>                                         -Keegan
>
>
>                                         On Tue, Jun 9, 2015 at 9:08
>                                         AM, Guillaume Laforge
>                                         <glaforge@gmail.com
>                                         <ma...@gmail.com>>
>                                         wrote:
>
>
>                                             2015-06-09 15:04 GMT+02:00
>                                             Keegan Witt
>                                             <keeganwitt@gmail.com
>                                             <ma...@gmail.com>>:
>
>                                                 Created GROOVY-7461
>                                                 <https://issues.apache.org/jira/browse/GROOVY-7461>
>                                                 and PR 36
>                                                 <https://github.com/apache/incubator-groovy/pull/36>.
>
>
>                                             Cool!
>
>                                                 How would you feel
>                                                 about a PR to copy the
>                                                 Javadoc comment
>                                                 mentioning the UTF-16
>                                                 BOM on File.newWriter
>                                                 to all the other
>                                                 methods that use
>                                                 writeUTF16BomIfRequired (at
>                                                 least until we decide
>                                                 we're going to change
>                                                 the current behavior)?
>
>
>                                             Right, worth it!
>
>
>                                                 -Keegan
>
>                                                 On Tue, Jun 9, 2015 at
>                                                 8:17 AM, Guillaume
>                                                 Laforge
>                                                 <glaforge@gmail.com
>                                                 <ma...@gmail.com>>
>                                                 wrote:
>
>                                                     Good point!
>
>                                                     2015-06-09 14:11
>                                                     GMT+02:00 Keegan
>                                                     Witt
>                                                     <keeganwitt@gmail.com
>                                                     <ma...@gmail.com>>:
>
>                                                         That's only
>                                                         available in
>                                                         Java 7. Isn't
>                                                         Groovy still
>                                                         targeting 1.6
>                                                         for the
>                                                         non-indy version?
>
>                                                         -Keegan
>
>                                                         On Jun 9, 2015
>                                                         7:56 AM,
>                                                         "Guillaume
>                                                         Laforge"
>                                                         <glaforge@gmail.com
>                                                         <ma...@gmail.com>>
>                                                         wrote:
>
>                                                             Well spotted!
>
>                                                             You could
>                                                             also
>                                                             compare
>                                                             with the
>                                                             StandardCharset,
>                                                             instead of
>                                                             going
>                                                             through
>                                                             the name
>                                                             comparison:
>                                                             http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>
>                                                             2015-06-09
>                                                             13:49
>                                                             GMT+02:00
>                                                             Keegan
>                                                             Witt
>                                                             <keeganwitt@gmail.com
>                                                             <ma...@gmail.com>>:
>
>                                                                 No,
>                                                                 it's a
>                                                                 Groovy
>                                                                 bug.
>
>                                                                 private static
>                                                                 void
>                                                                 writeUTF16BomIfRequired(final String charset, final OutputStream stream)throws IOException {
>                                                                      if ("UTF-16BE".equals(charset)) {
>                                                                          writeUtf16Bom(stream, true); }else if ("UTF-16LE".equals(charset)) {
>                                                                          writeUtf16Bom(stream, false); }
>                                                                 }
>
>                                                                 should be
>
>                                                                 private static
>                                                                 void
>                                                                 writeUTF16BomIfRequired(final String charset, final OutputStream stream)throws IOException {
>                                                                      if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>                                                                          writeUtf16Bom(stream, true); }else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>                                                                          writeUtf16Bom(stream, false); }
>                                                                 }
>
>                                                                 in
>                                                                 org.codehaus.groovy.runtime.ResourceGroovyMethods.
>                                                                 We'll
>                                                                 probably
>                                                                 want
>                                                                 to fix
>                                                                 that
>                                                                 regardless
>                                                                 of
>                                                                 what
>                                                                 we
>                                                                 decide
>                                                                 on the
>                                                                 /withPrintWriter/
>                                                                 question.
>                                                                 I'll
>                                                                 open a
>                                                                 Jira
>                                                                 and a PR.
>
>                                                                 -Keegan
>
>
>
>                                                                 On
>                                                                 Tue,
>                                                                 Jun 9,
>                                                                 2015
>                                                                 at
>                                                                 3:21
>                                                                 AM,
>                                                                 Guillaume
>                                                                 Laforge <glaforge@gmail.com
>                                                                 <ma...@gmail.com>>
>                                                                 wrote:
>
>                                                                     From
>                                                                     Groovy's
>                                                                     point
>                                                                     of
>                                                                     view
>                                                                     (ie.
>                                                                     when
>                                                                     you're
>                                                                     coding
>                                                                     in
>                                                                     Groovy),
>                                                                     the BOM
>                                                                     is
>                                                                     automatically
>                                                                     discarded
>                                                                     when
>                                                                     you use
>                                                                     one of
>                                                                     our reader
>                                                                     methods
>                                                                     (withReader,
>                                                                     etc),
>                                                                     so
>                                                                     it's
>                                                                     transparent
>                                                                     whether
>                                                                     the BOM
>                                                                     is
>                                                                     here
>                                                                     or
>                                                                     not.
>
>                                                                     I
>                                                                     tend
>                                                                     to
>                                                                     think
>                                                                     that
>                                                                     having
>                                                                     the BOM
>                                                                     always
>                                                                     is
>                                                                     a
>                                                                     good
>                                                                     thing
>                                                                     (I
>                                                                     even
>                                                                     thought
>                                                                     that
>                                                                     was mandatory),
>                                                                     but Groovy
>                                                                     should
>                                                                     guess
>                                                                     the endianness
>                                                                     regardless
>                                                                     anyway.
>
>                                                                     Happy
>                                                                     to
>                                                                     hear
>                                                                     what
>                                                                     others
>                                                                     think
>                                                                     too about
>                                                                     all this
>                                                                     though.
>
>                                                                     Guillaume
>
>
>                                                                     2015-06-08
>                                                                     23:20
>                                                                     GMT+02:00
>                                                                     Keegan
>                                                                     Witt
>                                                                     <keeganwitt@gmail.com
>                                                                     <ma...@gmail.com>>:
>
>                                                                         The
>                                                                         code
>                                                                         as-is
>                                                                         today
>                                                                         writes
>                                                                         the
>                                                                         BOM
>                                                                         regardless
>                                                                         of
>                                                                         platform. 
>                                                                         I just
>                                                                         tested
>                                                                         in
>                                                                         Linux
>                                                                         with
>                                                                         the
>                                                                         same
>                                                                         results.
>                                                                         I think
>                                                                         there
>                                                                         are
>                                                                         2 parts
>                                                                         to
>                                                                         the
>                                                                         question
>                                                                         of
>                                                                         "what's
>                                                                         the
>                                                                         correct
>                                                                         behavior?"
>
>
>                                                                         1.
>                                                                         Should
>                                                                         the
>                                                                         BOM
>                                                                         be
>                                                                         written
>                                                                         at
>                                                                         all,
>                                                                         particularly
>                                                                         when
>                                                                         the
>                                                                         platform
>                                                                         is
>                                                                         Windows?
>                                                                         2.
>                                                                         Should
>                                                                         the
>                                                                         behavior
>                                                                         of
>                                                                         /withPrintWriter/
>                                                                         differ
>                                                                         (even
>                                                                         if
>                                                                         the
>                                                                         difference
>                                                                         is
>                                                                         to
>                                                                         be
>                                                                         smarter)
>                                                                         from
>                                                                         the
>                                                                         behavior
>                                                                         of
>                                                                         /new
>                                                                         PrintWriter/?
>
>                                                                         *Discussion*
>                                                                         1.
>                                                                         Strictly
>                                                                         speaking,
>                                                                         yes. 
>                                                                         Because
>                                                                         RFC
>                                                                         2781
>                                                                         <http://tools.ietf.org/html/rfc2781>
>                                                                         states
>                                                                         in
>                                                                         section
>                                                                         4.3
>                                                                         to
>                                                                         assume
>                                                                         big
>                                                                         endian
>                                                                         if
>                                                                         there
>                                                                         is
>                                                                         no
>                                                                         BOM. 
>                                                                         However,
>                                                                         in
>                                                                         practice,
>                                                                         many
>                                                                         applications
>                                                                         disregard
>                                                                         the
>                                                                         RFC
>                                                                         and
>                                                                         assume
>                                                                         little-endian
>                                                                         because
>                                                                         that's
>                                                                         what
>                                                                         Windows
>                                                                         does
>                                                                         <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. 
>                                                                         Because
>                                                                         of
>                                                                         this,
>                                                                         the
>                                                                         behavior
>                                                                         could
>                                                                         be
>                                                                         changed
>                                                                         so
>                                                                         that
>                                                                         when
>                                                                         writing
>                                                                         UTF-16LE
>                                                                         on
>                                                                         Windows,
>                                                                         it
>                                                                         doesn't
>                                                                         write
>                                                                         the
>                                                                         BOM. 
>                                                                         But
>                                                                         in
>                                                                         my
>                                                                         opinion,
>                                                                         it's
>                                                                         best
>                                                                         practice
>                                                                         to
>                                                                         always
>                                                                         write
>                                                                         a BOM
>                                                                         when
>                                                                         working
>                                                                         with
>                                                                         UTF-16,
>                                                                         and
>                                                                         Java
>                                                                         should
>                                                                         have
>                                                                         done
>                                                                         this
>                                                                         in
>                                                                         their
>                                                                         implementation
>                                                                         of
>                                                                         their
>                                                                         PrintWriter.
>
>                                                                         2. 
>                                                                         This
>                                                                         is
>                                                                         a tough
>                                                                         one.
>                                                                         Arguably,
>                                                                         /withPrintWriter/
>                                                                         is
>                                                                         doing
>                                                                         the
>                                                                         smarter,
>                                                                         more
>                                                                         correct
>                                                                         behavior,
>                                                                         but
>                                                                         the
>                                                                         typical
>                                                                         user
>                                                                         would
>                                                                         assume
>                                                                         this
>                                                                         is
>                                                                         just
>                                                                         a shorthand
>                                                                         convenience
>                                                                         for
>                                                                         newing
>                                                                         up
>                                                                         a PrintWriter
>                                                                         (I
>                                                                         certainly
>                                                                         did). 
>                                                                         So
>                                                                         the
>                                                                         question
>                                                                         is,
>                                                                         is
>                                                                         it
>                                                                         better
>                                                                         to
>                                                                         just
>                                                                         document
>                                                                         this
>                                                                         difference
>                                                                         in
>                                                                         the
>                                                                         GroovyDoc? 
>                                                                         Or
>                                                                         to
>                                                                         change
>                                                                         the
>                                                                         behavior
>                                                                         to
>                                                                         be
>                                                                         closer
>                                                                         to
>                                                                         Java? 
>                                                                         And
>                                                                         if
>                                                                         the
>                                                                         latter,
>                                                                         what
>                                                                         breakages
>                                                                         would
>                                                                         that
>                                                                         cause
>                                                                         within
>                                                                         Groovy
>                                                                         itself?
>                                                                         Making
>                                                                         that
>                                                                         change
>                                                                         could
>                                                                         break
>                                                                         folks
>                                                                         in
>                                                                         production,
>                                                                         because
>                                                                         they
>                                                                         could
>                                                                         rely
>                                                                         on
>                                                                         that
>                                                                         BOM
>                                                                         being
>                                                                         there,
>                                                                         in
>                                                                         cases
>                                                                         for
>                                                                         example
>                                                                         where
>                                                                         the
>                                                                         file
>                                                                         is
>                                                                         created
>                                                                         on
>                                                                         Windows,
>                                                                         but
>                                                                         then
>                                                                         processed
>                                                                         on
>                                                                         Linux
>                                                                         or
>                                                                         when
>                                                                         working
>                                                                         with
>                                                                         a third
>                                                                         party
>                                                                         library
>                                                                         that
>                                                                         is
>                                                                         more
>                                                                         picky
>                                                                         about
>                                                                         the
>                                                                         presence
>                                                                         of
>                                                                         a BOM.
>
>                                                                         -Keegan
>
>                                                                         On
>                                                                         Mon,
>                                                                         Jun
>                                                                         8,
>                                                                         2015
>                                                                         at
>                                                                         4:32
>                                                                         PM,
>                                                                         Guillaume
>                                                                         Laforge
>                                                                         <glaforge@gmail.com
>                                                                         <ma...@gmail.com>>
>                                                                         wrote:
>
>                                                                             Now...
>                                                                             is
>                                                                             it
>                                                                             what
>                                                                             should
>                                                                             be
>                                                                             done
>                                                                             or
>                                                                             not
>                                                                             is
>                                                                             the
>                                                                             good
>                                                                             question
>                                                                             to
>                                                                             ask
>                                                                             :-)
>
>                                                                             Does
>                                                                             Windows
>                                                                             manages
>                                                                             to
>                                                                             open
>                                                                             UTF-16
>                                                                             files
>                                                                             without
>                                                                             BOMs?
>
>                                                                             2015-06-08
>                                                                             22:17
>                                                                             GMT+02:00
>                                                                             Keegan
>                                                                             Witt
>                                                                             <keeganwitt@gmail.com
>                                                                             <ma...@gmail.com>>:
>
>                                                                                 I forgot
>                                                                                 to
>                                                                                 mention
>                                                                                 that.
>                                                                                 Yes,
>                                                                                 I ran
>                                                                                 the
>                                                                                 test
>                                                                                 mentioned
>                                                                                 in
>                                                                                 Windows.
>
>                                                                                 On
>                                                                                 Mon,
>                                                                                 Jun
>                                                                                 8,
>                                                                                 2015
>                                                                                 at
>                                                                                 3:54
>                                                                                 PM,
>                                                                                 Guillaume
>                                                                                 Laforge
>                                                                                 <glaforge@gmail.com
>                                                                                 <ma...@gmail.com>>
>                                                                                 wrote:
>
>                                                                                     That's
>                                                                                     a good
>                                                                                     question.
>
>                                                                                     I guess
>                                                                                     this
>                                                                                     is
>                                                                                     happening
>                                                                                     on
>                                                                                     Windows?
>                                                                                     (I
>                                                                                     haven't
>                                                                                     tried
>                                                                                     here,
>                                                                                     since
>                                                                                     I'm
>                                                                                     on
>                                                                                     OS
>                                                                                     X)
>                                                                                     I think
>                                                                                     BOMs
>                                                                                     were
>                                                                                     mandatory
>                                                                                     in
>                                                                                     text
>                                                                                     files
>                                                                                     on
>                                                                                     Windows.
>
>                                                                                     2015-06-08
>                                                                                     17:53
>                                                                                     GMT+02:00
>                                                                                     Keegan
>                                                                                     Witt
>                                                                                     <keeganwitt@gmail.com
>                                                                                     <ma...@gmail.com>>:
>
>                                                                                         I've
>                                                                                         always
>                                                                                         taken
>                                                                                         a perverse
>                                                                                         pleasure
>                                                                                         in
>                                                                                         character
>                                                                                         encoding
>                                                                                         problems. 
>                                                                                         I was
>                                                                                         intrigued
>                                                                                         by
>                                                                                         this
>                                                                                         SO
>                                                                                         question
>                                                                                         <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>                                                                                         UTF
>                                                                                         16
>                                                                                         BOMs
>                                                                                         in
>                                                                                         Java
>                                                                                         vs
>                                                                                         Groovy.
>
>
>                                                                                         It
>                                                                                         appears
>                                                                                         using
>                                                                                         withPrintWriter(charset)
>                                                                                         produces
>                                                                                         a BOM
>                                                                                         whereas
>                                                                                         new
>                                                                                         PrintWriter(file,
>                                                                                         charset)
>                                                                                         does
>                                                                                         not. 
>                                                                                         As
>                                                                                         demonstrated
>                                                                                         here:
>
>                                                                                         |Filefile
>                                                                                         =newFile("tmp.txt")try{Stringtext
>                                                                                         ="
>                                                                                         "Stringcharset
>                                                                                         ="UTF-16LE"file.withPrintWriter(charset){it
>                                                                                         <<text
>                                                                                         }println
>                                                                                         "withPrintWriter"file.getBytes().each
>                                                                                         {System.out.format("%02x
>                                                                                         ",it)}PrintWriterw
>                                                                                         =newPrintWriter(file,charset)w.print(text)w.close()println
>                                                                                         "\n\nnew
>                                                                                         PrintWriter"file.getBytes().each
>                                                                                         {System.out.format("%02x
>                                                                                         ",it)}}finally{file.delete()}|
>
>                                                                                         Outputs
>
>                                                                                         withPrintWriter
>                                                                                         ff fe 20 00
>
>                                                                                         new PrintWriter
>                                                                                         20 00
>
>
>                                                                                         Is
>                                                                                         this
>                                                                                         difference
>                                                                                         in
>                                                                                         behavior
>                                                                                         intentional?
>                                                                                         It
>                                                                                         seems
>                                                                                         kinda
>                                                                                         odd
>                                                                                         to
>                                                                                         me.
>
>                                                                                         -Keegan
>
>
>
>
>                                                                                     --
>
>                                                                                     Guillaume
>                                                                                     Laforge
>                                                                                     Groovy
>                                                                                     Project
>                                                                                     Manager
>                                                                                     Product
>                                                                                     Ninja
>                                                                                     & Advocate
>                                                                                     at
>                                                                                     Restlet
>                                                                                     <http://restlet.com>
>
>                                                                                     Blog:
>                                                                                     http://glaforge.appspot.com/
>
>                                                                                     Social:
>                                                                                     @glaforge
>                                                                                     <http://twitter.com/glaforge> /
>                                                                                     Google+
>                                                                                     <https://plus.google.com/u/0/114130972232398734985/posts>
>
>
>
>
>
>                                                                             --
>
>                                                                             Guillaume
>                                                                             Laforge
>                                                                             Groovy
>                                                                             Project
>                                                                             Manager
>                                                                             Product
>                                                                             Ninja
>                                                                             & Advocate
>                                                                             at
>                                                                             Restlet
>                                                                             <http://restlet.com>
>
>                                                                             Blog:
>                                                                             http://glaforge.appspot.com/
>
>                                                                             Social:
>                                                                             @glaforge
>                                                                             <http://twitter.com/glaforge> /
>                                                                             Google+
>                                                                             <https://plus.google.com/u/0/114130972232398734985/posts>
>
>
>
>
>
>                                                                     -- 
>                                                                     Guillaume
>                                                                     Laforge
>                                                                     Groovy
>                                                                     Project
>                                                                     Manager
>                                                                     Product
>                                                                     Ninja
>                                                                     &
>                                                                     Advocate
>                                                                     at
>                                                                     Restlet
>                                                                     <http://restlet.com>
>
>                                                                     Blog:
>                                                                     http://glaforge.appspot.com/
>
>                                                                     Social:
>                                                                     @glaforge
>                                                                     <http://twitter.com/glaforge> /
>                                                                     Google+
>                                                                     <https://plus.google.com/u/0/114130972232398734985/posts>
>
>
>
>
>
>                                                             -- 
>                                                             Guillaume
>                                                             Laforge
>                                                             Groovy
>                                                             Project
>                                                             Manager
>                                                             Product
>                                                             Ninja &
>                                                             Advocate
>                                                             at Restlet
>                                                             <http://restlet.com>
>
>                                                             Blog:
>                                                             http://glaforge.appspot.com/
>
>                                                             Social:
>                                                             @glaforge
>                                                             <http://twitter.com/glaforge> /
>                                                             Google+
>                                                             <https://plus.google.com/u/0/114130972232398734985/posts>
>
>
>
>
>                                                     -- 
>                                                     Guillaume Laforge
>                                                     Groovy Project Manager
>                                                     Product Ninja &
>                                                     Advocate at
>                                                     Restlet
>                                                     <http://restlet.com>
>
>                                                     Blog:
>                                                     http://glaforge.appspot.com/
>
>                                                     Social: @glaforge
>                                                     <http://twitter.com/glaforge> /
>                                                     Google+
>                                                     <https://plus.google.com/u/0/114130972232398734985/posts>
>
>
>
>
>
>                                             -- 
>                                             Guillaume Laforge
>                                             Groovy Project Manager
>                                             Product Ninja & Advocate
>                                             at Restlet
>                                             <http://restlet.com>
>
>                                             Blog:
>                                             http://glaforge.appspot.com/
>                                             Social: @glaforge
>                                             <http://twitter.com/glaforge> /
>                                             Google+
>                                             <https://plus.google.com/u/0/114130972232398734985/posts>
>
>
>
>
>
>
>
>                             -- 
>                             Guillaume Laforge
>                             Groovy Project Manager
>                             Product Ninja & Advocate at Restlet
>                             <http://restlet.com>
>
>                             Blog: http://glaforge.appspot.com/
>                             Social: @glaforge
>                             <http://twitter.com/glaforge> / Google+
>                             <https://plus.google.com/u/0/114130972232398734985/posts>
>
>
>
>
>
>                     -- 
>                     Guillaume Laforge
>                     Groovy Project Manager
>                     Product Ninja & Advocate at Restlet
>                     <http://restlet.com>
>
>                     Blog: http://glaforge.appspot.com/
>                     Social: @glaforge <http://twitter.com/glaforge> /
>                     Google+
>                     <https://plus.google.com/u/0/114130972232398734985/posts>
>
>
>
>
>
>             -- 
>             Guillaume Laforge
>             Groovy Project Manager
>             Product Ninja & Advocate at Restlet <http://restlet.com>
>
>             Blog: http://glaforge.appspot.com/
>             Social: @glaforge <http://twitter.com/glaforge> / Google+
>             <https://plus.google.com/u/0/114130972232398734985/posts>
>
>
>
>


Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
I'm starting work on this.  Just to be clear (since we didn't really
discuss this): Do we want to make only newPrintWriter() not default to
writing a BOM?  Or also write() and append() methods not default to writing
a BOM?  I was thinking we would change all 3 so their behavior is
consistent.  What do you think?

On Thu, Jun 11, 2015 at 9:19 AM, Keegan Witt <ke...@gmail.com> wrote:

> I created GROOVY-7465 <https://issues.apache.org/jira/browse/GROOVY-7465> to
> track this.
>
> -Keegan
>
> On Tue, Jun 9, 2015 at 4:04 PM, Keegan Witt <ke...@gmail.com> wrote:
>
>> I'd be OK with that.  I think having false by default is the *Right
>> Thing™*, but true has a certain allure since it'd reduce the risk of
>> breaking existing code (hard to guess how likely breakage is).  Tough
>> choice.  Even if we defaulted to true, it's an improvement over current
>> state since it gives users the flexibility, and calling it out as a
>> parameter might elicit more thought and attention than just a JavaDoc
>> comment.
>>
>> On Tue, Jun 9, 2015 at 3:50 PM, Guillaume Laforge <gl...@gmail.com>
>> wrote:
>>
>>> So let's say, perhaps, we don't generate a BOM, unless asked
>>> specifically... but not with new methods, but with new parameters to such
>>> methods. In addition to specifying a charset, we could also pass a boolean
>>> saying we want a BOM to be generated (false by default, needs to be
>>> specified as true if BOM wanted) ?
>>>
>>> 2015-06-09 21:47 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> I get that -- and I wish JDK did the same.  But what bothers me most
>>>> about the current state is that sometimes it's transparent, sometimes it's
>>>> not -- depending on how it was invoked.  And while we could fix the new
>>>> instance usage too with metaClass, that could lead to weird inconsistencies
>>>> when Groovy is invoked from Java.
>>>>
>>>> I really think most users would not expect these two usages to behave
>>>> differently.  I think most would expect the difference to be stylistic
>>>> only.  So as much as it pains me to say this, I think it's better not to
>>>> violate the principle of least surprise, and remain consistent across all
>>>> styles of invocation with Java's poor life choices.
>>>>
>>>> But maybe the friendlier APIs can be moved into new methods, such as
>>>> newBomAwareWriter() / WithBomAwareWriter{}  What do you think?  If we
>>>> did that, I guess it'd be consistent to do the same for the readers as well.
>>>>
>>>> -Keegan
>>>>
>>>> On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <gl...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> 2015-06-09 18:57 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>
>>>>>> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37>
>>>>>> to correct the JavaDoc I mentioned (as well as to document the existing
>>>>>> behavior for the non-NIO methods).
>>>>>>
>>>>>> Java doesn't eat the BOM, but this is a problem Java folks are used
>>>>>> to dealing with, and why things like Apache Common-IO's
>>>>>> BOMInputStream
>>>>>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>>>>>> exist.
>>>>>>
>>>>>
>>>>> That's also why I made Groovy eat the BOM too, so that it's
>>>>> transparent to our users :-)
>>>>> But that was a long time ago since I worked on those parts of the
>>>>> codebase, and it's been refactored quite a bit (by Jim for example).
>>>>>
>>>>>
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <
>>>>>> glaforge@gmail.com> wrote:
>>>>>>
>>>>>>> So now, how to decide what's best? :-)
>>>>>>>
>>>>>>> Is a Java reader happy with the BOM? and eats it transparently? (I
>>>>>>> think in the past that wasn't the case but I may be wrong)
>>>>>>>
>>>>>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>
>>>>>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter
>>>>>>>> claims (in the JavaDoc) it will write the BOM if needed, but it doesn't
>>>>>>>> because it uses Java's implementation rather than with Groovy's
>>>>>>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods
>>>>>>>>  use writeUTF16BomIfRequired.
>>>>>>>>
>>>>>>>> Whichever we decide, we should be consistent.
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>>>>>> paolo.ditommaso@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I'm wondering if NioGroovyMethods that implement the write methods
>>>>>>>>> for Path should do the same.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Paolo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>>>>>>>>> thinking the Javadoc would be changed from
>>>>>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>>>>>> to
>>>>>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <
>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> Created GROOVY-7461
>>>>>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cool!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> How would you feel about a PR to copy the Javadoc comment
>>>>>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the other
>>>>>>>>>>>> methods that use writeUTF16BomIfRequired (at least until we
>>>>>>>>>>>> decide we're going to change the current behavior)?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Right, worth it!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Good point!
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's only available in Java 7.  Isn't Groovy still
>>>>>>>>>>>>>> targeting 1.6 for the non-indy version?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <
>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well spotted!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You could also compare with the StandardCharset, instead of
>>>>>>>>>>>>>>> going through the name comparison:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganwitt@gmail.com
>>>>>>>>>>>>>>> >:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.
>>>>>>>>>>>>>>>> We'll probably want to fix that regardless of what we decide on the
>>>>>>>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in
>>>>>>>>>>>>>>>>> Groovy), the BOM is automatically discarded when you use one of our reader
>>>>>>>>>>>>>>>>> methods (withReader, etc), so it's transparent whether the BOM is here or
>>>>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I tend to think that having the BOM always is a good thing
>>>>>>>>>>>>>>>>> (I even thought that was mandatory), but Groovy should guess the endianness
>>>>>>>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The code as-is today writes the BOM regardless of
>>>>>>>>>>>>>>>>>> platform.  I just tested in Linux with the same results.  I think there are
>>>>>>>>>>>>>>>>>> 2 parts to the question of "what's the correct behavior?"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when
>>>>>>>>>>>>>>>>>> the platform is Windows?
>>>>>>>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ
>>>>>>>>>>>>>>>>>> (even if the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section
>>>>>>>>>>>>>>>>>> 4.3 to assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is
>>>>>>>>>>>>>>>>>> doing the smarter, more correct behavior, but the typical user would assume
>>>>>>>>>>>>>>>>>> this is just a shorthand convenience for newing up a PrintWriter (I
>>>>>>>>>>>>>>>>>> certainly did).  So the question is, is it better to just document this
>>>>>>>>>>>>>>>>>> difference in the GroovyDoc?  Or to change the behavior to be closer to
>>>>>>>>>>>>>>>>>> Java?  And if the latter, what breakages would that cause within Groovy
>>>>>>>>>>>>>>>>>> itself?  Making that change could break folks in production, because they
>>>>>>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where the file is
>>>>>>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working with a
>>>>>>>>>>>>>>>>>> third party library that is more picky about the presence of a BOM.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Now... is it what should be done or not is the good
>>>>>>>>>>>>>>>>>>> question to ask :-)
>>>>>>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test
>>>>>>>>>>>>>>>>>>>> mentioned in Windows.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried
>>>>>>>>>>>>>>>>>>>>> here, since I'm on OS X)
>>>>>>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character
>>>>>>>>>>>>>>>>>>>>>> encoding problems.  I was intrigued by this SO
>>>>>>>>>>>>>>>>>>>>>> question
>>>>>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a
>>>>>>>>>>>>>>>>>>>>>> BOM whereas new PrintWriter(file, charset) does
>>>>>>>>>>>>>>>>>>>>>> not.  As demonstrated here:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems
>>>>>>>>>>>>>>>>>>>>>> kinda odd to me.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
>>>>>>>>>>>>>>>>>>>>> <http://restlet.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
I created GROOVY-7465 <https://issues.apache.org/jira/browse/GROOVY-7465> to
track this.

-Keegan

On Tue, Jun 9, 2015 at 4:04 PM, Keegan Witt <ke...@gmail.com> wrote:

> I'd be OK with that.  I think having false by default is the *Right
> Thing™*, but true has a certain allure since it'd reduce the risk of
> breaking existing code (hard to guess how likely breakage is).  Tough
> choice.  Even if we defaulted to true, it's an improvement over current
> state since it gives users the flexibility, and calling it out as a
> parameter might elicit more thought and attention than just a JavaDoc
> comment.
>
> On Tue, Jun 9, 2015 at 3:50 PM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>> So let's say, perhaps, we don't generate a BOM, unless asked
>> specifically... but not with new methods, but with new parameters to such
>> methods. In addition to specifying a charset, we could also pass a boolean
>> saying we want a BOM to be generated (false by default, needs to be
>> specified as true if BOM wanted) ?
>>
>> 2015-06-09 21:47 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> I get that -- and I wish JDK did the same.  But what bothers me most
>>> about the current state is that sometimes it's transparent, sometimes it's
>>> not -- depending on how it was invoked.  And while we could fix the new
>>> instance usage too with metaClass, that could lead to weird inconsistencies
>>> when Groovy is invoked from Java.
>>>
>>> I really think most users would not expect these two usages to behave
>>> differently.  I think most would expect the difference to be stylistic
>>> only.  So as much as it pains me to say this, I think it's better not to
>>> violate the principle of least surprise, and remain consistent across all
>>> styles of invocation with Java's poor life choices.
>>>
>>> But maybe the friendlier APIs can be moved into new methods, such as new
>>> BomAwareWriter() / WithBomAwareWriter{}  What do you think?  If we did
>>> that, I guess it'd be consistent to do the same for the readers as well.
>>>
>>> -Keegan
>>>
>>> On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <gl...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> 2015-06-09 18:57 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>
>>>>> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37>
>>>>> to correct the JavaDoc I mentioned (as well as to document the existing
>>>>> behavior for the non-NIO methods).
>>>>>
>>>>> Java doesn't eat the BOM, but this is a problem Java folks are used to
>>>>> dealing with, and why things like Apache Common-IO's BOMInputStream
>>>>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>>>>> exist.
>>>>>
>>>>
>>>> That's also why I made Groovy eat the BOM too, so that it's transparent
>>>> to our users :-)
>>>> But that was a long time ago since I worked on those parts of the
>>>> codebase, and it's been refactored quite a bit (by Jim for example).
>>>>
>>>>
>>>>>
>>>>> -Keegan
>>>>>
>>>>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <glaforge@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> So now, how to decide what's best? :-)
>>>>>>
>>>>>> Is a Java reader happy with the BOM? and eats it transparently? (I
>>>>>> think in the past that wasn't the case but I may be wrong)
>>>>>>
>>>>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>
>>>>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter
>>>>>>> claims (in the JavaDoc) it will write the BOM if needed, but it doesn't
>>>>>>> because it uses Java's implementation rather than with Groovy's
>>>>>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods
>>>>>>>  use writeUTF16BomIfRequired.
>>>>>>>
>>>>>>> Whichever we decide, we should be consistent.
>>>>>>>
>>>>>>> -Keegan
>>>>>>>
>>>>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>>>>> paolo.ditommaso@gmail.com> wrote:
>>>>>>>
>>>>>>>> I'm wondering if NioGroovyMethods that implement the write methods
>>>>>>>> for Path should do the same.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Paolo
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>>>>>>>> thinking the Javadoc would be changed from
>>>>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>>>>> to
>>>>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>>>>
>>>>>>>>> -Keegan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <
>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Created GROOVY-7461
>>>>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cool!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> How would you feel about a PR to copy the Javadoc comment
>>>>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the other
>>>>>>>>>>> methods that use writeUTF16BomIfRequired (at least until we
>>>>>>>>>>> decide we're going to change the current behavior)?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Right, worth it!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Keegan
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Good point!
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> That's only available in Java 7.  Isn't Groovy still targeting
>>>>>>>>>>>>> 1.6 for the non-indy version?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <
>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well spotted!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You could also compare with the StandardCharset, instead of
>>>>>>>>>>>>>> going through the name comparison:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>
>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>>>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in
>>>>>>>>>>>>>>>> Groovy), the BOM is automatically discarded when you use one of our reader
>>>>>>>>>>>>>>>> methods (withReader, etc), so it's transparent whether the BOM is here or
>>>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I tend to think that having the BOM always is a good thing
>>>>>>>>>>>>>>>> (I even thought that was mandatory), but Groovy should guess the endianness
>>>>>>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The code as-is today writes the BOM regardless of
>>>>>>>>>>>>>>>>> platform.  I just tested in Linux with the same results.  I think there are
>>>>>>>>>>>>>>>>> 2 parts to the question of "what's the correct behavior?"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when
>>>>>>>>>>>>>>>>> the platform is Windows?
>>>>>>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even
>>>>>>>>>>>>>>>>> if the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section
>>>>>>>>>>>>>>>>> 4.3 to assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is
>>>>>>>>>>>>>>>>> doing the smarter, more correct behavior, but the typical user would assume
>>>>>>>>>>>>>>>>> this is just a shorthand convenience for newing up a PrintWriter (I
>>>>>>>>>>>>>>>>> certainly did).  So the question is, is it better to just document this
>>>>>>>>>>>>>>>>> difference in the GroovyDoc?  Or to change the behavior to be closer to
>>>>>>>>>>>>>>>>> Java?  And if the latter, what breakages would that cause within Groovy
>>>>>>>>>>>>>>>>> itself?  Making that change could break folks in production, because they
>>>>>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where the file is
>>>>>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working with a
>>>>>>>>>>>>>>>>> third party library that is more picky about the presence of a BOM.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Now... is it what should be done or not is the good
>>>>>>>>>>>>>>>>>> question to ask :-)
>>>>>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned
>>>>>>>>>>>>>>>>>>> in Windows.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried
>>>>>>>>>>>>>>>>>>>> here, since I'm on OS X)
>>>>>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character
>>>>>>>>>>>>>>>>>>>>> encoding problems.  I was intrigued by this SO
>>>>>>>>>>>>>>>>>>>>> question
>>>>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a
>>>>>>>>>>>>>>>>>>>>> BOM whereas new PrintWriter(file, charset) does not.
>>>>>>>>>>>>>>>>>>>>> As demonstrated here:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems
>>>>>>>>>>>>>>>>>>>>> kinda odd to me.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
>>>>>>>>>>>>>>>>>>>> <http://restlet.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Laforge
>>>>>>>>>> Groovy Project Manager
>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>
>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>> Blog: http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
I'd be OK with that.  I think having false by default is the *Right Thing™*,
but true has a certain allure since it'd reduce the risk of breaking
existing code (hard to guess how likely breakage is).  Tough choice.  Even
if we defaulted to true, it's an improvement over current state since it
gives users the flexibility, and calling it out as a parameter might elicit
more thought and attention than just a JavaDoc comment.

On Tue, Jun 9, 2015 at 3:50 PM, Guillaume Laforge <gl...@gmail.com>
wrote:

> So let's say, perhaps, we don't generate a BOM, unless asked
> specifically... but not with new methods, but with new parameters to such
> methods. In addition to specifying a charset, we could also pass a boolean
> saying we want a BOM to be generated (false by default, needs to be
> specified as true if BOM wanted) ?
>
> 2015-06-09 21:47 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>
>> I get that -- and I wish JDK did the same.  But what bothers me most
>> about the current state is that sometimes it's transparent, sometimes it's
>> not -- depending on how it was invoked.  And while we could fix the new
>> instance usage too with metaClass, that could lead to weird inconsistencies
>> when Groovy is invoked from Java.
>>
>> I really think most users would not expect these two usages to behave
>> differently.  I think most would expect the difference to be stylistic
>> only.  So as much as it pains me to say this, I think it's better not to
>> violate the principle of least surprise, and remain consistent across all
>> styles of invocation with Java's poor life choices.
>>
>> But maybe the friendlier APIs can be moved into new methods, such as new
>> BomAwareWriter() / WithBomAwareWriter{}  What do you think?  If we did
>> that, I guess it'd be consistent to do the same for the readers as well.
>>
>> -Keegan
>>
>> On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <gl...@gmail.com>
>> wrote:
>>
>>>
>>> 2015-06-09 18:57 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37>
>>>> to correct the JavaDoc I mentioned (as well as to document the existing
>>>> behavior for the non-NIO methods).
>>>>
>>>> Java doesn't eat the BOM, but this is a problem Java folks are used to
>>>> dealing with, and why things like Apache Common-IO's BOMInputStream
>>>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>>>> exist.
>>>>
>>>
>>> That's also why I made Groovy eat the BOM too, so that it's transparent
>>> to our users :-)
>>> But that was a long time ago since I worked on those parts of the
>>> codebase, and it's been refactored quite a bit (by Jim for example).
>>>
>>>
>>>>
>>>> -Keegan
>>>>
>>>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <gl...@gmail.com>
>>>> wrote:
>>>>
>>>>> So now, how to decide what's best? :-)
>>>>>
>>>>> Is a Java reader happy with the BOM? and eats it transparently? (I
>>>>> think in the past that wasn't the case but I may be wrong)
>>>>>
>>>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>
>>>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims
>>>>>> (in the JavaDoc) it will write the BOM if needed, but it doesn't because it
>>>>>> uses Java's implementation rather than with Groovy's
>>>>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods
>>>>>>  use writeUTF16BomIfRequired.
>>>>>>
>>>>>> Whichever we decide, we should be consistent.
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>>>> paolo.ditommaso@gmail.com> wrote:
>>>>>>
>>>>>>> I'm wondering if NioGroovyMethods that implement the write methods
>>>>>>> for Path should do the same.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Paolo
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>>>>>>> thinking the Javadoc would be changed from
>>>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>>>> to
>>>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <
>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Created GROOVY-7461
>>>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cool!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> How would you feel about a PR to copy the Javadoc comment
>>>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the other
>>>>>>>>>> methods that use writeUTF16BomIfRequired (at least until we
>>>>>>>>>> decide we're going to change the current behavior)?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Right, worth it!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Good point!
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> That's only available in Java 7.  Isn't Groovy still targeting
>>>>>>>>>>>> 1.6 for the non-indy version?
>>>>>>>>>>>>
>>>>>>>>>>>> -Keegan
>>>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Well spotted!
>>>>>>>>>>>>>
>>>>>>>>>>>>> You could also compare with the StandardCharset, instead of
>>>>>>>>>>>>> going through the name comparison:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in
>>>>>>>>>>>>>>> Groovy), the BOM is automatically discarded when you use one of our reader
>>>>>>>>>>>>>>> methods (withReader, etc), so it's transparent whether the BOM is here or
>>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I tend to think that having the BOM always is a good thing
>>>>>>>>>>>>>>> (I even thought that was mandatory), but Groovy should guess the endianness
>>>>>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganwitt@gmail.com
>>>>>>>>>>>>>>> >:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The code as-is today writes the BOM regardless of
>>>>>>>>>>>>>>>> platform.  I just tested in Linux with the same results.  I think there are
>>>>>>>>>>>>>>>> 2 parts to the question of "what's the correct behavior?"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>>>>>>>>>> platform is Windows?
>>>>>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even
>>>>>>>>>>>>>>>> if the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3
>>>>>>>>>>>>>>>> to assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is
>>>>>>>>>>>>>>>> doing the smarter, more correct behavior, but the typical user would assume
>>>>>>>>>>>>>>>> this is just a shorthand convenience for newing up a PrintWriter (I
>>>>>>>>>>>>>>>> certainly did).  So the question is, is it better to just document this
>>>>>>>>>>>>>>>> difference in the GroovyDoc?  Or to change the behavior to be closer to
>>>>>>>>>>>>>>>> Java?  And if the latter, what breakages would that cause within Groovy
>>>>>>>>>>>>>>>> itself?  Making that change could break folks in production, because they
>>>>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where the file is
>>>>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working with a
>>>>>>>>>>>>>>>> third party library that is more picky about the presence of a BOM.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now... is it what should be done or not is the good
>>>>>>>>>>>>>>>>> question to ask :-)
>>>>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned
>>>>>>>>>>>>>>>>>> in Windows.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried
>>>>>>>>>>>>>>>>>>> here, since I'm on OS X)
>>>>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character
>>>>>>>>>>>>>>>>>>>> encoding problems.  I was intrigued by this SO question
>>>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a
>>>>>>>>>>>>>>>>>>>> BOM whereas new PrintWriter(file, charset) does not.
>>>>>>>>>>>>>>>>>>>> As demonstrated here:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems
>>>>>>>>>>>>>>>>>>>> kinda odd to me.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
So let's say, perhaps, we don't generate a BOM, unless asked
specifically... but not with new methods, but with new parameters to such
methods. In addition to specifying a charset, we could also pass a boolean
saying we want a BOM to be generated (false by default, needs to be
specified as true if BOM wanted) ?

2015-06-09 21:47 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> I get that -- and I wish JDK did the same.  But what bothers me most about
> the current state is that sometimes it's transparent, sometimes it's not --
> depending on how it was invoked.  And while we could fix the new instance
> usage too with metaClass, that could lead to weird inconsistencies when
> Groovy is invoked from Java.
>
> I really think most users would not expect these two usages to behave
> differently.  I think most would expect the difference to be stylistic
> only.  So as much as it pains me to say this, I think it's better not to
> violate the principle of least surprise, and remain consistent across all
> styles of invocation with Java's poor life choices.
>
> But maybe the friendlier APIs can be moved into new methods, such as new
> BomAwareWriter() / WithBomAwareWriter{}  What do you think?  If we did
> that, I guess it'd be consistent to do the same for the readers as well.
>
> -Keegan
>
> On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>>
>> 2015-06-09 18:57 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37> to
>>> correct the JavaDoc I mentioned (as well as to document the existing
>>> behavior for the non-NIO methods).
>>>
>>> Java doesn't eat the BOM, but this is a problem Java folks are used to
>>> dealing with, and why things like Apache Common-IO's BOMInputStream
>>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>>> exist.
>>>
>>
>> That's also why I made Groovy eat the BOM too, so that it's transparent
>> to our users :-)
>> But that was a long time ago since I worked on those parts of the
>> codebase, and it's been refactored quite a bit (by Jim for example).
>>
>>
>>>
>>> -Keegan
>>>
>>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <gl...@gmail.com>
>>> wrote:
>>>
>>>> So now, how to decide what's best? :-)
>>>>
>>>> Is a Java reader happy with the BOM? and eats it transparently? (I
>>>> think in the past that wasn't the case but I may be wrong)
>>>>
>>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>
>>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims
>>>>> (in the JavaDoc) it will write the BOM if needed, but it doesn't because it
>>>>> uses Java's implementation rather than with Groovy's
>>>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods use
>>>>> writeUTF16BomIfRequired.
>>>>>
>>>>> Whichever we decide, we should be consistent.
>>>>>
>>>>> -Keegan
>>>>>
>>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>>> paolo.ditommaso@gmail.com> wrote:
>>>>>
>>>>>> I'm wondering if NioGroovyMethods that implement the write methods
>>>>>> for Path should do the same.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Paolo
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>>>>>> thinking the Javadoc would be changed from
>>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>>> to
>>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>>
>>>>>>> -Keegan
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <
>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>
>>>>>>>>> Created GROOVY-7461
>>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Cool!
>>>>>>>>
>>>>>>>>
>>>>>>>>> How would you feel about a PR to copy the Javadoc comment
>>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the other
>>>>>>>>> methods that use writeUTF16BomIfRequired (at least until we
>>>>>>>>> decide we're going to change the current behavior)?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Right, worth it!
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Keegan
>>>>>>>>>
>>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Good point!
>>>>>>>>>>
>>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> That's only available in Java 7.  Isn't Groovy still targeting
>>>>>>>>>>> 1.6 for the non-indy version?
>>>>>>>>>>>
>>>>>>>>>>> -Keegan
>>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Well spotted!
>>>>>>>>>>>>
>>>>>>>>>>>> You could also compare with the StandardCharset, instead of
>>>>>>>>>>>> going through the name comparison:
>>>>>>>>>>>>
>>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>>>
>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>     }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> should be
>>>>>>>>>>>>>
>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>     }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in
>>>>>>>>>>>>>> Groovy), the BOM is automatically discarded when you use one of our reader
>>>>>>>>>>>>>> methods (withReader, etc), so it's transparent whether the BOM is here or
>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I tend to think that having the BOM always is a good thing (I
>>>>>>>>>>>>>> even thought that was mandatory), but Groovy should guess the endianness
>>>>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>
>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The code as-is today writes the BOM regardless of platform.
>>>>>>>>>>>>>>> I just tested in Linux with the same results.  I think there are 2 parts to
>>>>>>>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>>>>>>>>> platform is Windows?
>>>>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even
>>>>>>>>>>>>>>> if the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3
>>>>>>>>>>>>>>> to assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is
>>>>>>>>>>>>>>> doing the smarter, more correct behavior, but the typical user would assume
>>>>>>>>>>>>>>> this is just a shorthand convenience for newing up a PrintWriter (I
>>>>>>>>>>>>>>> certainly did).  So the question is, is it better to just document this
>>>>>>>>>>>>>>> difference in the GroovyDoc?  Or to change the behavior to be closer to
>>>>>>>>>>>>>>> Java?  And if the latter, what breakages would that cause within Groovy
>>>>>>>>>>>>>>> itself?  Making that change could break folks in production, because they
>>>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where the file is
>>>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working with a
>>>>>>>>>>>>>>> third party library that is more picky about the presence of a BOM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now... is it what should be done or not is the good
>>>>>>>>>>>>>>>> question to ask :-)
>>>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned
>>>>>>>>>>>>>>>>> in Windows.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried
>>>>>>>>>>>>>>>>>> here, since I'm on OS X)
>>>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character
>>>>>>>>>>>>>>>>>>> encoding problems.  I was intrigued by this SO question
>>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a
>>>>>>>>>>>>>>>>>>> BOM whereas new PrintWriter(file, charset) does not.
>>>>>>>>>>>>>>>>>>> As demonstrated here:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems
>>>>>>>>>>>>>>>>>>> kinda odd to me.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Laforge
>>>>>>>>>> Groovy Project Manager
>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>
>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Laforge
>>>>>>>> Groovy Project Manager
>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>
>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
I get that -- and I wish JDK did the same.  But what bothers me most about
the current state is that sometimes it's transparent, sometimes it's not --
depending on how it was invoked.  And while we could fix the new instance
usage too with metaClass, that could lead to weird inconsistencies when
Groovy is invoked from Java.

I really think most users would not expect these two usages to behave
differently.  I think most would expect the difference to be stylistic
only.  So as much as it pains me to say this, I think it's better not to
violate the principle of least surprise, and remain consistent across all
styles of invocation with Java's poor life choices.

But maybe the friendlier APIs can be moved into new methods, such as new
BomAwareWriter() / WithBomAwareWriter{}  What do you think?  If we did
that, I guess it'd be consistent to do the same for the readers as well.

-Keegan

On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <gl...@gmail.com>
wrote:

>
> 2015-06-09 18:57 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>
>> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37> to
>> correct the JavaDoc I mentioned (as well as to document the existing
>> behavior for the non-NIO methods).
>>
>> Java doesn't eat the BOM, but this is a problem Java folks are used to
>> dealing with, and why things like Apache Common-IO's BOMInputStream
>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>> exist.
>>
>
> That's also why I made Groovy eat the BOM too, so that it's transparent to
> our users :-)
> But that was a long time ago since I worked on those parts of the
> codebase, and it's been refactored quite a bit (by Jim for example).
>
>
>>
>> -Keegan
>>
>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <gl...@gmail.com>
>> wrote:
>>
>>> So now, how to decide what's best? :-)
>>>
>>> Is a Java reader happy with the BOM? and eats it transparently? (I think
>>> in the past that wasn't the case but I may be wrong)
>>>
>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims
>>>> (in the JavaDoc) it will write the BOM if needed, but it doesn't because it
>>>> uses Java's implementation rather than with Groovy's
>>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods use
>>>> writeUTF16BomIfRequired.
>>>>
>>>> Whichever we decide, we should be consistent.
>>>>
>>>> -Keegan
>>>>
>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>> paolo.ditommaso@gmail.com> wrote:
>>>>
>>>>> I'm wondering if NioGroovyMethods that implement the write methods for
>>>>> Path should do the same.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Paolo
>>>>>
>>>>>
>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>>>>> thinking the Javadoc would be changed from
>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>> to
>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <glaforge@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>
>>>>>>>> Created GROOVY-7461
>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>
>>>>>>>
>>>>>>> Cool!
>>>>>>>
>>>>>>>
>>>>>>>> How would you feel about a PR to copy the Javadoc comment
>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the other
>>>>>>>> methods that use writeUTF16BomIfRequired (at least until we decide
>>>>>>>> we're going to change the current behavior)?
>>>>>>>>
>>>>>>>
>>>>>>> Right, worth it!
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Good point!
>>>>>>>>>
>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> That's only available in Java 7.  Isn't Groovy still targeting
>>>>>>>>>> 1.6 for the non-indy version?
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Well spotted!
>>>>>>>>>>>
>>>>>>>>>>> You could also compare with the StandardCharset, instead of
>>>>>>>>>>> going through the name comparison:
>>>>>>>>>>>
>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>>
>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>     }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> should be
>>>>>>>>>>>>
>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>     }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>>>
>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in
>>>>>>>>>>>>> Groovy), the BOM is automatically discarded when you use one of our reader
>>>>>>>>>>>>> methods (withReader, etc), so it's transparent whether the BOM is here or
>>>>>>>>>>>>> not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tend to think that having the BOM always is a good thing (I
>>>>>>>>>>>>> even thought that was mandatory), but Groovy should guess the endianness
>>>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The code as-is today writes the BOM regardless of platform.
>>>>>>>>>>>>>> I just tested in Linux with the same results.  I think there are 2 parts to
>>>>>>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>>>>>>>> platform is Windows?
>>>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if
>>>>>>>>>>>>>> the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3
>>>>>>>>>>>>>> to assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>>>>> does
>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is
>>>>>>>>>>>>>> doing the smarter, more correct behavior, but the typical user would assume
>>>>>>>>>>>>>> this is just a shorthand convenience for newing up a PrintWriter (I
>>>>>>>>>>>>>> certainly did).  So the question is, is it better to just document this
>>>>>>>>>>>>>> difference in the GroovyDoc?  Or to change the behavior to be closer to
>>>>>>>>>>>>>> Java?  And if the latter, what breakages would that cause within Groovy
>>>>>>>>>>>>>> itself?  Making that change could break folks in production, because they
>>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where the file is
>>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working with a
>>>>>>>>>>>>>> third party library that is more picky about the presence of a BOM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now... is it what should be done or not is the good question
>>>>>>>>>>>>>>> to ask :-)
>>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganwitt@gmail.com
>>>>>>>>>>>>>>> >:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in
>>>>>>>>>>>>>>>> Windows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried
>>>>>>>>>>>>>>>>> here, since I'm on OS X)
>>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character
>>>>>>>>>>>>>>>>>> encoding problems.  I was intrigued by this SO question
>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
>>>>>>>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not.  As
>>>>>>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems
>>>>>>>>>>>>>>>>>> kinda odd to me.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
2015-06-09 18:57 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37> to
> correct the JavaDoc I mentioned (as well as to document the existing
> behavior for the non-NIO methods).
>
> Java doesn't eat the BOM, but this is a problem Java folks are used to
> dealing with, and why things like Apache Common-IO's BOMInputStream
> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
> exist.
>

That's also why I made Groovy eat the BOM too, so that it's transparent to
our users :-)
But that was a long time ago since I worked on those parts of the codebase,
and it's been refactored quite a bit (by Jim for example).


>
> -Keegan
>
> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>> So now, how to decide what's best? :-)
>>
>> Is a Java reader happy with the BOM? and eats it transparently? (I think
>> in the past that wasn't the case but I may be wrong)
>>
>> 2015-06-09 17:21 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims
>>> (in the JavaDoc) it will write the BOM if needed, but it doesn't because it
>>> uses Java's implementation rather than with Groovy's
>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods use
>>> writeUTF16BomIfRequired.
>>>
>>> Whichever we decide, we should be consistent.
>>>
>>> -Keegan
>>>
>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>> paolo.ditommaso@gmail.com> wrote:
>>>
>>>> I'm wondering if NioGroovyMethods that implement the write methods for
>>>> Path should do the same.
>>>>
>>>>
>>>> Cheers,
>>>> Paolo
>>>>
>>>>
>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com>
>>>> wrote:
>>>>
>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>>>> thinking the Javadoc would be changed from
>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>> to
>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>
>>>>> -Keegan
>>>>>
>>>>>
>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <gl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>
>>>>>>> Created GROOVY-7461
>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>
>>>>>>
>>>>>> Cool!
>>>>>>
>>>>>>
>>>>>>> How would you feel about a PR to copy the Javadoc comment mentioning
>>>>>>> the UTF-16 BOM on File.newWriter to all the other methods that use
>>>>>>> writeUTF16BomIfRequired (at least until we decide we're going to
>>>>>>> change the current behavior)?
>>>>>>>
>>>>>>
>>>>>> Right, worth it!
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> -Keegan
>>>>>>>
>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>
>>>>>>>> Good point!
>>>>>>>>
>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>
>>>>>>>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6
>>>>>>>>> for the non-indy version?
>>>>>>>>>
>>>>>>>>> -Keegan
>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Well spotted!
>>>>>>>>>>
>>>>>>>>>> You could also compare with the StandardCharset, instead of going
>>>>>>>>>> through the name comparison:
>>>>>>>>>>
>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>
>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>
>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>     }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> should be
>>>>>>>>>>>
>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>     }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>>
>>>>>>>>>>> -Keegan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy),
>>>>>>>>>>>> the BOM is automatically discarded when you use one of our reader methods
>>>>>>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>>>>>>>>>>
>>>>>>>>>>>> I tend to think that having the BOM always is a good thing (I
>>>>>>>>>>>> even thought that was mandatory), but Groovy should guess the endianness
>>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>>
>>>>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>>>>
>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> The code as-is today writes the BOM regardless of platform.  I
>>>>>>>>>>>>> just tested in Linux with the same results.  I think there are 2 parts to
>>>>>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>>>>>>> platform is Windows?
>>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if
>>>>>>>>>>>>> the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to
>>>>>>>>>>>>> assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>>>> does
>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is
>>>>>>>>>>>>> doing the smarter, more correct behavior, but the typical user would assume
>>>>>>>>>>>>> this is just a shorthand convenience for newing up a PrintWriter (I
>>>>>>>>>>>>> certainly did).  So the question is, is it better to just document this
>>>>>>>>>>>>> difference in the GroovyDoc?  Or to change the behavior to be closer to
>>>>>>>>>>>>> Java?  And if the latter, what breakages would that cause within Groovy
>>>>>>>>>>>>> itself?  Making that change could break folks in production, because they
>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where the file is
>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working with a
>>>>>>>>>>>>> third party library that is more picky about the presence of a BOM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now... is it what should be done or not is the good question
>>>>>>>>>>>>>> to ask :-)
>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>
>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in
>>>>>>>>>>>>>>> Windows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried
>>>>>>>>>>>>>>>> here, since I'm on OS X)
>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character
>>>>>>>>>>>>>>>>> encoding problems.  I was intrigued by this SO question
>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
>>>>>>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not.  As
>>>>>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems
>>>>>>>>>>>>>>>>> kinda odd to me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Laforge
>>>>>>>>>> Groovy Project Manager
>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>
>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Laforge
>>>>>>>> Groovy Project Manager
>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>
>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>> Blog: http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
I created PR 37 <https://github.com/apache/incubator-groovy/pull/37> to
correct the JavaDoc I mentioned (as well as to document the existing
behavior for the non-NIO methods).

Java doesn't eat the BOM, but this is a problem Java folks are used to
dealing with, and why things like Apache Common-IO's BOMInputStream
<https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
exist.

-Keegan

On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <gl...@gmail.com>
wrote:

> So now, how to decide what's best? :-)
>
> Is a Java reader happy with the BOM? and eats it transparently? (I think
> in the past that wasn't the case but I may be wrong)
>
> 2015-06-09 17:21 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>
>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims (in
>> the JavaDoc) it will write the BOM if needed, but it doesn't because it
>> uses Java's implementation rather than with Groovy's
>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods use
>> writeUTF16BomIfRequired.
>>
>> Whichever we decide, we should be consistent.
>>
>> -Keegan
>>
>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>> paolo.ditommaso@gmail.com> wrote:
>>
>>> I'm wondering if NioGroovyMethods that implement the write methods for
>>> Path should do the same.
>>>
>>>
>>> Cheers,
>>> Paolo
>>>
>>>
>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com>
>>> wrote:
>>>
>>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>>> thinking the Javadoc would be changed from
>>>>     is "UTF-16BE" or "UTF-16LE"
>>>> to
>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>
>>>> -Keegan
>>>>
>>>>
>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <gl...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>
>>>>>> Created GROOVY-7461
>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>
>>>>>
>>>>> Cool!
>>>>>
>>>>>
>>>>>> How would you feel about a PR to copy the Javadoc comment mentioning
>>>>>> the UTF-16 BOM on File.newWriter to all the other methods that use
>>>>>> writeUTF16BomIfRequired (at least until we decide we're going to
>>>>>> change the current behavior)?
>>>>>>
>>>>>
>>>>> Right, worth it!
>>>>>
>>>>>
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <glaforge@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Good point!
>>>>>>>
>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>
>>>>>>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6
>>>>>>>> for the non-indy version?
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Well spotted!
>>>>>>>>>
>>>>>>>>> You could also compare with the StandardCharset, instead of going
>>>>>>>>> through the name comparison:
>>>>>>>>>
>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>
>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>
>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>     }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> should be
>>>>>>>>>>
>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>     }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy),
>>>>>>>>>>> the BOM is automatically discarded when you use one of our reader methods
>>>>>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>>>>>>>>>
>>>>>>>>>>> I tend to think that having the BOM always is a good thing (I
>>>>>>>>>>> even thought that was mandatory), but Groovy should guess the endianness
>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>
>>>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>>>
>>>>>>>>>>> Guillaume
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> The code as-is today writes the BOM regardless of platform.  I
>>>>>>>>>>>> just tested in Linux with the same results.  I think there are 2 parts to
>>>>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>>>>
>>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>>>>>> platform is Windows?
>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if
>>>>>>>>>>>> the difference is to be smarter) from the behavior of *new
>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>
>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to
>>>>>>>>>>>> assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>>> does
>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>>
>>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing
>>>>>>>>>>>> the smarter, more correct behavior, but the typical user would assume this
>>>>>>>>>>>> is just a shorthand convenience for newing up a PrintWriter (I certainly
>>>>>>>>>>>> did).  So the question is, is it better to just document this difference in
>>>>>>>>>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>>>>>>>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>>>>>>>>>> change could break folks in production, because they could rely on that BOM
>>>>>>>>>>>> being there, in cases for example where the file is created on Windows, but
>>>>>>>>>>>> then processed on Linux or when working with a third party library that is
>>>>>>>>>>>> more picky about the presence of a BOM.
>>>>>>>>>>>>
>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Now... is it what should be done or not is the good question
>>>>>>>>>>>>> to ask :-)
>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in
>>>>>>>>>>>>>> Windows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here,
>>>>>>>>>>>>>>> since I'm on OS X)
>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganwitt@gmail.com
>>>>>>>>>>>>>>> >:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>>>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
>>>>>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not.  As
>>>>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems kinda
>>>>>>>>>>>>>>>> odd to me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
So now, how to decide what's best? :-)

Is a Java reader happy with the BOM? and eats it transparently? (I think in
the past that wasn't the case but I may be wrong)

2015-06-09 17:21 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims (in
> the JavaDoc) it will write the BOM if needed, but it doesn't because it
> uses Java's implementation rather than with Groovy's
> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods use
> writeUTF16BomIfRequired.
>
> Whichever we decide, we should be consistent.
>
> -Keegan
>
> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
> paolo.ditommaso@gmail.com> wrote:
>
>> I'm wondering if NioGroovyMethods that implement the write methods for
>> Path should do the same.
>>
>>
>> Cheers,
>> Paolo
>>
>>
>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com> wrote:
>>
>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>> thinking the Javadoc would be changed from
>>>     is "UTF-16BE" or "UTF-16LE"
>>> to
>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>
>>> -Keegan
>>>
>>>
>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <gl...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>
>>>>> Created GROOVY-7461
>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36
>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>
>>>>
>>>> Cool!
>>>>
>>>>
>>>>> How would you feel about a PR to copy the Javadoc comment mentioning
>>>>> the UTF-16 BOM on File.newWriter to all the other methods that use
>>>>> writeUTF16BomIfRequired (at least until we decide we're going to
>>>>> change the current behavior)?
>>>>>
>>>>
>>>> Right, worth it!
>>>>
>>>>
>>>>>
>>>>> -Keegan
>>>>>
>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <gl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Good point!
>>>>>>
>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>
>>>>>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6
>>>>>>> for the non-indy version?
>>>>>>>
>>>>>>> -Keegan
>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Well spotted!
>>>>>>>>
>>>>>>>> You could also compare with the StandardCharset, instead of going
>>>>>>>> through the name comparison:
>>>>>>>>
>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>
>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>
>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>
>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>     }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> should be
>>>>>>>>>
>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>     }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>>
>>>>>>>>> -Keegan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy),
>>>>>>>>>> the BOM is automatically discarded when you use one of our reader methods
>>>>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>>>>>>>>
>>>>>>>>>> I tend to think that having the BOM always is a good thing (I
>>>>>>>>>> even thought that was mandatory), but Groovy should guess the endianness
>>>>>>>>>> regardless anyway.
>>>>>>>>>>
>>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>>
>>>>>>>>>> Guillaume
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> The code as-is today writes the BOM regardless of platform.  I
>>>>>>>>>>> just tested in Linux with the same results.  I think there are 2 parts to
>>>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>>>
>>>>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>>>>> platform is Windows?
>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if
>>>>>>>>>>> the difference is to be smarter) from the behavior of *new
>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>
>>>>>>>>>>> *Discussion*
>>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to
>>>>>>>>>>> assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>>> does
>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>>
>>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing
>>>>>>>>>>> the smarter, more correct behavior, but the typical user would assume this
>>>>>>>>>>> is just a shorthand convenience for newing up a PrintWriter (I certainly
>>>>>>>>>>> did).  So the question is, is it better to just document this difference in
>>>>>>>>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>>>>>>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>>>>>>>>> change could break folks in production, because they could rely on that BOM
>>>>>>>>>>> being there, in cases for example where the file is created on Windows, but
>>>>>>>>>>> then processed on Linux or when working with a third party library that is
>>>>>>>>>>> more picky about the presence of a BOM.
>>>>>>>>>>>
>>>>>>>>>>> -Keegan
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Now... is it what should be done or not is the good question to
>>>>>>>>>>>> ask :-)
>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in
>>>>>>>>>>>>> Windows.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here,
>>>>>>>>>>>>>> since I'm on OS X)
>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>
>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
>>>>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not.  As
>>>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems kinda
>>>>>>>>>>>>>>> odd to me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Laforge
>>>>>>>>>> Groovy Project Manager
>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>
>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Laforge
>>>>>>>> Groovy Project Manager
>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>
>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>> Blog: http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims (in
the JavaDoc) it will write the BOM if needed, but it doesn't because it
uses Java's implementation rather than with Groovy's writeUTF16BomIfRequired.
None of the methods in NioGroovyMethods use writeUTF16BomIfRequired.

Whichever we decide, we should be consistent.

-Keegan

On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <paolo.ditommaso@gmail.com
> wrote:

> I'm wondering if NioGroovyMethods that implement the write methods for
> Path should do the same.
>
>
> Cheers,
> Paolo
>
>
> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com> wrote:
>
>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>> thinking the Javadoc would be changed from
>>     is "UTF-16BE" or "UTF-16LE"
>> to
>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>
>> -Keegan
>>
>>
>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <gl...@gmail.com>
>> wrote:
>>
>>>
>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> Created GROOVY-7461 <https://issues.apache.org/jira/browse/GROOVY-7461>
>>>> and PR 36 <https://github.com/apache/incubator-groovy/pull/36>.
>>>>
>>>
>>> Cool!
>>>
>>>
>>>> How would you feel about a PR to copy the Javadoc comment mentioning
>>>> the UTF-16 BOM on File.newWriter to all the other methods that use
>>>> writeUTF16BomIfRequired (at least until we decide we're going to
>>>> change the current behavior)?
>>>>
>>>
>>> Right, worth it!
>>>
>>>
>>>>
>>>> -Keegan
>>>>
>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <gl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Good point!
>>>>>
>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>
>>>>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6
>>>>>> for the non-indy version?
>>>>>>
>>>>>> -Keegan
>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Well spotted!
>>>>>>>
>>>>>>> You could also compare with the StandardCharset, instead of going
>>>>>>> through the name comparison:
>>>>>>>
>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>
>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>
>>>>>>>> No, it's a Groovy bug.
>>>>>>>>
>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>
>>>>>>>> should be
>>>>>>>>
>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>
>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy),
>>>>>>>>> the BOM is automatically discarded when you use one of our reader methods
>>>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>>>>>>>
>>>>>>>>> I tend to think that having the BOM always is a good thing (I even
>>>>>>>>> thought that was mandatory), but Groovy should guess the endianness
>>>>>>>>> regardless anyway.
>>>>>>>>>
>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>
>>>>>>>>> Guillaume
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> The code as-is today writes the BOM regardless of platform.  I
>>>>>>>>>> just tested in Linux with the same results.  I think there are 2 parts to
>>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>>
>>>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>>>> platform is Windows?
>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>>>>>>>>>> difference is to be smarter) from the behavior of *new
>>>>>>>>>> PrintWriter*?
>>>>>>>>>>
>>>>>>>>>> *Discussion*
>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to
>>>>>>>>>> assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>>> does
>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>>
>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing
>>>>>>>>>> the smarter, more correct behavior, but the typical user would assume this
>>>>>>>>>> is just a shorthand convenience for newing up a PrintWriter (I certainly
>>>>>>>>>> did).  So the question is, is it better to just document this difference in
>>>>>>>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>>>>>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>>>>>>>> change could break folks in production, because they could rely on that BOM
>>>>>>>>>> being there, in cases for example where the file is created on Windows, but
>>>>>>>>>> then processed on Linux or when working with a third party library that is
>>>>>>>>>> more picky about the presence of a BOM.
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Now... is it what should be done or not is the good question to
>>>>>>>>>>> ask :-)
>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in
>>>>>>>>>>>> Windows.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here,
>>>>>>>>>>>>> since I'm on OS X)
>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
>>>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not.  As
>>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems kinda
>>>>>>>>>>>>>> odd to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Paolo Di Tommaso <pa...@gmail.com>.
I'm wondering if NioGroovyMethods that implement the write methods for Path
should do the same.


Cheers,
Paolo


On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <ke...@gmail.com> wrote:

> Cool.  I'll wait for PR 36 to be merged first, because I also was thinking
> the Javadoc would be changed from
>     is "UTF-16BE" or "UTF-16LE"
> to
>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>
> -Keegan
>
>
> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>>
>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> Created GROOVY-7461 <https://issues.apache.org/jira/browse/GROOVY-7461>
>>> and PR 36 <https://github.com/apache/incubator-groovy/pull/36>.
>>>
>>
>> Cool!
>>
>>
>>> How would you feel about a PR to copy the Javadoc comment mentioning the
>>> UTF-16 BOM on File.newWriter to all the other methods that use
>>> writeUTF16BomIfRequired (at least until we decide we're going to change
>>> the current behavior)?
>>>
>>
>> Right, worth it!
>>
>>
>>>
>>> -Keegan
>>>
>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <gl...@gmail.com>
>>> wrote:
>>>
>>>> Good point!
>>>>
>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>
>>>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6 for
>>>>> the non-indy version?
>>>>>
>>>>> -Keegan
>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Well spotted!
>>>>>>
>>>>>> You could also compare with the StandardCharset, instead of going
>>>>>> through the name comparison:
>>>>>>
>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>
>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>
>>>>>>> No, it's a Groovy bug.
>>>>>>>
>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> should be
>>>>>>>
>>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>
>>>>>>> -Keegan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>
>>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy), the
>>>>>>>> BOM is automatically discarded when you use one of our reader methods
>>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>>>>>>
>>>>>>>> I tend to think that having the BOM always is a good thing (I even
>>>>>>>> thought that was mandatory), but Groovy should guess the endianness
>>>>>>>> regardless anyway.
>>>>>>>>
>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>
>>>>>>>> Guillaume
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>
>>>>>>>>> The code as-is today writes the BOM regardless of platform.  I
>>>>>>>>> just tested in Linux with the same results.  I think there are 2 parts to
>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>
>>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>>> platform is Windows?
>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>>>>>>>>> difference is to be smarter) from the behavior of *new
>>>>>>>>> PrintWriter*?
>>>>>>>>>
>>>>>>>>> *Discussion*
>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to
>>>>>>>>> assume big endian if there is no BOM.  However, in practice, many
>>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>>> does
>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>>
>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing
>>>>>>>>> the smarter, more correct behavior, but the typical user would assume this
>>>>>>>>> is just a shorthand convenience for newing up a PrintWriter (I certainly
>>>>>>>>> did).  So the question is, is it better to just document this difference in
>>>>>>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>>>>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>>>>>>> change could break folks in production, because they could rely on that BOM
>>>>>>>>> being there, in cases for example where the file is created on Windows, but
>>>>>>>>> then processed on Linux or when working with a third party library that is
>>>>>>>>> more picky about the presence of a BOM.
>>>>>>>>>
>>>>>>>>> -Keegan
>>>>>>>>>
>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Now... is it what should be done or not is the good question to
>>>>>>>>>> ask :-)
>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>>
>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in
>>>>>>>>>>> Windows.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here,
>>>>>>>>>>>> since I'm on OS X)
>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
>>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not.  As
>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>
>>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>>
>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>
>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>
>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>
>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is this difference in behavior intentional?  It seems kinda
>>>>>>>>>>>>> odd to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Laforge
>>>>>>>>>> Groovy Project Manager
>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>
>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Laforge
>>>>>>>> Groovy Project Manager
>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>
>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>> Blog: http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
Cool.  I'll wait for PR 36 to be merged first, because I also was thinking
the Javadoc would be changed from
    is "UTF-16BE" or "UTF-16LE"
to
    is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)

-Keegan


On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <gl...@gmail.com>
wrote:

>
> 2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>
>> Created GROOVY-7461 <https://issues.apache.org/jira/browse/GROOVY-7461>
>> and PR 36 <https://github.com/apache/incubator-groovy/pull/36>.
>>
>
> Cool!
>
>
>> How would you feel about a PR to copy the Javadoc comment mentioning the
>> UTF-16 BOM on File.newWriter to all the other methods that use
>> writeUTF16BomIfRequired (at least until we decide we're going to change
>> the current behavior)?
>>
>
> Right, worth it!
>
>
>>
>> -Keegan
>>
>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <gl...@gmail.com>
>> wrote:
>>
>>> Good point!
>>>
>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6 for
>>>> the non-indy version?
>>>>
>>>> -Keegan
>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com> wrote:
>>>>
>>>>> Well spotted!
>>>>>
>>>>> You could also compare with the StandardCharset, instead of going
>>>>> through the name comparison:
>>>>>
>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>
>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>
>>>>>> No, it's a Groovy bug.
>>>>>>
>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>         writeUtf16Bom(stream, true);
>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>         writeUtf16Bom(stream, false);
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> should be
>>>>>>
>>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>         writeUtf16Bom(stream, true);
>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>>         writeUtf16Bom(stream, false);
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll probably
>>>>>> want to fix that regardless of what we decide on the
>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <glaforge@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy), the
>>>>>>> BOM is automatically discarded when you use one of our reader methods
>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>>>>>
>>>>>>> I tend to think that having the BOM always is a good thing (I even
>>>>>>> thought that was mandatory), but Groovy should guess the endianness
>>>>>>> regardless anyway.
>>>>>>>
>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>
>>>>>>> Guillaume
>>>>>>>
>>>>>>>
>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>
>>>>>>>> The code as-is today writes the BOM regardless of platform.  I just
>>>>>>>> tested in Linux with the same results.  I think there are 2 parts to the
>>>>>>>> question of "what's the correct behavior?"
>>>>>>>>
>>>>>>>> 1.  Should the BOM be written at all, particularly when the
>>>>>>>> platform is Windows?
>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>>>>>>>> difference is to be smarter) from the behavior of *new PrintWriter*
>>>>>>>> ?
>>>>>>>>
>>>>>>>> *Discussion*
>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to
>>>>>>>> assume big endian if there is no BOM.  However, in practice, many
>>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>>> does
>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>>
>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
>>>>>>>> smarter, more correct behavior, but the typical user would assume this is
>>>>>>>> just a shorthand convenience for newing up a PrintWriter (I certainly
>>>>>>>> did).  So the question is, is it better to just document this difference in
>>>>>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>>>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>>>>>> change could break folks in production, because they could rely on that BOM
>>>>>>>> being there, in cases for example where the file is created on Windows, but
>>>>>>>> then processed on Linux or when working with a third party library that is
>>>>>>>> more picky about the presence of a BOM.
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Now... is it what should be done or not is the good question to
>>>>>>>>> ask :-)
>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>>
>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in
>>>>>>>>>> Windows.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> That's a good question.
>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here,
>>>>>>>>>>> since I'm on OS X)
>>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>
>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not.  As
>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>
>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>
>>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>>
>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>
>>>>>>>>>>>> Outputs
>>>>>>>>>>>>
>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>
>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>> 20 00
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Is this difference in behavior intentional?  It seems kinda odd
>>>>>>>>>>>> to me.
>>>>>>>>>>>>
>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
2015-06-09 15:04 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> Created GROOVY-7461 <https://issues.apache.org/jira/browse/GROOVY-7461>
> and PR 36 <https://github.com/apache/incubator-groovy/pull/36>.
>

Cool!


> How would you feel about a PR to copy the Javadoc comment mentioning the
> UTF-16 BOM on File.newWriter to all the other methods that use
> writeUTF16BomIfRequired (at least until we decide we're going to change
> the current behavior)?
>

Right, worth it!


>
> -Keegan
>
> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>> Good point!
>>
>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6 for
>>> the non-indy version?
>>>
>>> -Keegan
>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com> wrote:
>>>
>>>> Well spotted!
>>>>
>>>> You could also compare with the StandardCharset, instead of going
>>>> through the name comparison:
>>>>
>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>
>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>
>>>>> No, it's a Groovy bug.
>>>>>
>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>         writeUtf16Bom(stream, true);
>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>         writeUtf16Bom(stream, false);
>>>>>     }
>>>>> }
>>>>>
>>>>> should be
>>>>>
>>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>         writeUtf16Bom(stream, true);
>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>>         writeUtf16Bom(stream, false);
>>>>>     }
>>>>> }
>>>>>
>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll probably
>>>>> want to fix that regardless of what we decide on the *withPrintWriter*
>>>>> question.  I'll open a Jira and a PR.
>>>>>
>>>>> -Keegan
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <gl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> From Groovy's point of view (ie. when you're coding in Groovy), the
>>>>>> BOM is automatically discarded when you use one of our reader methods
>>>>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>>>>
>>>>>> I tend to think that having the BOM always is a good thing (I even
>>>>>> thought that was mandatory), but Groovy should guess the endianness
>>>>>> regardless anyway.
>>>>>>
>>>>>> Happy to hear what others think too about all this though.
>>>>>>
>>>>>> Guillaume
>>>>>>
>>>>>>
>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>
>>>>>>> The code as-is today writes the BOM regardless of platform.  I just
>>>>>>> tested in Linux with the same results.  I think there are 2 parts to the
>>>>>>> question of "what's the correct behavior?"
>>>>>>>
>>>>>>> 1.  Should the BOM be written at all, particularly when the platform
>>>>>>> is Windows?
>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>>>>>>> difference is to be smarter) from the behavior of *new PrintWriter*?
>>>>>>>
>>>>>>> *Discussion*
>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to
>>>>>>> assume big endian if there is no BOM.  However, in practice, many
>>>>>>> applications disregard the RFC and assume little-endian because that's what Windows
>>>>>>> does
>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>>
>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
>>>>>>> smarter, more correct behavior, but the typical user would assume this is
>>>>>>> just a shorthand convenience for newing up a PrintWriter (I certainly
>>>>>>> did).  So the question is, is it better to just document this difference in
>>>>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>>>>> change could break folks in production, because they could rely on that BOM
>>>>>>> being there, in cases for example where the file is created on Windows, but
>>>>>>> then processed on Linux or when working with a third party library that is
>>>>>>> more picky about the presence of a BOM.
>>>>>>>
>>>>>>> -Keegan
>>>>>>>
>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>
>>>>>>>> Now... is it what should be done or not is the good question to ask
>>>>>>>> :-)
>>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>>
>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>
>>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in
>>>>>>>>> Windows.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> That's a good question.
>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here,
>>>>>>>>>> since I'm on OS X)
>>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>>
>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>
>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
>>>>>>>>>>> whereas new PrintWriter(file, charset) does not.  As
>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>
>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>     String text = " "
>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>
>>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>>
>>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>     w.close()
>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>
>>>>>>>>>>> Outputs
>>>>>>>>>>>
>>>>>>>>>>> withPrintWriter
>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>
>>>>>>>>>>> new PrintWriter
>>>>>>>>>>> 20 00
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is this difference in behavior intentional?  It seems kinda odd
>>>>>>>>>>> to me.
>>>>>>>>>>>
>>>>>>>>>>> -Keegan
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Laforge
>>>>>>>>>> Groovy Project Manager
>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>
>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Laforge
>>>>>>>> Groovy Project Manager
>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>
>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>> Blog: http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
Created GROOVY-7461 <https://issues.apache.org/jira/browse/GROOVY-7461> and PR
36 <https://github.com/apache/incubator-groovy/pull/36>.

How would you feel about a PR to copy the Javadoc comment mentioning the
UTF-16 BOM on File.newWriter to all the other methods that use
writeUTF16BomIfRequired (at least until we decide we're going to change the
current behavior)?

-Keegan

On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <gl...@gmail.com>
wrote:

> Good point!
>
> 2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>
>> That's only available in Java 7.  Isn't Groovy still targeting 1.6 for
>> the non-indy version?
>>
>> -Keegan
>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com> wrote:
>>
>>> Well spotted!
>>>
>>> You could also compare with the StandardCharset, instead of going
>>> through the name comparison:
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>
>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> No, it's a Groovy bug.
>>>>
>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>     if ("UTF-16BE".equals(charset)) {
>>>>         writeUtf16Bom(stream, true);
>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>         writeUtf16Bom(stream, false);
>>>>     }
>>>> }
>>>>
>>>> should be
>>>>
>>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>         writeUtf16Bom(stream, true);
>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>>         writeUtf16Bom(stream, false);
>>>>     }
>>>> }
>>>>
>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll probably
>>>> want to fix that regardless of what we decide on the *withPrintWriter*
>>>> question.  I'll open a Jira and a PR.
>>>>
>>>> -Keegan
>>>>
>>>>
>>>>
>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <gl...@gmail.com>
>>>> wrote:
>>>>
>>>>> From Groovy's point of view (ie. when you're coding in Groovy), the
>>>>> BOM is automatically discarded when you use one of our reader methods
>>>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>>>
>>>>> I tend to think that having the BOM always is a good thing (I even
>>>>> thought that was mandatory), but Groovy should guess the endianness
>>>>> regardless anyway.
>>>>>
>>>>> Happy to hear what others think too about all this though.
>>>>>
>>>>> Guillaume
>>>>>
>>>>>
>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>
>>>>>> The code as-is today writes the BOM regardless of platform.  I just
>>>>>> tested in Linux with the same results.  I think there are 2 parts to the
>>>>>> question of "what's the correct behavior?"
>>>>>>
>>>>>> 1.  Should the BOM be written at all, particularly when the platform
>>>>>> is Windows?
>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>>>>>> difference is to be smarter) from the behavior of *new PrintWriter*?
>>>>>>
>>>>>> *Discussion*
>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume
>>>>>> big endian if there is no BOM.  However, in practice, many applications
>>>>>> disregard the RFC and assume little-endian because that's what Windows
>>>>>> does
>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>> Because of this, the behavior could be changed so that when writing
>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>>> should have done this in their implementation of their PrintWriter.
>>>>>>
>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
>>>>>> smarter, more correct behavior, but the typical user would assume this is
>>>>>> just a shorthand convenience for newing up a PrintWriter (I certainly
>>>>>> did).  So the question is, is it better to just document this difference in
>>>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>>>> change could break folks in production, because they could rely on that BOM
>>>>>> being there, in cases for example where the file is created on Windows, but
>>>>>> then processed on Linux or when working with a third party library that is
>>>>>> more picky about the presence of a BOM.
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <glaforge@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Now... is it what should be done or not is the good question to ask
>>>>>>> :-)
>>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>>
>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>
>>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>>>>>>
>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> That's a good question.
>>>>>>>>> I guess this is happening on Windows? (I haven't tried here, since
>>>>>>>>> I'm on OS X)
>>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>>
>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>
>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>>>>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>>>>>>
>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>     String text = " "
>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>
>>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>>
>>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>>     w.print(text)
>>>>>>>>>>     w.close()
>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>>     file.delete()}
>>>>>>>>>>
>>>>>>>>>> Outputs
>>>>>>>>>>
>>>>>>>>>> withPrintWriter
>>>>>>>>>> ff fe 20 00
>>>>>>>>>>
>>>>>>>>>> new PrintWriter
>>>>>>>>>> 20 00
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Is this difference in behavior intentional?  It seems kinda odd
>>>>>>>>>> to me.
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
Good point!

2015-06-09 14:11 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> That's only available in Java 7.  Isn't Groovy still targeting 1.6 for the
> non-indy version?
>
> -Keegan
> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com> wrote:
>
>> Well spotted!
>>
>> You could also compare with the StandardCharset, instead of going through
>> the name comparison:
>>
>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>
>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> No, it's a Groovy bug.
>>>
>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>     if ("UTF-16BE".equals(charset)) {
>>>         writeUtf16Bom(stream, true);
>>>     } else if ("UTF-16LE".equals(charset)) {
>>>         writeUtf16Bom(stream, false);
>>>     }
>>> }
>>>
>>> should be
>>>
>>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>         writeUtf16Bom(stream, true);
>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>>         writeUtf16Bom(stream, false);
>>>     }
>>> }
>>>
>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll probably
>>> want to fix that regardless of what we decide on the *withPrintWriter*
>>> question.  I'll open a Jira and a PR.
>>>
>>> -Keegan
>>>
>>>
>>>
>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <gl...@gmail.com>
>>> wrote:
>>>
>>>> From Groovy's point of view (ie. when you're coding in Groovy), the BOM
>>>> is automatically discarded when you use one of our reader methods
>>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>>
>>>> I tend to think that having the BOM always is a good thing (I even
>>>> thought that was mandatory), but Groovy should guess the endianness
>>>> regardless anyway.
>>>>
>>>> Happy to hear what others think too about all this though.
>>>>
>>>> Guillaume
>>>>
>>>>
>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>
>>>>> The code as-is today writes the BOM regardless of platform.  I just
>>>>> tested in Linux with the same results.  I think there are 2 parts to the
>>>>> question of "what's the correct behavior?"
>>>>>
>>>>> 1.  Should the BOM be written at all, particularly when the platform
>>>>> is Windows?
>>>>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>>>>> difference is to be smarter) from the behavior of *new PrintWriter*?
>>>>>
>>>>> *Discussion*
>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume
>>>>> big endian if there is no BOM.  However, in practice, many applications
>>>>> disregard the RFC and assume little-endian because that's what Windows
>>>>> does
>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>> Because of this, the behavior could be changed so that when writing
>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>>> should have done this in their implementation of their PrintWriter.
>>>>>
>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
>>>>> smarter, more correct behavior, but the typical user would assume this is
>>>>> just a shorthand convenience for newing up a PrintWriter (I certainly
>>>>> did).  So the question is, is it better to just document this difference in
>>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>>> change could break folks in production, because they could rely on that BOM
>>>>> being there, in cases for example where the file is created on Windows, but
>>>>> then processed on Linux or when working with a third party library that is
>>>>> more picky about the presence of a BOM.
>>>>>
>>>>> -Keegan
>>>>>
>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <gl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Now... is it what should be done or not is the good question to ask
>>>>>> :-)
>>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>>
>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>
>>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>>>>>
>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <
>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>
>>>>>>>> That's a good question.
>>>>>>>> I guess this is happening on Windows? (I haven't tried here, since
>>>>>>>> I'm on OS X)
>>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>>
>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>>
>>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>
>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>>>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>>>>>
>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>     String text = " "
>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>
>>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>>
>>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>>     w.print(text)
>>>>>>>>>     w.close()
>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>>     file.delete()}
>>>>>>>>>
>>>>>>>>> Outputs
>>>>>>>>>
>>>>>>>>> withPrintWriter
>>>>>>>>> ff fe 20 00
>>>>>>>>>
>>>>>>>>> new PrintWriter
>>>>>>>>> 20 00
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is this difference in behavior intentional?  It seems kinda odd to
>>>>>>>>> me.
>>>>>>>>>
>>>>>>>>> -Keegan
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Laforge
>>>>>>>> Groovy Project Manager
>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>
>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>> Blog: http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
That's only available in Java 7.  Isn't Groovy still targeting 1.6 for the
non-indy version?

-Keegan
On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <gl...@gmail.com> wrote:

> Well spotted!
>
> You could also compare with the StandardCharset, instead of going through
> the name comparison:
>
> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>
> 2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>
>> No, it's a Groovy bug.
>>
>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>     if ("UTF-16BE".equals(charset)) {
>>         writeUtf16Bom(stream, true);
>>     } else if ("UTF-16LE".equals(charset)) {
>>         writeUtf16Bom(stream, false);
>>     }
>> }
>>
>> should be
>>
>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>         writeUtf16Bom(stream, true);
>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>         writeUtf16Bom(stream, false);
>>     }
>> }
>>
>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll probably
>> want to fix that regardless of what we decide on the *withPrintWriter*
>> question.  I'll open a Jira and a PR.
>>
>> -Keegan
>>
>>
>>
>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <gl...@gmail.com>
>> wrote:
>>
>>> From Groovy's point of view (ie. when you're coding in Groovy), the BOM
>>> is automatically discarded when you use one of our reader methods
>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>
>>> I tend to think that having the BOM always is a good thing (I even
>>> thought that was mandatory), but Groovy should guess the endianness
>>> regardless anyway.
>>>
>>> Happy to hear what others think too about all this though.
>>>
>>> Guillaume
>>>
>>>
>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> The code as-is today writes the BOM regardless of platform.  I just
>>>> tested in Linux with the same results.  I think there are 2 parts to the
>>>> question of "what's the correct behavior?"
>>>>
>>>> 1.  Should the BOM be written at all, particularly when the platform is
>>>> Windows?
>>>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>>>> difference is to be smarter) from the behavior of *new PrintWriter*?
>>>>
>>>> *Discussion*
>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume
>>>> big endian if there is no BOM.  However, in practice, many applications
>>>> disregard the RFC and assume little-endian because that's what Windows
>>>> does
>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>> Because of this, the behavior could be changed so that when writing
>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>> should have done this in their implementation of their PrintWriter.
>>>>
>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
>>>> smarter, more correct behavior, but the typical user would assume this is
>>>> just a shorthand convenience for newing up a PrintWriter (I certainly
>>>> did).  So the question is, is it better to just document this difference in
>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>> change could break folks in production, because they could rely on that BOM
>>>> being there, in cases for example where the file is created on Windows, but
>>>> then processed on Linux or when working with a third party library that is
>>>> more picky about the presence of a BOM.
>>>>
>>>> -Keegan
>>>>
>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <gl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Now... is it what should be done or not is the good question to ask :-)
>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>
>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>
>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>>>>
>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <glaforge@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> That's a good question.
>>>>>>> I guess this is happening on Windows? (I haven't tried here, since
>>>>>>> I'm on OS X)
>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>
>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>>
>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>
>>>>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>>>>
>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>     String text = " "
>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>
>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>     println "withPrintWriter"
>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>>
>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>     w.print(text)
>>>>>>>>     w.close()
>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>>     file.delete()}
>>>>>>>>
>>>>>>>> Outputs
>>>>>>>>
>>>>>>>> withPrintWriter
>>>>>>>> ff fe 20 00
>>>>>>>>
>>>>>>>> new PrintWriter
>>>>>>>> 20 00
>>>>>>>>
>>>>>>>>
>>>>>>>> Is this difference in behavior intentional?  It seems kinda odd to
>>>>>>>> me.
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
Well spotted!

You could also compare with the StandardCharset, instead of going through
the name comparison:
http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html

2015-06-09 13:49 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> No, it's a Groovy bug.
>
> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>     if ("UTF-16BE".equals(charset)) {
>         writeUtf16Bom(stream, true);
>     } else if ("UTF-16LE".equals(charset)) {
>         writeUtf16Bom(stream, false);
>     }
> }
>
> should be
>
> private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException {
>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>         writeUtf16Bom(stream, true);
>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>         writeUtf16Bom(stream, false);
>     }
> }
>
> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll probably want
> to fix that regardless of what we decide on the *withPrintWriter*
> question.  I'll open a Jira and a PR.
>
> -Keegan
>
>
>
> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>> From Groovy's point of view (ie. when you're coding in Groovy), the BOM
>> is automatically discarded when you use one of our reader methods
>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>
>> I tend to think that having the BOM always is a good thing (I even
>> thought that was mandatory), but Groovy should guess the endianness
>> regardless anyway.
>>
>> Happy to hear what others think too about all this though.
>>
>> Guillaume
>>
>>
>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> The code as-is today writes the BOM regardless of platform.  I just
>>> tested in Linux with the same results.  I think there are 2 parts to the
>>> question of "what's the correct behavior?"
>>>
>>> 1.  Should the BOM be written at all, particularly when the platform is
>>> Windows?
>>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>>> difference is to be smarter) from the behavior of *new PrintWriter*?
>>>
>>> *Discussion*
>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume
>>> big endian if there is no BOM.  However, in practice, many applications
>>> disregard the RFC and assume little-endian because that's what Windows
>>> does
>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>> Because of this, the behavior could be changed so that when writing
>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>> best practice to always write a BOM when working with UTF-16, and Java
>>> should have done this in their implementation of their PrintWriter.
>>>
>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
>>> smarter, more correct behavior, but the typical user would assume this is
>>> just a shorthand convenience for newing up a PrintWriter (I certainly
>>> did).  So the question is, is it better to just document this difference in
>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>> latter, what breakages would that cause within Groovy itself?  Making that
>>> change could break folks in production, because they could rely on that BOM
>>> being there, in cases for example where the file is created on Windows, but
>>> then processed on Linux or when working with a third party library that is
>>> more picky about the presence of a BOM.
>>>
>>> -Keegan
>>>
>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <gl...@gmail.com>
>>> wrote:
>>>
>>>> Now... is it what should be done or not is the good question to ask :-)
>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>
>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>
>>>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>>>
>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <gl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> That's a good question.
>>>>>> I guess this is happening on Windows? (I haven't tried here, since
>>>>>> I'm on OS X)
>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>
>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>>
>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>> problems.  I was intrigued by this SO question
>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>
>>>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>>>
>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>     String text = " "
>>>>>>>     String charset = "UTF-16LE"
>>>>>>>
>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>     println "withPrintWriter"
>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>>
>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>     w.print(text)
>>>>>>>     w.close()
>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>>     file.delete()}
>>>>>>>
>>>>>>> Outputs
>>>>>>>
>>>>>>> withPrintWriter
>>>>>>> ff fe 20 00
>>>>>>>
>>>>>>> new PrintWriter
>>>>>>> 20 00
>>>>>>>
>>>>>>>
>>>>>>> Is this difference in behavior intentional?  It seems kinda odd to
>>>>>>> me.
>>>>>>>
>>>>>>> -Keegan
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>> Blog: http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
No, it's a Groovy bug.

private static void writeUTF16BomIfRequired(final String charset,
final OutputStream stream) throws IOException {
    if ("UTF-16BE".equals(charset)) {
        writeUtf16Bom(stream, true);
    } else if ("UTF-16LE".equals(charset)) {
        writeUtf16Bom(stream, false);
    }
}

should be

private static void writeUTF16BomIfRequired(final String charset,
final OutputStream stream) throws IOException {
    if ("UTF-16BE".equals(Charset.forName(charset).name())) {
        writeUtf16Bom(stream, true);
    } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
        writeUtf16Bom(stream, false);
    }
}

in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll probably want
to fix that regardless of what we decide on the *withPrintWriter*
question.  I'll open a Jira and a PR.

-Keegan


On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <gl...@gmail.com>
wrote:

> From Groovy's point of view (ie. when you're coding in Groovy), the BOM is
> automatically discarded when you use one of our reader methods (withReader,
> etc), so it's transparent whether the BOM is here or not.
>
> I tend to think that having the BOM always is a good thing (I even thought
> that was mandatory), but Groovy should guess the endianness regardless
> anyway.
>
> Happy to hear what others think too about all this though.
>
> Guillaume
>
>
> 2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>
>> The code as-is today writes the BOM regardless of platform.  I just
>> tested in Linux with the same results.  I think there are 2 parts to the
>> question of "what's the correct behavior?"
>>
>> 1.  Should the BOM be written at all, particularly when the platform is
>> Windows?
>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>> difference is to be smarter) from the behavior of *new PrintWriter*?
>>
>> *Discussion*
>> 1.  Strictly speaking, yes.  Because RFC 2781
>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume big
>> endian if there is no BOM.  However, in practice, many applications
>> disregard the RFC and assume little-endian because that's what Windows
>> does
>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>> Because of this, the behavior could be changed so that when writing
>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>> best practice to always write a BOM when working with UTF-16, and Java
>> should have done this in their implementation of their PrintWriter.
>>
>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
>> smarter, more correct behavior, but the typical user would assume this is
>> just a shorthand convenience for newing up a PrintWriter (I certainly
>> did).  So the question is, is it better to just document this difference in
>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>> latter, what breakages would that cause within Groovy itself?  Making that
>> change could break folks in production, because they could rely on that BOM
>> being there, in cases for example where the file is created on Windows, but
>> then processed on Linux or when working with a third party library that is
>> more picky about the presence of a BOM.
>>
>> -Keegan
>>
>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <gl...@gmail.com>
>> wrote:
>>
>>> Now... is it what should be done or not is the good question to ask :-)
>>> Does Windows manages to open UTF-16 files without BOMs?
>>>
>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>>
>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <gl...@gmail.com>
>>>> wrote:
>>>>
>>>>> That's a good question.
>>>>> I guess this is happening on Windows? (I haven't tried here, since I'm
>>>>> on OS X)
>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>
>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>>
>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>> problems.  I was intrigued by this SO question
>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>
>>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>>
>>>>>> File file = new File("tmp.txt")try {
>>>>>>     String text = " "
>>>>>>     String charset = "UTF-16LE"
>>>>>>
>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>     println "withPrintWriter"
>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>>
>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>     w.print(text)
>>>>>>     w.close()
>>>>>>     println "\n\nnew PrintWriter"
>>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>>     file.delete()}
>>>>>>
>>>>>> Outputs
>>>>>>
>>>>>> withPrintWriter
>>>>>> ff fe 20 00
>>>>>>
>>>>>> new PrintWriter
>>>>>> 20 00
>>>>>>
>>>>>>
>>>>>> Is this difference in behavior intentional?  It seems kinda odd to me.
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
>From Groovy's point of view (ie. when you're coding in Groovy), the BOM is
automatically discarded when you use one of our reader methods (withReader,
etc), so it's transparent whether the BOM is here or not.

I tend to think that having the BOM always is a good thing (I even thought
that was mandatory), but Groovy should guess the endianness regardless
anyway.

Happy to hear what others think too about all this though.

Guillaume


2015-06-08 23:20 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> The code as-is today writes the BOM regardless of platform.  I just tested
> in Linux with the same results.  I think there are 2 parts to the question
> of "what's the correct behavior?"
>
> 1.  Should the BOM be written at all, particularly when the platform is
> Windows?
> 2.  Should the behavior of *withPrintWriter* differ (even if the
> difference is to be smarter) from the behavior of *new PrintWriter*?
>
> *Discussion*
> 1.  Strictly speaking, yes.  Because RFC 2781
> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume big
> endian if there is no BOM.  However, in practice, many applications
> disregard the RFC and assume little-endian because that's what Windows
> does
> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
> Because of this, the behavior could be changed so that when writing
> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
> best practice to always write a BOM when working with UTF-16, and Java
> should have done this in their implementation of their PrintWriter.
>
> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
> smarter, more correct behavior, but the typical user would assume this is
> just a shorthand convenience for newing up a PrintWriter (I certainly
> did).  So the question is, is it better to just document this difference in
> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
> latter, what breakages would that cause within Groovy itself?  Making that
> change could break folks in production, because they could rely on that BOM
> being there, in cases for example where the file is created on Windows, but
> then processed on Linux or when working with a third party library that is
> more picky about the presence of a BOM.
>
> -Keegan
>
> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>> Now... is it what should be done or not is the good question to ask :-)
>> Does Windows manages to open UTF-16 files without BOMs?
>>
>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>
>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <gl...@gmail.com>
>>> wrote:
>>>
>>>> That's a good question.
>>>> I guess this is happening on Windows? (I haven't tried here, since I'm
>>>> on OS X)
>>>> I think BOMs were mandatory in text files on Windows.
>>>>
>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>>
>>>>> I've always taken a perverse pleasure in character encoding problems.
>>>>> I was intrigued by this SO question
>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>
>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>
>>>>> File file = new File("tmp.txt")try {
>>>>>     String text = " "
>>>>>     String charset = "UTF-16LE"
>>>>>
>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>     println "withPrintWriter"
>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>
>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>     w.print(text)
>>>>>     w.close()
>>>>>     println "\n\nnew PrintWriter"
>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>>     file.delete()}
>>>>>
>>>>> Outputs
>>>>>
>>>>> withPrintWriter
>>>>> ff fe 20 00
>>>>>
>>>>> new PrintWriter
>>>>> 20 00
>>>>>
>>>>>
>>>>> Is this difference in behavior intentional?  It seems kinda odd to me.
>>>>>
>>>>> -Keegan
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
The code as-is today writes the BOM regardless of platform.  I just tested
in Linux with the same results.  I think there are 2 parts to the question
of "what's the correct behavior?"

1.  Should the BOM be written at all, particularly when the platform is
Windows?
2.  Should the behavior of *withPrintWriter* differ (even if the difference
is to be smarter) from the behavior of *new PrintWriter*?

*Discussion*
1.  Strictly speaking, yes.  Because RFC 2781
<http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume big
endian if there is no BOM.  However, in practice, many applications
disregard the RFC and assume little-endian because that's what Windows does
<https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
Because of this, the behavior could be changed so that when writing
UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
best practice to always write a BOM when working with UTF-16, and Java
should have done this in their implementation of their PrintWriter.

2.  This is a tough one.  Arguably, *withPrintWriter* is doing the smarter,
more correct behavior, but the typical user would assume this is just a
shorthand convenience for newing up a PrintWriter (I certainly did).  So
the question is, is it better to just document this difference in the
GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
latter, what breakages would that cause within Groovy itself?  Making that
change could break folks in production, because they could rely on that BOM
being there, in cases for example where the file is created on Windows, but
then processed on Linux or when working with a third party library that is
more picky about the presence of a BOM.

-Keegan

On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <gl...@gmail.com>
wrote:

> Now... is it what should be done or not is the good question to ask :-)
> Does Windows manages to open UTF-16 files without BOMs?
>
> 2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>
>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>
>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <gl...@gmail.com>
>> wrote:
>>
>>> That's a good question.
>>> I guess this is happening on Windows? (I haven't tried here, since I'm
>>> on OS X)
>>> I think BOMs were mandatory in text files on Windows.
>>>
>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>>
>>>> I've always taken a perverse pleasure in character encoding problems.
>>>> I was intrigued by this SO question
>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>>> UTF 16 BOMs in Java vs Groovy.
>>>>
>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>
>>>> File file = new File("tmp.txt")try {
>>>>     String text = " "
>>>>     String charset = "UTF-16LE"
>>>>
>>>>     file.withPrintWriter(charset) { it << text }
>>>>     println "withPrintWriter"
>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>
>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>     w.print(text)
>>>>     w.close()
>>>>     println "\n\nnew PrintWriter"
>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>     file.delete()}
>>>>
>>>> Outputs
>>>>
>>>> withPrintWriter
>>>> ff fe 20 00
>>>>
>>>> new PrintWriter
>>>> 20 00
>>>>
>>>>
>>>> Is this difference in behavior intentional?  It seems kinda odd to me.
>>>>
>>>> -Keegan
>>>>
>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
Now... is it what should be done or not is the good question to ask :-)
Does Windows manages to open UTF-16 files without BOMs?

2015-06-08 22:17 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>
> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <gl...@gmail.com>
> wrote:
>
>> That's a good question.
>> I guess this is happening on Windows? (I haven't tried here, since I'm on
>> OS X)
>> I think BOMs were mandatory in text files on Windows.
>>
>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>>
>>> I've always taken a perverse pleasure in character encoding problems.  I
>>> was intrigued by this SO question
>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>>> UTF 16 BOMs in Java vs Groovy.
>>>
>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>
>>> File file = new File("tmp.txt")try {
>>>     String text = " "
>>>     String charset = "UTF-16LE"
>>>
>>>     file.withPrintWriter(charset) { it << text }
>>>     println "withPrintWriter"
>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>
>>>     PrintWriter w = new PrintWriter(file, charset)
>>>     w.print(text)
>>>     w.close()
>>>     println "\n\nnew PrintWriter"
>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>     file.delete()}
>>>
>>> Outputs
>>>
>>> withPrintWriter
>>> ff fe 20 00
>>>
>>> new PrintWriter
>>> 20 00
>>>
>>>
>>> Is this difference in behavior intentional?  It seems kinda odd to me.
>>>
>>> -Keegan
>>>
>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>


-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Keegan Witt <ke...@gmail.com>.
I forgot to mention that.  Yes, I ran the test mentioned in Windows.

On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <gl...@gmail.com>
wrote:

> That's a good question.
> I guess this is happening on Windows? (I haven't tried here, since I'm on
> OS X)
> I think BOMs were mandatory in text files on Windows.
>
> 2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:
>
>> I've always taken a perverse pleasure in character encoding problems.  I
>> was intrigued by this SO question
>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
>> UTF 16 BOMs in Java vs Groovy.
>>
>> It appears using withPrintWriter(charset) produces a BOM whereas new
>> PrintWriter(file, charset) does not.  As demonstrated here:
>>
>> File file = new File("tmp.txt")try {
>>     String text = " "
>>     String charset = "UTF-16LE"
>>
>>     file.withPrintWriter(charset) { it << text }
>>     println "withPrintWriter"
>>     file.getBytes().each { System.out.format("%02x ", it) }
>>
>>     PrintWriter w = new PrintWriter(file, charset)
>>     w.print(text)
>>     w.close()
>>     println "\n\nnew PrintWriter"
>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>     file.delete()}
>>
>> Outputs
>>
>> withPrintWriter
>> ff fe 20 00
>>
>> new PrintWriter
>> 20 00
>>
>>
>> Is this difference in behavior intentional?  It seems kinda odd to me.
>>
>> -Keegan
>>
>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()

Posted by Guillaume Laforge <gl...@gmail.com>.
That's a good question.
I guess this is happening on Windows? (I haven't tried here, since I'm on
OS X)
I think BOMs were mandatory in text files on Windows.

2015-06-08 17:53 GMT+02:00 Keegan Witt <ke...@gmail.com>:

> I've always taken a perverse pleasure in character encoding problems.  I
> was intrigued by this SO question
> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> on
> UTF 16 BOMs in Java vs Groovy.
>
> It appears using withPrintWriter(charset) produces a BOM whereas new
> PrintWriter(file, charset) does not.  As demonstrated here:
>
> File file = new File("tmp.txt")try {
>     String text = " "
>     String charset = "UTF-16LE"
>
>     file.withPrintWriter(charset) { it << text }
>     println "withPrintWriter"
>     file.getBytes().each { System.out.format("%02x ", it) }
>
>     PrintWriter w = new PrintWriter(file, charset)
>     w.print(text)
>     w.close()
>     println "\n\nnew PrintWriter"
>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>     file.delete()}
>
> Outputs
>
> withPrintWriter
> ff fe 20 00
>
> new PrintWriter
> 20 00
>
>
> Is this difference in behavior intentional?  It seems kinda odd to me.
>
> -Keegan
>



-- 
Guillaume Laforge
Groovy Project Manager
Product Ninja & Advocate at Restlet <http://restlet.com>

Blog: http://glaforge.appspot.com/
Social: @glaforge <http://twitter.com/glaforge> / Google+
<https://plus.google.com/u/0/114130972232398734985/posts>