You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hc.apache.org by sebb <se...@gmail.com> on 2013/07/08 12:10:08 UTC

UnicodeLittleUnmarked or UTF-16LE in NTLM code?

The NTLM code uses the charset UnicodeLittleUnmarked a lot.

The official page:

http://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html

says they are the same, but different APIs use a different canonical name.

I assume the methods will therefore take either.

Might be worth changing to the slightly shorter - but more obviously
16 bit - name?

In any case, extracting as a constant and documenting the choice would
be a good idea.
Especially since the code also uses US-ASCII or ASCII sometimes (why?)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

Re: UnicodeLittleUnmarked or UTF-16LE in NTLM code?

Posted by sebb <se...@gmail.com>.

On 8 July 2013 11:10, sebb <se...@gmail.com> wrote:
> The NTLM code uses the charset UnicodeLittleUnmarked a lot.
>
> The official page:
>
> http://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html
>
> says they are the same, but different APIs use a different canonical name.
>
> I assume the methods will therefore take either.
>
> Might be worth changing to the slightly shorter - but more obviously
> 16 bit - name?
>
> In any case, extracting as a constant and documenting the choice would
> be a good idea.
> Especially since the code also uses US-ASCII or ASCII sometimes (why?)

I've just been looking at http://davenport.sourceforge.net/ntlm.html
and this says that certain fields always use OEM encoding.
This is documented as being " the local machine's native character set
(DOS codepage)", however the code seems to use ASCII (or US-ASCII) for
this.
That seems wrong - although ASCII is likely to be a subset of the
default encoding, this is not 100% guaranteed.
If the code does make this assumption, I think it should be documented.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

Re: UnicodeLittleUnmarked or UTF-16LE in NTLM code?

Posted by Gary Gregory <ga...@gmail.com>.

On Jul 8, 2013, at 6:30, Oleg Kalnichevski <ol...@apache.org> wrote:

> On Mon, 2013-07-08 at 11:10 +0100, sebb wrote:
>> The NTLM code uses the charset UnicodeLittleUnmarked a lot.
>>
>> The official page:
>>
>> http://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html
>>
>> says they are the same, but different APIs use a different canonical name.
>>
>> I assume the methods will therefore take either.
>>
>> Might be worth changing to the slightly shorter - but more obviously
>> 16 bit - name?
>>
>> In any case, extracting as a constant and documenting the choice would
>> be a good idea.
>> Especially since the code also uses US-ASCII or ASCII sometimes (why?)
>
> Sebastian,
>
> I would like to propose to move to Java 1.6 at some point (rather sooner
> than later). One of the reasons to make this move is to be able to use
> Charset variant of String#getBytes() method and to clean up the use of
> various charsets throughout the code base, not just NTLM code.
>
> Maintaining Java 1.5 compatibility has been getting increasingly
> difficult and increasingly pointless. The question is whether this is
> too later for 4.3 or not.

I would move to Java 6 now, a major release is as good a time as any
and gives us the opportunity for changes that are easier to make than
in a minor release. I would not want to be stuck with supporting Java
5 until the next major release.

Gary

>
> Oleg
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

Re: UnicodeLittleUnmarked or UTF-16LE in NTLM code?

Posted by Oleg Kalnichevski <ol...@apache.org>.

On Mon, 2013-07-08 at 11:35 +0100, sebb wrote:
> On 8 July 2013 11:29, Oleg Kalnichevski <ol...@apache.org> wrote:
> > On Mon, 2013-07-08 at 11:10 +0100, sebb wrote:
> >> The NTLM code uses the charset UnicodeLittleUnmarked a lot.
> >>
> >> The official page:
> >>
> >> http://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html
> >>
> >> says they are the same, but different APIs use a different canonical name.
> >>
> >> I assume the methods will therefore take either.
> >>
> >> Might be worth changing to the slightly shorter - but more obviously
> >> 16 bit - name?
> >>
> >> In any case, extracting as a constant and documenting the choice would
> >> be a good idea.
> >> Especially since the code also uses US-ASCII or ASCII sometimes (why?)
> >>
> >
> > Sebastian,
> >
> > I would like to propose to move to Java 1.6 at some point (rather sooner
> > than later). One of the reasons to make this move is to be able to use
> > Charset variant of String#getBytes() method
> 
> Yes, that's definitely easier.
> 
> > and to clean up the use of
> > various charsets throughout the code base, not just NTLM code.
> 
> Not sure that requires Java 1.6.
> 
> > Maintaining Java 1.5 compatibility has been getting increasingly
> > difficult and increasingly pointless. The question is whether this is
> > too later for 4.3 or not.
> 
> There's still quite a lot of Java 1.5 out there, so I would suggest
> holding off requiring 1.6 until after 4.3.
> 
> There are a lot of useful fixes etc in 4.3, so why not make them
> available to people still stuck on Java 5?
> 

All right. Fair enough. Let's discuss the move post 4.3

Oleg



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

Re: UnicodeLittleUnmarked or UTF-16LE in NTLM code?

Posted by Gary Gregory <ga...@gmail.com>.

On Jul 8, 2013, at 6:36, sebb <se...@gmail.com> wrote:

> On 8 July 2013 11:29, Oleg Kalnichevski <ol...@apache.org> wrote:
>> On Mon, 2013-07-08 at 11:10 +0100, sebb wrote:
>>> The NTLM code uses the charset UnicodeLittleUnmarked a lot.
>>>
>>> The official page:
>>>
>>> http://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html
>>>
>>> says they are the same, but different APIs use a different canonical name.
>>>
>>> I assume the methods will therefore take either.
>>>
>>> Might be worth changing to the slightly shorter - but more obviously
>>> 16 bit - name?
>>>
>>> In any case, extracting as a constant and documenting the choice would
>>> be a good idea.
>>> Especially since the code also uses US-ASCII or ASCII sometimes (why?)
>>
>> Sebastian,
>>
>> I would like to propose to move to Java 1.6 at some point (rather sooner
>> than later). One of the reasons to make this move is to be able to use
>> Charset variant of String#getBytes() method
>
> Yes, that's definitely easier.
>
>> and to clean up the use of
>> various charsets throughout the code base, not just NTLM code.
>
> Not sure that requires Java 1.6.
>
>> Maintaining Java 1.5 compatibility has been getting increasingly
>> difficult and increasingly pointless. The question is whether this is
>> too later for 4.3 or not.
>
> There's still quite a lot of Java 1.5 out there,

According to what metric?

Gary
> so I would suggest
> holding off requiring 1.6 until after 4.3.
>
> There are a lot of useful fixes etc in 4.3, so why not make them
> available to people still stuck on Java 5?
>
> Also conversion to Java 1.6 requires lots of changes to @Override.
> It's more work than might at first appear.
>
>> Oleg
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
>> For additional commands, e-mail: dev-help@hc.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

Re: UnicodeLittleUnmarked or UTF-16LE in NTLM code?

Posted by sebb <se...@gmail.com>.

On 8 July 2013 11:29, Oleg Kalnichevski <ol...@apache.org> wrote:
> On Mon, 2013-07-08 at 11:10 +0100, sebb wrote:
>> The NTLM code uses the charset UnicodeLittleUnmarked a lot.
>>
>> The official page:
>>
>> http://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html
>>
>> says they are the same, but different APIs use a different canonical name.
>>
>> I assume the methods will therefore take either.
>>
>> Might be worth changing to the slightly shorter - but more obviously
>> 16 bit - name?
>>
>> In any case, extracting as a constant and documenting the choice would
>> be a good idea.
>> Especially since the code also uses US-ASCII or ASCII sometimes (why?)
>>
>
> Sebastian,
>
> I would like to propose to move to Java 1.6 at some point (rather sooner
> than later). One of the reasons to make this move is to be able to use
> Charset variant of String#getBytes() method

Yes, that's definitely easier.

> and to clean up the use of
> various charsets throughout the code base, not just NTLM code.

Not sure that requires Java 1.6.

> Maintaining Java 1.5 compatibility has been getting increasingly
> difficult and increasingly pointless. The question is whether this is
> too later for 4.3 or not.

There's still quite a lot of Java 1.5 out there, so I would suggest
holding off requiring 1.6 until after 4.3.

There are a lot of useful fixes etc in 4.3, so why not make them
available to people still stuck on Java 5?

Also conversion to Java 1.6 requires lots of changes to @Override.
It's more work than might at first appear.

> Oleg
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

Re: UnicodeLittleUnmarked or UTF-16LE in NTLM code?

Posted by Oleg Kalnichevski <ol...@apache.org>.

On Mon, 2013-07-08 at 11:10 +0100, sebb wrote:
> The NTLM code uses the charset UnicodeLittleUnmarked a lot.
> 
> The official page:
> 
> http://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html
> 
> says they are the same, but different APIs use a different canonical name.
> 
> I assume the methods will therefore take either.
> 
> Might be worth changing to the slightly shorter - but more obviously
> 16 bit - name?
> 
> In any case, extracting as a constant and documenting the choice would
> be a good idea.
> Especially since the code also uses US-ASCII or ASCII sometimes (why?)
> 

Sebastian,

I would like to propose to move to Java 1.6 at some point (rather sooner
than later). One of the reasons to make this move is to be able to use
Charset variant of String#getBytes() method and to clean up the use of
various charsets throughout the code base, not just NTLM code.

Maintaining Java 1.5 compatibility has been getting increasingly
difficult and increasingly pointless. The question is whether this is
too later for 4.3 or not.

Oleg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org