You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Stefan Seelmann <se...@apache.org> on 2007/10/31 23:24:18 UTC

Problem with LdapDN class and escaped characters

Hi developers,

I am working on DIRSTUDIO-229, trying to use the LdapDN class from
shared-ldap in Studio. It works almost well. but now I have a problem
with a special DN.

The attribte type is just 'cn'
The attribute value contains two characters:
  a) the german umlaut 'Ä', escaped UTF-8 value is '\C3\84'
  b) the plus '+', escaped UTF-8 value is '\2B'

Now I created four LdapDN objects with each combination of escaped or
not escaped characters:
  System.out.println(new LdapDN("cn=Ä\\+"));
  System.out.println(new LdapDN("cn=\\C3\\84\\+"));
  System.out.println(new LdapDN("cn=Ä\\2B"));
  System.out.println(new LdapDN("cn=\\C3\\84\\2B"));

The output is the following:
  cn=\C3\84\+
  cn=\C3\84\+
  cn=\EF\BF\BD\+
  cn=\C3\84\+

As you could see the third variant is not correct. I also tried to
construct the Java String from a byte array:
  String dn = new String( new byte[] { 'c', 'n', '=', ( byte ) 0xC3, (
byte ) 0x84, ( byte ) 0x5C, (byte) 0x32, ( byte ) 0x42 }, "UTF-8" );
  System.out.println(new LdapDN(dn));

But the output is the same:
  cn=\EF\BF\BD\+

So my question is if this is a problem of the LdapDN class and parser or
is this an illegal DN?

Such an DN with the unescaped 'Ä' and the escaped '+' is returnd by
OpenLDAP. Or is that a problem with OpenLDAP?

Kind Regards,
Stefan Seelmann



Re: Problem with LdapDN class and escaped characters

Posted by Emmanuel Lecharny <el...@gmail.com>.
Stefan Seelmann wrote:
> Hi developers,
>   
Hi Stefan !
> I am working on DIRSTUDIO-229, trying to use the LdapDN class from
> shared-ldap in Studio. It works almost well. but now I have a problem
> with a special DN.
>
> The attribte type is just 'cn'
> The attribute value contains two characters:
>   a) the german umlaut 'Ä', escaped UTF-8 value is '\C3\84'
>   b) the plus '+', escaped UTF-8 value is '\2B'
>
> Now I created four LdapDN objects with each combination of escaped or
> not escaped characters:
>   System.out.println(new LdapDN("cn=Ä\\+"));
>   
Not legal.
>   System.out.println(new LdapDN("cn=\\C3\\84\\+"));
>   
Legal
>   System.out.println(new LdapDN("cn=Ä\\2B"));
>   
Not legal
>   System.out.println(new LdapDN("cn=\\C3\\84\\2B"));
>   
Legal
> The output is the following:
>   cn=\C3\84\+
>   cn=\C3\84\+
>   cn=\EF\BF\BD\+
>   cn=\C3\84\+
>
> As you could see the third variant is not correct. I also tried to
> construct the Java String from a byte array:
>   String dn = new String( new byte[] { 'c', 'n', '=', ( byte ) 0xC3, (
> byte ) 0x84, ( byte ) 0x5C, (byte) 0x32, ( byte ) 0x42 }, "UTF-8" );
>   System.out.println(new LdapDN(dn));
>
> But the output is the same:
>   cn=\EF\BF\BD\+
>
> So my question is if this is a problem of the LdapDN class and parser or
> is this an illegal DN?
>   
This is an illegal DN.
> Such an DN with the unescaped 'Ä' and the escaped '+' is returnd by
> OpenLDAP. Or is that a problem with OpenLDAP?
>   
RFC 4514 + RFC 4512 :

attributeValue = string / hexstring
string =   [ ( leadchar / pair ) [ *( stringchar / pair ) ( trailchar / pair ) ] ]
leadchar = LUTF1 / UTFMB
pair = ESC ( ESC / special / hexpair )
stringchar = SUTF1 / UTFMB
trailchar  = TUTF1 / UTFMB
LUTF1 = %x01-1F / %x21 / %x24-2A / %x2D-3A / %x3D / %x3F-5B / %x5D-7F
TUTF1 = %x01-1F / %x21 / %x23-2A / %x2D-3A / %x3D / %x3F-5B / %x5D-7F
SUTF1 = %x01-21 / %x23-2A / %x2D-3A / %x3D / %x3F-5B / %x5D-7F
UTFMB   = UTF2 / UTF3 / UTF4
UTF2    = %xC2-DF UTF0
UTF3    = %xE0 %xA0-BF UTF0 / %xE1-EC 2(UTF0) / %xED %x80-9F UTF0 / %xEE-EF 2(UTF0)
UTF4    = %xF0 %x90-BF 2(UTF0) / %xF1-F3 3(UTF0) / %xF4 %x80-8F 2(UTF0)
UTF0    = %x80-BF
ESC     = %x5C ; backslash ("\")
special = escaped / SPACE / SHARP / EQUALS
hexpair = HEX HEX
SPACE   = %x20 ; space (" ")
SHARP   = %x23 ; octothorpe (or sharp sign) ("#")
EQUALS  = %x3D ; equals sign ("=")
escaped = DQUOTE / PLUS / COMMA / SEMI / LANGLE / RANGLE
DQUOTE  = %x22 ; quote (""")
PLUS    = %x2B ; plus sign ("+")
COMMA   = %x2C ; comma (",")
SEMI    = %x3B ; semicolon (";")
LANGLE  = %x3C ; left angle bracket ("<")
RANGLE  = %x3E ; right angle bracket (">")
HEX     = DIGIT / %x41-46 / %x61-66 ; "0"-"9" / "A"-"F" / "a"-"f"
DIGIT   = %x30 / LDIGIT       ; "0"-"9"
LDIGIT  = %x31-39             ; "1"-"9"

1) System.out.println(new LdapDN("cn=Ä\\+")); 
=> Depends on your local encoding. The 'Ä' will be encoding in a very 
strange way if you use a japanese local... The \\+ is legal though

2) System.out.println(new LdapDN("cn=\\C3\\84\\+"));
=> a correct encoding of 'Ä+'

3) System.out.println(new LdapDN("cn=Ä\\2B"));
=> Again, depends en our local encoding. \\2B resolves to +.

4) System.out.println(new LdapDN("cn=\\C3\\84\\2B"));
=> Correct encoding

5) String dn = new String( new byte[] { 'c', 'n', '=', ( byte ) 0xC3, 
(byte ) 0x84, ( byte ) 0x5C, (byte) 0x32, ( byte ) 0x42 }, "UTF-8" );

You have an error : the way the LdapDN parser unescape the resultant 
string is incorrect. Funny enough, the following DN is correctly parsed :
String dn = new String( new byte[] { 'c', 'n', '=', ( byte ) 0xC3, (byte 
) 0x84, '\\', '+' }, "UTF-8" );

I'm investigating ...

Complex, isn't it ? ;)


--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org