You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by "Alan D. Cabrera" <ad...@toolazydogs.com> on 2005/02/04 16:41:20 UTC
DnNormalizer
Why does it reparse the string when it's normalizing?
Regards,
Alan
Re: DnNormalizer
Posted by Alex Karasulu <ao...@bellsouth.net>.
Emmanuel Lecharny wrote:
>I may have missed something, so the following should only be taken for
>no more than my own perception of the problem :
>
>I think that we should consider two cases :
>- values that are sent through PDU
>- values that are sent through files (ldif)
>
>The first case does not need normalization : it's already done while
>decoding the PDU
>
>
When conducting a search an LDAP server must evaluate a filter
expression composed of assertion value pairs. Filters like
(& (locale= SanTA BaRBara) (OU=Human Resources ) )
need to be evaluated. Regardless of the space or character case
varience in the values provided for these assertions ( based on case
insensitive attributes) the result set should be the same. Backends
usually build indices to rapidly lookup entries within the system that
match these assertions. Beyond these there are system indices as well.
When a directory entry is added any attributes of the entry
corresponding to indexed attributes are normalized based on the schema
associated with the attribute. So an attribute that is case sensitive
like a UNIX file name will not have its case normalized. Whereas local
and ou values will be case normalized. So ApacheDS pays the tax of
normalization when performing write based operations like add,
modify(dn), and delete. This keeps searches fast and after all LDAP is a
read optimized store.
Now for DN's we need to normalize them in a similar fashion and keep
both the user provided DN as is when the entry was added and the
normalized DN which is added to a system index for entry addressing.
This way when scanning the normalized DN index we do not need to
normalize values of existing entries only the arriving DN within a PDU.
>For ldif files, that quite different. Spacing should never be a problem.
>LDAP server should store trimed values, so no difference. A space, a
>tab, a nbsp are differents char, so they are stored as is. If a user
>send a space instead of a nbsp;, too bad for him ! (modify or delete
>orders, for instance).
>
>
You always want to keep the data that was submitted as is the same to
return it without modification. However you obviously have to normalize
this for adding values to indices. Usually the rule of thumb with
whitespace normalization is to do a deep trim without changing
tokenization order unless quotations are used to signify literal text.
In the LDAP space people call this the string prep function.
>There may be only one specially vicious case : a LDAP client that send a
>request without triming spaces. (M$ could do that ! Embrass and extend
>stuff). Then you are dead... Don't know if you have to deal with this
>kind of brain dead client tier?
>
>
Client can send anything - we must presume this. I did not think
clients were required to normalize things. As a matter of fact they
should not. User data including the DN should be provided as is and
returned as is. For example if I added an entry and used the following DN
(note 5 extra whitespace characters between 'Wachy' and 'Users' words)
uid=akarasulu,ou=Wacky Users, dc=apache, dc=org
then the client should not be changing this. That's the way the DN
might need to appear for some crazy reason. That's what the user may
have wanted. So when we search and return akarasulu's entry then we
should see the DN as it was given to us. However behind the scenes the
server must normalize this so a compare on the password of user entry,
uid=akarasulu,ou=Wacky Users, dc=apache, dc=org
still addresses the right user to return the correct result. I may be
wrong but this was my impression. Please someone double check this
because its been so long. I may have lost my sanity here too.
>Am I a total fool, or just pretending that I'm sane?
>Please feel free to tell me !
>
>
Well first off you need to be insane to be here so no need to talk about
sanity when we're all a little cookoo.
No need to worry these are all very good points. Let's keep discussing
them until we all have a better understanding. It will take time to
have stuff sink in.
Really the drive for all this crazyness is just for setting things up
for search: to be able to match entries. Everything else is ancillary
and just there as setup for this function which is the heart of a
directory server. Can't wait to talk to you about search algorithm when
you spend time in the search engine where all this normalization
craziness will make more sense.
Hope this helps,
Alex
>Cheers,
>Emmanuel
>
>Le vendredi 04 février 2005 à 21:15 -0500, Alex Karasulu a écrit :
>
>
>>Alan D. Cabrera wrote:
>>
>>
>>
>>>Why does it reparse the string when it's normalizing?
>>>
>>>
>>The string is reparsed because normalization is not just a matter of
>>handing whitespace. It involves normalizing values so that case and
>>white space varience do not effect the outcome of addressing the entry
>>node within the namespace. Things like the attribute schema determine
>>how this is going to happen. However this might not contradict what you
>>are asking I just don't have enough info from this one liner.
>>
>>I think you are referring to when a non-normalized DN (user provided
>>input) as an LdapName is converted into a string then put through the
>>parser again. It might not have to be if I understand you.
>>
>>Alex
>>
>>
>>
>>
>
>
>
>
>
Re: DnNormalizer
Posted by Emmanuel Lecharny <el...@iktek.com>.
I may have missed something, so the following should only be taken for
no more than my own perception of the problem :
I think that we should consider two cases :
- values that are sent through PDU
- values that are sent through files (ldif)
The first case does not need normalization : it's already done while
decoding the PDU
For ldif files, that quite different. Spacing should never be a problem.
LDAP server should store trimed values, so no difference. A space, a
tab, a nbsp are differents char, so they are stored as is. If a user
send a space instead of a nbsp;, too bad for him ! (modify or delete
orders, for instance).
There may be only one specially vicious case : a LDAP client that send a
request without triming spaces. (M$ could do that ! Embrass and extend
stuff). Then you are dead... Don't know if you have to deal with this
kind of brain dead client tier?
Am I a total fool, or just pretending that I'm sane?
Please feel free to tell me !
Cheers,
Emmanuel
Le vendredi 04 février 2005 à 21:15 -0500, Alex Karasulu a écrit :
> Alan D. Cabrera wrote:
>
> > Why does it reparse the string when it's normalizing?
>
> The string is reparsed because normalization is not just a matter of
> handing whitespace. It involves normalizing values so that case and
> white space varience do not effect the outcome of addressing the entry
> node within the namespace. Things like the attribute schema determine
> how this is going to happen. However this might not contradict what you
> are asking I just don't have enough info from this one liner.
>
> I think you are referring to when a non-normalized DN (user provided
> input) as an LdapName is converted into a string then put through the
> parser again. It might not have to be if I understand you.
>
> Alex
>
>
Re: DnNormalizer
Posted by Alex Karasulu <ao...@bellsouth.net>.
Alan D. Cabrera wrote:
> Why does it reparse the string when it's normalizing?
The string is reparsed because normalization is not just a matter of
handing whitespace. It involves normalizing values so that case and
white space varience do not effect the outcome of addressing the entry
node within the namespace. Things like the attribute schema determine
how this is going to happen. However this might not contradict what you
are asking I just don't have enough info from this one liner.
I think you are referring to when a non-normalized DN (user provided
input) as an LdapName is converted into a string then put through the
parser again. It might not have to be if I understand you.
Alex