You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lécharny <el...@gmail.com> on 2016/03/12 13:52:29 UTC
Re: Value processing on the server

Le 12/03/16 10:01, Stefan Seelmann a écrit :
> On 03/11/2016 01:18 AM, Emmanuel Lécharny wrote:
>
> Wow, thanks a lot for the good summary, I needed to read it 3 times :)
>
>> What we do wrong
>> ----------------
>>
>> We will only focus on the schema aware API here. This is what we use on
>> the server side anyway...
>>
>> * First, we are depending on the same API on both side (client and
>> server). This make things more complex, because the context is
>> different. For instance, there is no need to parse the DN on the client,
>> but we still do it.  I'm not sure that we could easily abvoid doing so.
>> To some extent, we are penalizing the client.
> I think it makes sense to parse the DN on the client to ensure that it
> is syntactically correct.
>
> But there are some cases where we should disable it e.g. when doing a
> simple bind to AD other forms like domain\sAMAccountName or
> sAMAccoutName@domain or the GUID are allowed by AD.

Agreed. Or to accept such constructs.
>
> Anyway, I'd keep the current implementation for now and parse the DN on
> the client side.

Note that we still have to parse the DN on the server side. Parsing it
on the client side is just good to avoid sending something wrong on the
server.

>
>> * The most complex situation is when we have to procesds the DN. This is
>> always done in two phases :
>> - slice the DN into RDNs, the RDNs into AVAs containg Values
>> - apply the schema on each value
>>
>> We coulde easily imagine doing the processing in one single pass.
>> Actually, this is an error not to do so : this cost time, and the
>> classes are therfore not immutable.
> Yes, I agree, that would be a huge improvement.
>
>> * One specific problematic point is when we process escaped chars. For
>> instance, something like : 'cn=a\ \ \ b' is just a cn with a value
>> containing 3 spaces. This is what should be returned to the user, and
>> not a value with only one space. *But* we will be able to retrieve this
>> value using one of those filters : (cn=a b) or (cn=a  b) or (cn=
>> a         b). Actually the number of spaces is irrelevant when comparing
>> the value, it's not when it comes to send back the value to the user.
>> Again, it has all to see with the distinction between storing values and
>> comparing values.
>> For filters, we must unescape the String before sending it to the
>> server. The server does not handle the Filter as a String.
> Ok. But the server then still normalizes the value before comaprison,
> right? 
Depends. Comparison is done on normalized/prepared Strings, so it has to
be done. The question is : should we do it as soon as we receive the
value, even if no comparison will be done ? I think so (see my other mail).

> So "a\ \ \ b" is unescaped at the client to "a   b" and sent with
> the 3 spaces to the server.
For filter, yes. For DN, this is not an obligation.

>  Then the server normalizes and prepares the
> value before comparison. 

yes

> And in the server we store the already
> normalized+prepared value so it just can be compared. And then we return
> the original (user provided) value back to the user.
Yes.
>
> I thought we are doing that already. Or is there something missing?

You are'nt missing anything, except that teh String preparation is done
on the fly, instead of being done once and for ever.
>
>> * The PrepareString class needs to be reviewed. We don't handle spaces
>> the way it's supposed to be done.
> Yes. And especially we already call the "unescape()" method during
> normalization which is wrong.
Yes, absolutely.

>
>> Value Class
>> -----------
>>
>> I'm not exactly proud of it. It was a way to avoid having code like :
>>
>>     if ( value instance of String )
>>     {
>>         // This is a String
>>     }
>>     else
>>     {
>>         // This is a byte[]
>>     }
>>
>> so now, we have StringValue and BinaryValue, both of them could be used
>> with an AttributeType when they are SchemaAware. In retrospect, I think
>> the distinction between String and Binary values was an error. We should
>> have a Value, holding both, with a flag in it. Chaning that means we
>> review the entire code, again...
> I think that is not so urgent.

No. But the thing is that Values are used all over (in AVA, in
AttributeTypes, in Filter, etc), so at some point, having a simpler
class would aleviate the work we have to do in these classes.

>
>> Conclusion
>> ==========
>>
>> This is not a pleasant situation. We have some cases where we don't
>> handle things correctly, and this is largely due to some choices made a
>> decade ago. Now, I don't think that this should be kept as is. Sometime
>> a big refactoring is better than patching this and that...
>>
>>
>> Now, feel free to express yourself, I would be vert happy to have your
>> opinion.
> I think the most important thing we have to make is to fix the unescape
> of values and get a release out. I spend more times to explain users
> what is wrong in Studio than fixing stuff ;)
:-) True, very true...
>
> Then I think a next step is to refactor the DN parsing and normalization
> and make Dn/Rdn/Ava immutable.
Totally agree.


>  But that's only the part of the code I
> looked into recently, I cannot say how urgent the other refactorings are.

I won't say the refactoring is really critical, it's just that it could
help. The good point is that we have already a deep knowledge of how
they work, so it shoudl not be that complex. I have conducted an
experiment last week-end : changing the way the simple DN (those parsed
with the FastDnParser) so that the schema is applied immediately. It was
a mtter of 4/5 h to get it working, and it sped up the DN parsing by
around 10%.

We can evaluate teh work as we go...


Thanks Stefan !