You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Stefan Zoerner <st...@labeo.de> on 2005/09/04 15:09:18 UTC
Normalizer vs. Comparator
Hi all!
I have a question regarding Normalizers (first of all) and Comparators.
Here is the whole story:
I faced the problem that the compare operation does not adhere the
matching rules. Therefore I successfully modified the CompareHandler
class in org.apache.ldap.server.protocol to do this (whether this is the
best place to fix this problem is not the question here).
It worked better, but not all matching rules satisfied my needs (some
are missing). One of these is telephoneNumberMatch, and I changed
SystemComparatorProducer to replace ComparableComparator with something,
that implements the missing matching rule.
Two options here to implement this Comparator:
1. just implement this interface Comparator, call it
TelephoneNumberComparator
2. Create a Normalizer for telephone numbers (removing white space and
hyphens, transform to e.g. lower case), and instantiate a
NormalizingComparator in SystemComparatorProducer which uses it
This leads me (finally) to the question, where normalizers are intended
to use. I do not want my telephone number get "normalized" before
storing it, because that would delete the formatting, which people might
like to preserve.
Example: attribute value "0251-123-3333 0" is stored as is, but adding
an attribute value to the entry that matches according to the matching
rule (e.g. "025112333330") is rejected (attribute value in use).
Thanks in advance, Stefan
Btw.: If you advice me to do the right thing, I'll contribute the
matching rule implementations which are missing. I need them to make my
compare ops work.
Re: Normalizer vs. Comparator
Posted by Stefan Zoerner <st...@labeo.de>.
Alex Karasulu wrote:
> Stefan Zoerner wrote:
>
>> Hi all!
>
>
> Hey sorry for taking so long to respond.
>
No problem Alex. There are currently so many things for me to do ...
>
> Hope this helps,
> Alex
>
Thank you very much for taking the time to describe the relation of
these components and their deeper meaning. And yes, it was very helpful.
I will add some test cases for matching rules in the near future (for
the compare op I alredy have some). It looks to me that I will be able
to make some of the missing rules work (those not critical ones), at
least as implemented in the current SystemComparatorProducer.
Stefan.
Re: Normalizer vs. Comparator
Posted by Alex Karasulu <ao...@bellsouth.net>.
Stefan Zoerner wrote:
> Hi all!
Hey sorry for taking so long to respond.
> Here is the whole story:
> I faced the problem that the compare operation does not adhere the
> matching rules. Therefore I successfully modified the CompareHandler
> class in org.apache.ldap.server.protocol to do this (whether this is
> the best place to fix this problem is not the question here).
Ok some theory behind these constructs might shed some light on what
role they serve in the server.
Most LDAP servers have a means to extend the schema however this means
is extremely limited when it comes to defining new Syntaxes or new
MatchingRules. Really these constructs are often built into the server
and cannot be changed without code changes.
When I started designing the schema subsystem of ApacheDS (still not
finished) I wanted her to be able to be extended for new Syntaxes and
new MatchingRules. To do this I had to understand the fundamental
components needed to represent new matchingRules and syntaxes. For
syntaxes I created an interface called SyntaxChecker. Every syntax must
have a SyntaxChecker in order for the schema subsystem to check for
proper attribute value syntax. This SyntaxChecker can be a simple regex
or an entire parser. As long as the interface is adhired to the schema
subsystem can use it to determine if correct values are being used for
attributeTypes based on a schema.
The other half dealing with Comparators and Normalizers is much more
complex and for this you must really understand what a matchingRule
does. The server uses matching rules to determine equality and
ordering. Before it can do this string prep must be run on some values
(normalization) to remove the chance for varience to enter the picture.
Hence matchingRules can be broken down into Comparators and
Normalizers. Some may think a Normalizer is syntax specific however how
you want to match effects normalization not the syntax. For example if
I have an attribute that is a simple string and I want to perform a case
insensitive match then the normalization changes from a case sensitive
match. This shows how normalization is specific to matching an not just
a syntax.
Anyways Normalizers and Comparators are the basis to matchingRules. A
new matchingRule must have these defined for its OID as you probably saw.
> It worked better, but not all matching rules satisfied my needs (some
> are missing).
Yep we have not filled in any of these really. Just some very critical
ones so the directory can operate. We need help in filling these in.
> One of these is telephoneNumberMatch, and I changed
> SystemComparatorProducer to replace ComparableComparator with
> something, that implements the missing matching rule.
>
Cool. This is exactly what we need to do.
> Two options here to implement this Comparator:
> 1. just implement this interface Comparator, call it
> TelephoneNumberComparator
> 2. Create a Normalizer for telephone numbers (removing white space and
> hyphens, transform to e.g. lower case), and instantiate a
> NormalizingComparator in SystemComparatorProducer which uses it
>
Right these would be the two steps to follow. One for the Comparator
and another for the normalizer.
> This leads me (finally) to the question, where normalizers are
> intended to use. I do not want my telephone number get "normalized"
> before storing it, because that would delete the formatting, which
> people might like to preserve.
Good question. Let me try to answer this ...
Normalization is critical while attempting to match two values
together. Sometimes there is extra white space and it can be removed to
better enable correct comparisons. Sometimes normalization is not even
needed if the syntax is very rigid without any room for case or space
variance. Consider matching for cn=Stefan Zoerner which is in the
directory (this is what the user who added an entry put as the cn
attribute value). Now another user that is searching for these entries
may ask for cn=STEFAN ZOERNER with 3 spaces between STEFAN and
ZOERNER. The two users may be the same or different users. The second
user should be able to to pull the same entries regardles of which
filter he uses below:
(cn=STEFAN ZOERNER)
(cn= Stefan ZOerner)
(cn=stefan zoerner)
So a normalizer would come into play here by generating a canonical
representation of these inputs. ApacheDS by default case normalizes by
reducing case to lowercase and then comparing the filter string with the
normalized attribute value stored within the directory: this is only
done for matching rules that ignore case. For whitespace normalization
ApacheDS tries to follow the string prep operation defined in various
ietf documents. However I'm sure we fall short. The general rule of
thumb for ApacheDS is to whitespace normalize while retaining string
tokenization order. Meaning we do a deep trim of values replacing
whitespace with a single space character. Whitespace on the ends are
discarded. This btw is only done when space and whitespace in general
is not escaped.
Hope this helps,
Alex