You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directory.apache.org by Radovan Semancik <ra...@evolveum.com.INVALID> on 2019/06/11 14:11:15 UTC

Native AD schema

Hi,

I thought that this may be of interest for the community here. I have
created a prototype code to process "native" Active Directory schema
with Directory API:

https://github.com/Evolveum/connector-ldap/blob/feature/ad-native-schema/src/main/java/com/evolveum/polygon/connector/ldap/ad/AdSchemaLoader.java
https://github.com/Evolveum/connector-ldap/blob/feature/ad-native-schema/src/main/java/com/evolveum/polygon/connector/ldap/ad/AdSchemaManager.java

For those of you who are not aware of those AD "peculiarities", AD
provides an somehow-standard LDAP-like schema definition. But this
leaves a lot to be desired. E.g. it does not provides any means to work
with objectCategory, which is quite essential if you want to get any
decent performance of your AD server. (Un)fortunately, there is another
form of AD schema which is, quite surprisingly, not standard. What I
have done is to write a code to parse that proprietary form of the
schema and load it into modified Directory API SchemaManager. Then the
API can work with the same in the same way as it works with standard
LDAP schema.

First tests suggest that this may work. Once I manage to stabilize and
test the code, I will most likely contribute that directly to Directory
API, as this may be useful also for other people. But that will be most
definitely after the 2.0 release of the API.

Current structure of Directory API, especially the schema-processing
parts (SchemaLoader, SchemaManager) is not ideal for this particular
kind of abominations. However, I have managed to make it work just with
a few minor changes in SchemaLoader and SchemaManager. Those classes
would deserve much tender loving care. But the current code seems to
work and the rework of SchemaLoader and SchemaManager is currently not
my priority. E.g. dependency of the API on MINA is something that
bothers me much more.

--
Radovan Semancik
Software Architect
evolveum.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@directory.apache.org
For additional commands, e-mail: dev-help@directory.apache.org

Re: Native AD schema

Posted by Emmanuel Lécharny <el...@gmail.com>.

On 12/06/2019 02:40, Brian Burch wrote:
> On 12/6/19 2:49 am, Emmanuel Lécharny wrote:
>>
>>
>> Can't wait for M$ to stop using Windows and switch to Linux. Even if 
>> I'm pretty sure they are going to Fxxx it up quite a bit.
>>
>> The main issue is the fact the API is asynchronous. It makes 
>> *everything* insanely complex. Anyone claiming concurrent code is 
>> simple is either a genious or an imbecile.
>
> 3. I still have a personal task to convert some of my own 
> infrastructure systems from Netscape LDAP API to Apacheds LDAP API, so 
> perhaps I'll get round to it in 2019? In the meantime, my old LDAP 
> client code plays so nicely with apacheds (of course, along with quite 
> a lot of custom schemas) that the job never gets to the top of my list.

Would you decide to play with this idea in the coming months, we would 
really appreciate your experimented feedback on the API, especially the 
async part. As you probably noticed, we went through some quite 
difficult issues related to the async aspect of the API lately, and some 
review would really help ! Sadly, it would also require some analysis of 
MINA...

Anyway, an interesting feedback. As we say in french : "Si jeunesse 
savait,  si vieillesse pouvait..." (which translate to "*If**youth 
**b**ut **knew**an**d a**ge wer**e**able") (hopefully, even if we aren't 
young anymore, we are not old. Yet...)
*

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@directory.apache.org
For additional commands, e-mail: dev-help@directory.apache.org

Re: Native AD schema

Posted by Brian Burch <br...@pingtoo.com>.

On 12/6/19 2:49 am, Emmanuel Lécharny wrote:
> 
> On 11/06/2019 17:43, Radovan Semancik wrote:
>>
>>> I would be interested about having an exhaustive description of the 
>>> differences.
>>
>> E.g. AD objectclass definition for 'person' looks like this:
>>
>>
>> Completely different form. Part of the data is equivalent. But there 
>> are extensions (e.g. showInAdvancedViewOnly). Syntaxes are all 
>> different. Auxiliary object classes are handled in a different way 
>> (they can be "included" in structural object class definition). Lots 
>> of subtle differences there.
> 
> OMFG...
> 
> 
>>
>> But the most annoying is objectCategory. AD obviously cannot properly 
>> index objectClass. Therefore they have invented single-valued 
>> objectCategory. That is the primary reason that I have bothered with 
>> this.
> 
> 
> I have no words.
> 
> 
> Can't wait for M$ to stop using Windows and switch to Linux. Even if I'm 
> pretty sure they are going to Fxxx it up quite a bit.
> 
> The main issue is the fact the API is asynchronous. It makes 
> *everything* insanely complex. Anyone claiming concurrent code is simple 
> is either a genious or an imbecile.

I couldn't resist adding a couple personal comments, but I hope they 
either amuse or interest the readers of this list.

1. As soon as wifi 802.11 WPA arrived on some windows drivers, I was 
paid to write an authentication mechanism between it and a Radius server 
with a Netscape LDAP server back-end. It was a nightmare, of course, but 
I can smugly say it was successful and I got paid. Part of the project 
involved locating the "core" encryption logic and algorithm seed table 
- M$ created both, as well as requiring Radius which used SSL-over-UDP 
(implemented in java by me). To quote Emmanuel, M$ "Fxxxed it up quite a 
bit"!

2. I've been writing multi-threaded code in more languages than I can 
easily remember for most of my professional career. I wish I had 
Emmanuel's neat quote "Anyone claiming concurrent code is simple
is either a genius or an imbecile" to hand when proposing solutions to 
my clients!

3. I still have a personal task to convert some of my own infrastructure 
systems from Netscape LDAP API to Apacheds LDAP API, so perhaps I'll get 
round to it in 2019? In the meantime, my old LDAP client code plays so 
nicely with apacheds (of course, along with quite a lot of custom 
schemas) that the job never gets to the top of my list.

4. I Had another $job to interface Netscape LDAP server (by then Oracle, 
or was it Fedora - my memory is hazy) to the first implementation of M$ 
ADS. Yet another nightmare, including a lot of network traces and very 
little documentation. I would not want to return to that project now the 
baby has grown into a monster!

No wonder I only see grey hairs on my head when I look in the mirror!

One of my clients was an full-blooded native American Indian, who loved 
to say "you can always spot the pioneers - they are the ones with arrows 
in their BACKS"!

If anyone was listening... thanks for letting me get slightly off-topic!

Good luck to you all - experience tells me you will need it,

Brian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@directory.apache.org
For additional commands, e-mail: dev-help@directory.apache.org

Re: Native AD schema

Posted by Emmanuel Lécharny <el...@gmail.com>.

On 12/06/2019 08:46, Radovan Semancik wrote:
>
>
> On 11. 6. 2019 18:49, Emmanuel Lécharny wrote:
>> The main issue is the fact the API is asynchronous. It makes 
>> *everything* insanely complex. Anyone claiming concurrent code is 
>> simple is either a genious or an imbecile.
>
> Curious how thin can be a boundary between an imbecile and a genius, 
> isn't it?

Yep. Maybe that is because many people - very smart ones - are 
advocating that async is the new normal, and many imbeciles just trust 
them because they successfully built an 'hello world++" application 
based on events...

But that vast majority of people are just struggling with the concepts, 
and I would guess that 99.99% have never heard of Petri nets... (they 
probably think it's a set of Petri boxes linked together ;-)

>
> However, even if MINA is async in nature, the client is usually using 
> it is a very sync way. 

Yes. And the API wraps all the operation as synchronous on top of 
asynchronous operations. However, for the API implementer, you have to 
take care of things like Abandon operation that could be done in the 
middle of another operation. But from the client PoV, asynchronous is 
just a burden, and the API could have been fully written using a 
blocking socket.

As I said in a previous mail, we need to drink the async coolaid because 
the server is using the API (first to process incoming requests, second 
because  we need to talk to other LDAP servers for replication, 
delegated auth, or referrals handling). And then it's a very different 
story: ApacheDS wouldn't have existed witjout NIO? simply because we 
need to handle potentially millions of incoming connections, something 
you can't do with blocking IO.

> Opening connection, waiting, sending request, waiting for response and 
> so on. There is hardly any multiplexing. And if there is, it is not 
> essential for a basic use of the API. I'm thinking about a way how to 
> select sync implementation instead of MINA. 

We started to work on implementing blocking IO in MINA 4 years ago. 
Never completed the work. But yeah, basically, there is no reason a 
network framework couldn't offer such a possibility.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@directory.apache.org
For additional commands, e-mail: dev-help@directory.apache.org

Re: Native AD schema

Posted by Radovan Semancik <ra...@evolveum.com.INVALID>.

On 11. 6. 2019 18:49, Emmanuel Lécharny wrote:
> The main issue is the fact the API is asynchronous. It makes 
> *everything* insanely complex. Anyone claiming concurrent code is simple 
> is either a genious or an imbecile.

Curious how thin can be a boundary between an imbecile and a genius, 
isn't it?

However, even if MINA is async in nature, the client is usually using it 
is a very sync way. Opening connection, waiting, sending request, 
waiting for response and so on. There is hardly any multiplexing. And if 
there is, it is not essential for a basic use of the API. I'm thinking 
about a way how to select sync implementation instead of MINA. The 
*Async methods from LdapNetworkConnection won't work (throwing 
unsupported operation exception). But they are not needed for simple 
sync use of the API. That may simplify the things.

-- 
Radovan Semancik
Software Architect
evolveum.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@directory.apache.org
For additional commands, e-mail: dev-help@directory.apache.org

Re: Native AD schema

Posted by Emmanuel Lécharny <el...@gmail.com>.

On 11/06/2019 17:43, Radovan Semancik wrote:
>
>> I would be interested about having an exhaustive description of the 
>> differences.
>
> E.g. AD objectclass definition for 'person' looks like this:
>
>
> Completely different form. Part of the data is equivalent. But there 
> are extensions (e.g. showInAdvancedViewOnly). Syntaxes are all 
> different. Auxiliary object classes are handled in a different way 
> (they can be "included" in structural object class definition). Lots 
> of subtle differences there.

OMFG...


>
> But the most annoying is objectCategory. AD obviously cannot properly 
> index objectClass. Therefore they have invented single-valued 
> objectCategory. That is the primary reason that I have bothered with 
> this.


I have no words.


>
> To be completely honest, all of this leaves quite a bad taste. I do 
> not want to go into all the gory details. I think I will keep the 
> implementation to the bare minimum to be able to survive with AD.


Makes sense.


Can't wait for M$ to stop using Windows and switch to Linux. Even if I'm 
pretty sure they are going to Fxxx it up quite a bit.

>
>> AFAICT, you are working in a branch, which is just fine. I don't 
>> think the LDAP API base code will move a lot in the coming weeks, now 
>> that we have cut a release, so you have time to do what you need and 
>> be safe merging all of it back in master.
>
> For now the code is in my AD connector for midPoint. That's where it 
> is easiest for me to have quick trial-and-error cycles, that are so 
> necessary whenever one works with AD. Once I'm OK with the code I'll 
> move it to Directory API. But that will take several weeks/months.

`

Makes sense.


>
>>
>>> Current structure of Directory API, especially the schema-processing 
>>> parts (SchemaLoader, SchemaManager) is not ideal for this particular 
>>> kind of abominations. 
>>
>> trudat. I saw that you made it extensible, which is a sane decision. 
>> May be creating an abstract classin the  middle would make sense, I 
>> don't know. Or a factory ?
>
> I do not know yet. Honestly, I'm not sure that I completely understand 
> current purpose of SchemaLoader/SchemaManager and how the 
> responsibilities are split between them. E.g. it looks like the design 
> is based on assumption that all the schemas will be in RFC4512 format 
> (or something very similar). Which obviously does not fit AD schemas. 
> But I'm not sure. I think I need more experience with the code to 
> understand it. Maybe more experiments. And I also have almost zero 
> understanding how all of that fits into the server. That is also the 
> reason why I've done only a very minimal changes. I do not want to 
> ruin original design just because I do not understand it yet.

The original idea was to stick to the RFC as much as possible (well, at 
the very beginning it was all about adhering strictly with the RFC, 
something that quickly became too rigid, as you realized).

The second aspect was about being able to modify the schema without 
breaking anything (which means a two step update : you modify and check, 
if it does not break anything, then you commit). There are many checks 
being done, accordingly to the relation between the various schema 
objects (like you can add an OC if the AT it uses aren't existing, etc).

That were the two drivers for the system.

We added one more : the ability to inject some new schema element using 
a classloader (that was for extension of elements like 
schemaCheckers/normalizers and comparators). Those are no defined in the 
RFC, but we need them in ApacheDS.


>
>> Actually, decoupling the API from MINA will be quite difficult. We 
>> can discuss that further, but the main issues are :
>
> Yes. That is something we need to discuss later. The primary problem 
> with MINA is its complexity. E.g. it makes it very inconvenient to use 
> Directory API in simple apps, because even the basic usage of the API 
> creates a lot of threads. And then it is a complete nightmare to 
> diagnose and fix issues, e.g. the recent issue with the bind. Such 
> things should be easy on the client. But they are not. I understand, 
> that there is a need for some of that complexity on server side. But 
> maybe, if we can make the "networking underside" pluggable, then we 
> can have MINA on the server and something simpler on the client. But 
> that is also something that needs to wait a bit. One problem at a time 
> ...
The main issue is the fact the API is asynchronous. It makes 
*everything* insanely complex. Anyone claiming concurrent code is simple 
is either a genious or an imbecile.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@directory.apache.org
For additional commands, e-mail: dev-help@directory.apache.org

Re: Native AD schema

Posted by Radovan Semancik <ra...@evolveum.com.INVALID>.

Hi,

On 6/11/19 5:06 PM, Emmanuel Lécharny wrote:
>
>> For those of you who are not aware of those AD "peculiarities", AD 
>> provides an somehow-standard LDAP-like schema definition. But this 
>> leaves a lot to be desired. 
>
> Indeed. Like 'userPassword' is single-valued in AD, while it's 
> expected to be MultiValued per the RFCs, and many other atrocities 
> (aka "*Embrace, extend, and extinguish*").

In fact, userPassword is pretty much useless in AD. There is this 
marvelous unicodePwd thing instead.

> I would be interested about having an exhaustive description of the 
> differences.

E.g. AD objectclass definition for 'person' looks like this:

dn: CN=Person,CN=Schema,CN=Configuration,DC=ad,DC=evolveum,DC=com
objectClass: top
objectClass: classSchema
cn: Person
...
subClassOf: top
governsID: 2.5.6.6
mayContain: attributeCertificateAttribute
rDNAttID: cn
showInAdvancedViewOnly: TRUE
adminDisplayName: Person
adminDescription: Person
objectClassCategory: 0
lDAPDisplayName: person
name: Person
objectGUID:: /uA5AkzcX0W3/FG/QH8AFg==
schemaIDGUID:: p3qWv+YN0BGihQCqADBJ4g==
systemOnly: FALSE
systemPossSuperiors: organizationalUnit
systemPossSuperiors: container
systemMayContain: userPassword
systemMayContain: telephoneNumber
systemMayContain: sn
systemMayContain: serialNumber
systemMayContain: seeAlso
systemMustContain: cn
systemFlags: 16
defaultHidingValue: TRUE
objectCategory: 
CN=Class-Schema,CN=Schema,CN=Configuration,DC=ad,DC=evolveum,D
  C=com
defaultObjectCategory: 
CN=Person,CN=Schema,CN=Configuration,DC=ad,DC=evolveum,
  DC=com
....

Completely different form. Part of the data is equivalent. But there are 
extensions (e.g. showInAdvancedViewOnly). Syntaxes are all different. 
Auxiliary object classes are handled in a different way (they can be 
"included" in structural object class definition). Lots of subtle 
differences there.

But the most annoying is objectCategory. AD obviously cannot properly 
index objectClass. Therefore they have invented single-valued 
objectCategory. That is the primary reason that I have bothered with this.

To be completely honest, all of this leaves quite a bad taste. I do not 
want to go into all the gory details. I think I will keep the 
implementation to the bare minimum to be able to survive with AD.

> AFAICT, you are working in a branch, which is just fine. I don't think 
> the LDAP API base code will move a lot in the coming weeks, now that 
> we have cut a release, so you have time to do what you need and be 
> safe merging all of it back in master.

For now the code is in my AD connector for midPoint. That's where it is 
easiest for me to have quick trial-and-error cycles, that are so 
necessary whenever one works with AD. Once I'm OK with the code I'll 
move it to Directory API. But that will take several weeks/months.

>
>> Current structure of Directory API, especially the schema-processing 
>> parts (SchemaLoader, SchemaManager) is not ideal for this particular 
>> kind of abominations. 
>
> trudat. I saw that you made it extensible, which is a sane decision. 
> May be creating an abstract classin the  middle would make sense, I 
> don't know. Or a factory ?

I do not know yet. Honestly, I'm not sure that I completely understand 
current purpose of SchemaLoader/SchemaManager and how the 
responsibilities are split between them. E.g. it looks like the design 
is based on assumption that all the schemas will be in RFC4512 format 
(or something very similar). Which obviously does not fit AD schemas. 
But I'm not sure. I think I need more experience with the code to 
understand it. Maybe more experiments. And I also have almost zero 
understanding how all of that fits into the server. That is also the 
reason why I've done only a very minimal changes. I do not want to ruin 
original design just because I do not understand it yet.

> I think that rethinking the design after you are done would benefit 
> from your experience.

Maybe later. Now I feel that my experience is not sufficient for this.

> Actually, decoupling the API from MINA will be quite difficult. We can 
> discuss that further, but the main issues are :

Yes. That is something we need to discuss later. The primary problem 
with MINA is its complexity. E.g. it makes it very inconvenient to use 
Directory API in simple apps, because even the basic usage of the API 
creates a lot of threads. And then it is a complete nightmare to 
diagnose and fix issues, e.g. the recent issue with the bind. Such 
things should be easy on the client. But they are not. I understand, 
that there is a need for some of that complexity on server side. But 
maybe, if we can make the "networking underside" pluggable, then we can 
have MINA on the server and something simpler on the client. But that is 
also something that needs to wait a bit. One problem at a time ...

-- 
Radovan Semancik
Software Architect
evolveum.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@directory.apache.org
For additional commands, e-mail: dev-help@directory.apache.org

Re: Native AD schema

Posted by Emmanuel Lécharny <el...@gmail.com>.

Hi Radovan,


On 11/06/2019 16:11, Radovan Semancik wrote:
> Hi,
>
> I thought that this may be of interest for the community here. I have 
> created a prototype code to process "native" Active Directory schema 
> with Directory API:
>
> https://github.com/Evolveum/connector-ldap/blob/feature/ad-native-schema/src/main/java/com/evolveum/polygon/connector/ldap/ad/AdSchemaLoader.java 
>
> https://github.com/Evolveum/connector-ldap/blob/feature/ad-native-schema/src/main/java/com/evolveum/polygon/connector/ldap/ad/AdSchemaManager.java 
>


This is much needed. We probably also need the same thing for OpenLDAP, 
as it uses a slightly different way to describe schema (typically, you 
can have macros).

>
> For those of you who are not aware of those AD "peculiarities", AD 
> provides an somehow-standard LDAP-like schema definition. But this 
> leaves a lot to be desired. 

Indeed. Like 'userPassword' is single-valued in AD, while it's expected 
to be MultiValued per the RFCs, and many other atrocities (aka 
"*Embrace, extend, and extinguish*").


> E.g. it does not provides any means to work with objectCategory, which 
> is quite essential if you want to get any decent performance of your 
> AD server. (Un)fortunately, there is another form of AD schema which 
> is, quite surprisingly, not standard. What I have done is to write a 
> code to parse that proprietary form of the schema and load it into 
> modified Directory API SchemaManager. Then the API can work with the 
> same in the same way as it works with standard LDAP schema.


I would be interested about having an exhaustive description of the 
differences.

>
> First tests suggest that this may work. Once I manage to stabilize and 
> test the code, I will most likely contribute that directly to 
> Directory API, as this may be useful also for other people. But that 
> will be most definitely after the 2.0 release of the API.

AFAICT, you are working in a branch, which is just fine. I don't think 
the LDAP API base code will move a lot in the coming weeks, now that we 
have cut a release, so you have time to do what you need and be safe 
merging all of it back in master.


>
> Current structure of Directory API, especially the schema-processing 
> parts (SchemaLoader, SchemaManager) is not ideal for this particular 
> kind of abominations. 


trudat. I saw that you made it extensible, which is a sane decision. May 
be creating an abstract classin the  middle would make sense, I don't 
know. Or a factory ?


> However, I have managed to make it work just with a few minor changes 
> in SchemaLoader and SchemaManager. Those classes would deserve much 
> tender loving care. But the current code seems to work and the rework 
> of SchemaLoader and SchemaManager is currently not my priority.

I think that rethinking the design after you are done would benefit from 
your experience.


> E.g. dependency of the API on MINA is something that bothers me much 
> more.

Actually, decoupling the API from MINA will be quite difficult. We can 
discuss that further, but the main issues are :

- the API is used on both side (client and server). We could imagine 
having a client based on a synchronous network layer (ie, plain socket) 
instead of somlething based on NIO. That would make the client code much 
simpler

- abstracting an asynchronous layer (be it MINA or NETTY or whatever we 
decide to use) would be quite difficult.

- we could also decide to stop using a network framework, and wrap our 
own. We don't need all the whistles and bells MINA offer, we just need 
the SSL part. Tomcat did that, to avoid having to depend on such a 
framework for many reasons (mainly because they wanted  to be in 
control). However, we are pretty much in control of MINA.


Anyway, this is clearly something we can discuss further.


Thanks Radovan !



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@directory.apache.org
For additional commands, e-mail: dev-help@directory.apache.org